Run the Bench

Reproduce Parametric CAD Bench locally with Harbor, then publish your run artifacts to Hugging Face and open a submission PR. The task suite itself is free; you pay only your model-API costs.

1. Run the bench

Pull the task images from Harbor Hub and run them against your (agent, model) of choice:

harbor run -d gnucleus-ai/cad-bench@v1 \
  -a <your-agent> \
  -m <your-model>

Each task image bakes in the FreeCAD validator at /opt/grader/, so any local re-grade agrees with the leaderboard scores by construction.

2. Push your run artifacts

Upload your per-trial outputs to a Hugging Face dataset you control. Use the same runs/<agent>/<model>/<task_id>/ layout the internal baseline uses (result.json, answer.FCStd, agent.log, trajectory.jsonl).

3. Open a submission PR

Add a manifest YAML under submissions/ in the submission repo. The manifest is a lightweight pointer that names your Hugging Face dataset, the exact commit OID to pin, and the declared summary metrics. A maintainer reviews each PR by hand — schema check, spot-check re-grade with gnucleus-freecad-validator, trajectory sanity, cost re-derivation from token counts — then merges.

Links

Harbor task suite

The published task images. Reproducible end-to-end from a single harbor run command — reference geometry, validator, and scorer are all baked in.

https://hub.harborframework.com/datasets/gnucleus-ai/cad-bench/latest

Results dataset (Hugging Face)

Per-trial run artifacts under runs/<agent>/<model>/<task_id>/: result.json, answer.FCStd, agent.log, trajectory.jsonl. The internal baseline lives here; mirror this layout for your own submissions.

https://huggingface.co/datasets/gnucleus-ai/cad-gen-freecad-bench

Submission portal (GitHub)

Open a PR with one manifest YAML under submissions/ pointing at your Hugging Face dataset. A maintainer re-grades it against gnucleus-freecad-validator and merges.

https://github.com/gNucleus-AI/cad-bench-submission