Run the Bench
Reproduce Parametric CAD Bench locally with Harbor, then publish your run artifacts to Hugging Face and open a submission PR. The task suite itself is free; you pay only your model-API costs.
1. Run the bench
Pull the task images from Harbor Hub and run them against your (agent, model) of choice:
harbor run -d gnucleus-ai/cad-bench@v1 \ -a <your-agent> \ -m <your-model>
Each task image bakes in the FreeCAD validator at /opt/grader/, so any local re-grade agrees with the leaderboard scores by construction.
2. Push your run artifacts
Upload your per-trial outputs to a Hugging Face dataset you control. Use the same runs/<agent>/<model>/<task_id>/ layout the internal baseline uses (result.json, answer.FCStd, agent.log, trajectory.jsonl).
3. Open a submission PR
Add a manifest YAML under submissions/ in the submission repo. The manifest is a lightweight pointer that names your Hugging Face dataset, the exact commit OID to pin, and the declared summary metrics. A maintainer reviews each PR by hand — schema check, spot-check re-grade with gnucleus-freecad-validator, trajectory sanity, cost re-derivation from token counts — then merges.
Links
Harbor task suite
The published task images. Reproducible end-to-end from a single harbor run command — reference geometry, validator, and scorer are all baked in.
https://hub.harborframework.com/datasets/gnucleus-ai/cad-bench/latest

Results dataset (Hugging Face)
Per-trial run artifacts under runs/<agent>/<model>/<task_id>/: result.json, answer.FCStd, agent.log, trajectory.jsonl. The internal baseline lives here; mirror this layout for your own submissions.
https://huggingface.co/datasets/gnucleus-ai/cad-gen-freecad-bench
Submission portal (GitHub)
Open a PR with one manifest YAML under submissions/ pointing at your Hugging Face dataset. A maintainer re-grades it against gnucleus-freecad-validator and merges.
https://github.com/gNucleus-AI/cad-bench-submission