Back to CAD-Bench

Run the Bench

Reproduce Parametric CAD Bench locally with Harbor, then publish your run artifacts to Hugging Face and open a submission PR. The task suite itself is free; you pay only your model-API costs.

1. Run the bench

Pull the task images from Harbor Hub and run them against your (agent, model) of choice:

harbor run -d gnucleus-ai/cad-bench@v1 \
  -a <your-agent> \
  -m <your-model>

Each task image bakes in the FreeCAD validator at /opt/grader/, so any local re-grade agrees with the leaderboard scores by construction.

2. Push your run artifacts

Upload your per-trial outputs to a Hugging Face dataset you control. Use the same runs/<agent>/<model>/<task_id>/ layout the internal baseline uses (result.json, answer.FCStd, agent.log, trajectory.jsonl).

3. Open a submission PR

Add a manifest YAML under submissions/ in the submission repo. The manifest is a lightweight pointer that names your Hugging Face dataset, the exact commit OID to pin, and the declared summary metrics. A maintainer reviews each PR by hand — schema check, spot-check re-grade with gnucleus-freecad-validator, trajectory sanity, cost re-derivation from token counts — then merges.

Links