whitepaper · AI-first engineering simulation

The Anvil Sim technical thesis.

Anvil Sim is a bet that deep research, reusable playbooks, native evidence, and benchmark-driven solver development can turn serious computational physics into usable engineering software.

Operating model

deep research -> specs -> code -> benchmarks

Current evidence

17 v1-ready packs, 9 verified, 32 validation steps

Product surface

native Mac evidence console

North star

validated engineering physics across more domains

1. The real thesis

Anvil Sim should become AI-first engineering simulation software: FEA, CFD, thermal, modal, geometry, meshing, optimization, and multiphysics workflows that are built from first principles, validated against benchmarks, and made inspectable in a native Mac product.

AI works around the solver: it reads literature, turns research into architecture, proposes testable setups, generates code, critiques assumptions, finds the next benchmarks, writes reviewer packets, and helps the team move faster while preserving validation evidence.

The edge is a technical loop for turning frontier reasoning into validated physics capability.

2. Why now

Classical simulation platforms have decades of solver depth. Anvil can start narrower and move differently: every demo becomes a capability decomposition, every validation focus becomes a build slice, and every benchmark closure makes the next visual scene more credible.

The practical wedge is still bounded engineering decision support. The long-horizon ambition is state-of-the-art computational physics across expanding domains. The bridge between those two is architecture, tests, reference data, convergence, and clear validation notes.

3. Operating model

Deep research becomes build direction

Research is useful when it produces architecture implications, benchmark targets, first tests, validation notes, and a build slice that can ship.

Reusable playbooks preserve execution intelligence

Repeated workflows become playbooks: CFD benchmark cycle, geometry intake, native evidence slice, evidence-packet cycle, review, verification, and release.

Knowledge systems keep context alive

Anvil needs a living knowledge base for CFD, FEA, meshing, numerical methods, Mac-native product design, customer workflows, and competitive references so research turns into better build choices.

Orchestration keeps work reviewable

Bounded product lanes converge through tests, benchmark evidence, visual review, and explicit validation steps.

This is why Anvil needs an execution harness. It keeps the product from becoming scattered demos and stale documents. The ideal loop is simple: deep research -> architecture decision -> build-slice spec -> test or benchmark -> code -> evidence -> skill update.

deep research
  -> architecture decision
  -> build-slice spec
  -> test or benchmark
  -> solver/native/product code
  -> evidence packet and visual evidence
  -> retrospective
  -> skill update

4. Product architecture

Anvil is built as a layered system so capability compounds instead of splitting into disconnected scenes.

Layer	What it must prove
Geometry and meshing	CAD intake, units, source-face provenance, mesh quality, mesh convergence, and source-bound loads.
Solver kernels	Rust-owned computed evidence for structural, modal, thermal, CFD, and eventually coupled workflows.
Benchmark gauntlet	Analytic checks, manufactured solutions, NAFEMS-style structural cases, CFD reference rungs, convergence, and reference-solver comparisons.
Native evidence	Swift and Metal surfaces that show geometry, mesh, solver fields, benchmark deltas, provenance, validation steps, and visual replay.
AI workflow	AI proposes setups, critiques meshes, reads papers, drafts specs, and diagnoses solver behavior while benchmark evidence carries public claims.
Evidence packets	A replayable packet that binds claim, input, solver, benchmark, visual replay, reviewer packet, and validation notes.

5. Visuals still matter

A benchmark table alone cannot carry the whole product story. Visuals are the front door: boomerang flight paths, flame motion, Saturn V subclaims, robot dynamics, thermal fields, modal shapes, and pressure maps make the product legible. But the visual must be clickable into evidence.

The Boomerang demo is a good example of the desired product language. The flight-path replay is memorable. The analysis panel shows the solver gate. The validation note explains how measured-flight or aerodynamic correlation can expand the evidence.

6. Benchmarks are the roadmap

Evidence packets are executable roadmaps. A validation packet should tell us exactly what to build next: source-face binding, mesh convergence, reference solver comparison, CFD conservation, thermal energy balance, modal correlation, or native evidence routing.

Current evidence inventory: 17 v1-ready packets, 9 verified evidence packs, 8 validation-track packs, and 32 tracked validation steps. The job is to turn those validation steps into code, tests, benchmark closures, and stronger evidence packets.

7. Pushing toward theoretical limits

The phrase means each narrow workflow should be pushed toward the limit of what can be known from equations, discretization, reference solvers, analytic checks, experiments, and uncertainty accounting. When Anvil claims a result, it should be clear which limit was approached and which expansion path comes next.

This is why CFD matters so much. CFD is visually compelling and demands careful validation. The Anvil path is conservation, mesh sensitivity, simple flows, canonical public references, same-input comparisons, and then flagship compositions.

8. Validation principles

Engineering decision support first; certification workflows require dedicated evidence packages.
Narrow benchmark closures compound toward broader solver coverage.
AI accelerates setup, critique, and review while solvers and validation carry the claim.
Visual demos route into benchmark rows, evidence packets, and reviewer surfaces.
Flagship scenes become proof through smaller Saturn V, flame, boomerang, CFD, FEA, and thermal subclaims.

9. The product shape

Deep research supplies expert context. Reusable playbooks preserve successful workflows. The knowledge system keeps literature, decisions, benchmarks, and lessons searchable. The native product shows the evidence.

The product loop reinforces itself. Better research produces better solver work. Better solver work produces better evidence packets. Better evidence packets produce better native evidence. Better reviews make the next build loop faster.

10. Closing

Anvil Sim is a system for making engineering judgment sharper: more research-aware, more testable, more visual, more reproducible, and clearer about what the evidence supports next.

The near-term product is evidence-routed simulation for narrow workflows. The long-term ambition is AI-first engineering simulation pushed as close as possible to the edge of what software, solvers, literature, and validation can prove.

Whitepaper updated May 13, 2026. This page states the company thesis and operating model. It keeps public claims tied to the current benchmark, provenance, and validation focus evidence.