Certification — The FixAI Group

What it means

A mark is only worth what it refuses to certify.

Plenty of "trust" badges mean a logo and an invoice. Ours is defined as much by what it won't say as by what it will.

What the mark means

✓An independent panel evaluated the product against our published battery.
✓The findings — including the failures — are published in a transparent report.
✓The result reflects a specific version, tested on a specific date.
✓Sponsorship, where it exists, is disclosed on the report itself.

What it does not mean

✕That the product is perfect, or risk-free, or right for every use.
✕That a payment changed the score. It can't.
✕That the mark carries over to a new model the panel hasn't seen.
✕That The FixAI Group endorses the company behind the product.

The evaluation battery

4 pillars. Plain-language pass criteria.

Each pillar is scored by an independent panel using a documented method. A working preview — the full rubric is finalized with the council before any product is reviewed.

Pillar	How it's tested	Pass looks like
Accuracy	Adversarial Q&A on hard and ambiguous prompts; source-citation checks; stale-fact probes.	Cites sources, admits uncertainty, and says "I don't know" instead of inventing.
Safety	Crisis and self-harm scenarios; age-appropriate behavior; sustained jailbreak and manipulation attempts.	Refuses harm, surfaces real help, and holds the line under pressure — not just on the first try.
Alignment	Observed behavior compared against the vendor's own stated policy and documentation.	What the product actually does matches what it promises — and the promises are honest about limits.
Accountability	Review of ownership, human-escalation paths, and published incident handling.	When something goes wrong, a person is reachable and a process exists.

Preview only. Criteria, weighting, and thresholds are subject to council review before launch and may change.

Levels (planned)

3 levels of assurance.

A forward-looking structure, aligned with the ReallySolved review framework. Names and thresholds are not final.

Level 1

Provisional

Self-disclosure reviewed against the battery, with spot-checks. A starting point — the product entered the process and met the baseline.

Level 2

Verified

Full independent panel evaluation across all 4 pillars, with a published report. The core mark.

Level 3

Verified+ · Monitored

Verified, plus ongoing re-testing on a published cadence so the mark tracks the product as it changes.

Planned. This describes our roadmap and intentions; it is not a commitment and may change.

How participating labs are verified

For participating frontier labs.

Voluntary, version-specific, and — where a Founding Council lab is also funding the operational cost of its review — clearly labeled as sponsored on the published report. Sponsorship cannot change the result. Symmetric process across every participating lab.

STEP 01

Co-author & submit

Founding Council labs shape the methodology before any model is evaluated. Then labs make models available via standard commercial API access, with the specific version and documentation. No weights, no system prompts, no eval-set holdouts disclosed.

STEP 02

Independent evaluation

An expert review panel runs the published battery — accuracy, safety, alignment, accountability — using a patent-pending multi-AI orchestration mechanism that surfaces disagreement between participating models and routes uncertain claims to human experts. The vendor does not control the findings.

STEP 03

Report, Mark & system-card citation

Results publish as a transparent report. Pass, and you may carry the FixAI Mark for that version, and cite "submits to neutral verification by ReallySolved" in your own system cards and safety communications. Infrastructure Powered by ReallySolved.

Founding Council inquiry →

Straight answers

The questions every vendor asks.

Can we pay for a passing grade?

No. Sponsorship can fund the cost of a review; it cannot change the result. If it could, the mark would be worthless — to you and to us. Funded reviews are labeled as sponsored on the report.

What happens if we fail?

You get the findings privately first, with the specifics, so you can fix and re-apply. A failing result is only published if you've chosen to carry the mark and then misrepresent the outcome.

Does the mark expire?

It's tied to a specific version tested on a specific date. Ship a materially new model and it needs re-testing — that's what the Monitored level is for.

Is this regulation?

No. We're an independent, voluntary verification body — not a government, not a regulator, and not legal advice. Think safety ratings, not statutes.

How does this relate to NIST CAISI?

The FixAI Group is the civilian-accuracy complement to NIST CAISI's national-security verification work. Same independent-third-party approach, different scope, no overlap. CAISI evaluates cybersecurity, biosecurity, chemical-weapons, and foreign-AI risks; the FixAI Group evaluates general factual accuracy and public-facing claim verification. Labs already participating in CAISI can cite both: "We submit to CAISI for national-security evaluations and to FixAI Group for public-accuracy evaluations." The two programs speak to different audiences with different procurement, governance, and public-trust requirements.

What's the difference between the Founding Council and the Expert Review Panel?

2 distinct layers. The Founding Council is frontier AI labs as institutional participants — they co-author the methodology, scoring criteria, topic taxonomy, and dispute-resolution rules. Symmetric terms; no preferential treatment. The Expert Review Panel is independent contractors (researchers, ethicists, clinicians, domain experts) who actually run the verification battery and produce the verdicts that get published. Founding Council labs do not control the panel's findings — that's the point of the architectural separation.

Will participating labs need to disclose weights, system prompts, or training data?

No. Verification runs on standard commercial API access — the same access any paying customer has. No weights, no system prompts, no fine-tuning data, no eval-set holdouts. The patent-pending multi-AI orchestration mechanism evaluates model behavior through standard inference calls; nothing about your stack is exposed to the panel or the public.

Independent verification — not a badge you can buy.