Pre-launch draft. The FixAI Group — Independent AI Verification Council. Copy under review.
The ReallySolved family — making AI safer and more accurate:
The FixAI Group / Certification
The FixAI Mark

Independent verification — not a badge you can buy.

The FixAI Mark is designed to say one thing, and mean it: an independent expert panel tests an AI product against methodology co-authored with Founding Council labs for accuracy, safety, and alignment, and publishes what it finds. The vendor doesn't write the test, score the test, or bury the result — and no participating lab gets preferential treatment.

Civilian-accuracy complement to NIST CAISI's national-security verification. Same independent-third-party approach, different scope, no overlap. Patent pending.

What it means

A mark is only worth what it refuses to certify.

Plenty of "trust" badges mean a logo and an invoice. Ours is defined as much by what it won't say as by what it will.

What the mark means

  • An independent panel evaluated the product against our published battery.
  • The findings — including the failures — are published in a transparent report.
  • The result reflects a specific version, tested on a specific date.
  • Sponsorship, where it exists, is disclosed on the report itself.

What it does not mean

  • That the product is perfect, or risk-free, or right for every use.
  • That a payment changed the score. It can't.
  • That the mark carries over to a new model the panel hasn't seen.
  • That The FixAI Group endorses the company behind the product.
The evaluation battery

Four pillars. Plain-language pass criteria.

Each pillar is scored by an independent panel using a documented method. A working preview — the full rubric is finalized with the council before any product is reviewed.

PillarHow it's testedPass looks like
AccuracyAdversarial Q&A on hard and ambiguous prompts; source-citation checks; stale-fact probes.Cites sources, admits uncertainty, and says "I don't know" instead of inventing.
SafetyCrisis and self-harm scenarios; age-appropriate behavior; sustained jailbreak and manipulation attempts.Refuses harm, surfaces real help, and holds the line under pressure — not just on the first try.
AlignmentObserved behavior compared against the vendor's own stated policy and documentation.What the product actually does matches what it promises — and the promises are honest about limits.
AccountabilityReview of ownership, human-escalation paths, and published incident handling.When something goes wrong, a person is reachable and a process exists.

Preview only. Criteria, weighting, and thresholds are subject to council review before launch and may change.

Levels (planned)

Three levels of assurance.

A forward-looking structure, aligned with the ReallySolved review framework. Names and thresholds are not final.

Level 1
Provisional

Self-disclosure reviewed against the battery, with spot-checks. A starting point — the product entered the process and met the baseline.

Level 2
Verified

Full independent panel evaluation across all four pillars, with a published report. The core mark.

Level 3
Verified+ · Monitored

Verified, plus ongoing re-testing on a published cadence so the mark tracks the product as it changes.

Planned. This describes our roadmap and intentions; it is not a commitment and may change.

How participating labs are verified

For participating frontier labs.

Voluntary, version-specific, and — where a Founding Council lab is also funding the operational cost of its review — clearly labeled as sponsored on the published report. Sponsorship cannot change the result. Symmetric process across every participating lab.

STEP 01
Co-author & submit
Founding Council labs shape the methodology before any model is evaluated. Then labs make models available via standard commercial API access, with the specific version and documentation. No weights, no system prompts, no eval-set holdouts disclosed.
STEP 02
Independent evaluation
An expert review panel runs the published battery — accuracy, safety, alignment, accountability — using a patent-pending multi-AI orchestration mechanism that surfaces disagreement between participating models and routes uncertain claims to human experts. The vendor does not control the findings.
STEP 03
Report, Mark & system-card citation
Results publish as a transparent report. Pass, and you may carry the FixAI Mark for that version, and cite "submits to neutral verification by ReallySolved" in your own system cards and safety communications. Infrastructure Powered by ReallySolved.
Founding Council inquiry →
Straight answers

The questions every vendor asks.

Can we pay for a passing grade?
No. Sponsorship can fund the cost of a review; it cannot change the result. If it could, the mark would be worthless — to you and to us. Funded reviews are labeled as sponsored on the report.
What happens if we fail?
You get the findings privately first, with the specifics, so you can fix and re-apply. A failing result is only published if you've chosen to carry the mark and then misrepresent the outcome.
Does the mark expire?
It's tied to a specific version tested on a specific date. Ship a materially new model and it needs re-testing — that's what the Monitored level is for.
Is this regulation?
No. We're an independent, voluntary verification body — not a government, not a regulator, and not legal advice. Think safety ratings, not statutes.
How does this relate to NIST CAISI?
The FixAI Group is the civilian-accuracy complement to NIST CAISI's national-security verification work. Same independent-third-party approach, different scope, no overlap. CAISI evaluates cybersecurity, biosecurity, chemical-weapons, and foreign-AI risks; the FixAI Group evaluates general factual accuracy and public-facing claim verification. Labs already participating in CAISI can cite both: "We submit to CAISI for national-security evaluations and to FixAI Group for public-accuracy evaluations." The two programs speak to different audiences with different procurement, governance, and public-trust requirements.
What's the difference between the Founding Council and the Expert Review Panel?
Two distinct layers. The Founding Council is frontier AI labs as institutional participants — they co-author the methodology, scoring criteria, topic taxonomy, and dispute-resolution rules. Symmetric terms; no preferential treatment. The Expert Review Panel is independent contractors (researchers, ethicists, clinicians, domain experts) who actually run the verification battery and produce the verdicts that get published. Founding Council labs do not control the panel's findings — that's the point of the architectural separation.
Will participating labs need to disclose weights, system prompts, or training data?
No. Verification runs on standard commercial API access — the same access any paying customer has. No weights, no system prompts, no fine-tuning data, no eval-set holdouts. The patent-pending multi-AI orchestration mechanism evaluates model behavior through standard inference calls; nothing about your stack is exposed to the panel or the public.

Confident? Then prove it to someone independent.

The companies that ask to be checked first are the ones that have nothing to hide. Be early.

Request verification →