Documentation

Trust Agent whitepaper

Audit-first. Provenance-aware. Sovereign by design. Why an honest agent marketplace needs more than a star rating, and how Trust Agent enforces it.

1. The trust deficit

Agent marketplaces today rely on the seller's self-description. There is no standard way to verify that an agent does what it claims, that its prompt is free of hidden instructions, or that it will refuse the requests it should refuse. The buyer has to trust the seller's word — or run their own evaluation, which most cannot.

Trust Agent's thesis: the trust gap is not a UX problem to be solved by better ratings or richer reviews. It is an audit problem. The marketplace itself must produce machine-checkable evidence that each listing matches its description, signed by an identifiable auditor, and re-checked on every version.

We treat AI listings the way an accounting firm treats a public company: independent review, signed reports, version history, and a public registry the buyer can verify without having to trust either party.

2. What Trust Agent is

Trust Agent is a marketplace and audit layer for AI roles, skills, and agents. Three distinct entities live in the system:

Role — a complete persona-shaped AI helper for a specific job (Finance Advisor, HR Coach, Tutor). The unit most non-technical buyers hire.
Skill — an optional capability that plugs into a Role (e.g. Financial Modelling attached to a Finance Advisor). Audited independently.
Agent — a self-contained executable AI module published by developers and consumed via the API. The unit technical buyers integrate into their own apps.

All three flow through the same audit pipeline. The output of an audit is a public, signed certificate at /audits/{id} that references the exact artefactHash that was reviewed.

3. The audit pipeline

Every Role, Skill, and Agent submitted runs through a four-stage pipeline before it can carry a badge.

3.1 Stage 1 — configuration integrity

Static analysis of the manifest, declared permissions, metadata, license, and dependency footprint. Catches missing capability declarations, undeclared network access, and permission drift between the listing's claims and its actual configuration.

3.2 Stage 2 — behaviour testing

Observed responses are checked against the declared behaviour spec. Refusal behaviour, escalation paths, prompt-injection resistance, and consistency between persona name and observed tone are all sampled.

3.3 Security scan — 70 SC checks

SC-001 through SC-070 cover four kinds with different weights in the blended score:

Security (50%) — malware markers, secrets, privilege escalation, CVE / supply-chain, network exfiltration, sandbox escape, LLM-judge attack-surface checks, code audit.
Safety (25%) — content safety, refusal coverage, persona drift, boundary enforcement.
Compliance (15%) — GDPR, age-gating, sensitive-domain handling, prohibited claims, privacy declarations.
Behaviour (10%) — prompt injection, indirect injection, jailbreak resistance, claim-vs-behaviour drift.

Eight checks are LLM-judged: hidden instructions, manipulation, unsafe automation, consistency, indirect injection, role confusion, prompt extraction, hypothetical bypass. The remaining 62 are deterministic regex / structural detectors.

3.4 Stage 3 — human review

An approved auditor claims the audit, reads the automated findings, edits the public narrative, sets a final human-review score, and signs. Only then does the listing publish with its badge. See the Auditor Code of Conduct for independence and confidentiality requirements.

4. The trust score formula

The final 0-100 score is a weighted blend:

finalScore =  0.18 · stage1
            + 0.18 · stage2
            + 0.20 · stage3            (replaced by humanReviewScore on sign-off)
            + 0.06 · communityScore
            + 0.38 · securityScore     (severity-weighted, sub-blended)
            × 0.7    if any critical-severity SC check failed
            + 5      bonus when an approved auditor has signed

A single critical security failure triggers a multiplicative 0.7× penalty so it cannot be averaged away by strong stage1/2 scores. Auditor sign-off adds a small confidence bonus (capped at 100) that reflects human verification of the automated findings.

4.1 Badge thresholds

Platinum 92-100 — zero critical findings, recommended for regulated-sector deployment.
Gold 84-91 — no high-severity findings, suitable for most professional use.
Silver 74-83 — minor findings disclosed with remediation plan.
Bronze 62-73 — meets the audit bar with documented limitations; low-stakes use only.
Advisory below 62 — below the audit bar; not recommended for production.

5. Sovereignty

Trust Agent is operated by AgentCore LTD from infrastructure in the United Kingdom. We do not use Neon, AWS, or other US-headquartered hyperscaler database / compute services for production. Database, file storage, and inference all run on infrastructure we operate inside the UK AI Growth Zone.

The Brain — the persistent memory each user's companion builds across sessions — is encrypted client-side (AES-256-GCM) and stored on the user's own cloud (Google Drive, iCloud, OneDrive). Trust Agent cannot read its contents, and the file remains the user's property even if the subscription lapses.

6. Provenance and versioning

Every listing version is content-addressed by its SHA-256 artefact hash. The audit certificate at /audits/{id} embeds that hash, so the buyer can verify that the listing they hire is the same version that was audited.

Substantive changes to the prompt, behaviour spec, or capabilities trigger a new version. The new version must pass audit before it carries the badge. Until then, the previous audited version remains in service. Audit history is preserved per version, so version-2 buyers can compare what changed between v1 and v2 without guessing.

7. Liability

Trust Agent is the audit and trust layer, not the author or operator of any listing. The badge attests to the rigour of the audit applied to a specific version — it does not warrant the truthfulness, accuracy, currency, or fitness-for-purpose of any output the listing produces. Liability for output rests with the Creator who published the listing and the Buyer who configured and used it. See the Terms of Service Section 7 (Limitation of liability) for the full carve-out.

8. Roadmap

On-chain audit registry (web3 cut of the deck) — every signed audit certificate published as an NFT-style attestation so third parties can verify rigour without trusting Trust Agent's own database.
Sector-specific audit profiles — additional check sets for medical, legal, financial, and child-safeguarding listings on top of the SC-001 to SC-070 base.
Re-audit on dependency change — when a Role's upstream Skill or LLM provider updates, the Role's audit decays until re-verified.
Auditor reputation — public per-auditor track record (audits signed, override rate, post-audit incident rate) so buyers can weigh signatures against each auditor's history.

9. Reading list

For the financial model, see the Financial paper. For the live audit-pipeline implementation, browse /security. For the company and team, see About. For the legal posture, see Terms of Service and the AI Output Disclaimer.