Open Source · AI Agent Infrastructure

The reliability
layer AI agents
depend on.

Nine open-source tools that validate, test, monitor, govern, and verify AI agents in production. Not applications — infrastructure.

"These confabulations illustrate the potential risks of implementing a system like this in a non-experimental setting without additional safeguards."

— Anthropic, Project Deal Research Paper, 2026

We built those safeguards.

pip install iron-thread

or npm install iron-thread · Start with any tool. Each works standalone.

Explore the Suite Read the Story

Tools Live

Infrastructure Cost

PyPI + npm Packages

thread-suite — production

$ pip install iron-thread testthread policythread

Successfully installed 3 packages

$ python audit.py

Running Thread Suite audit...

✓ Iron-Thread — structure validated

✓ TestThread — 47/50 tests passing

⚠ PolicyThread — 2 violations detected

→ "No medical diagnoses" [CRITICAL]

→ "Competitor mention" [HIGH]

Attestation chain: verified

Pipeline health: 0.84

Why This Exists

Agents are in production.
Nobody is watching.

AI agents are being deployed into production right now. They are making autonomous decisions, running destructive actions, confabulating details mid-negotiation, and operating outside their defined scope — without anyone noticing. The infrastructure layer that governs, verifies, and audits these agents does not exist yet at scale. The Thread Suite is building it.

"A Claude agent deleted a company's entire production database in 9 seconds. Every backup gone. The AI admitted afterward: it was guessing."

Real incident — May 2026

"Anthropic ran 186 autonomous agent-to-agent deals and documented agents confabulating details mid-negotiation."

Anthropic Research — 2026

"The EU AI Act mandates ongoing behavioral monitoring for high-risk AI systems by August 2026. Fines up to €15M for non-compliance."

Regulatory reality

The Suite

Nine tools. Nine questions.
Complete coverage.

Each tool answers a question the others cannot. Use one or use all — they work standalone and work better together.

Iron-Thread v1.2.0

"Did the AI return the right structure?"

Validates AI outputs against defined schemas before they reach your database. Confidence scoring flags statistically anomalous values that pass schema. Tamper-evident SHA-256 hash chain on every validation run.

pip install iron-thread

npm install iron-thread

Docs ↗ Live API ↗ GitHub ↗

TestThread v0.12.0

"Did the agent do the right thing?"

Behavioral testing framework for AI agents. Define what your agent should do, run it, get pass/fail with AI diagnosis on failures. Adversarial test generation. Continuous drift monitoring.

pip install testthread

npm install testthread

Docs ↗ Live API ↗ GitHub ↗

PromptThread v0.4.0

"Is my prompt the best version of itself?"

Version control and performance tracking for prompts. Full history, rollback, and diff. Run logging with latency, cost, and pass rate. World drift signal detects when your model is now wrong about verifiable facts.

pip install promptthread

npm install promptthread

Docs ↗ Live API ↗ GitHub ↗

ChainThread v0.10.0

"Did the handoff between agents succeed?"

Verification and governance for agent-to-agent handoffs. Signed envelopes, contract assertions, confidence decay, PII detection, dead letter queue, and agent reputation layer.

pip install chainthread

npm install chainthread

Docs ↗ Live API ↗ GitHub ↗

PolicyThread v1.3.0

"Is the AI staying within our rules in production?"

Always-on compliance monitoring for every live interaction. Semantic and deterministic evaluation. Cryptographic attestation chain. Audit reports a regulator can actually read.

pip install policythread

npm install policythread

Docs ↗ Live API ↗ GitHub ↗

ThreadWatch v0.6.0

"Is the entire pipeline healthy right now?"

Cross-layer vigilance system. Ingests signals from all suite tools simultaneously, detects anomalies, diagnoses root causes, and correlates internal failures with external provider incidents.

pip install threadwatch

npm install threadwatch

Docs ↗ Live API ↗ GitHub ↗

Behavioral Fingerprint v0.6.0

"Has this agent's behavioral profile changed?"

Captures how your agent behaves at deployment across six dimensions — verbosity, hedging, refusal, confidence, consistency, adherence — and monitors drift over time.

pip install behavioralfingerprint

npm install behavioralfingerprint

Docs ↗ Live API ↗ GitHub ↗

AgentID v0.5.0

"Who is this agent and can we trust it?"

Cryptographic identity and reputation system for AI agents. Every agent gets a verifiable credential and a track record. Trust is earned, not assumed.

pip install threadagentid

npm install threadagentid

Docs ↗ Live API ↗ GitHub ↗

DriftWatch v0.5.0

"Does my model still know what is true?"

Monitors whether your AI model's knowledge has gone stale against verified real-world facts. Staleness score, decay curve, and domain-specific ground truth anchors.

pip install thread-driftwatch

npm install thread-driftwatch

Docs ↗ Live API ↗ GitHub ↗

The Pipeline

Every tool is standalone.
Together they cover everything.

Prompt enters

→

PromptThread

→

AI output

→

Iron-Thread

→

PolicyThread

→

ChainThread

→

AgentID

→

TestThread

→

Behavioral Fingerprint

→

DriftWatch

→

ThreadWatch

The cryptographic trust layer runs through the entire suite. Iron-Thread hashes every validation run. ChainThread signs every handoff. PolicyThread chains every compliance evaluation. AgentID signs every credential. Any tampering with any record is immediately detectable. This is not a feature — it is an architecture.

Get Started

Where do you begin?

I want to validate my agent's outputs

Start with Iron-Thread. Install in 30 seconds, send your first validation in 5 minutes. Define a schema, submit your AI's output, get a pass/fail with confidence score and tamper-evident log.

pip install iron-thread

Read the docs →

I want to test my agent's behavior

Start with TestThread. Define what your agent should do, run it, see exactly what passes and what fails. AI diagnosis explains failures. Adversarial generation finds the breaking points.

pip install testthread

Read the docs →

I need compliance monitoring in production

Start with PolicyThread. Define your rules in plain English, watch every live interaction automatically. Semantic evaluation catches meaning-based violations. Cryptographic audit trail for regulators.

pip install policythread

Read the docs →

I want to understand the suite before picking a tool

Read the suite story first. Five minutes. The architecture, the reasoning, the cryptographic trust layer, why nine tools and not one — all of it is there. Then you will know exactly which tool your system needs most.

Read the story →

Learn

Use the suite effectively.

The Thread Suite enforces your thinking. Define a poor schema and Iron-Thread enforces poor validation. Write shallow test cases and TestThread enforces shallow testing. The tools are only as powerful as the thinking you put into them.

PATTERN 01

Stack your match types

Contains tells you if the agent spoke the right language. Semantic tells you if it said the right thing. Use both on the same input and you get a diagnostic layer, not just a pass/fail. Contains passes, semantic fails — the word is there but the meaning is not. That is a different problem than both failing. Different problems need different fixes.

PATTERN 02

Read the gap between deterministic and semantic

When deterministic fails and semantic passes — pay attention. Your agent found a correct answer your rules did not anticipate. That gap is telling you something about the limits of your own thinking. Update your rules. Your suite just taught you something about your system that you did not know before.

PATTERN 03

Why you need more than one tool

Iron-Thread tells you the shape is right. TestThread tells you the answer is right. You need both because a perfectly structured wrong answer is still wrong. Every tool answers a question the others cannot. Reliability is not one question — it is nine. Treat them as one system.

PATTERN 04

The pipeline is a signal chain

ThreadWatch was not designed top-down. It was discovered by asking: what happens when ChainThread fails and TestThread has no explanation? The answer was a tool that watches everything simultaneously. Every tool in this suite exists because a real gap was found and filled. Nothing is arbitrary.

PATTERN 05

The audit trail is proof, not a log

PolicyThread's attestation chain uses SHA-256 hash chaining. Every evaluation record incorporates the hash of the previous record. Tampering with any record breaks every subsequent link immediately. This is not logging — it is cryptographic proof. Hand it to a regulator. They can verify it independently.

The Story

Built at the layer everyone
else is ignoring.

Everyone is building AI applications. The gap — the same gap that existed in the early internet — is in what AI needs to be industrial-grade. Not smarter. Not faster. More reliable, more safe, more trusted, more steerable, more accountable. The Thread Suite is that layer. Nine tools. One person. Accra, Ghana. Celeron processor. 4GB RAM. Borrowed data some days. $0 infrastructure cost. Every tool live. Every API responding. Organic downloads with zero marketing. Research submitted to Anthropic. If it runs here — it runs anywhere.

March '26

ICA submitted to Anthropic

Total infrastructure cost

Tools live in production

The Research Arm — ICA

The Infinite Conversation Architecture — a five-component memory framework that makes AI conversation genuinely continuous without growing the context window. The novel contribution: retrieval that begins while the user is still typing, adding zero latency. Distributed Memory Verification Protocol for cryptographic attestation of AI memory across agent instances. Published open-source, submitted to Anthropic.

View on GitHub ↗

The reliability
layer AI agents
depend on.

Agents are in production.
Nobody is watching.

Nine tools. Nine questions.
Complete coverage.

Every tool is standalone.
Together they cover everything.

Where do you begin?

Use the suite effectively.

Built at the layer everyone
else is ignoring.

The Research Arm — ICA

Keep the suite running
and growing.

Questions, suggestions,
complaints — all welcome.

General Contact

Bug Reports & Suggestions

Developer Questions

Coming Soon — Ask the Suite

The reliabilitylayer AI agentsdepend on.

Agents are in production.Nobody is watching.

Nine tools. Nine questions.Complete coverage.

Every tool is standalone.Together they cover everything.

Where do you begin?

Use the suite effectively.

Built at the layer everyoneelse is ignoring.

The Research Arm — ICA

Keep the suite runningand growing.

Questions, suggestions,complaints — all welcome.

General Contact

Bug Reports & Suggestions

Developer Questions

Coming Soon — Ask the Suite

The reliability
layer AI agents
depend on.

Agents are in production.
Nobody is watching.

Nine tools. Nine questions.
Complete coverage.

Every tool is standalone.
Together they cover everything.

Built at the layer everyone
else is ignoring.

Keep the suite running
and growing.

Questions, suggestions,
complaints — all welcome.