Suite Pipeline Learn Story Contact GitHub ↗
Open Source · AI Agent Infrastructure

The reliability
layer AI agents
depend on.

Nine open-source tools that validate, test, monitor, govern, and verify AI agents in production. Not applications — infrastructure.

"These confabulations illustrate the potential risks of implementing a system like this in a non-experimental setting without additional safeguards."
— Anthropic, Project Deal Research Paper, 2026
We built those safeguards.
pip install iron-thread

or npm install iron-thread · Start with any tool. Each works standalone.

9
Tools Live
$0
Infrastructure Cost
18
PyPI + npm Packages
thread-suite — production
$ pip install iron-thread testthread policythread
Successfully installed 3 packages

$ python audit.py
Running Thread Suite audit...

✓ Iron-Thread — structure validated
✓ TestThread — 47/50 tests passing
⚠ PolicyThread — 2 violations detected
→ "No medical diagnoses" [CRITICAL]
→ "Competitor mention" [HIGH]

Attestation chain: verified
Pipeline health: 0.84

$

Why This Exists

Agents are in production.
Nobody is watching.

AI agents are being deployed into production right now. They are making autonomous decisions, running destructive actions, confabulating details mid-negotiation, and operating outside their defined scope — without anyone noticing. The infrastructure layer that governs, verifies, and audits these agents does not exist yet at scale. The Thread Suite is building it.

"A Claude agent deleted a company's entire production database in 9 seconds. Every backup gone. The AI admitted afterward: it was guessing."

Real incident — May 2026

"Anthropic ran 186 autonomous agent-to-agent deals and documented agents confabulating details mid-negotiation."

Anthropic Research — 2026

"The EU AI Act mandates ongoing behavioral monitoring for high-risk AI systems by August 2026. Fines up to €15M for non-compliance."

Regulatory reality

The Suite

Nine tools. Nine questions.
Complete coverage.

Each tool answers a question the others cannot. Use one or use all — they work standalone and work better together.

Iron-Thread v1.2.0
"Did the AI return the right structure?"

Validates AI outputs against defined schemas before they reach your database. Confidence scoring flags statistically anomalous values that pass schema. Tamper-evident SHA-256 hash chain on every validation run.

pip install iron-thread
npm install iron-thread
TestThread v0.12.0
"Did the agent do the right thing?"

Behavioral testing framework for AI agents. Define what your agent should do, run it, get pass/fail with AI diagnosis on failures. Adversarial test generation. Continuous drift monitoring.

pip install testthread
npm install testthread
PromptThread v0.4.0
"Is my prompt the best version of itself?"

Version control and performance tracking for prompts. Full history, rollback, and diff. Run logging with latency, cost, and pass rate. World drift signal detects when your model is now wrong about verifiable facts.

pip install promptthread
npm install promptthread
ChainThread v0.10.0
"Did the handoff between agents succeed?"

Verification and governance for agent-to-agent handoffs. Signed envelopes, contract assertions, confidence decay, PII detection, dead letter queue, and agent reputation layer.

pip install chainthread
npm install chainthread
PolicyThread v1.3.0
"Is the AI staying within our rules in production?"

Always-on compliance monitoring for every live interaction. Semantic and deterministic evaluation. Cryptographic attestation chain. Audit reports a regulator can actually read.

pip install policythread
npm install policythread
ThreadWatch v0.6.0
"Is the entire pipeline healthy right now?"

Cross-layer vigilance system. Ingests signals from all suite tools simultaneously, detects anomalies, diagnoses root causes, and correlates internal failures with external provider incidents.

pip install threadwatch
npm install threadwatch
Behavioral Fingerprint v0.6.0
"Has this agent's behavioral profile changed?"

Captures how your agent behaves at deployment across six dimensions — verbosity, hedging, refusal, confidence, consistency, adherence — and monitors drift over time.

pip install behavioralfingerprint
npm install behavioralfingerprint
AgentID v0.5.0
"Who is this agent and can we trust it?"

Cryptographic identity and reputation system for AI agents. Every agent gets a verifiable credential and a track record. Trust is earned, not assumed.

pip install threadagentid
npm install threadagentid
DriftWatch v0.5.0
"Does my model still know what is true?"

Monitors whether your AI model's knowledge has gone stale against verified real-world facts. Staleness score, decay curve, and domain-specific ground truth anchors.

pip install thread-driftwatch
npm install thread-driftwatch

The Pipeline

Every tool is standalone.
Together they cover everything.

Prompt enters
PromptThread
AI output
Iron-Thread
PolicyThread
ChainThread
AgentID
TestThread
Behavioral Fingerprint
DriftWatch
ThreadWatch

The cryptographic trust layer runs through the entire suite. Iron-Thread hashes every validation run. ChainThread signs every handoff. PolicyThread chains every compliance evaluation. AgentID signs every credential. Any tampering with any record is immediately detectable. This is not a feature — it is an architecture.


Get Started

Where do you begin?

01
I want to validate my agent's outputs

Start with Iron-Thread. Install in 30 seconds, send your first validation in 5 minutes. Define a schema, submit your AI's output, get a pass/fail with confidence score and tamper-evident log.

pip install iron-thread
Read the docs →
02
I want to test my agent's behavior

Start with TestThread. Define what your agent should do, run it, see exactly what passes and what fails. AI diagnosis explains failures. Adversarial generation finds the breaking points.

pip install testthread
Read the docs →
03
I need compliance monitoring in production

Start with PolicyThread. Define your rules in plain English, watch every live interaction automatically. Semantic evaluation catches meaning-based violations. Cryptographic audit trail for regulators.

pip install policythread
Read the docs →
04
I want to understand the suite before picking a tool

Read the suite story first. Five minutes. The architecture, the reasoning, the cryptographic trust layer, why nine tools and not one — all of it is there. Then you will know exactly which tool your system needs most.

Read the story →

Learn

Use the suite effectively.

The Thread Suite enforces your thinking. Define a poor schema and Iron-Thread enforces poor validation. Write shallow test cases and TestThread enforces shallow testing. The tools are only as powerful as the thinking you put into them.

PATTERN 01
Stack your match types

Contains tells you if the agent spoke the right language. Semantic tells you if it said the right thing. Use both on the same input and you get a diagnostic layer, not just a pass/fail. Contains passes, semantic fails — the word is there but the meaning is not. That is a different problem than both failing. Different problems need different fixes.

PATTERN 02
Read the gap between deterministic and semantic

When deterministic fails and semantic passes — pay attention. Your agent found a correct answer your rules did not anticipate. That gap is telling you something about the limits of your own thinking. Update your rules. Your suite just taught you something about your system that you did not know before.

PATTERN 03
Why you need more than one tool

Iron-Thread tells you the shape is right. TestThread tells you the answer is right. You need both because a perfectly structured wrong answer is still wrong. Every tool answers a question the others cannot. Reliability is not one question — it is nine. Treat them as one system.

PATTERN 04
The pipeline is a signal chain

ThreadWatch was not designed top-down. It was discovered by asking: what happens when ChainThread fails and TestThread has no explanation? The answer was a tool that watches everything simultaneously. Every tool in this suite exists because a real gap was found and filled. Nothing is arbitrary.

PATTERN 05
The audit trail is proof, not a log

PolicyThread's attestation chain uses SHA-256 hash chaining. Every evaluation record incorporates the hash of the previous record. Tampering with any record breaks every subsequent link immediately. This is not logging — it is cryptographic proof. Hand it to a regulator. They can verify it independently.


The Story

Built at the layer everyone
else is ignoring.

Everyone is building AI applications. The gap — the same gap that existed in the early internet — is in what AI needs to be industrial-grade. Not smarter. Not faster. More reliable, more safe, more trusted, more steerable, more accountable. The Thread Suite is that layer. Nine tools. One person. Accra, Ghana. Celeron processor. 4GB RAM. Borrowed data some days. $0 infrastructure cost. Every tool live. Every API responding. Organic downloads with zero marketing. Research submitted to Anthropic. If it runs here — it runs anywhere.

March '26
ICA submitted to Anthropic
$0
Total infrastructure cost
9
Tools live in production

The Research Arm — ICA

The Infinite Conversation Architecture — a five-component memory framework that makes AI conversation genuinely continuous without growing the context window. The novel contribution: retrieval that begins while the user is still typing, adding zero latency. Distributed Memory Verification Protocol for cryptographic attestation of AI memory across agent instances. Published open-source, submitted to Anthropic.

View on GitHub ↗

Support

Keep the suite running
and growing.

Everything in the Thread Suite is open source and free. If it has saved you time, helped you ship, or given you ideas — consider supporting the work. Every contribution goes directly toward infrastructure, development, and keeping the tools live.

Donations coming soon. For now, the best support is using the tools, starring the repos, and telling another developer.


Get in Touch

Questions, suggestions,
complaints — all welcome.

General Contact

For partnerships, enterprise inquiries, or anything that needs a human response.

bitelance.team@gmail.com

Bug Reports & Suggestions

Open an issue on the relevant tool repo. Everything is tracked publicly and responded to.

github.com/eugene001dayne

Developer Questions

Have a question about integrating a tool? Drop it here and it will get answered. The best questions get added to the Learn section.

Coming Soon — Ask the Suite

A chatbot trained on every tool, every guide, every decision behind the architecture. Ask anything about the Thread Suite.