Synthetic Data Supply Chain · CertifiedData
Certify, sell, and verify synthetic datasets
Every dataset in your supply chain gets a certificate, a payment receipt, and two public verification URLs. Buyers prove compliance without calling you.
Definition
Synthetic data supply chain: A pipeline in which synthetic datasets are cryptographically certified at generation, sold through policy-gated payment flows, and delivered with signed receipts that buyers can independently verify.
The five-step chain
Generate
Dataset is generated synthetically (CTGAN, diffusion, or custom pipeline). No real PII enters.
Certify
CertifiedData hashes the dataset (SHA-256), issues a certificate, and signs it with Ed25519. The certificate_id is permanent.
Learn more →Sell
Buyer's agent creates a transaction, attaches certificate_id + artifact_hash, and captures payment. Receipt is signed inline.
Learn more →Deliver
Buyer receives: the dataset file, the certificate, and the payment receipt — all cryptographically bound.
Learn more →Verify
Anyone verifies dataset integrity and payment proof with two public API calls. No account, no vendor required.
Learn more →What the buyer receives
Three independently verifiable records — all cryptographically bound to each other. No vendor calls, no PDFs, no trust required.
| Item | Proves | How to verify |
|---|---|---|
| Dataset file | The actual asset delivered | sha256sum file → matches artifact_hash in receipt |
| CertifiedData certificate | Dataset is synthetically generated, hash matches, issuer signature valid | GET /api/verify/:certificate_id |
| Payment receipt | Spend was policy-approved, certificate_id is referenced, receipt is Ed25519-signed | GET /api/payments/verify/:receipt_id |
Batch certify + expose commerce endpoints
For sellers with many datasets: certify all of them in one batch, then auto-expose purchase endpoints. Each dataset gets its owncertificate_idand a payment endpoint that handles the full create → attachLinks → capture flow.
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor
import hashlib, requests, time
def certify_dataset(path: Path) -> dict:
sha = hashlib.sha256(path.read_bytes()).hexdigest()
cert = requests.post(
"https://certifieddata.io/api/certify",
headers={"Authorization": f"Bearer {CD_API_KEY}"},
json={
"artifact_type": "synthetic_dataset",
"sha256": sha,
"metadata": { "filename": path.name }
}).json()
return {
"file": str(path),
"sha256": sha,
"certificate_id": cert["certificate_id"]
}
files = list(Path("./datasets").glob("*.parquet"))
# Certify 100 files with 5 parallel workers
with ThreadPoolExecutor(max_workers=5) as exe:
catalog = list(exe.map(certify_dataset, files))
# catalog is now: [{file, sha256, certificate_id}, ...]from certifieddata_payments import CertifiedDataPayments
def sell_dataset(certificate_id: str,
buyer_api_key: str) -> dict:
"""Buyer agent hits this endpoint to purchase."""
item = catalog[certificate_id]
cdp = CertifiedDataPayments(api_key=buyer_api_key)
tx = cdp.transactions.create({
"amount": item["price_cents"],
"currency": "usd",
"payee_id": "merch_dataset_seller",
"rail": "stripe",
})
cdp.transactions.attach_links(
tx["transaction_id"], {
"certificate_id": certificate_id,
"artifact_hash": f"sha256:{item['sha256']}",
"decision_record_id": f"dec_{tx['transaction_id']}",
}
)
capture = cdp.transactions.capture(
tx["transaction_id"]
)
receipt = capture["receipt"]
return {
"receipt": receipt,
"cert_url": item["certificate_url"],
"verify_url": f"certifieddata.io/api/payments"
f"/verify/{receipt['receipt_id']}",
}Each buyer gets: the dataset file, acertificate_url, and averify_url— the full proof bundle in one API response. See the full dataset purchase flow →
Why this matters for regulated buyers
Healthcare & clinical AI
Training data provenance is required for FDA/CE regulatory submissions. A certificate + receipt proves the dataset is synthetic and procured through a governed process.
Financial services
Model risk management frameworks (SR 11-7, DORA) require documentation of training data sourcing. Signed receipts are auditor-ready.
Legal / LegalTech
When synthetic data trains models used in legal workflows, the certification chain proves no real client data was used.
Enterprise AI governance
ISO 42001 and EU AI Act high-risk system requirements include data documentation. A certificate + receipt satisfies both lineage and payment traceability.
AI agent marketplaces
Buyers can resell or pass certified datasets downstream. The certificate travels with the data and is independently verifiable at any point.
Compliance automation
Automated systems can verify the certificate and receipt programmatically — no human review, no vendor contact required.
Related
AI artifact certification →
How CertifiedData certifies datasets and AI artifacts.
Synthetic data certification →
Certificate structure for synthetic datasets.
Dataset purchase flow →
The full create → attachLinks → capture payment flow.
Agent Commerce use cases →
All Agent Commerce use cases and patterns.
Certified outputs →
Artifact → payment → receipt → public verification.
Receipt schema →
Full payment_receipt.v1 field definitions.
Verify a certificate →
Public verification — no account required.
Artifact registry →
Browse certified artifacts in the public registry.
Transparency log →
Platform-wide event log for audit.
Make your synthetic datasets provably trustworthy
Free sandbox. Certificate + receipt on every purchase. Buyers verify independently.