Generate Synthetic Data
Create statistically accurate synthetic datasets. Every generated artifact receives a machine-verifiable certificate — a cryptographic provenance record proving the dataset was synthetically generated by CertifiedData.
- 10 dataset generations per 24h from 40+ industry templates
- Download the CSV and the matching Ed25519-signed certificate immediately
- Verify the certificate hash + signature at /verify — no key required
- Re-hash and re-verify outside our system with the published public key
Datasets pathway
Other workflows
Generate from template
Create a new synthetic dataset in the UI
Active workflow
Upload manifest
Certify an existing artifact or dataset · signed-in Trust plan required
Sign in to continue →
AI Notary
Notarize any AI artifact instantly · no account required
Open notary →
Advanced generation modes
CertifiedData supports multiple synthesis approaches. Plan requirements shown per capability.
Template-based generation
Choose from 40+ industry schemas across finance, healthcare, energy, retail, manufacturing, and government. Every template produces statistically coherent synthetic records.
Prompt-based generation
Describe your dataset in natural language. The system infers column names, types, constraints, and relationships — then generates a schema-accurate synthetic output.
Upload + synthesize
Upload a real dataset. The engine learns statistical distributions and generates a new dataset that preserves the shape, schema, and correlations — without exposing source records.
Schema-controlled generation
Explicitly define column types, value ranges, cardinality, nullability, and cross-column constraints. Use when statistical inference is insufficient for your compliance use case.
Privacy-preserving generation (coming soon)
A DP-CTGAN engine with epsilon-based privacy accounting is in development for regulated environments. The certificate will record whether differential privacy was enforced and at what epsilon level.
Certified output
Every generated dataset receives an Ed25519-signed certificate binding the dataset's SHA-256 hash to a provenance record. Certificates are machine-verifiable and logged to the public transparency log.
How certification works
CertifiedData acts as a certificate authority for AI artifacts. A certificate is not a badge — it is a cryptographic record binding the dataset to its generation event.
Generate dataset
Template, rows, format
SHA-256 fingerprint
Cryptographic dataset hash
Issue certificate
Provenance record created
Ed25519 signature
Tamper-evident signing
Public verification
Independently verifiable
Tamper-evident
SHA-256 of the dataset bytes is embedded in the certificate payload. Any modification changes the hash.
Ed25519 signed
The certificate payload is signed with CertifiedData's Ed25519 private key. Verification requires no trust in us.
Publicly verifiable
Any auditor can verify a certificate via the public API or the /verify page — without authentication.
Platform capabilities
Need higher limits? View plans →
Certificate levels
Every plan produces a real Ed25519-signed certificate. Higher tiers add lineage fields, retention, and audit exports.
Anonymous & free-account sandbox certificates
- •SHA-256 fingerprint + Ed25519 signature
- •Public verification URL per certificate
- •30-day retention for free accounts
Production-grade provenance with lineage
- •Everything in v1 + registry listing
- •Algorithm version + row/column metadata pinned
- •Indefinite retention
Audit-ready with schema provenance
- •Everything in v2 + schema hash + template hash
- •Decision-record link per generation run
- •JSON / CSV audit exports
Regulated environments
- •Everything in v3 + approval workflow evidence
- •Customer-managed signing keys supported
- •Privacy accounting fields (when DP-CTGAN ships)
From sandbox to production
Start free. The certificate schema stays identical across every tier — only retention and advanced generation modes change.
Sandbox
No credit card$0
- •10 generations / 24h · 1,000 rows
- •Real Ed25519 certificates (cert.v1)
- •No persistence
You're here
Free account
Real API key$0
- •5 jobs / month · 10,000 rows each
- •Certificate persistence (30 days)
- •Public verification URLs
Build
First production$49/mo
- •50 jobs / month
- •Prompt-based + upload+synthesize
- •cert.v2 with registry listing
Trust
Audit-ready$149/mo
- •500 jobs / month
- •Schema-controlled generation
- •cert.v3 + audit exports
Prefer code?
Open-source templates + data safety
Synthetic templates are MIT-licensed. The PII scanner is a separate safety utility — run it against real inputs before you feed them into generation.
git clone https://github.com/certifieddata/certifieddata-synthetic-templates cd certifieddata-synthetic-templates pnpm install && pnpm run example
For compliance, audit, and legal review
Evidence an auditor can verify
EU AI Act · Article 10
Data governance
Each certificate records the synthetic origin of the dataset plus its generation algorithm — the evidence Article 10 expects for training-data governance.
EU AI Act · Article 12
Record-keeping
Certificates are the immutable record of how a training dataset was produced, timestamped and signed by the issuer.
EU AI Act · Article 50
Transparency obligations
Public verification URLs let downstream deployers independently confirm a dataset's synthetic origin before use.
Verify it yourself
Every surface here is a live public endpoint — no account, no API key.
Try the other pillars
Three pillars, one proof system. Each runs anonymously, each produces real signatures.
Frequently asked
Do I need an account to generate a certified dataset?+
No. The anonymous sandbox allows 10 generations per 24 hours, up to 1,000 rows each, using any of the 40+ industry templates. Every run produces a real Ed25519-signed certificate you can verify at /verify.
Is differential privacy enforced?+
Not currently. DP-CTGAN with epsilon-based privacy accounting is in development for enterprise. Today's certificates record the algorithm (CTGAN) and parameters but do not claim a differential-privacy guarantee. We publish this honestly so downstream users are not misled.
What's inside the certificate?+
A canonical (RFC 8785) JSON payload including the dataset SHA-256 fingerprint, certification ID (UUID), ISO-8601 timestamp, issuer, algorithm spec, row and column counts, schema version, and an Ed25519 signature. Machine-verifiable with no dependency on CertifiedData.
How do I verify a generated dataset?+
Hash the downloaded CSV with SHA-256, confirm it matches dataset_hash in the certificate, then verify the Ed25519 signature against the public key at /.well-known/signing-keys.json. Or paste the cert ID at /verify.
What counts as a billable generation?+
Only runs that complete and produce a signed certificate count toward plan limits. Failed runs are not counted.
Can I certify a dataset I already have?+
Yes. Use /upload-manifest for manifest-based certification or /notary to notarize any AI artifact. Both produce the same Ed25519-signed certificate class as /generate.
Generate certified synthetic data
Synthetic data generation creates statistically representative datasets without exposing real-world records. CertifiedData extends this with cryptographic certification: every generated dataset is fingerprinted with SHA-256 and signed with an Ed25519 key, producing a machine-verifiable provenance record.
This transforms a synthetic dataset from an anonymous output into a traceable artifact — one that any auditor, regulator, or downstream system can independently verify without asking CertifiedData.
Why machine-verifiable provenance matters
AI governance frameworks — including the EU AI Act Article 12 (logging obligations) and Article 19 (record-keeping) — require organizations to demonstrate the provenance of training datasets and the integrity of AI outputs.
A certificate issued by CertifiedData provides the immutable audit artifact required for that demonstration. It records what was generated, when, by whom, and with what algorithm — all bound to a cryptographic fingerprint of the artifact itself.
Supported generation workflows
- →Template-based: Select from 40+ pre-built schemas. Generate in seconds. Available on all plans.
- →Prompt-based: Describe your dataset in natural language. The engine infers schema and generates structured output. Build plan.
- →Upload + synthesize: Upload real data to generate a statistically similar synthetic version. No source data is retained. Build plan.
- →Schema-controlled: Explicitly define field types, constraints, and relationships. Trust plan.
- →Manifest upload / notarize existing artifact: Certify a dataset you already have. Use Upload Manifest or AI Notary.
- →CI/CD + API: Generate and certify programmatically via the REST API. Integrate certification into MLOps pipelines.
Use cases for certified synthetic data
AI model training
Generate training data that carries a verifiable certificate of synthetic origin — required by emerging AI governance standards.
Regulatory compliance
Produce datasets meeting EU AI Act, NIST AI RMF, and ISO 42001 documentation requirements for training data provenance.
Privacy-safe data sharing
Share datasets externally without exposing real-world records. Certificates prove synthetic origin to recipients.
Testing environments
Spin up realistic test data with known statistical properties. Certification makes the data traceable through test infrastructure.
Vendor/partner data exchange
Provide counterparties with certified datasets they can independently verify before use in their systems.
Audit and lineage documentation
Establish an immutable record of every dataset used in model development — discoverable in the public transparency log.