CertifiedData.io
No account · Cryptographically real · ~90s

Generate Synthetic Data

Create statistically accurate synthetic datasets. Every generated artifact receives a machine-verifiable certificate — a cryptographic provenance record proving the dataset was synthetically generated by CertifiedData.

  • 10 dataset generations per 24h from 40+ industry templates
  • Download the CSV and the matching Ed25519-signed certificate immediately
  • Verify the certificate hash + signature at /verify — no key required
  • Re-hash and re-verify outside our system with the published public key

Other workflows

Advanced generation modes

CertifiedData supports multiple synthesis approaches. Plan requirements shown per capability.

📝

Template-based generation

Choose from 40+ industry schemas across finance, healthcare, energy, retail, manufacturing, and government. Every template produces statistically coherent synthetic records.

💬

Prompt-based generation

Pro

Describe your dataset in natural language. The system infers column names, types, constraints, and relationships — then generates a schema-accurate synthetic output.

⬆️

Upload + synthesize

Pro

Upload a real dataset. The engine learns statistical distributions and generates a new dataset that preserves the shape, schema, and correlations — without exposing source records.

🧬

Schema-controlled generation

Team

Explicitly define column types, value ranges, cardinality, nullability, and cross-column constraints. Use when statistical inference is insufficient for your compliance use case.

🔒

Privacy-preserving generation (coming soon)

Enterprise

A DP-CTGAN engine with epsilon-based privacy accounting is in development for regulated environments. The certificate will record whether differential privacy was enforced and at what epsilon level.

🔐

Certified output

Every generated dataset receives an Ed25519-signed certificate binding the dataset's SHA-256 hash to a provenance record. Certificates are machine-verifiable and logged to the public transparency log.

How certification works

CertifiedData acts as a certificate authority for AI artifacts. A certificate is not a badge — it is a cryptographic record binding the dataset to its generation event.

1

Generate dataset

Template, rows, format

2

SHA-256 fingerprint

Cryptographic dataset hash

3

Issue certificate

Provenance record created

4

Ed25519 signature

Tamper-evident signing

5

Public verification

Independently verifiable

Tamper-evident

SHA-256 of the dataset bytes is embedded in the certificate payload. Any modification changes the hash.

Ed25519 signed

The certificate payload is signed with CertifiedData's Ed25519 private key. Verification requires no trust in us.

Publicly verifiable

Any auditor can verify a certificate via the public API or the /verify page — without authentication.

Platform capabilities

40+ industry templatesAvailable
CSV exportAvailable
JSON/JSONL exportAvailable
Parquet exportComing soon
Ed25519 dataset certificationAvailable
SHA-256 fingerprintingAvailable
Public transparency logAvailable
Prompt-based generationBuild
Upload + synthesizeBuild
Schema-controlled generationTrust
Privacy-preserving generation (DP-CTGAN)Coming soon
CI/CD pipeline integrationAvailable

Need higher limits? View plans →

Certificate levels

Every plan produces a real Ed25519-signed certificate. Higher tiers add lineage fields, retention, and audit exports.

cert.v1Free

Anonymous & free-account sandbox certificates

  • SHA-256 fingerprint + Ed25519 signature
  • Public verification URL per certificate
  • 30-day retention for free accounts
cert.v2Build

Production-grade provenance with lineage

  • Everything in v1 + registry listing
  • Algorithm version + row/column metadata pinned
  • Indefinite retention
cert.v3Trust

Audit-ready with schema provenance

  • Everything in v2 + schema hash + template hash
  • Decision-record link per generation run
  • JSON / CSV audit exports
cert.v4Govern

Regulated environments

  • Everything in v3 + approval workflow evidence
  • Customer-managed signing keys supported
  • Privacy accounting fields (when DP-CTGAN ships)

From sandbox to production

Start free. The certificate schema stays identical across every tier — only retention and advanced generation modes change.

Full pricing →

Sandbox

No credit card

$0

  • 10 generations / 24h · 1,000 rows
  • Real Ed25519 certificates (cert.v1)
  • No persistence

You're here

Free account

Real API key

$0

  • 5 jobs / month · 10,000 rows each
  • Certificate persistence (30 days)
  • Public verification URLs
Start free

Build

First production

$49/mo

  • 50 jobs / month
  • Prompt-based + upload+synthesize
  • cert.v2 with registry listing
Start Build

Trust

Audit-ready

$149/mo

  • 500 jobs / month
  • Schema-controlled generation
  • cert.v3 + audit exports
Start Trust

Prefer code?

Open-source templates + data safety

Synthetic templates are MIT-licensed. The PII scanner is a separate safety utility — run it against real inputs before you feed them into generation.

git clone https://github.com/certifieddata/certifieddata-synthetic-templates
cd certifieddata-synthetic-templates
pnpm install && pnpm run example

For compliance, audit, and legal review

Evidence an auditor can verify

Article 12 deep-dive →

EU AI Act · Article 10

Data governance

Each certificate records the synthetic origin of the dataset plus its generation algorithm — the evidence Article 10 expects for training-data governance.

EU AI Act · Article 12

Record-keeping

Certificates are the immutable record of how a training dataset was produced, timestamped and signed by the issuer.

EU AI Act · Article 50

Transparency obligations

Public verification URLs let downstream deployers independently confirm a dataset's synthetic origin before use.

Frequently asked

Do I need an account to generate a certified dataset?+

No. The anonymous sandbox allows 10 generations per 24 hours, up to 1,000 rows each, using any of the 40+ industry templates. Every run produces a real Ed25519-signed certificate you can verify at /verify.

Is differential privacy enforced?+

Not currently. DP-CTGAN with epsilon-based privacy accounting is in development for enterprise. Today's certificates record the algorithm (CTGAN) and parameters but do not claim a differential-privacy guarantee. We publish this honestly so downstream users are not misled.

What's inside the certificate?+

A canonical (RFC 8785) JSON payload including the dataset SHA-256 fingerprint, certification ID (UUID), ISO-8601 timestamp, issuer, algorithm spec, row and column counts, schema version, and an Ed25519 signature. Machine-verifiable with no dependency on CertifiedData.

How do I verify a generated dataset?+

Hash the downloaded CSV with SHA-256, confirm it matches dataset_hash in the certificate, then verify the Ed25519 signature against the public key at /.well-known/signing-keys.json. Or paste the cert ID at /verify.

What counts as a billable generation?+

Only runs that complete and produce a signed certificate count toward plan limits. Failed runs are not counted.

Can I certify a dataset I already have?+

Yes. Use /upload-manifest for manifest-based certification or /notary to notarize any AI artifact. Both produce the same Ed25519-signed certificate class as /generate.

Generate certified synthetic data

Synthetic data generation creates statistically representative datasets without exposing real-world records. CertifiedData extends this with cryptographic certification: every generated dataset is fingerprinted with SHA-256 and signed with an Ed25519 key, producing a machine-verifiable provenance record.

This transforms a synthetic dataset from an anonymous output into a traceable artifact — one that any auditor, regulator, or downstream system can independently verify without asking CertifiedData.

Why machine-verifiable provenance matters

AI governance frameworks — including the EU AI Act Article 12 (logging obligations) and Article 19 (record-keeping) — require organizations to demonstrate the provenance of training datasets and the integrity of AI outputs.

A certificate issued by CertifiedData provides the immutable audit artifact required for that demonstration. It records what was generated, when, by whom, and with what algorithm — all bound to a cryptographic fingerprint of the artifact itself.

Supported generation workflows

  • Template-based: Select from 40+ pre-built schemas. Generate in seconds. Available on all plans.
  • Prompt-based: Describe your dataset in natural language. The engine infers schema and generates structured output. Build plan.
  • Upload + synthesize: Upload real data to generate a statistically similar synthetic version. No source data is retained. Build plan.
  • Schema-controlled: Explicitly define field types, constraints, and relationships. Trust plan.
  • Manifest upload / notarize existing artifact: Certify a dataset you already have. Use Upload Manifest or AI Notary.
  • CI/CD + API: Generate and certify programmatically via the REST API. Integrate certification into MLOps pipelines.

Use cases for certified synthetic data

AI model training

Generate training data that carries a verifiable certificate of synthetic origin — required by emerging AI governance standards.

Regulatory compliance

Produce datasets meeting EU AI Act, NIST AI RMF, and ISO 42001 documentation requirements for training data provenance.

Privacy-safe data sharing

Share datasets externally without exposing real-world records. Certificates prove synthetic origin to recipients.

Testing environments

Spin up realistic test data with known statistical properties. Certification makes the data traceable through test infrastructure.

Vendor/partner data exchange

Provide counterparties with certified datasets they can independently verify before use in their systems.

Audit and lineage documentation

Establish an immutable record of every dataset used in model development — discoverable in the public transparency log.

Generate Synthetic Data — No Login Required | CertifiedData