Generate Synthetic Data
Create statistically accurate synthetic datasets. Every generated artifact receives a machine-verifiable certificate β a cryptographic provenance record proving the dataset was synthetically generated by CertifiedData.
Datasets pathway
Other workflows
Generate from template
Create a new synthetic dataset in the UI
Active workflow
Upload manifest
Certify an existing artifact or dataset
Upload manifest β
AI Notary
Notarize any AI artifact instantly
Open notary β
Advanced generation modes
CertifiedData supports multiple synthesis approaches. Plan requirements shown per capability.
Template-based generation
Choose from 40+ industry schemas across finance, healthcare, energy, retail, manufacturing, and government. Every template produces statistically coherent synthetic records.
Prompt-based generation
Describe your dataset in natural language. The system infers column names, types, constraints, and relationships β then generates a schema-accurate synthetic output.
Upload + synthesize
Upload a real dataset. The engine learns statistical distributions and generates a new dataset that preserves the shape, schema, and correlations β without exposing source records.
Schema-controlled generation
Explicitly define column types, value ranges, cardinality, nullability, and cross-column constraints. Use when statistical inference is insufficient for your compliance use case.
Privacy-preserving generation (coming soon)
A DP-CTGAN engine with epsilon-based privacy accounting is in development for regulated environments. The certificate will record whether differential privacy was enforced and at what epsilon level.
Certified output
Every generated dataset receives an Ed25519-signed certificate binding the dataset's SHA-256 hash to a provenance record. Certificates are machine-verifiable and logged to the public transparency log.
How certification works
CertifiedData acts as a certificate authority for AI artifacts. A certificate is not a badge β it is a cryptographic record binding the dataset to its generation event.
Generate dataset
Template, rows, format
SHA-256 fingerprint
Cryptographic dataset hash
Issue certificate
Provenance record created
Ed25519 signature
Tamper-evident signing
Public verification
Independently verifiable
Tamper-evident
SHA-256 of the dataset bytes is embedded in the certificate payload. Any modification changes the hash.
Ed25519 signed
The certificate payload is signed with CertifiedData's Ed25519 private key. Verification requires no trust in us.
Publicly verifiable
Any auditor can verify a certificate via the public API or the /verify page β without authentication.
Platform capabilities
Need higher limits? View plans β
Generate certified synthetic data
Synthetic data generation creates statistically representative datasets without exposing real-world records. CertifiedData extends this with cryptographic certification: every generated dataset is fingerprinted with SHA-256 and signed with an Ed25519 key, producing a machine-verifiable provenance record.
This transforms a synthetic dataset from an anonymous output into a traceable artifact β one that any auditor, regulator, or downstream system can independently verify without asking CertifiedData.
Why machine-verifiable provenance matters
AI governance frameworks β including the EU AI Act Article 12 (logging obligations) and Article 19 (record-keeping) β require organizations to demonstrate the provenance of training datasets and the integrity of AI outputs.
A certificate issued by CertifiedData provides the immutable audit artifact required for that demonstration. It records what was generated, when, by whom, and with what algorithm β all bound to a cryptographic fingerprint of the artifact itself.
Supported generation workflows
- βTemplate-based: Select from 40+ pre-built schemas. Generate in seconds. Available on all plans.
- βPrompt-based: Describe your dataset in natural language. The engine infers schema and generates structured output. Pro plan.
- βUpload + synthesize: Upload real data to generate a statistically similar synthetic version. No source data is retained. Pro plan.
- βSchema-controlled: Explicitly define field types, constraints, and relationships. Team plan.
- βManifest upload / notarize existing artifact: Certify a dataset you already have. Use Upload Manifest or AI Notary.
- βCI/CD + API: Generate and certify programmatically via the REST API. Integrate certification into MLOps pipelines.
Use cases for certified synthetic data
AI model training
Generate training data that carries a verifiable certificate of synthetic origin β required by emerging AI governance standards.
Regulatory compliance
Produce datasets meeting EU AI Act, NIST AI RMF, and ISO 42001 documentation requirements for training data provenance.
Privacy-safe data sharing
Share datasets externally without exposing real-world records. Certificates prove synthetic origin to recipients.
Testing environments
Spin up realistic test data with known statistical properties. Certification makes the data traceable through test infrastructure.
Vendor/partner data exchange
Provide counterparties with certified datasets they can independently verify before use in their systems.
Audit and lineage documentation
Establish an immutable record of every dataset used in model development β discoverable in the public transparency log.