Synthetic data certification transforms a synthetic dataset into a verifiable artifact. Rather than relying on documentation or assertions, certification creates a cryptographic record that anyone can independently verify.
CertifiedData acts as a certificate authority for synthetic datasets — issuing Ed25519-signed certificates that bind a SHA-256 fingerprint of the dataset to its generation metadata.
The certification record
When you generate a synthetic dataset with CertifiedData, the platform immediately hashes the output using SHA-256. This hash becomes the dataset's fingerprint — a stable identifier that changes if even a single byte of the dataset is modified.
The fingerprint is included in a structured certificate payload alongside generation metadata: algorithm, row count, schema, timestamp, and issuer. The payload is signed with an Ed25519 private key.
- dataset_hash: SHA-256 fingerprint of the output
- algorithm: CTGAN, Gaussian, or other generation engine
- rows: exact row count at generation time
- timestamp: ISO-8601 generation timestamp
- issuer: Certified Data LLC
- signature: Ed25519 over the full payload
Who uses synthetic data certification
Data science teams certify synthetic training datasets to provide verifiable provenance when sharing with model training teams or including in an AIBOM.
Compliance teams use certificates as evidence that training data meets documentation requirements under frameworks like the EU AI Act.
Enterprise procurement teams require certificates before accepting synthetic data components from third-party suppliers.
Verification without data access
One of the key properties of synthetic data certification is that verification does not require access to the original data. A verifier only needs the dataset (or a hash of it) and the certificate ID.
Using the public signing key from the CertifiedData registry, the verifier can independently confirm: (1) the dataset hash matches the certificate, and (2) the certificate signature is valid.
Frequently asked questions
Is certification the same as generating synthetic data?
No. Generation creates the synthetic dataset. Certification is the subsequent step that creates a cryptographic record proving the dataset's synthetic origin and integrity.
What generation algorithms does CertifiedData certify?
CertifiedData certifies datasets generated by CTGAN, Gaussian copula, light sampling, and hybrid engines. The specific algorithm is recorded in the certificate payload.
Generate and certify synthetic data
Create a certified synthetic dataset in minutes. Every dataset receives a cryptographic certificate with SHA-256 fingerprint.