Certification Methodology
Precise documentation of how CertifiedData hashes artifacts, constructs certificate payloads, signs them with Ed25519, and what each element proves.
Overview
CertifiedData acts as a certificate authority for synthetic datasets and AI artifacts. Every certified artifact receives a structured certificate payload that is cryptographically signed with an Ed25519 private key. The certificate records the SHA-256 fingerprints of the artifact files, enabling any party to independently verify that a downloaded file matches its certified version.
Certificate schema versions:
cert.v1— signature and provenance only; no artifact-level hashescert.v2— full artifact hashes (ZIP + inner files) enabling upload-based verification
What is hashed
For cert.v2 certificates, three categories of hashes are recorded:
- ZIP artifact hash — SHA-256 of the final ZIP file bytes exactly as delivered to the user. Stored in
artifact_hash. - Inner file hashes — SHA-256 of each individual file inside the ZIP (e.g.
dataset.csv,manifest.json). Stored ininner_artifacts. - Certificate payload hash — SHA-256 of the canonicalized certificate JSON (RFC 8785). Stored in
hashes.certificate_payload_sha256. This is what the Ed25519 signature covers.
Hashing sequence
The following sequence is performed before certificate issuance:
1. Generate synthetic data rows
2. Write each table to {name}.csv
3. Write manifest.json with table metadata
4. Hash each file:
inner_artifacts["dataset.csv"].sha256 = SHA256(csv_bytes)
inner_artifacts["manifest.json"].sha256 = SHA256(manifest_bytes)
5. Zip the directory:
ZIP = archive(csv + manifest.json)
6. Hash the ZIP:
artifact_hash = SHA256(zip_bytes)
7. Build certificate payload including artifact_hash + inner_artifacts
8. Canonicalize payload (RFC 8785 JSON Canonicalization)
9. SHA-256 hash canonicalized payload
10. Sign the canonical bytes with Ed25519 private key
11. Persist signed certificateHashes are computed from the exact bytes written to disk before upload to cloud storage. This ensures the certified hash matches what the user downloads.
ZIP hash vs CSV hash
The two hash types are related but distinct:
| Field | Covers | Use case |
|---|---|---|
| artifact_hash | Entire ZIP file bytes | Verify you received the exact ZIP |
| inner_artifacts["dataset.csv"] | CSV file bytes only | Verify the CSV if extracted from ZIP |
| inner_artifacts["manifest.json"] | Manifest file bytes only | Verify generation metadata integrity |
A ZIP hash and its CSV hash will always differ because the ZIP contains compression headers and directory structure in addition to the CSV content.
What the Ed25519 signature covers
The Ed25519 signature covers the canonicalized certificate payload bytes using RFC 8785 JSON Canonicalization Scheme. This includes:
- All artifact hashes (
artifact_hash,inner_artifacts) - Certificate metadata (issuer, subject, timestamp, schema version)
- Generation metadata (engine, row count, template ID)
- Payload SHA-256 (
hashes.certificate_payload_sha256) - Clickwrap acceptance record, if present
The signature does not cover: the PDF rendering, the audit vault entry IDs (added after signing), or Supabase storage paths. These are post-signing annotations and are documented as such.
The Ed25519 public key is published at: /.well-known/signing-keys.json
What the certificate proves
A valid CertifiedData certificate proves:
- Integrity — the artifact file has not been modified since certification. Any change to a single byte produces a different SHA-256 that will not match the certified hash.
- Synthetic origin — the certificate records that the dataset was generated by CertifiedData, not sourced from real data.
- Issuer authenticity — the Ed25519 signature confirms the certificate was produced by CertifiedData and has not been forged.
- Provenance — generation metadata (engine, template, timestamp, row count) is bound to the certificate.
A certificate does not prove:
- Statistical quality or fidelity to any original distribution
- Differential privacy, unless
dp_enforced: trueis explicitly set in the genesis section - Freedom from bias — use the Bias Evaluation Registry for bias auditing
How to verify locally
Anyone can verify a CertifiedData certificate without using this website.
Step 1: Fetch the signed manifest
curl -H "Accept: application/certifieddata.manifest+json" \
https://certifieddata.io/verify/{certificate_id}Step 2: Hash your downloaded file
# Linux/macOS sha256sum dataset.zip # macOS alternative shasum -a 256 dataset.zip # Windows PowerShell Get-FileHash dataset.zip -Algorithm SHA256
Step 3: Compare with artifact_hash in the certificate
# The SHA-256 of your file must match the artifact_hash field # in the certificate payload.
Step 4: Verify the Ed25519 signature
# Fetch public key
curl https://certifieddata.io/.well-known/signing-keys.json
# Verify signature using openssl (the payload bytes are the
# canonical JSON of the payload field in the manifest):
echo -n '{canonical_payload_json}' | \
openssl dgst -sha512 -verify pubkey.pem -signature sig.binThe machine-readable manifest endpoint caches for one year (immutable). The canonical payload bytes are computed using RFC 8785 JSON Canonicalization.
Schema versions
cert.v1
Original certificate format. Contains signature, provenance, and generation metadata. Does not contain artifact-level hashes. Upload-based file verification is not possible.
cert.v2 current
Adds artifact_hash, artifact_filename, artifact_mime_type, inner_artifacts, hash_method, certification_scope, and hashing_methodology_version at the root level of the certificate payload. Enables full upload-based verification.
Legacy cert.v1 limitations
Certificates issued before cert.v2 do not contain artifact hashes. Specifically:
- The
artifact_hashfield is absent — you cannot verify a downloaded file against a cert.v1 certificate by hash comparison. - The Ed25519 signature and provenance metadata remain valid — the certificate still proves that a dataset was generated by CertifiedData.
- The
/api/verify-uploadendpoint will returnINVALID_CERTIFICATE_PAYLOADfor cert.v1 certificates.
All new certificates issued from cert.v2 support forward include full artifact hashing. Legacy cert.v1 certificates are clearly labeled on their verification pages.