CertifiedData.io
Docs / Certification Methodology

Certification Methodology

Precise documentation of how CertifiedData hashes artifacts, constructs certificate payloads, signs them with Ed25519, and what each element proves.

Overview

CertifiedData acts as a certificate authority for synthetic datasets and AI artifacts. Every certified artifact receives a structured certificate payload that is cryptographically signed with an Ed25519 private key. The certificate records the SHA-256 fingerprints of the artifact files, enabling any party to independently verify that a downloaded file matches its certified version.

Certificate schema versions:

  • cert.v1 — signature and provenance only; no artifact-level hashes
  • cert.v2 — full artifact hashes (ZIP + inner files) enabling upload-based verification

What is hashed

For cert.v2 certificates, three categories of hashes are recorded:

  1. ZIP artifact hash — SHA-256 of the final ZIP file bytes exactly as delivered to the user. Stored in artifact_hash.
  2. Inner file hashes — SHA-256 of each individual file inside the ZIP (e.g. dataset.csv, manifest.json). Stored in inner_artifacts.
  3. Certificate payload hash — SHA-256 of the canonicalized certificate JSON (RFC 8785). Stored in hashes.certificate_payload_sha256. This is what the Ed25519 signature covers.

Hashing sequence

The following sequence is performed before certificate issuance:

1. Generate synthetic data rows
2. Write each table to {name}.csv
3. Write manifest.json with table metadata
4. Hash each file:
   inner_artifacts["dataset.csv"].sha256 = SHA256(csv_bytes)
   inner_artifacts["manifest.json"].sha256 = SHA256(manifest_bytes)
5. Zip the directory:
   ZIP = archive(csv + manifest.json)
6. Hash the ZIP:
   artifact_hash = SHA256(zip_bytes)
7. Build certificate payload including artifact_hash + inner_artifacts
8. Canonicalize payload (RFC 8785 JSON Canonicalization)
9. SHA-256 hash canonicalized payload
10. Sign the canonical bytes with Ed25519 private key
11. Persist signed certificate

Hashes are computed from the exact bytes written to disk before upload to cloud storage. This ensures the certified hash matches what the user downloads.

ZIP hash vs CSV hash

The two hash types are related but distinct:

FieldCoversUse case
artifact_hashEntire ZIP file bytesVerify you received the exact ZIP
inner_artifacts["dataset.csv"]CSV file bytes onlyVerify the CSV if extracted from ZIP
inner_artifacts["manifest.json"]Manifest file bytes onlyVerify generation metadata integrity

A ZIP hash and its CSV hash will always differ because the ZIP contains compression headers and directory structure in addition to the CSV content.

What the Ed25519 signature covers

The Ed25519 signature covers the canonicalized certificate payload bytes using RFC 8785 JSON Canonicalization Scheme. This includes:

  • All artifact hashes (artifact_hash, inner_artifacts)
  • Certificate metadata (issuer, subject, timestamp, schema version)
  • Generation metadata (engine, row count, template ID)
  • Payload SHA-256 (hashes.certificate_payload_sha256)
  • Clickwrap acceptance record, if present

The signature does not cover: the PDF rendering, the audit vault entry IDs (added after signing), or Supabase storage paths. These are post-signing annotations and are documented as such.

The Ed25519 public key is published at: /.well-known/signing-keys.json

What the certificate proves

A valid CertifiedData certificate proves:

  • Integrity — the artifact file has not been modified since certification. Any change to a single byte produces a different SHA-256 that will not match the certified hash.
  • Synthetic origin — the certificate records that the dataset was generated by CertifiedData, not sourced from real data.
  • Issuer authenticity — the Ed25519 signature confirms the certificate was produced by CertifiedData and has not been forged.
  • Provenance — generation metadata (engine, template, timestamp, row count) is bound to the certificate.

A certificate does not prove:

  • Statistical quality or fidelity to any original distribution
  • Differential privacy, unless dp_enforced: true is explicitly set in the genesis section
  • Freedom from bias — use the Bias Evaluation Registry for bias auditing

How to verify locally

Anyone can verify a CertifiedData certificate without using this website.

Step 1: Fetch the signed manifest

curl -H "Accept: application/certifieddata.manifest+json" \
  https://certifieddata.io/verify/{certificate_id}

Step 2: Hash your downloaded file

# Linux/macOS
sha256sum dataset.zip

# macOS alternative
shasum -a 256 dataset.zip

# Windows PowerShell
Get-FileHash dataset.zip -Algorithm SHA256

Step 3: Compare with artifact_hash in the certificate

# The SHA-256 of your file must match the artifact_hash field
# in the certificate payload.

Step 4: Verify the Ed25519 signature

# Fetch public key
curl https://certifieddata.io/.well-known/signing-keys.json

# Verify signature using openssl (the payload bytes are the
# canonical JSON of the payload field in the manifest):
echo -n '{canonical_payload_json}' | \
  openssl dgst -sha512 -verify pubkey.pem -signature sig.bin

The machine-readable manifest endpoint caches for one year (immutable). The canonical payload bytes are computed using RFC 8785 JSON Canonicalization.

Schema versions

cert.v1

Original certificate format. Contains signature, provenance, and generation metadata. Does not contain artifact-level hashes. Upload-based file verification is not possible.

cert.v2 current

Adds artifact_hash, artifact_filename, artifact_mime_type, inner_artifacts, hash_method, certification_scope, and hashing_methodology_version at the root level of the certificate payload. Enables full upload-based verification.

Legacy cert.v1 limitations

Certificates issued before cert.v2 do not contain artifact hashes. Specifically:

  • The artifact_hash field is absent — you cannot verify a downloaded file against a cert.v1 certificate by hash comparison.
  • The Ed25519 signature and provenance metadata remain valid — the certificate still proves that a dataset was generated by CertifiedData.
  • The /api/verify-upload endpoint will return INVALID_CERTIFICATE_PAYLOAD for cert.v1 certificates.

All new certificates issued from cert.v2 support forward include full artifact hashing. Legacy cert.v1 certificates are clearly labeled on their verification pages.