Technical Reference
This reference documents the cryptographic primitives, certificate schema, and public API used by CertifiedData to produce EU AI Act-compliant training data certification artifacts. All verification is deterministic and requires no trust in CertifiedData infrastructure.
Cryptographic Primitives
Applied to the raw bytes of the exported dataset file (ZIP, CSV, or JSON). The hash is computed before any upload or transmission.
Applied to the UTF-8 encoding of the JSON-canonicalized certificate payload (RFC 8785). Signature is returned as standard base64 (stored) and base64url (API responses).
Deterministic serialization of the certificate payload before signing. Ensures the same payload bytes regardless of key ordering or whitespace differences.
Each transparency log entry includes the SHA-256 hash of the previous entry. Any insertion, deletion, or modification of historical entries breaks the chain.
Certificate Schema (certifieddata.cert.v1)
{
// Identity
"certification_id": string, // UUID v4
"schema_version": "certifieddata.cert.v1",
"issuer": "Certified Data LLC",
// Timing
"timestamp": string, // ISO-8601 UTC
// Dataset binding
"dataset_hash": string, // "sha256:{hex}" of canonical dataset content
"hashes": {
"datasets": {
"[filename]": {
"sha256": string, // hex hash of exported file bytes
"size_bytes": number,
"row_count": number
}
}
},
// Generation provenance
"algorithm": string, // e.g. "CTGAN"
"algorithm_spec": {
"name": string,
"version": string,
"parameters": object // engine-specific configuration
},
"rows": number,
"columns": number,
// Trust
"key_id": string, // signing key identifier
"signature": string, // Ed25519 signature, base64url
"transparency_log_sequence": number, // position in append-only log
// Resolution
"verify_url": string
}Verification Procedure
Any verifier can independently confirm a certificate is authentic and the dataset has not been modified since certification, without relying on CertifiedData's servers:
GET https://certifieddata.io/api/certificates/{certification_id}GET https://certifieddata.io/.well-known/signing-keys.json → find entry where key_id matches certificate.key_id → extract public_key_raw_b64url (last 32 bytes of SPKI DER)
payload = stripUndefined(certificate_json) payload_bytes = UTF8(JCS(payload)) // RFC 8785 canonicalization
sig_bytes = base64url_decode(certificate.signature) ed25519.verify(public_key_bytes, sig_bytes, payload_bytes) → must return true
file_hash = sha256(read_file_bytes(dataset_file)) assert file_hash == certificate.hashes.datasets[filename].sha256
Public API Endpoints
All read endpoints are public and require no authentication. Rate-limited to 60 req/min per IP.
Signing Key Lifecycle
CertifiedData uses a single active Ed25519 signing key at any time. Key rotation follows a documented schedule. All historical public keys are retained at /.well-known/signing-keys.json so certificates signed with any prior key remain verifiable indefinitely.
Each signing key entry includes created_at, revoked_at (if applicable), and public_key_raw_b64url — the raw 32-byte Ed25519 public key in base64url format, suitable for direct use in Python's cryptography library, Go's crypto/ed25519 package, and Rust's ed25519-dalek crate.