CertifiedData.io
EU AI Act · Technical Reference

Technical Reference

This reference documents the cryptographic primitives, certificate schema, and public API used by CertifiedData to produce EU AI Act-compliant training data certification artifacts. All verification is deterministic and requires no trust in CertifiedData infrastructure.

Cryptographic Primitives

Dataset fingerprintingSHA-256FIPS 180-4

Applied to the raw bytes of the exported dataset file (ZIP, CSV, or JSON). The hash is computed before any upload or transmission.

Certificate signingEd25519RFC 8032

Applied to the UTF-8 encoding of the JSON-canonicalized certificate payload (RFC 8785). Signature is returned as standard base64 (stored) and base64url (API responses).

JSON canonicalizationJCS (RFC 8785)RFC 8785

Deterministic serialization of the certificate payload before signing. Ensures the same payload bytes regardless of key ordering or whitespace differences.

Log integritySHA-256 hash chainingInternal v1

Each transparency log entry includes the SHA-256 hash of the previous entry. Any insertion, deletion, or modification of historical entries breaks the chain.

Certificate Schema (certifieddata.cert.v1)

{
  // Identity
  "certification_id": string,              // UUID v4
  "schema_version": "certifieddata.cert.v1",
  "issuer": "Certified Data LLC",

  // Timing
  "timestamp": string,                     // ISO-8601 UTC

  // Dataset binding
  "dataset_hash": string,                  // "sha256:{hex}" of canonical dataset content
  "hashes": {
    "datasets": {
      "[filename]": {
        "sha256": string,                  // hex hash of exported file bytes
        "size_bytes": number,
        "row_count": number
      }
    }
  },

  // Generation provenance
  "algorithm": string,                     // e.g. "CTGAN"
  "algorithm_spec": {
    "name": string,
    "version": string,
    "parameters": object                   // engine-specific configuration
  },
  "rows": number,
  "columns": number,

  // Trust
  "key_id": string,                        // signing key identifier
  "signature": string,                     // Ed25519 signature, base64url
  "transparency_log_sequence": number,    // position in append-only log

  // Resolution
  "verify_url": string
}

Verification Procedure

Any verifier can independently confirm a certificate is authentic and the dataset has not been modified since certification, without relying on CertifiedData's servers:

1.Fetch the certificate
GET https://certifieddata.io/api/certificates/{certification_id}
2.Fetch the signing public key
GET https://certifieddata.io/.well-known/signing-keys.json
→ find entry where key_id matches certificate.key_id
→ extract public_key_raw_b64url (last 32 bytes of SPKI DER)
3.Reconstruct the signed payload
payload = stripUndefined(certificate_json)
payload_bytes = UTF8(JCS(payload))  // RFC 8785 canonicalization
4.Verify the Ed25519 signature
sig_bytes = base64url_decode(certificate.signature)
ed25519.verify(public_key_bytes, sig_bytes, payload_bytes)
→ must return true
5.Verify the dataset hash
file_hash = sha256(read_file_bytes(dataset_file))
assert file_hash == certificate.hashes.datasets[filename].sha256

Public API Endpoints

GET/api/certificates/{id}Retrieve a certificate by UUID
GET/api/log?limit=NTransparency log — most recent N entries
POST/api/verify/{id}Verify a certificate (no auth required)
GET/.well-known/signing-keys.jsonActive and historical signing public keys
GET/.well-known/certifieddata-registry.jsonPlatform metadata and verification spec
GET/api/decision-log?limit=NPublic decision log entries

All read endpoints are public and require no authentication. Rate-limited to 60 req/min per IP.

Signing Key Lifecycle

CertifiedData uses a single active Ed25519 signing key at any time. Key rotation follows a documented schedule. All historical public keys are retained at /.well-known/signing-keys.json so certificates signed with any prior key remain verifiable indefinitely.

Each signing key entry includes created_at, revoked_at (if applicable), and public_key_raw_b64url — the raw 32-byte Ed25519 public key in base64url format, suitable for direct use in Python's cryptography library, Go's crypto/ed25519 package, and Rust's ed25519-dalek crate.