Question 1

What is AI data provenance?

Accepted Answer

AI data provenance is the documented history of where a training dataset came from, how it was generated or collected, and whether it has been modified since creation. Cryptographic provenance — like CertifiedData certificates — provides machine-verifiable proof that a dataset was generated synthetically, binding the generation method, timestamp, and file hash to a signed record.

Question 2

Why does AI data provenance matter for compliance?

Accepted Answer

EU AI Act Article 10 requires high-risk AI systems to document the origin, collection method, and processing of training data. Article 12 requires technical documentation of training datasets. Cryptographically signed certificates provide regulators and auditors with machine-readable provenance records that can be independently verified without contacting the data issuer.

Question 3

How does CertifiedData establish data provenance?

Accepted Answer

CertifiedData generates synthetic datasets and immediately issues a certificate recording: the SHA-256 fingerprint of the artifact, the generation algorithm (CTGAN or Light engine), row and column count, generation timestamp, template ID, and issuer identity. This payload is signed with an Ed25519 key whose public counterpart is published at /.well-known/signing-keys.json.

Question 4

Can AI data provenance be verified by a third party?

Accepted Answer

Yes. Any party with access to the dataset file and the certificate can independently verify provenance without using CertifiedData's website. They compute the file's SHA-256 hash, compare it against artifact_hash in the certificate, then verify the Ed25519 signature using the public key. This does not require a CertifiedData account or API key.

Question 5

What is the difference between data provenance and data lineage?

Accepted Answer

Data provenance records the origin and integrity of a dataset at a point in time — where it came from and whether it has been altered. Data lineage records how a dataset was used over time — which models were trained on it, which decisions it influenced, and how it flowed through systems. CertifiedData provides both: certificates for provenance and a Decision Ledger integration for lineage.

AI Data Provenance

What a provenance certificate records

Provenance vs lineage

EU AI Act compliance

Frequently asked questions

What is AI data provenance?

Why does AI data provenance matter for compliance?

How does CertifiedData establish data provenance?

Can AI data provenance be verified by a third party?

What is the difference between data provenance and data lineage?

Prove your dataset's provenance.