Lineage

Data Lineage Tools

A directory of data lineage and catalog tools relevant to AI auditability, training data traceability, and artifact provenance.

Buyer intent

For data, governance, and platform teams that need to trace inputs, transformations, datasets, and AI artifacts across systems.

About this category

Data lineage tools in this directory track the lineage of data, features, datasets, training data, and AI artifacts across systems. This category includes pure-lineage platforms, catalogs with lineage as a core capability, ML feature stores with lineage tracking, AI artifact registries with provenance chains, and data observability platforms that expose lineage edges. It does not include pure data quality monitoring without lineage, pure metadata management without lineage edges, or model registries that do not track input-data lineage. The focus is on trace-through capability and the evidence trail that results.

Buyers are evaluating whether they can trace any AI output back through the model and the training or fine-tuning data that produced it, down to sources and transformations. They also need to know whether lineage can be exported, reconstructed, and preserved for regulated reviews, such as Article 12 record-keeping under the EU AI Act Article 12 record-keeping. Engineering cares about coverage and recoverability when systems change; auditors care that the lineage is the same record that existed at a past point in time and that it supports Article 12 evidence.

The ladder clarifies what has been validated. Listed means the vendor operates in lineage tooling; nothing about completeness or robustness is implied. Vendor-submitted means the vendor has provided lineage capability documentation, integration examples, or sample lineage exports that show how sources, transformations, and outputs connect. Public-source reviewed means CertifiedData verified those claims against public material. Evidence-reviewed means primary evidence was assessed — actual lineage exports from a running deployment, the capture mechanism behind edges, strategies for late-arriving data or schema drift, and whether lineage remains recoverable after source-system changes.

Certified is for specific lineage exports — for example, the complete training-data lineage for a particular high-risk AI system on a specific date. The export is hashed with SHA-256, any JSON manifest is JCS-canonicalized per RFC 8785, and an Ed25519 signature binds content to an issuer and timestamp. Certification makes the lineage artifact tamper-evident and independently verifiable, similar to how a logged AI decision record can be protected for Article 12 review. It does not assert that the lineage is complete or correct; it proves that the certified export is the same artifact that was originally issued.

Profiles

Vendors in this category

Directory methodology →

Seed profiles for this category are being prepared. The category is live so future vendor records can be added without changing the URL architecture.