← All projects

Lamin

Open data lakehouse for biology, context and memory at scale

Data & Analyticsbioinformaticsdata-lakehouseopen-sourcedata-lineagescientific-databiologypython
Lamin screenshot

About

Lamin is an open-source data platform designed for biology research teams, providing a lineage-native lakehouse that supports biological file formats, registries, and ontologies. It enables tracked data management across infrastructure by letting users query, trace, and validate datasets and models with a single line of code. The platform integrates with relational metadata stores and supports major cloud providers while maintaining zero vendor lock-in.

Problem

Biological research teams lack a unified, traceable way to manage datasets, models, and metadata at scale across diverse infrastructure.

For

Biology research teams and bioinformatics engineers

How it works

Lamin wraps storage (S3, GCP, Azure) and databases (Postgres, SQLite) with a lineage-tracking layer and Python/R API that records data provenance, enforces schemas, and maps assets into a queryable lakehouse supporting bio-formats like AnnData and Zarr.

Business model

open-source

Status

launched

Similar projects