What Is Lakehouse Architecture?
Lakehouse architecture is the technical design that allows a single platform to serve both the structured, governed workloads of a data warehouse and the flexible, large-scale workloads of a data lake. Where the data lakehouse is the concept, lakehouse architecture is how it is actually built: the open table formats, the separation of storage and compute, the layered refinement of data, and the governance that together let one copy of the data support reporting, data science, and AI. Understanding the architecture is what separates a lakehouse that works from a data lake with extra steps.
The architecture emerged to solve a specific problem. Traditional warehouses gave reliability and governance but could not handle unstructured data or scale economically. Data lakes gave scale and flexibility but lacked the transactions, performance, and governance that reporting requires. Lakehouse architecture adds warehouse capabilities directly on top of open, lake-scale storage, so an organization no longer has to run and reconcile two separate systems.
The architecture is where a lakehouse succeeds or struggles. Open table formats and a clean layered structure are what let one governed copy of the data serve a controller’s report and a data scientist’s model. When the architecture is sound, the rest tends to follow.
Marla Nelson, CTO, QuickLaunch Analytics
The Layers of Lakehouse Architecture
A lakehouse is built from several architectural layers that work together:
Storage layer. Data sits in low-cost cloud object storage in open file formats such as Parquet. This is the foundation that gives the lakehouse its scale and economics, the same storage a data lake uses.
Table format layer. An open table format, such as Delta Lake or Apache Iceberg, sits on top of the storage. This layer is what adds the warehouse-like capabilities: transactions, reliability, schema enforcement, and time travel. It is the single most important architectural element, because it is what turns raw files into reliable tables.
Compute layer. Separated from storage, compute engines spin up to process queries and transformations and spin down when idle. Because storage and compute scale independently, an organization pays for processing only when it runs and stores data cheaply in between.
Governance and metadata layer. Across the whole architecture, a governance layer manages security, access control, lineage, and the catalog of what data exists. This is what makes the lakehouse trustworthy enough for enterprise reporting, not just experimentation.
Semantic layer. On top of the governed, refined data, a semantic layer translates it into business terms for reporting and AI. This is where the architecture connects to the people and tools that consume it.
Open Table Formats: The Heart of the Architecture
The defining innovation of lakehouse architecture is the open table format. Delta Lake and Apache Iceberg are the two most widely used. They work by maintaining a transaction log and metadata alongside the data files, which gives capabilities that raw files in a data lake never had: atomic transactions so concurrent writes do not corrupt data, schema enforcement so the structure stays consistent, and time travel so previous versions of the data can be queried.
These capabilities are what allow reporting to trust the data. A data lake without them was often a “data swamp,” unreliable and ungoverned. The table format is the architectural piece that makes the same low-cost storage reliable enough for financial reporting. Microsoft Fabric is built on Delta as its open format, with OneLake as its storage foundation, and Databricks pioneered the Delta format; both implement the same core architectural idea.
The Medallion Pattern
Most lakehouse architectures organize data through a layered refinement pattern, commonly called the medallion architecture, with bronze, silver, and gold stages. Raw data lands in the bronze layer as it arrives from source systems. It is cleaned, conformed, and de-duplicated in the silver layer. It is modeled into business-ready tables in the gold layer, ready for reporting and AI.
This staged approach is more than a convention. It makes data quality traceable, because each stage has a clear purpose and the transformations between them are visible. It also keeps the architecture maintainable, because raw data is always preserved in bronze and can be reprocessed if a downstream model changes. For ERP data in particular, where the raw structure is complex, the medallion pattern provides a clear path from raw source tables to clean business entities.
Lakehouse Architecture in ERP Environments
For organizations whose core data lives in ERP systems, lakehouse architecture is where that data is brought together and refined. Operational data from JD Edwards, NetSuite, Vista, or OneStream lands in the bronze layer, is cleaned and conformed through silver, and is modeled into governed business entities in gold. The architecture handles the complexity of the ERP source structures while presenting clean, consistent data at the top.
This is especially valuable for organizations running multiple ERPs. The lakehouse is the single governed environment where data from each system is consolidated and reconciled, and the layered architecture provides the structure to do that cleanly. The result is one foundation that serves financial reporting, operational analytics, and AI from the same governed data, regardless of which ERP each piece came from.
Common Challenges and Best Practices
- Choose one table format and commit. Delta Lake and Apache Iceberg both work well. Standardize on one based on the platform rather than mixing them without reason.
- Use the medallion pattern. Staging data from bronze to silver to gold keeps quality traceable and the architecture maintainable. Preserve raw data in bronze so it can be reprocessed.
- Govern from the start. The openness of lakehouse storage is a strength and a risk. Build access control, lineage, and quality into the governance layer as data lands.
- Build the semantic layer. A lakehouse full of governed gold tables still needs a semantic layer to deliver value to reporting and AI. Treat it as part of the architecture.
- Design for both reporting and AI. The architectural advantage of the lakehouse is serving both from one copy. Build it so reporting and AI workloads draw on the same governed data.
Frequently Asked Questions
What is the difference between a data lakehouse and lakehouse architecture?
The data lakehouse is the concept: a platform that combines warehouse and lake capabilities. Lakehouse architecture is the technical design that implements it, the open table formats, separated storage and compute, layered refinement, and governance that make the concept work in practice.
What are open table formats in lakehouse architecture?
Open table formats such as Delta Lake and Apache Iceberg sit on top of low-cost file storage and add warehouse-like capabilities: transactions, schema enforcement, and time travel. They are the architectural element that turns raw files into reliable tables, and they are central to how a lakehouse works.
What is the medallion architecture?
The medallion architecture is the common practice of refining data in stages within a lakehouse: bronze for raw data, silver for cleaned and conformed data, and gold for business-ready tables. It keeps data quality traceable and the architecture maintainable.
Is Microsoft Fabric a lakehouse architecture?
Yes. Microsoft Fabric is built on lakehouse architecture, using Delta as its open table format and OneLake as its storage foundation. Databricks is the other major platform built on this architecture. Both implement the same core design.
Lakehouse Architecture and QuickLaunch’s Approach
The governed data lakehouse architecture is the second of QuickLaunch Analytics’ three data foundations, between automated data pipelines and the enterprise semantic layer. QuickLaunch builds on Microsoft Fabric and Databricks, using open table formats and the medallion pattern to refine enterprise application data from raw source tables into governed, business-ready models.
For customers, this means starting from a sound lakehouse architecture rather than designing and building one. The structure, governance, and refinement are already in place and proven across 250+ enterprise implementations, serving both the reports people build and the AI tools that read the same governed data.