Medallion Architecture in Snowflake

In 2026, setting up a medallion architecture in Snowflake has become a standard approach for building scalable, governed, and AI-ready data platforms. Rather than being just a conceptual framework, it is now a practical engineering pattern used to structure ingestion, transformation, and consumption layers in a way that improves data quality and operational clarity.

This guide focuses on how to actually set it up inside Snowflake, from database design through to pipeline orchestration and modelling. The goal is to move from raw source data to trusted, analytics-ready datasets using a repeatable structure that can scale across multiple business domains.

Step 1: Design Your Snowflake Structure (Databases & Schemas)

The first step is defining a clear physical structure inside Snowflake. Most teams implement medallion architecture using either separate databases or schemas within a single database.

A common and clean setup looks like this:

RAW_DB (Bronze) – stores ingested data exactly as received
CURATED_DB (Silver) – cleaned, validated, and standardised datasets
ANALYTICS_DB (Gold) – aggregated, business-ready data models

Inside each database, you typically mirror schemas by source system or domain (e.g. salesforce, shopify, erp). This keeps ingestion modular and prevents cross-domain coupling.

Step 2: Set Up the Bronze Layer (Raw Ingestion)

The Bronze layer is your immutable landing zone. Data should be ingested with minimal transformation so you always retain a source-of-truth copy.

In Snowflake, ingestion typically happens via:

Snowpipe for continuous ingestion
Bulk COPY INTO for scheduled loads
External stages (S3, Azure Blob, GCS)

A recommended pattern is to store raw files as VARIANT columns for flexibility:

Example table design:

RAW_DB.SALESFORCE.OPPORTUNITIES_RAW

Columns typically include:

LOAD_ID – batch identifier
INGESTED_AT – timestamp of ingestion
RAW_DATA – VARIANT JSON payload
SOURCE_FILE – origin reference

The key principle here is: never overwrite raw data. Instead, append and version it for full auditability.

Step 3: Build the Silver Layer (Transformation & Cleaning)

The Silver layer is where data becomes usable. This is typically built using SQL transformations, dbt models, or Snowpark pipelines.

Core responsibilities include:

Flattening semi-structured data (JSON → relational tables)
Removing duplicates and null invalid records
Standardising date formats, currencies, and IDs
Joining related datasets (e.g. customers + orders)

Example transformation flow:

RAW_DB.SALESFORCE.OPPORTUNITIES_RAW → CURATED_DB.SALESFORCE.OPPORTUNITIES_CLEAN

A best practice is to enforce business rules here using views or materialised tables so downstream layers remain consistent.

For example, filtering invalid records:

– Remove records with missing customer IDs
– Exclude closed-lost opportunities with zero value
– Standardise statuses into controlled vocabulary

Step 4: Build the Gold Layer (Business Models)

The Gold layer is where data becomes analytics-ready. This is where KPI tables, reporting models, and dimensional structures are created for tools like Power BI or Tableau.

Common structures include:

Fact tables (sales, revenue, transactions)
Dimension tables (customers, products, time)
Aggregated KPI tables (daily revenue, cohort retention)

Example Gold model:

ANALYTICS_DB.SALES.FACT_DAILY_SALES

This might include:

Date
Product ID
Total Revenue
Units Sold
Customer Segment

At this stage, performance optimisation becomes important — using clustering keys, materialised views, or pre-aggregations to support fast dashboard performance.

Step 5: Orchestrate the Pipeline

Once the layers are defined, you need orchestration to automate movement between them. In Snowflake ecosystems, this is typically handled using:

Snowflake Tasks for scheduling SQL transformations
Streams for change data capture (CDC)
External orchestration tools (ADF, Airflow, dbt Cloud)

A typical flow looks like:

1. Snowpipe ingests raw data into Bronze
2. Stream detects new records
3. Task triggers transformation into Silver
4. Second Task aggregates into Gold

This creates an automated, near real-time data pipeline with minimal manual intervention.

Step 6: Apply Governance and Data Quality Controls

One of the biggest advantages of medallion architecture is built-in governance. However, it still needs to be explicitly configured.

Key practices include:

Role-based access control (RBAC) per layer
Data lineage tracking between Bronze → Gold
Automated validation checks in Silver layer
Monitoring failed loads and schema changes

Most mature setups also enforce strict “no direct access” rules to Bronze data, ensuring only Silver and Gold layers are consumed by analytics tools.

Step 7: Connect to BI and Analytics Tools

Once Gold datasets are ready, they are exposed to BI tools such as Power BI, Tableau, or Looker. At this stage, the goal is performance and simplicity — not transformation.

Best practice is to treat Gold tables as the single source of truth for reporting, avoiding complex logic in BI tools themselves.

Final Thoughts

Setting up a medallion architecture in Snowflake is less about technology and more about discipline in how data is structured and transformed. When implemented correctly, it creates a scalable foundation that supports analytics, AI, and operational reporting without constant rework.

In 2026, organisations that standardise on this layered approach are seeing faster delivery cycles, higher trust in data, and significantly reduced engineering overhead — making it one of the most practical modern data architecture patterns available.