In 2026, setting up a medallion architecture in Snowflake has become a standard approach for building scalable, governed, and AI-ready data platforms. Rather than being just a conceptual framework, it is now a practical engineering pattern used to structure ingestion, transformation, and consumption layers in a way that improves data quality and operational clarity.
This guide focuses on how to actually set it up inside Snowflake, from database design through to pipeline orchestration and modelling. The goal is to move from raw source data to trusted, analytics-ready datasets using a repeatable structure that can scale across multiple business domains.
Step 1: Design Your Snowflake Structure (Databases & Schemas)
The first step is defining a clear physical structure inside Snowflake. Most teams implement medallion architecture using either separate databases or schemas within a single database.
A common and clean setup looks like this:
- RAW_DB (Bronze) – stores ingested data exactly as received
- CURATED_DB (Silver) – cleaned, validated, and standardised datasets
- ANALYTICS_DB (Gold) – aggregated, business-ready data models
Inside each database, you typically mirror schemas by source system or domain (e.g. salesforce, shopify, erp). This keeps ingestion modular and prevents cross-domain coupling.
Step 2: Set Up the Bronze Layer (Raw Ingestion)
The Bronze layer is your immutable landing zone. Data should be ingested with minimal transformation so you always retain a source-of-truth copy.
In Snowflake, ingestion typically happens via:
- Snowpipe for continuous ingestion
- Bulk COPY INTO for scheduled loads
- External stages (S3, Azure Blob, GCS)
A recommended pattern is to store raw files as VARIANT columns for flexibility:
Example table design:
RAW_DB.SALESFORCE.OPPORTUNITIES_RAW
Columns typically include:
LOAD_ID– batch identifierINGESTED_AT– timestamp of ingestionRAW_DATA– VARIANT JSON payloadSOURCE_FILE– origin reference
The key principle here is: never overwrite raw data. Instead, append and version it for full auditability.
Step 3: Build the Silver Layer (Transformation & Cleaning)
The Silver layer is where data becomes usable. This is typically built using SQL transformations, dbt models, or Snowpark pipelines.
Core responsibilities include:
- Flattening semi-structured data (JSON → relational tables)
- Removing duplicates and null invalid records
- Standardising date formats, currencies, and IDs
- Joining related datasets (e.g. customers + orders)
Example transformation flow:
RAW_DB.SALESFORCE.OPPORTUNITIES_RAW → CURATED_DB.SALESFORCE.OPPORTUNITIES_CLEAN
A best practice is to enforce business rules here using views or materialised tables so downstream layers remain consistent.
For example, filtering invalid records:
– Remove records with missing customer IDs
– Exclude closed-lost opportunities with zero value
– Standardise statuses into controlled vocabulary
Step 4: Build the Gold Layer (Business Models)
The Gold layer is where data becomes analytics-ready. This is where KPI tables, reporting models, and dimensional structures are created for tools like Power BI or Tableau.
Common structures include:
- Fact tables (sales, revenue, transactions)
- Dimension tables (customers, products, time)
- Aggregated KPI tables (daily revenue, cohort retention)
Example Gold model:
ANALYTICS_DB.SALES.FACT_DAILY_SALES
This might include:
- Date
- Product ID
- Total Revenue
- Units Sold
- Customer Segment
At this stage, performance optimisation becomes important — using clustering keys, materialised views, or pre-aggregations to support fast dashboard performance.
Step 5: Orchestrate the Pipeline
Once the layers are defined, you need orchestration to automate movement between them. In Snowflake ecosystems, this is typically handled using:
- Snowflake Tasks for scheduling SQL transformations
- Streams for change data capture (CDC)
- External orchestration tools (ADF, Airflow, dbt Cloud)
A typical flow looks like:
1. Snowpipe ingests raw data into Bronze
2. Stream detects new records
3. Task triggers transformation into Silver
4. Second Task aggregates into Gold
This creates an automated, near real-time data pipeline with minimal manual intervention.
Step 6: Apply Governance and Data Quality Controls
One of the biggest advantages of medallion architecture is built-in governance. However, it still needs to be explicitly configured.
Key practices include:
- Role-based access control (RBAC) per layer
- Data lineage tracking between Bronze → Gold
- Automated validation checks in Silver layer
- Monitoring failed loads and schema changes
Most mature setups also enforce strict “no direct access” rules to Bronze data, ensuring only Silver and Gold layers are consumed by analytics tools.
Step 7: Connect to BI and Analytics Tools
Once Gold datasets are ready, they are exposed to BI tools such as Power BI, Tableau, or Looker. At this stage, the goal is performance and simplicity — not transformation.
Best practice is to treat Gold tables as the single source of truth for reporting, avoiding complex logic in BI tools themselves.
Final Thoughts
Setting up a medallion architecture in Snowflake is less about technology and more about discipline in how data is structured and transformed. When implemented correctly, it creates a scalable foundation that supports analytics, AI, and operational reporting without constant rework.
In 2026, organisations that standardise on this layered approach are seeing faster delivery cycles, higher trust in data, and significantly reduced engineering overhead — making it one of the most practical modern data architecture patterns available.
