npx skills add https://github.com/wshobson/agents --skill dbt-transformation-patternsHow Dbt Transformation Patterns fits into a Paperclip company.
Dbt Transformation Patterns drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.
Pre-configured AI company — 18 agents, 18 skills, one-time purchase.
SKILL.md556 linesExpandCollapse
---name: dbt-transformation-patternsdescription: Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.--- # dbt Transformation Patterns Production-ready patterns for dbt (data build tool) including model organization, testing strategies, documentation, and incremental processing. ## When to Use This Skill - Building data transformation pipelines with dbt- Organizing models into staging, intermediate, and marts layers- Implementing data quality tests- Creating incremental models for large datasets- Documenting data models and lineage- Setting up dbt project structure ## Core Concepts ### 1. Model Layers (Medallion Architecture) ```sources/ Raw data definitions ↓staging/ 1:1 with source, light cleaning ↓intermediate/ Business logic, joins, aggregations ↓marts/ Final analytics tables``` ### 2. Naming Conventions | Layer | Prefix | Example || ------------ | -------------- | ----------------------------- || Staging | `stg_` | `stg_stripe__payments` || Intermediate | `int_` | `int_payments_pivoted` || Marts | `dim_`, `fct_` | `dim_customers`, `fct_orders` | ## Quick Start ```yaml# dbt_project.ymlname: "analytics"version: "1.0.0"profile: "analytics" model-paths: ["models"]analysis-paths: ["analyses"]test-paths: ["tests"]seed-paths: ["seeds"]macro-paths: ["macros"] vars: start_date: "2020-01-01" models: analytics: staging: +materialized: view +schema: staging intermediate: +materialized: ephemeral marts: +materialized: table +schema: analytics``` ```# Project structuremodels/├── staging/│ ├── stripe/│ │ ├── _stripe__sources.yml│ │ ├── _stripe__models.yml│ │ ├── stg_stripe__customers.sql│ │ └── stg_stripe__payments.sql│ └── shopify/│ ├── _shopify__sources.yml│ └── stg_shopify__orders.sql├── intermediate/│ └── finance/│ └── int_payments_pivoted.sql└── marts/ ├── core/ │ ├── _core__models.yml │ ├── dim_customers.sql │ └── fct_orders.sql └── finance/ └── fct_revenue.sql``` ## Patterns ### Pattern 1: Source Definitions ```yaml# models/staging/stripe/_stripe__sources.ymlversion: 2 sources: - name: stripe description: Raw Stripe data loaded via Fivetran database: raw schema: stripe loader: fivetran loaded_at_field: _fivetran_synced freshness: warn_after: { count: 12, period: hour } error_after: { count: 24, period: hour } tables: - name: customers description: Stripe customer records columns: - name: id description: Primary key tests: - unique - not_null - name: email description: Customer email - name: created description: Account creation timestamp - name: payments description: Stripe payment transactions columns: - name: id tests: - unique - not_null - name: customer_id tests: - not_null - relationships: to: source('stripe', 'customers') field: id``` ### Pattern 2: Staging Models ```sql-- models/staging/stripe/stg_stripe__customers.sqlwith source as ( select * from {{ source('stripe', 'customers') }}), renamed as ( select -- ids id as customer_id, -- strings lower(email) as email, name as customer_name, -- timestamps created as created_at, -- metadata _fivetran_synced as _loaded_at from source) select * from renamed``` ```sql-- models/staging/stripe/stg_stripe__payments.sql{{ config( materialized='incremental', unique_key='payment_id', on_schema_change='append_new_columns' )}} with source as ( select * from {{ source('stripe', 'payments') }} {% if is_incremental() %} where _fivetran_synced > (select max(_loaded_at) from {{ this }}) {% endif %}), renamed as ( select -- ids id as payment_id, customer_id, invoice_id, -- amounts (convert cents to dollars) amount / 100.0 as amount, amount_refunded / 100.0 as amount_refunded, -- status status as payment_status, -- timestamps created as created_at, -- metadata _fivetran_synced as _loaded_at from source) select * from renamed``` ### Pattern 3: Intermediate Models ```sql-- models/intermediate/finance/int_payments_pivoted_to_customer.sqlwith payments as ( select * from {{ ref('stg_stripe__payments') }}), customers as ( select * from {{ ref('stg_stripe__customers') }}), payment_summary as ( select customer_id, count(*) as total_payments, count(case when payment_status = 'succeeded' then 1 end) as successful_payments, sum(case when payment_status = 'succeeded' then amount else 0 end) as total_amount_paid, min(created_at) as first_payment_at, max(created_at) as last_payment_at from payments group by customer_id) select customers.customer_id, customers.email, customers.created_at as customer_created_at, coalesce(payment_summary.total_payments, 0) as total_payments, coalesce(payment_summary.successful_payments, 0) as successful_payments, coalesce(payment_summary.total_amount_paid, 0) as lifetime_value, payment_summary.first_payment_at, payment_summary.last_payment_at from customersleft join payment_summary using (customer_id)``` ### Pattern 4: Mart Models (Dimensions and Facts) ```sql-- models/marts/core/dim_customers.sql{{ config( materialized='table', unique_key='customer_id' )}} with customers as ( select * from {{ ref('int_payments_pivoted_to_customer') }}), orders as ( select * from {{ ref('stg_shopify__orders') }}), order_summary as ( select customer_id, count(*) as total_orders, sum(total_price) as total_order_value, min(created_at) as first_order_at, max(created_at) as last_order_at from orders group by customer_id), final as ( select -- surrogate key {{ dbt_utils.generate_surrogate_key(['customers.customer_id']) }} as customer_key, -- natural key customers.customer_id, -- attributes customers.email, customers.customer_created_at, -- payment metrics customers.total_payments, customers.successful_payments, customers.lifetime_value, customers.first_payment_at, customers.last_payment_at, -- order metrics coalesce(order_summary.total_orders, 0) as total_orders, coalesce(order_summary.total_order_value, 0) as total_order_value, order_summary.first_order_at, order_summary.last_order_at, -- calculated fields case when customers.lifetime_value >= 1000 then 'high' when customers.lifetime_value >= 100 then 'medium' else 'low' end as customer_tier, -- timestamps current_timestamp as _loaded_at from customers left join order_summary using (customer_id)) select * from final``` ```sql-- models/marts/core/fct_orders.sql{{ config( materialized='incremental', unique_key='order_id', incremental_strategy='merge' )}} with orders as ( select * from {{ ref('stg_shopify__orders') }} {% if is_incremental() %} where updated_at > (select max(updated_at) from {{ this }}) {% endif %}), customers as ( select * from {{ ref('dim_customers') }}), final as ( select -- keys orders.order_id, customers.customer_key, orders.customer_id, -- dimensions orders.order_status, orders.fulfillment_status, orders.payment_status, -- measures orders.subtotal, orders.tax, orders.shipping, orders.total_price, orders.total_discount, orders.item_count, -- timestamps orders.created_at, orders.updated_at, orders.fulfilled_at, -- metadata current_timestamp as _loaded_at from orders left join customers on orders.customer_id = customers.customer_id) select * from final``` ### Pattern 5: Testing and Documentation ```yaml# models/marts/core/_core__models.ymlversion: 2 models: - name: dim_customers description: Customer dimension with payment and order metrics columns: - name: customer_key description: Surrogate key for the customer dimension tests: - unique - not_null - name: customer_id description: Natural key from source system tests: - unique - not_null - name: email description: Customer email address tests: - not_null - name: customer_tier description: Customer value tier based on lifetime value tests: - accepted_values: values: ["high", "medium", "low"] - name: lifetime_value description: Total amount paid by customer tests: - dbt_utils.expression_is_true: expression: ">= 0" - name: fct_orders description: Order fact table with all order transactions tests: - dbt_utils.recency: datepart: day field: created_at interval: 1 columns: - name: order_id tests: - unique - not_null - name: customer_key tests: - not_null - relationships: to: ref('dim_customers') field: customer_key``` ### Pattern 6: Macros and DRY Code ```sql-- macros/cents_to_dollars.sql{% macro cents_to_dollars(column_name, precision=2) %} round({{ column_name }} / 100.0, {{ precision }}){% endmacro %} -- macros/generate_schema_name.sql{% macro generate_schema_name(custom_schema_name, node) %} {%- set default_schema = target.schema -%} {%- if custom_schema_name is none -%} {{ default_schema }} {%- else -%} {{ default_schema }}_{{ custom_schema_name }} {%- endif -%}{% endmacro %} -- macros/limit_data_in_dev.sql{% macro limit_data_in_dev(column_name, days=3) %} {% if target.name == 'dev' %} where {{ column_name }} >= dateadd(day, -{{ days }}, current_date) {% endif %}{% endmacro %} -- Usage in modelselect * from {{ ref('stg_orders') }}{{ limit_data_in_dev('created_at') }}``` ### Pattern 7: Incremental Strategies ```sql-- Delete+Insert (default for most warehouses){{ config( materialized='incremental', unique_key='id', incremental_strategy='delete+insert' )}} -- Merge (best for late-arriving data){{ config( materialized='incremental', unique_key='id', incremental_strategy='merge', merge_update_columns=['status', 'amount', 'updated_at'] )}} -- Insert Overwrite (partition-based){{ config( materialized='incremental', incremental_strategy='insert_overwrite', partition_by={ "field": "created_date", "data_type": "date", "granularity": "day" } )}} select *, date(created_at) as created_datefrom {{ ref('stg_events') }} {% if is_incremental() %}where created_date >= dateadd(day, -3, current_date){% endif %}``` ## dbt Commands ```bash# Developmentdbt run # Run all modelsdbt run --select staging # Run staging models onlydbt run --select +fct_orders # Run fct_orders and its upstreamdbt run --select fct_orders+ # Run fct_orders and its downstreamdbt run --full-refresh # Rebuild incremental models # Testingdbt test # Run all testsdbt test --select stg_stripe # Test specific modelsdbt build # Run + test in DAG order # Documentationdbt docs generate # Generate docsdbt docs serve # Serve docs locally # Debuggingdbt compile # Compile SQL without runningdbt debug # Test connectiondbt ls --select tag:critical # List models by tag``` ## Best Practices ### Do's - **Use staging layer** - Clean data once, use everywhere- **Test aggressively** - Not null, unique, relationships- **Document everything** - Column descriptions, model descriptions- **Use incremental** - For tables > 1M rows- **Version control** - dbt project in Git ### Don'ts - **Don't skip staging** - Raw → mart is tech debt- **Don't hardcode dates** - Use `{{ var('start_date') }}`- **Don't repeat logic** - Extract to macros- **Don't test in prod** - Use dev target- **Don't ignore freshness** - Monitor source dataAccessibility Compliance
This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov
Airflow Dag Patterns
If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API
Angular Migration
Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app