Name: Ml Pipeline Workflow
Author: Wshobson

Install

Terminal · npx

$npx skills add https://github.com/wshobson/agents --skill ml-pipeline-workflow

Works with Paperclip

How Ml Pipeline Workflow fits into a Paperclip company.

Ml Pipeline Workflow drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md248 linesmarkdown

Expand

1---2name: ml-pipeline-workflow3description: Build end-to-end MLOps pipelines from data preparation through model training, validation, and production deployment. Use when creating ML pipelines, implementing MLOps practices, or automating model training and deployment workflows.4---5 6# ML Pipeline Workflow7 8Complete end-to-end MLOps pipeline orchestration from data preparation through model deployment.9 10## Overview11 12This skill provides comprehensive guidance for building production ML pipelines that handle the full lifecycle: data ingestion → preparation → training → validation → deployment → monitoring.13 14## When to Use This Skill15 16- Building new ML pipelines from scratch17- Designing workflow orchestration for ML systems18- Implementing data → model → deployment automation19- Setting up reproducible training workflows20- Creating DAG-based ML orchestration21- Integrating ML components into production systems22 23## What This Skill Provides24 25### Core Capabilities26 271. **Pipeline Architecture**28   - End-to-end workflow design29   - DAG orchestration patterns (Airflow, Dagster, Kubeflow)30   - Component dependencies and data flow31   - Error handling and retry strategies32 332. **Data Preparation**34   - Data validation and quality checks35   - Feature engineering pipelines36   - Data versioning and lineage37   - Train/validation/test splitting strategies38 393. **Model Training**40   - Training job orchestration41   - Hyperparameter management42   - Experiment tracking integration43   - Distributed training patterns44 454. **Model Validation**46   - Validation frameworks and metrics47   - A/B testing infrastructure48   - Performance regression detection49   - Model comparison workflows50 515. **Deployment Automation**52   - Model serving patterns53   - Canary deployments54   - Blue-green deployment strategies55   - Rollback mechanisms56 57### Reference Documentation58 59See the `references/` directory for detailed guides:60 61- **data-preparation.md** - Data cleaning, validation, and feature engineering62- **model-training.md** - Training workflows and best practices63- **model-validation.md** - Validation strategies and metrics64- **model-deployment.md** - Deployment patterns and serving architectures65 66### Assets and Templates67 68The `assets/` directory contains:69 70- **pipeline-dag.yaml.template** - DAG template for workflow orchestration71- **training-config.yaml** - Training configuration template72- **validation-checklist.md** - Pre-deployment validation checklist73 74## Usage Patterns75 76### Basic Pipeline Setup77 78```python79# 1. Define pipeline stages80stages = [81    "data_ingestion",82    "data_validation",83    "feature_engineering",84    "model_training",85    "model_validation",86    "model_deployment"87]88 89# 2. Configure dependencies90# See assets/pipeline-dag.yaml.template for full example91```92 93### Production Workflow94 951. **Data Preparation Phase**96   - Ingest raw data from sources97   - Run data quality checks98   - Apply feature transformations99   - Version processed datasets100 1012. **Training Phase**102   - Load versioned training data103   - Execute training jobs104   - Track experiments and metrics105   - Save trained models106 1073. **Validation Phase**108   - Run validation test suite109   - Compare against baseline110   - Generate performance reports111   - Approve for deployment112 1134. **Deployment Phase**114   - Package model artifacts115   - Deploy to serving infrastructure116   - Configure monitoring117   - Validate production traffic118 119## Best Practices120 121### Pipeline Design122 123- **Modularity**: Each stage should be independently testable124- **Idempotency**: Re-running stages should be safe125- **Observability**: Log metrics at every stage126- **Versioning**: Track data, code, and model versions127- **Failure Handling**: Implement retry logic and alerting128 129### Data Management130 131- Use data validation libraries (Great Expectations, TFX)132- Version datasets with DVC or similar tools133- Document feature engineering transformations134- Maintain data lineage tracking135 136### Model Operations137 138- Separate training and serving infrastructure139- Use model registries (MLflow, Weights & Biases)140- Implement gradual rollouts for new models141- Monitor model performance drift142- Maintain rollback capabilities143 144### Deployment Strategies145 146- Start with shadow deployments147- Use canary releases for validation148- Implement A/B testing infrastructure149- Set up automated rollback triggers150- Monitor latency and throughput151 152## Integration Points153 154### Orchestration Tools155 156- **Apache Airflow**: DAG-based workflow orchestration157- **Dagster**: Asset-based pipeline orchestration158- **Kubeflow Pipelines**: Kubernetes-native ML workflows159- **Prefect**: Modern dataflow automation160 161### Experiment Tracking162 163- MLflow for experiment tracking and model registry164- Weights & Biases for visualization and collaboration165- TensorBoard for training metrics166 167### Deployment Platforms168 169- AWS SageMaker for managed ML infrastructure170- Google Vertex AI for GCP deployments171- Azure ML for Azure cloud172- OCI Data Science for Oracle Cloud Infrastructure deployments173- Kubernetes + KServe for cloud-agnostic serving174 175## Progressive Disclosure176 177Start with the basics and gradually add complexity:178 1791. **Level 1**: Simple linear pipeline (data → train → deploy)1802. **Level 2**: Add validation and monitoring stages1813. **Level 3**: Implement hyperparameter tuning1824. **Level 4**: Add A/B testing and gradual rollouts1835. **Level 5**: Multi-model pipelines with ensemble strategies184 185## Common Patterns186 187### Batch Training Pipeline188 189```yaml190# See assets/pipeline-dag.yaml.template191stages:192  - name: data_preparation193    dependencies: []194  - name: model_training195    dependencies: [data_preparation]196  - name: model_evaluation197    dependencies: [model_training]198  - name: model_deployment199    dependencies: [model_evaluation]200```201 202### Real-time Feature Pipeline203 204```python205# Stream processing for real-time features206# Combined with batch training207# See references/data-preparation.md208```209 210### Continuous Training211 212```python213# Automated retraining on schedule214# Triggered by data drift detection215# See references/model-training.md216```217 218## Troubleshooting219 220### Common Issues221 222- **Pipeline failures**: Check dependencies and data availability223- **Training instability**: Review hyperparameters and data quality224- **Deployment issues**: Validate model artifacts and serving config225- **Performance degradation**: Monitor data drift and model metrics226 227### Debugging Steps228 2291. Check pipeline logs for each stage2302. Validate input/output data at boundaries2313. Test components in isolation2324. Review experiment tracking metrics2335. Inspect model artifacts and metadata234 235## Next Steps236 237After setting up your pipeline:238 2391. Explore **hyperparameter-tuning** skill for optimization2402. Learn **experiment-tracking-setup** for MLflow/W&B2413. Review **model-deployment-patterns** for serving strategies2424. Implement monitoring with observability tools243 244## Related Skills245 246- **experiment-tracking-setup**: MLflow and Weights & Biases integration247- **hyperparameter-tuning**: Automated hyperparameter optimization248- **model-deployment-patterns**: Advanced deployment strategies

Related skills

Accessibility Compliance

This walks you through implementing proper WCAG 2.2 compliance with real code patterns for screen readers, keyboard navigation, and mobile accessibility. It cov

Airflow Dag Patterns

If you're building data pipelines with Airflow, this skill gives you production-ready DAG patterns that actually work in the real world. It covers TaskFlow API

Angular Migration

Migrating from AngularJS to Angular is notoriously painful, and this skill tackles the practical stuff that makes or breaks these projects. It covers hybrid app