Name: Datanalysis Credit Risk
Author: Github

Install

Terminal · npx

$npx skills add https://github.com/github/awesome-copilot --skill datanalysis-credit-risk

Works with Paperclip

How Datanalysis Credit Risk fits into a Paperclip company.

Datanalysis Credit Risk drops into any Paperclip agent that handles this kind of work. Assign it to a specialist inside a pre-configured PaperclipOrg company and the skill becomes available on every heartbeat — no prompt engineering, no tool wiring.

SaaS FactoryPaired

Pre-configured AI company — 18 agents, 18 skills, one-time purchase.

$27$59

Explore pack

Source file

SKILL.md113 linesmarkdown

Expand

1---2name: datanalysis-credit-risk3description: Credit risk data cleaning and variable screening pipeline for pre-loan modeling. Use when working with raw credit data that needs quality assessment,  missing value analysis, or variable selection before modeling. it covers data loading and formatting, abnormal period filtering, missing rate calculation, high-missing variable removal,low-IV variable filtering, high-PSI variable removal, Null Importance denoising, high-correlation variable removal, and cleaning report generation. Applicable scenarios arecredit risk data cleaning, variable screening, pre-loan modeling preprocessing.4---5 6# Data Cleaning and Variable Screening7 8## Quick Start9 10```bash11# Run the complete data cleaning pipeline12python ".github/skills/datanalysis-credit-risk/scripts/example.py"13```14 15## Complete Process Description16 17The data cleaning pipeline consists of the following 11 steps, each executed independently without deleting the original data:18 191. **Get Data** - Load and format raw data202. **Organization Sample Analysis** - Statistics of sample count and bad sample rate for each organization213. **Separate OOS Data** - Separate out-of-sample (OOS) samples from modeling samples224. **Filter Abnormal Months** - Remove months with insufficient bad sample count or total sample count235. **Calculate Missing Rate** - Calculate overall and organization-level missing rates for each feature246. **Drop High Missing Rate Features** - Remove features with overall missing rate exceeding threshold257. **Drop Low IV Features** - Remove features with overall IV too low or IV too low in too many organizations268. **Drop High PSI Features** - Remove features with unstable PSI279. **Null Importance Denoising** - Remove noise features using label permutation method2810. **Drop High Correlation Features** - Remove high correlation features based on original gain2911. **Export Report** - Generate Excel report containing details and statistics of all steps30 31## Core Functions32 33| Function | Purpose | Module |34|------|------|----------|35| `get_dataset()` | Load and format data | references.func |36| `org_analysis()` | Organization sample analysis | references.func |37| `missing_check()` | Calculate missing rate | references.func |38| `drop_abnormal_ym()` | Filter abnormal months | references.analysis |39| `drop_highmiss_features()` | Drop high missing rate features | references.analysis |40| `drop_lowiv_features()` | Drop low IV features | references.analysis |41| `drop_highpsi_features()` | Drop high PSI features | references.analysis |42| `drop_highnoise_features()` | Null Importance denoising | references.analysis |43| `drop_highcorr_features()` | Drop high correlation features | references.analysis |44| `iv_distribution_by_org()` | IV distribution statistics | references.analysis |45| `psi_distribution_by_org()` | PSI distribution statistics | references.analysis |46| `value_ratio_distribution_by_org()` | Value ratio distribution statistics | references.analysis |47| `export_cleaning_report()` | Export cleaning report | references.analysis |48 49## Parameter Description50 51### Data Loading Parameters52- `DATA_PATH`: Data file path (best are parquet format)53- `DATE_COL`: Date column name54- `Y_COL`: Label column name55- `ORG_COL`: Organization column name56- `KEY_COLS`: Primary key column name list57 58### OOS Organization Configuration59- `OOS_ORGS`: Out-of-sample organization list60 61### Abnormal Month Filtering Parameters62- `min_ym_bad_sample`: Minimum bad sample count per month (default 10)63- `min_ym_sample`: Minimum total sample count per month (default 500)64 65### Missing Rate Parameters66- `missing_ratio`: Overall missing rate threshold (default 0.6)67 68### IV Parameters69- `overall_iv_threshold`: Overall IV threshold (default 0.1)70- `org_iv_threshold`: Single organization IV threshold (default 0.1)71- `max_org_threshold`: Maximum tolerated low IV organization count (default 2)72 73### PSI Parameters74- `psi_threshold`: PSI threshold (default 0.1)75- `max_months_ratio`: Maximum unstable month ratio (default 1/3)76- `max_orgs`: Maximum unstable organization count (default 6)77 78### Null Importance Parameters79- `n_estimators`: Number of trees (default 100)80- `max_depth`: Maximum tree depth (default 5)81- `gain_threshold`: Gain difference threshold (default 50)82 83### High Correlation Parameters84- `max_corr`: Correlation threshold (default 0.9)85- `top_n_keep`: Keep top N features by original gain ranking (default 20)86 87## Output Report88 89The generated Excel report contains the following sheets:90 911. **汇总** - Summary information of all steps, including operation results and conditions922. **机构样本统计** - Sample count and bad sample rate for each organization933. **分离OOS数据** - OOS sample and modeling sample counts944. **Step4-异常月份处理** - Abnormal months that were removed955. **缺失率明细** - Overall and organization-level missing rates for each feature966. **Step5-有值率分布统计** - Distribution of features in different value ratio ranges977. **Step6-高缺失率处理** - High missing rate features that were removed988. **Step7-IV明细** - IV values of each feature in each organization and overall999. **Step7-IV处理** - Features that do not meet IV conditions and low IV organizations10010. **Step7-IV分布统计** - Distribution of features in different IV ranges10111. **Step8-PSI明细** - PSI values of each feature in each organization each month10212. **Step8-PSI处理** - Features that do not meet PSI conditions and unstable organizations10313. **Step8-PSI分布统计** - Distribution of features in different PSI ranges10414. **Step9-null importance处理** - Noise features that were removed10515. **Step10-高相关性剔除** - High correlation features that were removed106 107## Features108 109- **Interactive Input**: Parameters can be input before each step execution, with default values supported110- **Independent Execution**: Each step is executed independently without deleting original data, facilitating comparative analysis111- **Complete Report**: Generate complete Excel report containing details, statistics, and distributions112- **Multi-process Support**: IV and PSI calculations support multi-process acceleration113- **Organization-level Analysis**: Support organization-level statistics and modeling/OOS distinction

Related skills

Add Educational Comments

Takes any code file and transforms it into a teaching resource by adding educational comments that explain syntax, design choices, and language concepts. Automa

Agent Governance

When your AI agents start calling APIs, touching databases, or executing shell commands, you need guardrails before something goes sideways. This gives you comp

Agentic Eval

Implements self-critique loops where Claude generates output, evaluates it against your criteria, then refines based on its own feedback. Includes evaluator-opt