Data Cleaning Notebook Builder
NotebooksClaudePythonpandasdata cleaningJupyter
Prompt
Write a complete, ready-to-run Jupyter notebook for cleaning and validating my dataset. Dataset description: [DESCRIBE YOUR DATA, e.g. 'CSV of sales transactions, 50k rows, columns: date, amount, customer_id, product_sku, region'] File name/path: [FILE NAME OR PATH] Common issues I know about: [LIST KNOWN ISSUES, e.g. missing values in customer_id, date format inconsistencies, negative amounts] The notebook should include: 1. **Setup cell** — imports (pandas, numpy, matplotlib) 2. **Load data** — read file, print shape and dtypes 3. **Initial EDA** — .info(), .describe(), missing value counts, duplicate check 4. **Column-by-column cleaning** — fix each issue I listed + common ones you'd expect 5. **Validation checks** — assertions that catch if cleaning broke something 6. **Export** — save cleaned file as [ORIGINAL_NAME]_cleaned.csv 7. **Summary markdown cell** — what was cleaned and how many rows affected Use clear markdown headers for each section. Add comments explaining non-obvious steps.