Back to Library

Data Cleaning Notebook Builder

NotebooksNotebooks●●●AdvancedPythonpandasdata cleaningJupyter

Prompt

Write a complete, ready-to-run Jupyter notebook for cleaning and validating my dataset.

Dataset description: [DESCRIBE YOUR DATA, e.g. 'CSV of sales transactions, 50k rows, columns: date, amount, customer_id, product_sku, region']
File name/path: [FILE NAME OR PATH]
Common issues I know about: [LIST KNOWN ISSUES, e.g. missing values in customer_id, date format inconsistencies, negative amounts]

The notebook should include:
1. **Setup cell** — imports (pandas, numpy, matplotlib)
2. **Load data** — read file, print shape and dtypes
3. **Initial EDA** — .info(), .describe(), missing value counts, duplicate check
4. **Column-by-column cleaning** — fix each issue I listed + common ones you'd expect
5. **Validation checks** — assertions that catch if cleaning broke something
6. **Export** — save cleaned file as [ORIGINAL_NAME]_cleaned.csv
7. **Summary markdown cell** — what was cleaned and how many rows affected

Use clear markdown headers for each section. Add comments explaining non-obvious steps.
Generate sample demo data

Sample contract snippet with reviewable clauses.

Related Prompts

Notebooks

Jupyter Notebook Explainer

I have a Jupyter notebook I need explained. Break it down cell by cell and give me a plain-English summary. Context: [WHAT THIS NOTEBOOK IS FOR / WHO WROTE IT]...
Notebooks●●IntermediateWorkflow-ready
Notebooks

Pandas DataFrame Debugger

I'm getting an error in my Python/Pandas code. Help me debug it. What I was trying to do: [DESCRIBE YOUR GOAL IN ONE SENTENCE] Python version: [PYTHON VERSION]...
Notebooks●●IntermediateWorkflow-ready
Notebooks

Python Script to Notebook Converter

Convert this Python script into a well-structured Jupyter notebook. Script purpose: [WHAT THIS SCRIPT DOES] Target audience: [WHO WILL USE / READ THIS NOTEBOOK...
Notebooks●●●AdvancedWorkflow-ready
Notebooks

SQL to Pandas Translator

Translate my SQL queries into Pandas code. Database context: [DESCRIBE YOUR TABLES / DATA STRUCTURE] DataFrame variable names to use: [e.g. df_orders, df_custo...
Notebooks●●IntermediateWorkflow-ready
Back to Library