jupyter nbconvert --to=script upload_data.ipynb
jupyter nbconvert is a command-line tool from Jupyter-to=script specifies that we want to convert to a Python script formatupload_data.ipynb is the source notebook file that will be convertedupload_data.pyCommon use cases:
.py files are more git-friendly than .ipynb)Note that nbconvert also supports other output formats like:
-to=html)-to=pdf)-to=markdown)Notice we are using the argparse, but why we need it in our data engineering process? Is it for formating only or have more necessary use case. So, here, we get into it to help understand the logi c behind.
In Data Engineering, argparse has widespread applications. Let me illustrate through several practical scenarios:
import argparse
import pandas as pd
parser = argparse.ArgumentParser(description='Data processing script')
parser.add_argument('--input_path', required=True, help='Input data path')
parser.add_argument('--output_path', required=True, help='Output data path')
parser.add_argument('--date', help='Processing date, format YYYY-MM-DD')
parser.add_argument('--mode', choices=['full', 'incremental'], default='incremental', help='Processing mode')
args = parser.parse_args()
# ETL processing logic
df = pd.read_csv(args.input_path)
if args.mode == 'incremental':
df = df[df['date'] == args.date]
# Process data...
df.to_csv(args.output_path)