jupyter nbconvert --to=script upload_data.ipynb
jupyter nbconvert
is a command-line tool from Jupyter-to=script
specifies that we want to convert to a Python script formatupload_data.ipynb
is the source notebook file that will be convertedupload_data.py
Common use cases:
.py
files are more git-friendly than .ipynb
)Note that nbconvert
also supports other output formats like:
-to=html
)-to=pdf
)-to=markdown
)Notice we are using the argparse, but why we need it in our data engineering process? Is it for formating only or have more necessary use case. So, here, we get into it to help understand the logi c behind.
In Data Engineering, argparse
has widespread applications. Let me illustrate through several practical scenarios:
import argparse
import pandas as pd
parser = argparse.ArgumentParser(description='Data processing script')
parser.add_argument('--input_path', required=True, help='Input data path')
parser.add_argument('--output_path', required=True, help='Output data path')
parser.add_argument('--date', help='Processing date, format YYYY-MM-DD')
parser.add_argument('--mode', choices=['full', 'incremental'], default='incremental', help='Processing mode')
args = parser.parse_args()
# ETL processing logic
df = pd.read_csv(args.input_path)
if args.mode == 'incremental':
df = df[df['date'] == args.date]
# Process data...
df.to_csv(args.output_path)