Hi, everyone, welcome to the week 2 of our learning! 🥳
Yes, this week, we have Kestra as our tool for DE pipeline workflow building. You may also know another tool named Apache Airflow. They are all very powerful tools, but here as a chance from the very first video of introduction, we could learn deeper about why we need these [Workflow Orchestration Tools]. Below, we are exploring the concepts deeper by the features.
Case: Imagine you're a data engineer at an e-commerce company
# Simplified Kestra ETL workflow example
tasks:
- name: extract_orders
type: mysql
query: "SELECT * FROM orders WHERE date = '{{tomorrow}}'"
- name: transform_data
type: python
code: |
# Process order data, calculate daily sales
df = pd.read_csv("raw_orders.csv")
daily_sales = df.groupby('date')['amount'].sum()
- name: load_to_warehouse
type: postgresql
query: "INSERT INTO daily_sales VALUES {{daily_sales}}"
Case: Building a new user registration flow
# API orchestration workflow example
tasks:
- name: verify_user
type: http
url: "<https://api.verify.com/check>"
method: POST
- name: send_welcome_email
type: http
url: "<https://api.email.com/send>"
condition: "{{verify_user.success}}"
- name: create_crm_profile
type: http
url: "<https://api.crm.com/customers>"
condition: "{{send_welcome_email.success}}"
Case: Inventory management for an e-commerce website