Step Snap 1 [Workflow Orchestration Tools]:

Hi, everyone, welcome to the week 2 of our learning! 🥳

Yes, this week, we have Kestra as our tool for DE pipeline workflow building. You may also know another tool named Apache Airflow. They are all very powerful tools, but here as a chance from the very first video of introduction, we could learn deeper about why we need these [Workflow Orchestration Tools]. Below, we are exploring the concepts deeper by the features.

image.png

1. ETL/ELT Data Processing

Case: Imagine you're a data engineer at an e-commerce company

# Simplified Kestra ETL workflow example
tasks:
  - name: extract_orders
    type: mysql
    query: "SELECT * FROM orders WHERE date = '{{tomorrow}}'"

  - name: transform_data
    type: python
    code: |
      # Process order data, calculate daily sales
      df = pd.read_csv("raw_orders.csv")
      daily_sales = df.groupby('date')['amount'].sum()

  - name: load_to_warehouse
    type: postgresql
    query: "INSERT INTO daily_sales VALUES {{daily_sales}}"

2. API Orchestration

Case: Building a new user registration flow

# API orchestration workflow example
tasks:
  - name: verify_user
    type: http
    url: "<https://api.verify.com/check>"
    method: POST

  - name: send_welcome_email
    type: http
    url: "<https://api.email.com/send>"
    condition: "{{verify_user.success}}"

  - name: create_crm_profile
    type: http
    url: "<https://api.crm.com/customers>"
    condition: "{{send_welcome_email.success}}"

3. Scheduled & Event-Driven Workflows

Case: Inventory management for an e-commerce website