Step Snap 1: [Google Cloud Storage (gsutil) Manipulation]

Installing gsutil

gsutil is part of the Google Cloud SDK used for managing Google Cloud Storage. Here are the installation methods:

Method 1: Via Google Cloud SDK

# Download and install Google Cloud SDK
curl <https://sdk.cloud.google.com> | bash

# Start a new shell or run
source ~/.bashrc

# Initialize SDK and login
gcloud init

Method 2: Using package manager (Ubuntu/Debian)

# Add Google Cloud's apt repository
echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] <https://packages.cloud.google.com/apt> cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

# Import the Google Cloud public key
curl <https://packages.cloud.google.com/apt/doc/apt-key.gpg> | sudo apt-key --keyring /usr/share/keyrings/cloud.google.gpg add -

# Update and install
sudo apt-get update && sudo apt-get install google-cloud-sdk

Method 3: Via Conda (for Conda environments)

conda install -c conda-forge google-cloud-sdk

Authentication

Before using gsutil, you need to authenticate:

# User account authentication
gcloud auth login

# OR service account authentication
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your-service-account-key.json"

Note: Authentication is required even when running from a local server to establish a secure connection with Google Cloud.

Understanding the command: gsutil -m cp -r pq/ gs://my-de-zoomcamp-kestra/pq

Breaking down this command: