date_trunc
is a date-time truncation function used to truncate datetime type data to a specified precision (such as year, month, day, hour, etc.).
date_trunc
originally comes from PostgreSQL database system, not SparkSELECT
date_trunc('hour', lpep_pickup_datetime) AS hour,
PULocationID AS zone,
SUM(total_amount) AS amount,
COUNT(1) AS number_records
FROM
green
WHERE
lpep_pickup_datetime >= '2020-01-01 00:00:00'
GROUP BY
1, 2
The example above truncates timestamps to the hour level, facilitating grouping and statistics by hour.
Different SQL systems have similar but not identical functions:
date_trunc('hour', timestamp)
DATE_FORMAT(timestamp, '%Y-%m-%d %H:00:00')
DATETRUNC(hour, timestamp)
or using date part extraction functionsTRUNC(timestamp, 'HH')
TRUNC(timestamp, 'HH')