25.3. Querying External Files #
To query files from a data lake (e.g., local or S3 storage), use read_* functions.
To access columns, use the r[' syntax. column_name']
Example 25.3.
-- Query a single Parquet file
SELECT
r['product_id'],
r['review_text']
FROM
read_parquet('s3://my-bucket/reviews.parquet') r -- 'r' is a required alias
LIMIT 100;
-- Query multiple CSV files using a glob pattern
SELECT
r['timestamp'],
r['event_type'],
COUNT(*) as event_count
FROM
read_csv('s3://my-datalake/logs/2024-*.csv') r
GROUP BY
r['timestamp'],
r['event_type'];