25.3. Querying External Files #

To query files from a data lake (e.g., local or S3 storage), use read_* functions.

To access columns, use the r['column_name'] syntax.

Example 25.3. 

-- Query a single Parquet file
SELECT
  r['product_id'],
  r['review_text']
FROM
  read_parquet('s3://my-bucket/reviews.parquet') r -- 'r' is a required alias
LIMIT 100;

-- Query multiple CSV files using a glob pattern
SELECT
  r['timestamp'],
  r['event_type'],
  COUNT(*) as event_count
FROM
  read_csv('s3://my-datalake/logs/2024-*.csv') r
GROUP BY
  r['timestamp'],
  r['event_type'];