4.5. Syntax #

4.5.1. Creating a Postgres Pro Table #

Example 4.47. 

-- This is a standard Postgres Pro table
CREATE TABLE orders (
  order_id SERIAL PRIMARY KEY,
  product_name TEXT,
  amount NUMERIC,
  order_date DATE
);

INSERT INTO orders (product_name, amount, order_date)
VALUES ('Laptop', 1200.00, '2024-07-01'),
      ('Keyboard', 75.50, '2024-07-01'),
      ('Mouse', 25.00, '2024-07-02');

4.5.2. Querying Postgres Pro Tables #

For analytical queries on your existing Postgres Pro tables, use standard SQL. No special syntax is needed. pgpro_axe automatically accelerates these queries when you set duckdb.force_execution to true.

Example 4.48. 

SET duckdb.force_execution = true;
-- Standard SELECT on a Postgres Pro table
SELECT
  category,
  AVG(price) as avg_price,
  COUNT(*) as item_count
FROM
  products -- This is a regular Postgres Pro table
GROUP BY
  category
ORDER BY
  avg_price DESC;

SET duckdb.force_execution = true;
-- Standard SELECT on a Postgres Pro table
SELECT
  category,
  AVG(price) as avg_price,
  COUNT(*) as item_count
FROM
  products -- This is a regular Postgres Pro table
GROUP BY
  category
ORDER BY
  avg_price DESC;

4.5.3. Querying External Files #

To query files from a data lake (e.g., local or S3 storage), use read_* functions.

To access columns, use the r['column_name'] syntax.

Example 4.49. 

-- Query a single Parquet file
SELECT
  r['product_id'],
  r['review_text']
FROM
  read_parquet('s3://my-bucket/reviews.parquet') r -- 'r' is a required alias
LIMIT 100;

-- Query multiple CSV files using a glob pattern
SELECT
  r['timestamp'],
  r['event_type'],
  COUNT(*) as event_count
FROM
  read_csv('s3://my-datalake/logs/2024-*.csv') r
GROUP BY
  r['timestamp'],
  r['event_type'];

4.5.4. Joining Postgres Pro and External Data #

You can join Postgres Pro tables with external data sources in a single query.

Example 4.50. 

-- Join a local Postgres Pro 'customers' table with a remote Parquet file of 'orders'
SELECT
  c.customer_name,
  c.signup_date,
  SUM(r['order_total']) AS total_spent
FROM
  customers c -- This is a Postgres Pro table
JOIN
  read_parquet('s3://my-bucket/orders/*.parquet') r ON c.customer_id = r['customer_id']
WHERE
  c.status = 'active'
GROUP BY
  c.customer_name,
  c.signup_date
ORDER BY
  total_spent DESC;

4.5.5. Using duckdb.query() #

The duckdb.query() function is an advanced feature that is not required for most queries. You must only use it to run a query that uses the DuckDB-specific syntax that is not valid in Postgres Pro.

Example 4.51. 

-- This query uses DuckDB's PIVOT syntax, so it must be wrapped in duckdb.query()
SELECT * FROM duckdb.query($$
  PIVOT sales_summary
  ON product_category
  USING SUM(total_sales)
  GROUP BY region;
$$);