4.5. Syntax #
4.5.1. Creating a Postgres Pro Table #
Example 4.47.
-- This is a standard Postgres Pro table
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
product_name TEXT,
amount NUMERIC,
order_date DATE
);
INSERT INTO orders (product_name, amount, order_date)
VALUES ('Laptop', 1200.00, '2024-07-01'),
('Keyboard', 75.50, '2024-07-01'),
('Mouse', 25.00, '2024-07-02');
4.5.2. Querying Postgres Pro Tables #
For analytical queries on your existing Postgres Pro tables, use standard SQL. No special syntax is needed. pgpro_axe automatically accelerates these queries when you set duckdb.force_execution to true.
Example 4.48.
SET duckdb.force_execution = true; -- Standard SELECT on a Postgres Pro table SELECT category, AVG(price) as avg_price, COUNT(*) as item_count FROM products -- This is a regular Postgres Pro table GROUP BY category ORDER BY avg_price DESC;
SET duckdb.force_execution = true; -- Standard SELECT on a Postgres Pro table SELECT category, AVG(price) as avg_price, COUNT(*) as item_count FROM products -- This is a regular Postgres Pro table GROUP BY category ORDER BY avg_price DESC;
4.5.3. Querying External Files #
To query files from a data lake (e.g., local or S3 storage), use read_* functions.
To access columns, use the r[' syntax. column_name']
Example 4.49.
-- Query a single Parquet file
SELECT
r['product_id'],
r['review_text']
FROM
read_parquet('s3://my-bucket/reviews.parquet') r -- 'r' is a required alias
LIMIT 100;
-- Query multiple CSV files using a glob pattern
SELECT
r['timestamp'],
r['event_type'],
COUNT(*) as event_count
FROM
read_csv('s3://my-datalake/logs/2024-*.csv') r
GROUP BY
r['timestamp'],
r['event_type'];
4.5.4. Joining Postgres Pro and External Data #
You can join Postgres Pro tables with external data sources in a single query.
Example 4.50.
-- Join a local Postgres Pro 'customers' table with a remote Parquet file of 'orders'
SELECT
c.customer_name,
c.signup_date,
SUM(r['order_total']) AS total_spent
FROM
customers c -- This is a Postgres Pro table
JOIN
read_parquet('s3://my-bucket/orders/*.parquet') r ON c.customer_id = r['customer_id']
WHERE
c.status = 'active'
GROUP BY
c.customer_name,
c.signup_date
ORDER BY
total_spent DESC;
4.5.5. Using duckdb.query() #
The duckdb.query() function is an advanced feature that is not required for most queries. You must only use it to run a query that uses the DuckDB-specific syntax that is not valid in Postgres Pro.
Example 4.51.
-- This query uses DuckDB's PIVOT syntax, so it must be wrapped in duckdb.query() SELECT * FROM duckdb.query($$ PIVOT sales_summary ON product_category USING SUM(total_sales) GROUP BY region; $$);