Postgres Pro AXE : Documentation: 1: 21.6. Sampling Functions : Postgres Professional

21.6. Sampling Functions
Prev	Up	Chapter 21. Functions	Home	Next

21.6. Sampling Functions #

TABLESAMPLE (sampling_method(percentage | rows)) #

Samples a subset of rows from an analytical table or query result. This is useful for analyzing large datasets by working with representative samples, improving query performance for exploratory data analysis.

Example 21.20.

-- Sample 10% of rows from a table
SELECT * FROM large_table TABLESAMPLE SYSTEM(10);

-- Sample approximately 1000 rows
SELECT * FROM events TABLESAMPLE SYSTEM(1000 ROWS);

-- Sample from data lake files
SELECT * FROM read_parquet('s3://datalake/**/*.parquet') TABLESAMPLE SYSTEM(5);

-- Use sampling for quick data profiling
SELECT
    region,
    COUNT(*) as sample_count,
    AVG(revenue) as avg_revenue
FROM sales_data TABLESAMPLE SYSTEM(2)
GROUP BY region;

-- Sample from joins for performance
SELECT c.name, COUNT(o.id) as order_count
FROM customers c
JOIN orders o TABLESAMPLE SYSTEM(10) ON c.id = o.customer_id
GROUP BY c.name;

Sampling methods:

SYSTEM: Random sampling at the storage level (faster, approximate percentage).
BERNOULLI: Row-by-row random sampling (slower, exact percentage).

Example 21.21.

-- System sampling (recommended for large tables)
SELECT * FROM huge_table TABLESAMPLE SYSTEM(1);

-- Bernoulli sampling (exact percentage)
SELECT * FROM medium_table TABLESAMPLE BERNOULLI(5);

Use cases:

Data exploration: Quick analysis of large datasets.
Performance testing: Test queries on sample data.
Data profiling: Understand data distribution patterns.
ETL operations development: Develop pipelines on sample data.
Quality checks: Validate data quality on samples.

Required parameters:

Name	Type	Description
`sampling_method`	`keyword`	`SYSTEM` or `BERNOULLI`
`percentage`	`numeric`	Percentage of rows to sample (0-100)

Optional parameters:

Name	Type	Description
`rows`	`integer`	Approximate number of rows to sample (use with ROWS keyword)

Prev	Up	Next
21.5. Aggregates Functions	Home	21.7. Time Functions

epub pdf