Re: speedup COPY TO for partitioned table. - Mailing list pgsql-hackers

From newtglobal postgresql_contributors
Subject Re: speedup COPY TO for partitioned table.
Date
Msg-id 174219852967.294107.6195385625494034792.pgcf@coridan.postgresql.org
Whole thread Raw
In response to Re: speedup COPY TO for partitioned table.  (jian he <jian.universality@gmail.com>)
List pgsql-hackers
Hi Jian,
Tested this patch with COPY sales TO STDOUT; ~ 1.909ms, improving performance over the older COPY (SELECT * FROM sales)
TOSTDOUT; ~ 3.80ms method. This eliminates query planning overhead and significantly speeds up data export from
partitionedtables. 
 
Our test setup involved creating a partitioned table(sales), inserted 500 records, and comparing execution times.

-- Step 1: Create Partitioned Parent Table
CREATE TABLE sales (
    id SERIAL NOT NULL,
    sale_date DATE NOT NULL,
    region TEXT NOT NULL,
    amount NUMERIC(10,2) NOT NULL,
    category TEXT NOT NULL,
    PRIMARY KEY (id, sale_date,region)
) PARTITION BY RANGE (sale_date);

-- Step 2: Create Range Partitions (2023 & 2024)
CREATE TABLE sales_2023 PARTITION OF sales
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
    PARTITION BY HASH (region);

CREATE TABLE sales_2024 PARTITION OF sales
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
    PARTITION BY HASH (region);

-- Step 3: Create Hash Partitions for sales_2023
CREATE TABLE sales_2023_part1 PARTITION OF sales_2023
    FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2023_part2 PARTITION OF sales_2023
    FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 4: Create Hash Partitions for sales_2024
CREATE TABLE sales_2024_part1 PARTITION OF sales_2024
    FOR VALUES WITH (MODULUS 2, REMAINDER 0);

CREATE TABLE sales_2024_part2 PARTITION OF sales_2024
    FOR VALUES WITH (MODULUS 2, REMAINDER 1);

-- Step 5: Insert Data **AFTER** Creating Partitions
INSERT INTO sales (sale_date, region, amount, category)
SELECT 
    ('2023-01-01'::DATE + (random() * 730)::int) AS sale_date,  -- Random date in 2023-2024 range
    CASE WHEN random() > 0.5 THEN 'North' ELSE 'South' END AS region,  -- Random region
    (random() * 1000)::NUMERIC(10,2) AS amount,  -- Random amount (0 to 1000)
    CASE WHEN random() > 0.5 THEN 'Electronics' ELSE 'Furniture' END AS category  -- Random category
FROM generate_series(1, 500);

COPY (SELECT * FROM SALES) TO STDOUT;  ~ 1.909ms

COPY SALES TO STDOUT; ~ 3.80ms

This change is recommended for better performance in PostgreSQL partitioned tables.

pgsql-hackers by date:

Previous
From: Daniil Davydov
Date:
Subject: Re: Forbid to DROP temp tables of other sessions
Next
From: Dmitry Dolgov
Date:
Subject: Re: pg_stat_statements and "IN" conditions