Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Make COPY format extendable: Extract COPY TO format implementations |
Date | |
Msg-id | Zbr6piWuVHDtFFOl@paquier.xyz Whole thread Raw |
In response to | Re: Make COPY format extendable: Extract COPY TO format implementations (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Make COPY format extendable: Extract COPY TO format implementations
Re: Make COPY format extendable: Extract COPY TO format implementations Re: Make COPY format extendable: Extract COPY TO format implementations |
List | pgsql-hackers |
On Wed, Jan 31, 2024 at 02:39:54PM +0900, Michael Paquier wrote: > Thanks, I'm looking into that now. I have much to say about the patch, but for now I have begun running some performance tests using the patches, because this thread won't get far until we are sure that the callbacks do not impact performance in some kind of worst-case scenario. First, here is what I used to setup a set of tables used for COPY FROM and COPY TO (requires [1] to feed COPY FROM's data to the void, and note that default values is to have a strict control on the size of the StringInfos used in the copy paths): CREATE EXTENSION blackhole_am; CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int) RETURNS VOID AS $func$ DECLARE query text; BEGIN query := 'CREATE UNLOGGED TABLE ' || tabname || ' ('; FOR i IN 1..num_cols LOOP query := query || 'a_' || i::text || ' int default 1'; IF i != num_cols THEN query := query || ', '; END IF; END LOOP; query := query || ')'; EXECUTE format(query); END $func$ LANGUAGE plpgsql; -- Tables used for COPY TO SELECT create_table_cols ('to_tab_1', 1); SELECT create_table_cols ('to_tab_10', 10); INSERT INTO to_tab_1 SELECT FROM generate_series(1, 10000000); INSERT INTO to_tab_10 SELECT FROM generate_series(1, 10000000); -- Data for COPY FROM COPY to_tab_1 TO '/tmp/to_tab_1.bin' WITH (format binary); COPY to_tab_10 TO '/tmp/to_tab_10.bin' WITH (format binary); COPY to_tab_1 TO '/tmp/to_tab_1.txt' WITH (format text); COPY to_tab_10 TO '/tmp/to_tab_10.txt' WITH (format text); -- Tables used for COPY FROM SELECT create_table_cols ('from_tab_1', 1); SELECT create_table_cols ('from_tab_10', 10); ALTER TABLE from_tab_1 SET ACCESS METHOD blackhole_am; ALTER TABLE from_tab_10 SET ACCESS METHOD blackhole_am; Then I have run a set of tests using HEAD, v7 and v10 with queries like that (adapt them depending on the format and table): COPY to_tab_1 TO '/dev/null' WITH (FORMAT text) \watch count=5 SET client_min_messages TO error; -- for blackhole_am COPY from_tab_1 FROM '/tmp/to_tab_1.txt' with (FORMAT 'text') \watch count=5 COPY from_tab_1 FROM '/tmp/to_tab_1.bin' with (FORMAT 'binary') \watch count=5 All the patches have been compiled with -O2, without assertions, etc. Postgres is run in tmpfs mode, on scissors, without fsync. Unlogged tables help a bit in focusing on the execution paths as we don't care about WAL, of course. I have also included v7 in the test of tests, as this version uses more simple per-row callbacks. And here are the results I get for text and binary (ms, average of 15 queries after discarding the three highest and three lowest values): test | master | v7 | v10 -----------------+--------+------+------ from_bin_1col | 1575 | 1546 | 1575 from_bin_10col | 5364 | 5208 | 5230 from_text_1col | 1690 | 1715 | 1684 from_text_10col | 4875 | 4793 | 4757 to_bin_1col | 1717 | 1730 | 1731 to_bin_10col | 7728 | 7707 | 7513 to_text_1col | 1710 | 1730 | 1698 to_text_10col | 5998 | 5960 | 5987 I am getting an interesting trend here in terms of a speedup between HEAD and the patches with a table that has 10 attributes filled with integers, especially for binary and text with COPY FROM. COPY TO binary also gets nice numbers, while text looks rather stable. Hmm. These were on my buildfarm animal, but we need to be more confident about all this. Could more people run these tests? I am going to do a second session on a local machine I have at hand and see what happens. Will publish the numbers here, the method will be the same. [1]: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am -- Michael
Attachment
pgsql-hackers by date: