Re: insert vs select into performance - Mailing list pgsql-performance
From | PFC |
---|---|
Subject | Re: insert vs select into performance |
Date | |
Msg-id | op.tvnqigq6cigqcu@apollo13 Whole thread Raw |
In response to | Re: insert vs select into performance (Thomas Finneid <tfinneid@student.matnat.uio.no>) |
Responses |
Re: insert vs select into performance
|
List | pgsql-performance |
>> It's the time to parse statements, plan, execute, roundtrips with >> the client, context switches, time for your client library to escape >> the data and encode it and for postgres to decode it, etc. In a word : >> OVERHEAD. > > I know there is some overhead, but that much when running it batched...? Well, yeah ;) > Unfortunately its not fast enough, it needs to be done in no more than > 1-2 seconds, ( and in production it will be maybe 20-50 columns of data, > perhaps divided over 5-10 tables.) > Additionally it needs to scale to perhaps three times as many columns > and perhaps 2 - 3 times as many rows in some situation within 1 seconds. > Further on it needs to allow for about 20 - 50 clients reading much of > that data before the next batch of data arrives. Wow. What is the application ? Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty slow) : test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT); CREATE TABLE Temps : 11,463 ms test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000 ) as a; INSERT 0 100000 Temps : 721,579 ms OK, so you see, insert speed is pretty fast. With a better CPU and faster disks, you can get a lot more. test=> TRUNCATE TABLE test; TRUNCATE TABLE Temps : 30,010 ms test=> ALTER TABLE test ADD PRIMARY KEY (f); INFO: ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey» pour la table «test» ALTER TABLE Temps : 100,577 ms test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000 ) as a; INSERT 0 100000 Temps : 1915,928 ms This includes the time to update the index. test=> DROP TABLE test; DROP TABLE Temps : 28,804 ms test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT); CREATE TABLE Temps : 1,626 ms test=> CREATE OR REPLACE FUNCTION test_insert( ) RETURNS VOID LANGUAGE plpgsql AS $$ DECLARE _i INTEGER; BEGIN FOR _i IN 0..100000 LOOP INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i); END LOOP; END; $$; CREATE FUNCTION Temps : 51,948 ms test=> SELECT test_insert(); test_insert ------------- (1 ligne) Temps : 1885,382 ms Now you see, performing 100K individual inserts inside a plpgsql function is also fast. The postgres engine is pretty damn fast ; it's the communication overhead that you feel, especially switching between client and server processes. Another example : => INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples) INSERT 0 100000 Temps : 1836,458 ms VALUES is actually pretty fast. Here, there is no context switch, everything is done in 1 INSERT. However COPY is much faster because the parsing overhead and de-escaping of data is faster. COPY is optimized for throughput. So, advice : For optimum throughput, have your application build chunks of data into text files and use COPY. Or if your client lib supports the copy interface, use it. You will need a fast disk system with xlog and data on separate disks, several CPU cores (1 insert thread will max out 1 core, use the others for selects), lots of RAM so index updates don't need to seek, and tuning of bgwriter and checkpoints to avoid load spikes.
pgsql-performance by date: