Re: insert vs select into performance - Mailing list pgsql-performance

From PFC
Subject Re: insert vs select into performance
Date
Msg-id op.tvnqigq6cigqcu@apollo13
Whole thread Raw
In response to Re: insert vs select into performance  (Thomas Finneid <tfinneid@student.matnat.uio.no>)
Responses Re: insert vs select into performance  (Thomas Finneid <tfinneid@student.matnat.uio.no>)
List pgsql-performance
>>     It's the time to parse statements, plan, execute, roundtrips with
>> the client, context switches, time for your client library to escape
>> the data and encode it and for postgres to decode it, etc. In a word :
>> OVERHEAD.
>
> I know there is some overhead, but that much when running it batched...?

    Well, yeah ;)

> Unfortunately its not fast enough, it needs to be done in no more than
> 1-2 seconds, ( and in production it will be maybe 20-50 columns of data,
> perhaps divided over 5-10 tables.)
> Additionally it needs to scale to perhaps three times as many columns
> and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
> Further on it needs to allow for about 20 - 50 clients reading much of
> that data before the next batch of data arrives.

    Wow. What is the application ?

    Test run on a desktop PC, Athlon 64 3200+, 2 IDE disks in RAID1 (pretty
slow) :

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 11,463 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 721,579 ms

    OK, so you see, insert speed is pretty fast. With a better CPU and faster
disks, you can get a lot more.

test=> TRUNCATE TABLE test;
TRUNCATE TABLE
Temps : 30,010 ms

test=> ALTER TABLE test ADD PRIMARY KEY (f);
INFO:  ALTER TABLE / ADD PRIMARY KEY créera un index implicite «test_pkey»
pour la table «test»
ALTER TABLE
Temps : 100,577 ms

test=> INSERT INTO test SELECT 1,2,3,4,5,a FROM generate_series( 1, 100000
) as a;
INSERT 0 100000
Temps : 1915,928 ms

    This includes the time to update the index.

test=> DROP TABLE test;
DROP TABLE
Temps : 28,804 ms

test=> CREATE TABLE test (a INT, b INT, c INT, d INT, e INT, f INT);
CREATE TABLE
Temps : 1,626 ms

test=> CREATE OR REPLACE FUNCTION test_insert( )
     RETURNS VOID
     LANGUAGE plpgsql
     AS
$$
DECLARE
     _i INTEGER;
BEGIN
     FOR _i IN 0..100000 LOOP
         INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
     END LOOP;
END;
$$;
CREATE FUNCTION
Temps : 51,948 ms

test=> SELECT test_insert();
  test_insert
-------------

(1 ligne)

Temps : 1885,382 ms

    Now you see, performing 100K individual inserts inside a plpgsql function
is also fast.
    The postgres engine is pretty damn fast ; it's the communication overhead
that you feel, especially switching between client and server processes.

    Another example :

=> INSERT INTO test (a,b,c,d,e,f) VALUES (... 100000 integer tuples)
INSERT 0 100000
Temps : 1836,458 ms

    VALUES is actually pretty fast. Here, there is no context switch,
everything is done in 1 INSERT.

    However COPY is much faster because the parsing overhead and de-escaping
of data is faster. COPY is optimized for throughput.

    So, advice :

    For optimum throughput, have your application build chunks of data into
text files and use COPY. Or if your client lib supports the copy
interface, use it.
    You will need a fast disk system with xlog and data on separate disks,
several CPU cores (1 insert thread will max out 1 core, use the others for
selects), lots of RAM so index updates don't need to seek, and tuning of
bgwriter and checkpoints to avoid load spikes.

























pgsql-performance by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: ionice to make vacuum friendier?
Next
From: Michael Stone
Date:
Subject: Re: insert vs select into performance