Home > mailing lists

Re: insert vs select into performance - Mailing list pgsql-performance

From	Thomas Finneid
Subject	Re: insert vs select into performance
Date	July 18, 2007 16:08:55
Msg-id	469E6518.5000203@ifi.uio.no Whole thread Raw
In response to	Re: insert vs select into performance (PFC <lists@peufeu.com>)
List	pgsql-performance

Tree view

PFC wrote:
>> Unfortunately its not fast enough, it needs to be done in no more than
>> 1-2 seconds, ( and in production it will be maybe 20-50 columns of
>> data, perhaps divided over 5-10 tables.)
>> Additionally it needs to scale to perhaps three times as many columns
>> and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
>> Further on it needs to allow for about 20 - 50 clients reading much of
>> that data before the next batch of data arrives.
>
>     Wow. What is the application ?

Geological surveys, where they perform realtime geo/hydro-phone shots of
areas of the size of 10x10km every 3-15 seconds.

> test=> CREATE OR REPLACE FUNCTION test_insert( )
>     RETURNS VOID
>     LANGUAGE plpgsql
>     AS
> $$
> DECLARE
>     _i INTEGER;
> BEGIN
>     FOR _i IN 0..100000 LOOP
>         INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
>     END LOOP;
> END;
> $$;
> CREATE FUNCTION
> Temps : 51,948 ms
>
> test=> SELECT test_insert();
>  test_insert
> -------------
>
> (1 ligne)
>
> Temps : 1885,382 ms

I tested this one and it took 4 seconds, compared to the jdbc insert
which took 14 seconds, so its a lot faster. but not as fast as the
SELECT INTO.

I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now we
are starting to talk about real performance.

>     However COPY is much faster because the parsing overhead and
> de-escaping of data is faster. COPY is optimized for throughput.
>
>     So, advice :
>
>     For optimum throughput, have your application build chunks of data
> into text files and use COPY. Or if your client lib supports the copy
> interface, use it.

I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs at
approx 1.8 seconds :) The test was done with text input, I am going to
test it with binary input, which I expect will increase the performance
with 20-50%.

All these test have ben performed on a laptop with a Kubuntu 6.10
version of pg 8.1 without any special pg performance tuning. So I expect
that compiling lates pg and doing some tuning on it and testing it on
the a representative server will give it an additional boost in performance.

The key here is that with abundance in performance, I can experiment
with the solution in a completely different way than if I had any
"artificial" restrictions.

>     You will need a fast disk system with xlog and data on separate
> disks, several CPU cores (1 insert thread will max out 1 core, use the
> others for selects), lots of RAM so index updates don't need to seek,
> and tuning of bgwriter and checkpoints to avoid load spikes.

will have a look at it.

regards

thomas

pgsql-performance by date:

From: "Jim C. Nasby"
Date: 18 July 2007, 15:53:00
Subject: Re: When/if to Reindex

From: Thomas Finneid
Date: 18 July 2007, 16:12:09
Subject: Re: insert vs select into performance

Re: insert vs select into performance - Mailing list pgsql-performance

Previous

Next