PFC wrote:
>> Unfortunately its not fast enough, it needs to be done in no more than
>> 1-2 seconds, ( and in production it will be maybe 20-50 columns of
>> data, perhaps divided over 5-10 tables.)
>> Additionally it needs to scale to perhaps three times as many columns
>> and perhaps 2 - 3 times as many rows in some situation within 1 seconds.
>> Further on it needs to allow for about 20 - 50 clients reading much of
>> that data before the next batch of data arrives.
>
> Wow. What is the application ?
Geological surveys, where they perform realtime geo/hydro-phone shots of
areas of the size of 10x10km every 3-15 seconds.
> test=> CREATE OR REPLACE FUNCTION test_insert( )
> RETURNS VOID
> LANGUAGE plpgsql
> AS
> $$
> DECLARE
> _i INTEGER;
> BEGIN
> FOR _i IN 0..100000 LOOP
> INSERT INTO test (a,b,c,d,e,f) VALUES (1,2,3,4,5, _i);
> END LOOP;
> END;
> $$;
> CREATE FUNCTION
> Temps : 51,948 ms
>
> test=> SELECT test_insert();
> test_insert
> -------------
>
> (1 ligne)
>
> Temps : 1885,382 ms
I tested this one and it took 4 seconds, compared to the jdbc insert
which took 14 seconds, so its a lot faster. but not as fast as the
SELECT INTO.
I also tested an INSERT INTO FROM SELECT, which took 1.8 seconds, now we
are starting to talk about real performance.
> However COPY is much faster because the parsing overhead and
> de-escaping of data is faster. COPY is optimized for throughput.
>
> So, advice :
>
> For optimum throughput, have your application build chunks of data
> into text files and use COPY. Or if your client lib supports the copy
> interface, use it.
I did test COPY, i.e. the jdbc COPY patch for pg 8.1, it performs at
approx 1.8 seconds :) The test was done with text input, I am going to
test it with binary input, which I expect will increase the performance
with 20-50%.
All these test have ben performed on a laptop with a Kubuntu 6.10
version of pg 8.1 without any special pg performance tuning. So I expect
that compiling lates pg and doing some tuning on it and testing it on
the a representative server will give it an additional boost in performance.
The key here is that with abundance in performance, I can experiment
with the solution in a completely different way than if I had any
"artificial" restrictions.
> You will need a fast disk system with xlog and data on separate
> disks, several CPU cores (1 insert thread will max out 1 core, use the
> others for selects), lots of RAM so index updates don't need to seek,
> and tuning of bgwriter and checkpoints to avoid load spikes.
will have a look at it.
regards
thomas