Home > mailing lists

Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Date	November 25, 2009 09:32:29
Msg-id	162867790911250232u75985e29mba6310358d4b2911@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
List	pgsql-hackers

Tree view

2009/11/25 Jeff Davis <pgsql@j-davis.com>:
> On Wed, 2009-11-25 at 09:23 +0100, Pavel Stehule wrote:
>> > If SRFs use a tuplestore in that situation, it sounds like that should
>> > be fixed. Why do we need to provide alternate syntax involving COPY?
>>
>> It isn't problem of SRF function design. It allow both mode - row and
>> tuplestor.
>
> select * from generate_series(1,1000000000) limit 1;
>
> That statement takes a long time, which indicates to me that it's
> materializing the result of the SRF. And there's no insert there.

This is missing optimalisation. If I understand - PostgreSQL wait for
complete result set - so in this case materialisation is necessary. In
your query pg do materialisation too early.

postgres=# select * from generate_series(1,100000) limit 1;generate_series
─────────────────              1
(1 row)

Time: 59,540 ms
postgres=# select generate_series(1,100000) limit 1;generate_series
─────────────────              1
(1 row)

Time: 1,107 ms

But usually we can process all rows from SRF function - so problem
with LIMIT isn't significant.

I am testing:

1.
postgres=# select count(*) from generate_series(1,1000000); count
─────────1000000
(1 row)

Time: 930,720 ms

2.
postgres=# select count(*) from (select generate_series(1,1000000)) x; count
─────────1000000
(1 row)

Time: 276,511 ms

2. is significantly faster then 1 (there are not SRF materialisation)

postgres=# create table foo(a integer);
CREATE TABLE

postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 4274,869 ms

postgres=# insert into foo select * from generate_series(1,1000000);
INSERT 0 1000000
Time: 4814,756 ms

postgres=# copy foo to '/tmp/xxx';
COPY 1000000
Time: 1033,078 ms

postgres=# set synchronous_commit to off;
SET

postgres=# copy foo from '/tmp/xxx';
COPY 1000000
Time: 2749,277 ms

postgres=# insert into foo select generate_series(1,1000000);
INSERT 0 1000000
Time: 3948,734 ms

generate_function is fast and simple - but still COPY is about 30% faster

>
>> This is problem of INSERT statement, resp. INSERT INTO
>> SELECT implementation.
>
> If "tmp" is a new table, and "zero" is a table with a million zeros in
> it, then:
>  insert into tmp select 1/i from zero;
> fails instantly. That tells me that it's not materializing the result of
> the select; rather, it's feeding the rows in one at a time.
>

I thing, so materialisation is every time, when you use any SQL
statement without cursor.

> Can show me in more detail what you mean? I'm having difficulty
> understanding your short replies.

I thing, so COPY tab from/to fce() should be used for very large
import export, where INSERT SELECT needs minimally one
materialisation.

p.s. I am sorry - I am not native speaker - so I am speaking in short replies.

Pavel

>
> Regards,
>        Jeff Davis
>
>

pgsql-hackers by date:

From: Hannu Krosing
Date: 25 November 2009, 09:26:54
Subject: Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

From: Heikki Linnakangas
Date: 25 November 2009, 09:59:55
Subject: Hot Standby remaining issues

Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION - Mailing list pgsql-hackers

Previous

Next