Re: Merging large volumes of data - Mailing list pgsql-performance

From Tom Lane
Subject Re: Merging large volumes of data
Date
Msg-id 6357.1178550568@sss.pgh.pa.us
Whole thread Raw
In response to Re: Merging large volumes of data  (Andreas Kostyrka <andreas@kostyrka.org>)
List pgsql-performance
Andreas Kostyrka <andreas@kostyrka.org> writes:
> Ambrus Wagner (IJ/ETH) wrote:
>> Is there a way to prevent PostgreSQL from doing a full sort on the result set after the unions have been completed?
Evenif I write 
>>
>> (select a,b,c,d,e from table1 order by a,b) union all
>> (select a,b,c,d,e from table2 order by a,b)  union all
>> etc...
>> (select a,b,c,d,e from tablen order by a,b)  order by a,b;
>>
>> PostgreSQL does not seem to realise (maybe it should not be able to do this trick anyway) that the last "order by"
clauseis merely a final merge step on the ordered data sets. 

At least to a first-order approximation, teaching it this would be a
waste of time.

Assume for simplicity that each of your K sub-selects yields N tuples.
Then there are KN items altogether, so if we just sort the big data set
it takes O(KN*log(KN)) time ... which is the same as O(KN*(log K + log N)).
OTOH, if we sort each sub-select by itself, that takes O(N*log N) time,
or O(KN*log N) for all K sub-sorts.  Now we've got to do a K-way merge,
which will take O(log K) comparisons for each of the KN tuples, ie,
O(KN*log K)).  Net result: exactly the same runtime.

Of course this argument fails if you have some way of obtaining the
sub-select values pre-sorted for free.  But it's never really free.
Historical experience is that full-table indexscans often underperform
explicit sorts, at least when there are enough tuples involved to
make the problem interesting.

So the bottom line is that the use-case for this optimization seems
far too narrow to justify the implementation effort.

            regards, tom lane

pgsql-performance by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Merging large volumes of data
Next
From: Jim Nasby
Date:
Subject: Re: How to Find Cause of Long Vacuum Times - NOOB Question