Home > mailing lists

Re: queries with lots of UNIONed relations - Mailing list pgsql-performance

From	Jon Nelson
Subject	Re: queries with lots of UNIONed relations
Date	January 13, 2011 15:54:38
Msg-id	AANLkTikM9=JXPwYpAbfQ5KZgenWBCn5_J2=W6MRhq0U0@mail.gmail.com Whole thread
In response to	Re: queries with lots of UNIONed relations (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: queries with lots of UNIONed relations
List	pgsql-performance

Tree view

On Thu, Jan 13, 2011 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> In the former case, the query plan was a bitmap heap scan for each
>> table. Then those results were Appended, Sorted, Uniqued, Sorted
>> again, and then returned.
>
>> In the latter, before Appending, each table's results were run through
>> HashAggregate.
>
> Probably the reason it did that is that each individual de-duplication
> looked like it would fit in work_mem, but a single de-duplication
> didn't.  Consider raising work_mem, at least for this one query.

I raised work_mem to as high as 512MB (SET LOCAL work_mem = '512MB',
within the transaction).  Nice. Instead of 7-10 minutes the result is
now about a minute (the same as with individual de-duplication).

Your comment regarding "each individual de-duplication looked like it
would fit in work_mem" doesn't really make sense, exactly. Maybe I'm
misunderstanding you.

What I'm asking is this: can postgresql apply a de-duplication to each
member of a UNION (as I did with SELECT DISTINCT) in order to reduce
the total number of rows that need to be de-duplicated when all of the
rows have been Appended?

The results of the various plans/tweaks are:

Initial state: (work_mem = 16MB, no DISTINCT, run time of 7-10 minutes):
Unique (Sort (Append ( Lots of Bitmap Heap Scans Here ) ) )

and (work_mem = 16MB, with DISTINCT, run time of ~ 1 minute):
HashAggregate ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) )

and (work_mem = 64kB, DISTINCT, run time of *15+ minutes*):
Unique (Sort ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) )

So I take from this the following:

1. if the result set fits in work_mem, hash aggregate is wicked fast.
About 1 jillion times faster than Unique+Sort.

2. it would be nifty if postgresql could be taught that, in a UNION,
to de-duplicate each contributory relation so as to reduce the total
set of rows that need to be re-de-duplicated. It's extra work, true,
and maybe there are some tricks here, but it seems to make a big
difference. This is useful so that the total result set is small
enough that hash aggregate might apply.

NOTE:

I have to have work_mem really low as a global on this machine because
other queries involving the same tables (such as those that involve
UNION ALL for SUM() or GROUP BY operations) cause the machine to run
out of memory. Indeed, even with work_mem at 1MB I run the machine out
of memory if I don't explicitly disable hashagg for some queries. Can
anything be done about that?

--
Jon

pgsql-performance by date:

From: Tom Lane
Date: 13 January 2011, 13:13:31
Subject: Re: queries with lots of UNIONed relations

From: Tom Lane
Date: 13 January 2011, 16:05:53
Subject: Re: queries with lots of UNIONed relations

Re: queries with lots of UNIONed relations - Mailing list pgsql-performance

Previous

Next