Re: queries with lots of UNIONed relations - Mailing list pgsql-performance

From Jon Nelson
Subject Re: queries with lots of UNIONed relations
Date
Msg-id AANLkTikM9=JXPwYpAbfQ5KZgenWBCn5_J2=W6MRhq0U0@mail.gmail.com
Whole thread Raw
In response to Re: queries with lots of UNIONed relations  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: queries with lots of UNIONed relations
List pgsql-performance
On Thu, Jan 13, 2011 at 11:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jon Nelson <jnelson+pgsql@jamponi.net> writes:
>> In the former case, the query plan was a bitmap heap scan for each
>> table. Then those results were Appended, Sorted, Uniqued, Sorted
>> again, and then returned.
>
>> In the latter, before Appending, each table's results were run through
>> HashAggregate.
>
> Probably the reason it did that is that each individual de-duplication
> looked like it would fit in work_mem, but a single de-duplication
> didn't.  Consider raising work_mem, at least for this one query.

I raised work_mem to as high as 512MB (SET LOCAL work_mem = '512MB',
within the transaction).  Nice. Instead of 7-10 minutes the result is
now about a minute (the same as with individual de-duplication).

Your comment regarding "each individual de-duplication looked like it
would fit in work_mem" doesn't really make sense, exactly. Maybe I'm
misunderstanding you.

What I'm asking is this: can postgresql apply a de-duplication to each
member of a UNION (as I did with SELECT DISTINCT) in order to reduce
the total number of rows that need to be de-duplicated when all of the
rows have been Appended?

The results of the various plans/tweaks are:

Initial state: (work_mem = 16MB, no DISTINCT, run time of 7-10 minutes):
Unique (Sort (Append ( Lots of Bitmap Heap Scans Here ) ) )

and (work_mem = 16MB, with DISTINCT, run time of ~ 1 minute):
HashAggregate ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) )

and (work_mem = 64kB, DISTINCT, run time of *15+ minutes*):
Unique (Sort ( Append ( Lots Of HashAggregate( Bitmap Heap Scan ) ) ) )

So I take from this the following:

1. if the result set fits in work_mem, hash aggregate is wicked fast.
About 1 jillion times faster than Unique+Sort.

2. it would be nifty if postgresql could be taught that, in a UNION,
to de-duplicate each contributory relation so as to reduce the total
set of rows that need to be re-de-duplicated. It's extra work, true,
and maybe there are some tricks here, but it seems to make a big
difference. This is useful so that the total result set is small
enough that hash aggregate might apply.

NOTE:

I have to have work_mem really low as a global on this machine because
other queries involving the same tables (such as those that involve
UNION ALL for SUM() or GROUP BY operations) cause the machine to run
out of memory. Indeed, even with work_mem at 1MB I run the machine out
of memory if I don't explicitly disable hashagg for some queries. Can
anything be done about that?


--
Jon

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: queries with lots of UNIONed relations
Next
From: Tom Lane
Date:
Subject: Re: queries with lots of UNIONed relations