Home > mailing lists

queries with lots of UNIONed relations - Mailing list pgsql-performance

From	Jon Nelson
Subject	queries with lots of UNIONed relations
Date	January 13, 2011 16:57:12
Msg-id	AANLkTinVjco+Q=9C3Fd8vFxmYY=LiUKFFg4rm=e2yFoX@mail.gmail.com Whole thread
List	pgsql-performance

Tree view

I was recently asked to look into why one particular set of queries
was taking a long time.
The queries are all of the same form.  They select the UNION of a few
columns from around 100 tables.
The query in particular was taking some 7-8 minutes to run.
On a whim, I changed the query from this form:

SELECT a, b FROM FOO_a WHERE <conditions>
UNION
SELECT a,b FROM FOO_b WHERE <conditions>
....

to:

SELECT DISTINCT a,b FROM FOO_a WHERE <conditions>
UNION
SELECT DISTINCT a,b FROM FOO_b WHERE <conditions>

and the query time dropped to under a minute.

In the former case, the query plan was a bitmap heap scan for each
table. Then those results were Appended, Sorted, Uniqued, Sorted
again, and then returned.

In the latter, before Appending, each table's results were run through
HashAggregate.

The total number of result rows is in the 500K range. Each table holds
approximately 150K matching rows (but this can vary a bit).

What I'm asking is this: since adding DISTINCT to each participating
member of the UNION query reduced the total number of appended rows,
is there some sort of heuristic that postgresql could use to do this
automatically?  The 12x speedup was quite nice.


--
Jon

pgsql-performance by date:

From: Jon Nelson
Date: 13 January 2011, 16:13:13
Subject: Re: queries with lots of UNIONed relations

From: kakarukeys
Date: 13 January 2011, 16:57:13
Subject: Re: adding foreign key constraint locks up table

queries with lots of UNIONed relations - Mailing list pgsql-performance

Previous

Next