queries with lots of UNIONed relations - Mailing list pgsql-performance

From Jon Nelson
Subject queries with lots of UNIONed relations
Date
Msg-id AANLkTinVjco+Q=9C3Fd8vFxmYY=LiUKFFg4rm=e2yFoX@mail.gmail.com
Whole thread Raw
List pgsql-performance
I was recently asked to look into why one particular set of queries
was taking a long time.
The queries are all of the same form.  They select the UNION of a few
columns from around 100 tables.
The query in particular was taking some 7-8 minutes to run.
On a whim, I changed the query from this form:

SELECT a, b FROM FOO_a WHERE <conditions>
UNION
SELECT a,b FROM FOO_b WHERE <conditions>
....

to:

SELECT DISTINCT a,b FROM FOO_a WHERE <conditions>
UNION
SELECT DISTINCT a,b FROM FOO_b WHERE <conditions>

and the query time dropped to under a minute.

In the former case, the query plan was a bitmap heap scan for each
table. Then those results were Appended, Sorted, Uniqued, Sorted
again, and then returned.

In the latter, before Appending, each table's results were run through
HashAggregate.

The total number of result rows is in the 500K range. Each table holds
approximately 150K matching rows (but this can vary a bit).

What I'm asking is this: since adding DISTINCT to each participating
member of the UNION query reduced the total number of appended rows,
is there some sort of heuristic that postgresql could use to do this
automatically?  The 12x speedup was quite nice.


--
Jon

pgsql-performance by date:

Previous
From: Jon Nelson
Date:
Subject: Re: queries with lots of UNIONed relations
Next
From: kakarukeys
Date:
Subject: Re: adding foreign key constraint locks up table