Re: Increase performance of a UNION query that thakes 655.07 msec to be runned ? - Mailing list pgsql-performance

From Tom Lane
Subject Re: Increase performance of a UNION query that thakes 655.07 msec to be runned ?
Date
Msg-id 29924.1076081304@sss.pgh.pa.us
Whole thread Raw
In response to Increase performance of a UNION query that thakes 655.07 msec to be runned ?  ("Bruno BAGUETTE" <pgsql-ml@baguette.net>)
Responses RE : Increase performance of a UNION query that thakes 655.07 msec to be runned ?
List pgsql-performance
"Bruno BAGUETTE" <pgsql-ml@baguette.net> writes:
> Do you see a way to get better performances with this query which takes
> currently 655.07 msec to be done.

> levure=> explain analyze SELECT distinct lower(substr(l_name, 1, 1)) AS
> initiale FROM people
> levure-> UNION
> levure-> SELECT distinct lower(substr(org_name, 1, 1)) AS initiale FROM
> organizations
> levure-> ORDER BY initiale;

This is inherently a bit inefficient since the UNION implies a DISTINCT
step, thus partially repeating the DISTINCT work done inside each SELECT.
It would likely be a tad faster to drop the DISTINCTs from the
subselects and rely on UNION to do the filtering.  However, you're still
gonna have a big SORT/UNIQUE step.

As of PG 7.4 you could probably get a performance win by converting the
thing to use GROUP BY instead of DISTINCT or UNION:

select initiale from (
  select lower(substr(l_name,1,1)) as initiale from people
  union all
  select lower(substr(org_name,1,1)) as initiale from organizations
) ss
group by initiale order by initiale;

This should use a HashAggregate to do the unique-ification.  I think
that will be faster than Sort/Unique.

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Seq scan on zero-parameters function
Next
From: Stephan Szabo
Date:
Subject: Re: Increase performance of a UNION query that thakes