Thread: Unions and where optimisation

Unions and where optimisation

From
Boris Klug
Date:
Hello!

I am quite new in the PostgreSQL performance business, done a few years Oracle
stuff before. My ist question is the following:

We have three table, lets name them rk150, 151 and rk152. They all have a
timestamp and a order number in common but than different data after this.
Now I need the data from all tables in one view for a given order number, so
I created a view

create view orderevents as
   select ts, aufnr from rk150
     union
   select ts, aufnr from rk151
     union
   select ts, aufnr from rk152;

When I does a "select * from orderevents where aufnr='1234'" it takes over 14
seconds!
The problem is now that PostgreSQL first does the union with all the three
tables and after this sorts out the right rows:

Subquery Scan a  (cost=54699.06..56622.18 rows=38462 width=20)
  ->  Unique  (cost=54699.06..56622.18 rows=38462 width=20)
        ->  Sort  (cost=54699.06..54699.06 rows=384624 width=20)
              ->  Append  (cost=0.00..10689.24 rows=384624 width=20)
                    ->  Subquery Scan *SELECT* 1
                           (cost=0.00..8862.52 rows=314852 width=20)
                          ->  Seq Scan on rk150
                           (cost=0.00..8862.52 rows=314852 width=20)
                    ->  Subquery Scan *SELECT* 2
                           (cost=0.00..1208.58 rows=45858 width=20)
                          ->  Seq Scan on rk151
                           (cost=0.00..1208.58 rows=45858 width=20)
                    ->  Subquery Scan *SELECT* 3
                           (cost=0.00..618.14 rows=23914 width=20)
                          ->  Seq Scan on rk152
                           (cost=0.00..618.14 rows=23914 width=20)

A better thing would it (Oracle does this and I think I have seen it on
PostgreSQL before), that the where-clause is moved inside every select so we
have something like this (written by hand):

select * from (
   select zeit, aufnr from rk150 where aufnr='13153811'
     union
   select zeit, aufnr from rk151 where aufnr='13153811'
     union
   select zeit, aufnr from rk152 where aufnr='13153811')
 as A;

This takes less than 1 second because the nr of rows that have to be joined
are only 45 (optimizer expects 4), not > 300.000:

Subquery Scan a  (cost=45.97..46.19 rows=4 width=20)
  ->  Unique  (cost=45.97..46.19 rows=4 width=20)
        ->  Sort  (cost=45.97..45.97 rows=45 width=20)
              ->  Append  (cost=0.00..44.74 rows=45 width=20)
                    ->  Subquery Scan *SELECT* 1
                           (cost=0.00..32.22 rows=31 width=20)
                          ->  Index Scan using rk150_uidx_aufnr on rk150
                           (cost=0.00..32.22 rows=31 width=20)
                    ->  Subquery Scan *SELECT* 2
                           (cost=0.00..7.67 rows=9 width=20)
                          ->  Index Scan using rk151_uidx_aufnr on rk151
                           (cost=0.00..7.67 rows=9 width=20)
                    ->  Subquery Scan *SELECT* 3
                           (cost=0.00..4.85 rows=5 width=20)
                          ->  Index Scan using rk152_uidx_aufnr on rk152
                           (cost=0.00..4.85 rows=5 width=20)

My question now: Is the optimizer able to move the where clause into unions?
If so, how I can get him to do it?

Thank you for the help in advance!

--
Dipl. Inform. Boris Klug, control IT GmbH, Germany

Re: Unions and where optimisation

From
Hannu Krosing
Date:
On Wed, 2003-01-08 at 13:25, Boris Klug wrote:
> Hello!
>
> I am quite new in the PostgreSQL performance business, done a few years Oracle
> stuff before. My ist question is the following:
>
> We have three table, lets name them rk150, 151 and rk152. They all have a
> timestamp and a order number in common but than different data after this.
> Now I need the data from all tables in one view for a given order number, so
> I created a view
>
> create view orderevents as
>    select ts, aufnr from rk150
>      union
>    select ts, aufnr from rk151
>      union
>    select ts, aufnr from rk152;

try making the orderevents view like this:

create view orderevents as
select rk.aufnr, sub.ts
  from rk150 rk,
     ( select ts from rk150 where aufnr = rk.aufr
       union
       select ts from rk151 where aufnr = rk.aufr
       union
       select ts from rk152 where aufnr = rk.aufr
     ) as sub
;

this could/should force your desired behavior.

> My question now: Is the optimizer able to move the where clause into unions?
> If so, how I can get him to do it?
>
> Thank you for the help in advance!
--
Hannu Krosing <hannu@tm.ee>

Re: Unions and where optimisation

From
Tomasz Myrta
Date:
Hannu Krosing wrote:

>
> try making the orderevents view like this:
>
> create view orderevents as
> select rk.aufnr, sub.ts
>   from rk150 rk,
>      ( select ts from rk150 where aufnr = rk.aufr
>        union
>        select ts from rk151 where aufnr = rk.aufr
>        union
>        select ts from rk152 where aufnr = rk.aufr
>      ) as sub
> ;
>
> this could/should force your desired behavior.
>

Hannu, does it work?
Few months ago I lost some time trying to create this kind of query and
I always got error, that subselect doesn't knows anything about upper
(outer?) table.

In this query you should get error:
"relation rk does not exist".

What version of postgres do you have?
Tomasz Myrta


Re: Unions and where optimisation

From
Tomasz Myrta
Date:
Boris Klug wrote:

> create view orderevents as
>    select ts, aufnr from rk150
>      union
>    select ts, aufnr from rk151
>      union
>    select ts, aufnr from rk152;

I lost some time and I didn't find valid solution for this kind of query :-(

I solved it (nice to hear about better solution) using table inheritance.

create table rk_master(
fields...
fields...
);

create table rk150 () inherits rk_master;
create table rk151 () inherits rk_master;
create table rk152 () inherits rk_master;

now you can just create simple view:
select ts, aufnr from rk_master;

Regards,
Tomasz Myrta


Re: Unions and where optimisation

From
Boris Klug
Date:
Hello!

> Hannu, does it work?
> Few months ago I lost some time trying to create this kind of query and
> I always got error, that subselect doesn't knows anything about upper
> (outer?) table.

It does not work on my PostgreSQL 7.2.x

Get the same error like you: "relation rk does not exist"

Also the disadvantage of this solution is that the speed up is bound to
queries for the ordernr. If a statement has a where clause e.g. for a
timestamp, the view is still slow.

Does PostgreSQL not know how to move where clause inside each select in a
union?

--
Dipl. Inform. Boris Klug, control IT GmbH, Germany

Re: Unions and where optimisation

From
Hannu Krosing
Date:
On Wed, 2003-01-08 at 14:32, Tomasz Myrta wrote:
> Hannu Krosing wrote:
>
> >
> > try making the orderevents view like this:
> >
> > create view orderevents as
> > select rk.aufnr, sub.ts
> >   from rk150 rk,
> >      ( select ts from rk150 where aufnr = rk.aufr
> >        union
> >        select ts from rk151 where aufnr = rk.aufr
> >        union
> >        select ts from rk152 where aufnr = rk.aufr
> >      ) as sub
> > ;
> >
> > this could/should force your desired behavior.
> >
>
> Hannu, does it work?

Nope! Sorry.

SQL spec clearly states that subqueries in FROM clause must not see each
other ;(

It would work in WITH part of the query, which will hopefully be
implemented in some future PG version, perhaps even 7.4 as WITH is the
prerequisite for implementing SQL99 recursive queries, and RedHat has
shown an strongish interest in implementing these.

> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--
Hannu Krosing <hannu@tm.ee>

Re: Unions and where optimisation

From
"Michael Paesold"
Date:
Boris Klug <boris.klug@control.de> wrote:

> > Hannu, does it work?
> > Few months ago I lost some time trying to create this kind of query and
> > I always got error, that subselect doesn't knows anything about upper
> > (outer?) table.
>
> It does not work on my PostgreSQL 7.2.x
>
> Get the same error like you: "relation rk does not exist"
>
> Also the disadvantage of this solution is that the speed up is bound to
> queries for the ordernr. If a statement has a where clause e.g. for a
> timestamp, the view is still slow.
>
> Does PostgreSQL not know how to move where clause inside each select in a
> union?

Hi Boris,

As far as I know, this has first been "fixed" in 7.3. I think it was Tom who
improved the optimizer to push the where clause into the selects of a union
view. I've done a test...

create view test as
   select updated, invoice_id from invoice
     union all
   select updated, invoice_id from inv2
     union all
   select updated, invoice_id from inv3;

... and it seems to work (postgresql 7.3 here):

billing=# explain select * from test where invoice_id = 111000;
                                         QUERY PLAN
----------------------------------------------------------------------------
----------------
 Subquery Scan test  (cost=0.00..413.24 rows=114 width=12)
   ->  Append  (cost=0.00..413.24 rows=114 width=12)
         ->  Subquery Scan "*SELECT* 1"  (cost=0.00..6.00 rows=1 width=12)
               ->  Index Scan using pk_invoice on invoice  (cost=0.00..6.00
rows=1 width=12)
                     Index Cond: (invoice_id = 111000)
         ->  Subquery Scan "*SELECT* 2"  (cost=0.00..203.62 rows=57
width=12)
               ->  Index Scan using idx_inv2 on inv2  (cost=0.00..203.62
rows=57 width=12)
                     Index Cond: (invoice_id = 111000)
         ->  Subquery Scan "*SELECT* 3"  (cost=0.00..203.62 rows=57
width=12)
               ->  Index Scan using idx_inv3 on inv3  (cost=0.00..203.62
rows=57 width=12)
                     Index Cond: (invoice_id = 111000)
(11 rows)

I hope this is helps. Can you upgrade to 7.3.1? I really think the upgrade
is worth the effort.

Best Regards,
Michael Paesold


Re: Unions and where optimisation

From
Boris Klug
Date:
Hello!

> As far as I know, this has first been "fixed" in 7.3. I think it was Tom
> who improved the optimizer to push the where clause into the selects of a
> union view. I've done a test...

Yes, I installed 7.3 and it works fine there. I think we will upgrade to 7.3.1
our development system soon.

Thank you!

--
Dipl. Inform. Boris Klug, control IT GmbH, Germany