Thread: Optimising queries involving unions

From:
Sam Mason
Date:

Hi,

I've got a query that I think the query optimiser should be able
to work it's magic on but it doesn't!  I've had a look around and
asked on the IRC channel and found that the current code doesn't
attempt to optimise for what I'm asking it to do at the moment.
Here's a bad example:

  SELECT u.txt
  FROM smalltable t, (
    SELECT id, txt FROM largetable1
    UNION ALL
    SELECT id, txt FROM largetable2) u
  WHERE t.id = u.id
    AND t.foo = 'bar';

I was hoping that "smalltable" would get moved up into the union,
but it doesn't at the moment and the database does a LOT of extra
work.  In this case, I can manually do quite a couple of transforms
to move things around and it does the right thing:

  SELECT txt
  FROM (
    SELECT l.id as lid, r.id as rid, r.foo, l.txt
      FROM largetable1 l, smalltable r
    UNION ALL
    SELECT l.id as lid, r.id as rid, r.foo, l.txt
      FROM largetable1 l, smalltable r)
  WHERE foo = 'bar';
    AND lid = rid

The optimiser is intelligent enough to move the where clauses up
into the union and end end up with a reasonably optimal query.
Unfortunatly, in real life, the query is much larger and reorganising
everything manually isn't really feasible!

Is this a good place to ask about this or is it more in the realm
of the hackers mailing list?

Thanks,
  Sam

From:
Tom Lane
Date:

Sam Mason <> writes:
> Here's a bad example:

>   SELECT u.txt
>   FROM smalltable t, (
>     SELECT id, txt FROM largetable1
>     UNION ALL
>     SELECT id, txt FROM largetable2) u
>   WHERE t.id = u.id
>     AND t.foo = 'bar';

> I was hoping that "smalltable" would get moved up into the union,
> but it doesn't at the moment and the database does a LOT of extra
> work.

I'm afraid we're a long way away from being able to do that; the
parse/plan representation of UNION wasn't chosen with an eye to
being able to optimize it at all :-(.  We can push restriction
clauses down into a union, but we can't do much with join clauses,
because they necessarily refer to tables that don't even exist
within the sub-query formed by the UNION.

It'd be nice to fix this someday, but don't hold your breath ...

            regards, tom lane

From:
Sam Mason
Date:

Tom Lane wrote:
>It'd be nice to fix this someday, but don't hold your breath ...

Thanks for the response!

Is it even worth me thinking about trying to figure out how to make
the current code do this sort of thing? or is it just not going to
happen with the code as it is?


  Sam

From:
Tom Lane
Date:

Sam Mason <> writes:
> Tom Lane wrote:
>> It'd be nice to fix this someday, but don't hold your breath ...

> Is it even worth me thinking about trying to figure out how to make
> the current code do this sort of thing?

Probably not :-(.  What we need is to integrate UNION (and the other
set-ops) into the normal querytree structure so that the planner can
consider alternative plans within its existing framework.  That requires
some fundamental changes in the Query structure --- in particular, we
have to get rid of the current situation that there is exactly one
targetlist per rangetable.  Decoupling targetlists and rangetables would
have some other benefits too (INSERT ... SELECT would get a lot cleaner)
but it's a wide-ranging change, and I think could only usefully be
tackled by someone who is already pretty familiar with the code.

            regards, tom lane

From:
"Marc Mamin"
Date:

Hi,

I'm using a workaround for this kind of issues:


consider:

    select A from

       (select B from T1 where C
      union
        select B from T2 where C
      union
        select B from T3 where C
        ) foo
    where D


in your case:

SELECT u.txt
  FROM  (
    SELECT id, txt FROM largetable1,smalltable t WHERE t.id = u.id   AND
t.foo = 'bar'
    UNION ALL
    SELECT id, txt FROM largetable2,smalltable t WHERE t.id = u.id   AND
t.foo = 'bar'
    ) u




and

    select A from foo where C and D

(A, B, C, D being everything you want, C and D may also include "GROUP
BY,ORDER...)

The first version will be handled correctly by the optimiser, whereas in the
second version,
Postgres will first build the UNION and then run the query on it.




I'm having large tables with identical structure, one per day.
Instead of defining a view on all tables,
I' using functions that "distribute" my query on all tables.

The only issue if that I need to define a type that match the result
structure and each return type needs its own function.


Example:
(The first parameter is a schema name, the four next corresponds to A, B, C,
D





---------------------
create type T_i2_vc1 as (int_1 int,int_2 int,vc_1 varchar);

CREATE OR REPLACE FUNCTION
vq_T_i2_vc1(varchar,varchar,varchar,varchar,varchar) RETURNS setof T_i2_vc1
AS $$


DECLARE
    result T_i2_vc1%rowtype;
    mviews RECORD;
    sql varchar;
    counter int;
BEGIN
    select into counter 1;

        -- loop on all daily tables
        FOR mviews IN SELECT distinct this_day FROM daylist order by plainday
desc LOOP

        IF counter =1 THEN
          select INTO  sql 'SELECT '||mviews.this_day||' AS plainday, '||$2||'
FROM '||$3||'_'||mviews.plainday||' WHERE '||$4;
        ELSE
          select INTO  sql sql||' UNION ALL SELECT '||mviews.this_day||' AS
plainday, '||$2||' FROM '||$3||'_'||mviews.plainday||' WHERE '||$4;
        END IF;

        select into counter counter+1;
        END LOOP;

        select INTO  sql 'SELECT  '||$1||' FROM ('||sql||')foo '||$5;

   for result in   EXECUTE (sql) LOOP
     return  NEXT result;
   end loop;
 return ;

END;
$$ LANGUAGE plpgsql;



Note: in your case the function shoud have a further parameter to join
largetable(n) to smalltable in the "sub queries"

HTH,

Marc





> I've got a query that I think the query optimiser should be able
> to work it's magic on but it doesn't!  I've had a look around and
> asked on the IRC channel and found that the current code doesn't
> attempt to optimise for what I'm asking it to do at the moment.
> Here's a bad example:
>
>   SELECT u.txt
>   FROM smalltable t, (
>     SELECT id, txt FROM largetable1
>     UNION ALL
>     SELECT id, txt FROM largetable2) u
>   WHERE t.id = u.id
>     AND t.foo = 'bar';
>
> I was hoping that "smalltable" would get moved up into the union,
> but it doesn't at the moment and the database does a LOT of extra
> work.  In this case, I can manually do quite a couple of transforms
> to move things around and it does the right thing:
>
>   SELECT txt
>   FROM (
>     SELECT l.id as lid, r.id as rid, r.foo, l.txt
>       FROM largetable1 l, smalltable r
>     UNION ALL
>     SELECT l.id as lid, r.id as rid, r.foo, l.txt
>       FROM largetable1 l, smalltable r)
>   WHERE foo = 'bar';
>     AND lid = rid
>
> The optimiser is intelligent enough to move the where clauses up
> into the union and end end up with a reasonably optimal query.
> Unfortunatly, in real life, the query is much larger and reorganising
> everything manually isn't really feasible!

--
Weitersagen: GMX DSL-Flatrates mit Tempo-Garantie!
Ab 4,99 Euro/Monat: http://www.gmx.net/de/go/dsl