Thread: inheritance, and plans

inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

06 February 2009, 14:03:58

Hey folks,

I have few tables, that inherit from table X.
The query I perform, tries to obtain information about changes in all
tables that inherit from X,
aside from that, I have table Y that keeps another information related
to changes, but in bit different schema.
Anyway, there is one unique id field, shared amongst them.

When I want to obtain all that information, I do:


select updateid from (
  select updateid from r.skel
union all
  select updateid from r.history
) as foo
  where  updateid > 1232634919168805;

And what amazes me, is that no matter what value I choose in where X >
, postgres will always think this is the best plan:

                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Subquery Scan foo  (cost=0.00..167736.75 rows=978726 width=8)
   Filter: (foo.updateid > 1232634919168805::bigint)
   ->  Append  (cost=0.00..131034.54 rows=2936177 width=8)
         ->  Subquery Scan "*SELECT* 1"  (cost=0.00..130999.94
rows=2934947 width=8)
               ->  Result  (cost=0.00..101650.47 rows=2934947 width=8)
                     ->  Append  (cost=0.00..101650.47 rows=2934947 width=8)
                           ->  Seq Scan on skel  (cost=0.00..24.80
rows=1480 width=8)
                           ->  Seq Scan on a skel
(cost=0.00..22028.96 rows=923596 width=8)
                           ->  Seq Scan on b skel  (cost=0.00..8.01
rows=201 width=8)
                           ->  Seq Scan on c skel  (cost=0.00..1.81
rows=81 width=8)
                           ->  Seq Scan on d skel
(cost=0.00..22117.94 rows=923594 width=8)
                           ->  Seq Scan on e skel  (cost=0.00..6.03
rows=303 width=8)
                           ->  Seq Scan on f skel  (cost=0.00..6.02
rows=202 width=8)
                           ->  Seq Scan on g skel  (cost=0.00..1987.40
rows=85140 width=8)
                           ->  Seq Scan on h skel  (cost=0.00..1.01
rows=1 width=8)
                           ->  Seq Scan on i skel
(cost=0.00..55454.99 rows=999999 width=8)
                           ->  Seq Scan on j skel  (cost=0.00..13.50
rows=350 width=8)
         ->  Seq Scan on history  (cost=0.00..22.30 rows=1230 width=8)
(18 rows)

so my question is, why isn't postgres use index on some tables , and
search for the X > N individually ?

Because, yet - I tried to recreate problem, but I wasn't able. I have
this test db:

create schema r;

create sequence fooseq;
create domain r.fooint AS bigint NOT NULL default  nextval('fooseq');

create table skel(aid r.fooint, cd timestamp default now() not null);

create table one( a bigserial, aid r.fooint, cd timestamp not null);
create table two( a bigserial, aid r.fooint, cd timestamp not null);
create table three( a bigserial, aid r.fooint, cd timestamp not null);
create table four( a bigserial, aid r.fooint, cd timestamp not null);
create table five( a bigserial, aid r.fooint, cd timestamp not null);

create unique index one_aid on one(aid);
create unique index two_aid on two(aid);
create unique index three_aid on three(aid);
create unique index four_aid on four(aid);
create unique index five_aid on five(aid);


create table numbers( something int default random()*666, aid_foo r.fooint);

create unique index numbers_aid on numbers(aid_foo);

insert into one(a, cd) select generate_series(1,2000000), now();
insert into two(a, cd) select generate_series(1,200000), now();
insert into three(a, cd) select generate_series(1,2200000), now();
insert into four(a, cd) select generate_series(1,2200000), now();
insert into five(a, cd) select generate_series(1,2200000), now();

insert into numbers(something) select generate_series(1,870000);

alter table one inherit skel;
alter table two inherit skel;
alter table three inherit skel;
alter table four inherit skel;
alter table five inherit skel;


But no matter how many tables I throw in ( and I got to 20 ) - it will
always do it right:
gjaskie=# explain  select aid from (select aid from skel union all
select aid_foo as aid from numbers) AS foo where  aid > 999000;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Result  (cost=0.00..178034.88 rows=8661268 width=8)
   ->  Append  (cost=0.00..178034.88 rows=8661268 width=8)
         ->  Seq Scan on skel  (cost=0.00..32.12 rows=590 width=8)
               Filter: ((aid)::bigint > 999000)
         ->  Index Scan using one_aid on one skel
(cost=0.00..34549.76 rows=991445 width=8)
               Index Cond: ((aid)::bigint > 999000)
         ->  Seq Scan on two skel  (cost=0.00..3774.00 rows=199980 width=8)
               Filter: ((aid)::bigint > 999000)
         ->  Seq Scan on three skel  (cost=0.00..41513.00 rows=2199780 width=8)
               Filter: ((aid)::bigint > 999000)
         ->  Seq Scan on four skel  (cost=0.00..41513.00 rows=2199780 width=8)
               Filter: ((aid)::bigint > 999000)
         ->  Seq Scan on five skel  (cost=0.00..41513.00 rows=2199780 width=8)
               Filter: ((aid)::bigint > 999000)
         ->  Seq Scan on numbers  (cost=0.00..15140.00 rows=869913 width=8)
               Filter: ((aid_foo)::bigint > 999000)
(16 rows)

Time: 36.326 ms



But, if I add another union, it screws it up:

gjaskie=# explain  select aid from (select aid from skel union all
select aid_foo as aid from numbers union all select 1 aid) AS foo
where  aid > 999000;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Subquery Scan foo  (cost=0.00..374659.56 rows=3223924 width=8)
   Filter: (foo.aid > 999000)
   ->  Append  (cost=0.00..253762.42 rows=9671771 width=8)
         ->  Result  (cost=0.00..253762.40 rows=9671770 width=8)
               ->  Append  (cost=0.00..253762.40 rows=9671770 width=8)
                     ->  Result  (cost=0.00..144079.70 rows=8801770 width=8)
                           ->  Append  (cost=0.00..144079.70
rows=8801770 width=8)
                                 ->  Seq Scan on skel
(cost=0.00..27.70 rows=1770 width=8)
                                 ->  Seq Scan on one skel
(cost=0.00..32739.00 rows=2000000 width=8)
                                 ->  Seq Scan on two skel
(cost=0.00..3274.00 rows=200000 width=8)
                                 ->  Seq Scan on three skel
(cost=0.00..36013.00 rows=2200000 width=8)
                                 ->  Seq Scan on four skel
(cost=0.00..36013.00 rows=2200000 width=8)
                                 ->  Seq Scan on five skel
(cost=0.00..36013.00 rows=2200000 width=8)
                     ->  Seq Scan on numbers  (cost=0.00..12965.00
rows=870000 width=8)
         ->  Subquery Scan "*SELECT* 3"  (cost=0.00..0.02 rows=1 width=0)
               ->  Result  (cost=0.00..0.01 rows=1 width=0)
(16 rows)

Time: 1.502 ms

now the question is, how my test db's query:
  select aid from (select aid from skel union all select aid_foo as
aid from numbers union all select 1 aid) AS foo where  aid > 999000;

differ from original:
  select updateid from ( select updateid from r.skel union all select
updateid from r.history ) as foo where  updateid > 1232634919168805;


Oh, and the value N doesn't change the plan here either :/

tested on both 8.3 and 8.4, same results..
ideas welcomed

--
GJ

Re: inheritance, and plans

From

Robert Haas

Date:

06 February 2009, 14:15:23

Just guessing here, but what values are you using for
join_collapse_limit and from_collapse_limit, and what happens if you
make them much bigger (like 100)?

...Robert

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

06 February 2009, 14:27:09

On Fri, Feb 6, 2009 at 6:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Just guessing here, but what values are you using for
> join_collapse_limit and from_collapse_limit, and what happens if you
> make them much bigger (like 100)?
both default values, afair = 8.

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

06 February 2009, 14:27:42

On Fri, Feb 6, 2009 at 6:20 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
> On Fri, Feb 6, 2009 at 6:15 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Just guessing here, but what values are you using for
>> join_collapse_limit and from_collapse_limit, and what happens if you
>> make them much bigger (like 100)?

changing them to 100 doesn't help a bit.



--
GJ

Re: inheritance, and plans

From

Tom Lane

Date:

06 February 2009, 17:50:27

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> so my question is, why isn't postgres use index on some tables , and
> search for the X > N individually ?

The UNION arms have to all be the same data type in order to have
restrictions pushed down through the UNION.  You did not show us
the table declarations for your first example, but I bet that updateid
isn't the same type in both.  (And yes, a domain is different from its
underlying type for this purpose.)

In the second example, "1" isn't even the right base type let alone
the same domain.

            regards, tom lane

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

07 February 2009, 06:59:53

On Fri, Feb 6, 2009 at 9:50 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The UNION arms have to all be the same data type in order to have
> restrictions pushed down through the UNION.  You did not show us
> the table declarations for your first example, but I bet that updateid
> isn't the same type in both.  (And yes, a domain is different from its
> underlying type for this purpose.)
I think you're right. The domain's in both cases (updateid and uri)
are bigints default nextval('something') not null;

and the r.history table's ones are just bigints not null. Same
underlying type, but not a domain. I'll try to alter it to domain
type, and see.

thanks.


--
GJ

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

07 February 2009, 07:03:25

that helped, thanks a lot Tom.

Looks like additional thing on 'pet peeves' list (from -general).
:P

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

07 February 2009, 13:23:46

so Tom,
with a little help on Irc from Andrew (RhodiumToad) I got it 'fixed',
but I know this is just a hack:

Index: src/backend/utils/adt/selfuncs.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
retrieving revision 1.258
diff -u -r1.258 selfuncs.c
--- src/backend/utils/adt/selfuncs.c    1 Jan 2009 17:23:50 -0000    1.258
+++ src/backend/utils/adt/selfuncs.c    7 Feb 2009 17:20:21 -0000
@@ -3392,7 +3392,7 @@
 static double
 convert_numeric_to_scalar(Datum value, Oid typid)
 {
-    switch (typid)
+    switch (getBaseType(typid))
     {
         case BOOLOID:
             return (double) DatumGetBool(value);
ndex: src/backend/optimizer/path/allpaths.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/optimizer/path/allpaths.c,v
retrieving revision 1.179
diff -u -r1.179 allpaths.c
--- src/backend/optimizer/path/allpaths.c    1 Jan 2009 17:23:43 -0000    1.179
+++ src/backend/optimizer/path/allpaths.c    7 Feb 2009 17:21:25 -0000
@@ -33,6 +33,9 @@
 #include "parser/parse_clause.h"
 #include "parser/parsetree.h"
 #include "rewrite/rewriteManip.h"
+#include "utils/lsyscache.h"
+#include "utils/syscache.h"
+


 /* These parameters are set by GUC */
@@ -1042,7 +1045,7 @@
             continue;            /* ignore resjunk columns */
         if (colType == NULL)
             elog(ERROR, "wrong number of tlist entries");
-        if (exprType((Node *) tle->expr) != lfirst_oid(colType))
+        if (exprType((Node *) tle->expr) != lfirst_oid(colType) &&
getBaseType(exprType((Node *) tle->expr)) != lfirst_oid(colType))
             differentTypes[tle->resno] = true;
         colType = lnext(colType);
     }




This is just a hack, and I know it won't work. But the question is, is
it possible to fix it ?
I think something should convert types to base type somewhere up the
call level. Perhaps we should pass on oid of type+baseType ?

I also noticed that the  convert_numeric_to_scalar() family of
functions is considered a future change too, atm everything there is
hardcoded pretty much.

Re: inheritance, and plans

From

Tom Lane

Date:

07 February 2009, 13:39:30

=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?= <gryzman@gmail.com> writes:
> with a little help on Irc from Andrew (RhodiumToad) I got it 'fixed',
> but I know this is just a hack:

You seem to be laboring under the delusion that this is considered a
bug.  It's a necessary semantic restriction, because the pushed-down
expression could mean different things when applied to different
data types.

            regards, tom lane

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

07 February 2009, 13:43:51

On Sat, Feb 7, 2009 at 5:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> You seem to be laboring under the delusion that this is considered a
> bug.  It's a necessary semantic restriction, because the pushed-down
> expression could mean different things when applied to different
> data types.

Very true Tom, still I was hoping this could be seen as an improvement
- because in fact in my case it would be .


Thanks.


--
GJ

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

07 February 2009, 13:53:00

and frankly I still (and few others) think it is a defect, for domain
with some base type should be treated as such. It is after all treated
that way when you create index.

Re: inheritance, and plans

From

Andrew Gierth

Date:

07 February 2009, 14:18:47

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 [domain -> base type conversion interfering with optimization]

 Tom> You seem to be laboring under the delusion that this is
 Tom> considered a bug.

Of course it's a bug, or at least a missing feature - there is no
justification for putting performance deathtraps in the way of using
domains.

 Tom> It's a necessary semantic restriction, because the pushed-down
 Tom> expression could mean different things when applied to different
 Tom> data types.

How?

Type-dependent selection of operators has already been done as part of
parse analysis, no? And the domain -> base conversion is purely a
relabelling, no? So what semantic change is possible as a result?

--
Andrew (irc:RhodiumToad)

Re: inheritance, and plans

From

Tom Lane

Date:

08 February 2009, 14:34:17

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Type-dependent selection of operators has already been done as part of
> parse analysis, no? And the domain -> base conversion is purely a
> relabelling, no? So what semantic change is possible as a result?

Domain conversions are *not* simply relabellings.  It's possible now to
have domain-specific functions/operators, and the likely future
evolution of the system is to make that work better rather than remove it.

It's possible that there are specific cases where the UNION optimization
checks could allow domains to be treated as their base types, but
blindly smashing both sides of the check to base is going to break more
cases than it fixes.

            regards, tom lane

Re: inheritance, and plans

From

Andrew Gierth

Date:

08 February 2009, 14:59:00

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 > Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
 >> Type-dependent selection of operators has already been done as
 >> part of parse analysis, no? And the domain -> base conversion is
 >> purely a relabelling, no? So what semantic change is possible as a
 >> result?

 Tom> Domain conversions are *not* simply relabellings.  It's possible
 Tom> now to have domain-specific functions/operators,

Right, but that's irrelevent to the planner in this case because the
resolution of precisely which operator is being called has _already
happened_ (in parse analysis), no?

 Tom> It's possible that there are specific cases where the UNION
 Tom> optimization checks could allow domains to be treated as their
 Tom> base types,

The domain -> base conversion is an important one (to anyone who uses
domains) because it happens implicitly in a wide range of contexts,
and so it's unsatisfactory for it to have major performance impacts
such as interfering with important optimizations.

 Tom> but blindly smashing both sides of the check to base is going to
 Tom> break more cases than it fixes.

The posted code was only looking up the base type for one side, not
both (though I don't know that code well enough to know whether it was
the correct side); the case of interest is when the subquery has the
domain type but the outer query is seeing the base type, _not_ the
reverse.

--
Andrew.

Re: inheritance, and plans

From

Grzegorz Jaśkiewicz

Date:

08 February 2009, 15:37:38

On Sun, Feb 8, 2009 at 6:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> It's possible that there are specific cases where the UNION optimization
> checks could allow domains to be treated as their base types, but
> blindly smashing both sides of the check to base is going to break more
> cases than it fixes.

What my little patch was trying to proof, is whether, that part of
planner could construct the plan better, if it had a notion of base
type in domains.
Which I still believe it should do, based on my simple test.

Again, I am not saying, that postgresql should treat domains just as
an alias to base type, but I do believe that it should at least
construct plan better - based on the base type, not the domain's oid.

If you know what it might possibly break, can you give some examples please ?

--
GJ