Thread: Re: Order by (for 15 rows) adds 30 seconds to query time

Re: Order by (for 15 rows) adds 30 seconds to query time

From
"Kevin Grittner"
Date:
Tom Lane wrote:

> That does look weird. Do we have a self-contained test case?

I've been tinkering with this and I now have a self-contained test
case (SQL statements and run results attached). I've debugged through
it and things don't seem right in set_append_rel_pathlist, since
childrel->rows seems to contain the total rows in each table rather
than the number which meet the join conditions. That is reflected in
the output from OPTIMIZER_DEBUG logging, but not in the query plan
from ANALYZE?

I'm afraid I'm a bit stuck on getting farther.  Hopefully this much
will be of use to someone.  If you could point out where I should
have looked next, I'd be grateful.  :-)

-Kevin


Attachment

Re: Order by (for 15 rows) adds 30 seconds to query time

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane wrote:
>> That does look weird. Do we have a self-contained test case?

> I've been tinkering with this and I now have a self-contained test
> case (SQL statements and run results attached). I've debugged through
> it and things don't seem right in set_append_rel_pathlist, since
> childrel->rows seems to contain the total rows in each table rather
> than the number which meet the join conditions.

Yeah, that is expected.  Nestloop inner indexscans have a rowcount
estimate that is different from that of the parent table --- the
parent's rowcount is what would be applicable for another type of
join, such as merge or hash, where the join condition is applied at
the join node not in the relation scan.

The problem here boils down to the fact that examine_variable punts on
appendrel variables:

        else if (rte->inh)
        {
            /*
             * XXX This means the Var represents a column of an append
             * relation. Later add code to look at the member relations and
             * try to derive some kind of combined statistics?
             */
        }

This means you get a default estimate for the selectivity of the join
condition, so the joinrel size estimate ends up being 0.005 * 1 * 40000.
That's set long before we ever generate indexscan plans, and I don't
think there's any clean way to correct the size estimate when we do.

Fixing this has been on the to-do list since forever.  I don't think
we'll make much progress on it until we have an explicit notion of
partitioned tables.  The approach contemplated in the comment, of
assembling some stats on-the-fly from the stats for individual child
tables, doesn't seem real practical from a planning-time standpoint.
The thing that you really want to know here is that there will be only
one matching id value in the whole partitioned table; and that would be
something trivial to know if we understood about partitioning keys,
but it's difficult to extract from independent sets of stats.

            regards, tom lane

Re: Order by (for 15 rows) adds 30 seconds to query time

From
"Kevin Grittner"
Date:
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Yeah, that is expected.  Nestloop inner indexscans have a rowcount
> estimate that is different from that of the parent table --- the
> parent's rowcount is what would be applicable for another type of
> join, such as merge or hash, where the join condition is applied
> at the join node not in the relation scan.
>
> The problem here boils down to the fact that examine_variable
> punts on appendrel variables:

>  * XXX This means the Var represents a column of an append
>  * relation. Later add code to look at the member relations and
>  * try to derive some kind of combined statistics?

> This means you get a default estimate for the selectivity of the
> join condition, so the joinrel size estimate ends up being 0.005 *
> 1 * 40000.  That's set long before we ever generate indexscan
> plans, and I don't think there's any clean way to correct the size
> estimate when we do.

Thanks for the explanation.

> Fixing this has been on the to-do list since forever.

Which item?  I looked and couldn't find one which seems to fit.
(I was hoping to find a reference to a discussion thread.)

> I don't think we'll make much progress on it until we have an
> explicit notion of partitioned tables.

I'm not clear that a generalized solution for partitioned tables
would solve the production query from the OP.  The OP was using
table extension to model the properties of the data.  In the actual
production problem, the table structure involved, for example,
materials -- some of which were containers (which had all the
properties of other materials, plus some unique to containers); so
the containers table extends the materials table to model that.  In
the problem query, he wanted to find all the materials related to
some item.  He was also joining to location, which might be (among
other things) a waypoint or a container (both extending location).
Note that a container is both a location and a material.

> The approach contemplated in the comment, of assembling some stats
> on-the-fly from the stats for individual child tables, doesn't
> seem real practical from a planning-time standpoint.

Can you give a thumbnail sketch of why that is?

> The thing that you really want to know here is that there will be
> only one matching id value in the whole partitioned table

It would seem to matter nearly as much if statistics indicated you
would get five rows out of twenty million.

> it's difficult to extract from independent sets of stats.

Since we make the attempt for most intermediate results, it's not
immediately clear to me why it's so hard here.  Not that I'm
doubting your assertion that it *is* hard; I'm just trying to see
*why* it is.  Perhaps the code which generates such estimates for
everything else could be made available here?

The usefulness of inheritance to model data would seem to be rather
limited without better optimizer support.

-Kevin

Re: Order by (for 15 rows) adds 30 seconds to query time

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The approach contemplated in the comment, of assembling some stats
>> on-the-fly from the stats for individual child tables, doesn't
>> seem real practical from a planning-time standpoint.

> Can you give a thumbnail sketch of why that is?

Well, it would be expensive, and it's not even clear you can do it at
all (merging histograms with overlapping bins seems like a mess for
instance).

I think we have previously discussed the idea of generating and storing
ANALYZE stats for a whole inheritance tree, which'd solve the problem
nicely from the planner's standpoint.  I'm a bit worried about the
locking implications, but if it took just a SELECT lock on the child
tables it probably wouldn't be too bad --- no worse than any other
SELECT on the inheritance tree.  Another thing that's hard to figure out
is how autovacuum would know when to redo the stats.  In a lot of common
situations, the inheritance parent table is empty and never changes, so
no autovac or autoanalyze would ever get launched against it.

            regards, tom lane