Thread: 7k records into Sort node, 4.5m out?

7k records into Sort node, 4.5m out?

From
Christophe Pettus
Date:
Consider this EXPLAIN ANALYZE output:

    http://explain.depesz.com/s/TCi

Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows, and the Sort above it expects to be
processing7330 rows (the same number the Bitmap Heap Scan expected to produce)... but the sort is actually producing
4512231rows, which the sort time would indicate is what really happened.  How can this be? 

--
-- Christophe Pettus
   xof@thebuild.com



Re: 7k records into Sort node, 4.5m out?

From
"Tomas Vondra"
Date:
On 14 Srpen 2012, 3:15, Christophe Pettus wrote:
> Consider this EXPLAIN ANALYZE output:
>
>     http://explain.depesz.com/s/TCi
>
> Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows,
> and the Sort above it expects to be processing 7330 rows (the same number
> the Bitmap Heap Scan expected to produce)... but the sort is actually
> producing 4512231 rows, which the sort time would indicate is what really
> happened.  How can this be?

Hi,

notice there's a merge join right above the sort. If there are duplicate
values in the first table (charlie in the explain plans), the matching
rows from the sort will be read repeatedly (re-scanned) and thus counted
multiple times.

Tomas



Re: 7k records into Sort node, 4.5m out?

From
Christophe Pettus
Date:
On Aug 13, 2012, at 6:35 PM, Tomas Vondra wrote:

> On 14 Srpen 2012, 3:15, Christophe Pettus wrote:
>> Consider this EXPLAIN ANALYZE output:
>>
>>     http://explain.depesz.com/s/TCi
>>
>> Note the Bitmap Heap Scan at the bottom claims to be producing 7094 rows,
>> and the Sort above it expects to be processing 7330 rows (the same number
>> the Bitmap Heap Scan expected to produce)... but the sort is actually
>> producing 4512231 rows, which the sort time would indicate is what really
>> happened.  How can this be?
>
> Hi,
>
> notice there's a merge join right above the sort. If there are duplicate
> values in the first table (charlie in the explain plans), the matching
> rows from the sort will be read repeatedly (re-scanned) and thus counted
> multiple times.

Thanks, that makes sense.  Something a colleague of mine just noticed is that the estimate cost of the Index Scan node
isn'tbeing included in the cost of the Merge Join above it, which makes the Merge Join seem much cheaper than it really
is. Could this be a planner bug? 

--
-- Christophe Pettus
   xof@thebuild.com



Re: 7k records into Sort node, 4.5m out?

From
Tom Lane
Date:
Christophe Pettus <xof@thebuild.com> writes:
> Thanks, that makes sense.  Something a colleague of mine just noticed is that the estimate cost of the Index Scan
nodeisn't being included in the cost of the Merge Join above it, which makes the Merge Join seem much cheaper than it
reallyis.  Could this be a planner bug? 

No, that looks sane.  It's probably expecting that the range of keys on
the right-hand side is a lot less than the range of keys on the left,
and thus the merge won't have to read all of the left side.  Since the
output shows an estimated total number of rows in the LHS of 84 million,
but the join stopped after reading 20 million of them, it looks like
that effect did in fact occur.  If the planner had that fraction dead
on, it would only have charged the mergejoin with a quarter of the
indexscan's total estimated cost.  It's hard to tell though exactly what
it did think.

The whole thing looks a bit weird to me --- why did it not use a
nestloop join with inner indexscan on charlie?  With 7000 rows on the
other side, the estimated cost for that shouldn't have been more than
about 30000 ...

            regards, tom lane


Re: 7k records into Sort node, 4.5m out?

From
Christophe Pettus
Date:
On Aug 13, 2012, at 7:11 PM, Tom Lane wrote:
> The whole thing looks a bit weird to me --- why did it not use a
> nestloop join with inner indexscan on charlie?  With 7000 rows on the
> other side, the estimated cost for that shouldn't have been more than
> about 30000 ...

Here's the same query with set enable_megejoin = off.  All of the other query tuning parameters are default.

    http://explain.depesz.com/s/dqO

--
-- Christophe Pettus
   xof@thebuild.com



Re: 7k records into Sort node, 4.5m out?

From
Tom Lane
Date:
Christophe Pettus <xof@thebuild.com> writes:
> On Aug 13, 2012, at 7:11 PM, Tom Lane wrote:
>> The whole thing looks a bit weird to me --- why did it not use a
>> nestloop join with inner indexscan on charlie?  With 7000 rows on the
>> other side, the estimated cost for that shouldn't have been more than
>> about 30000 ...

> Here's the same query with set enable_megejoin = off.  All of the other query tuning parameters are default.

>     http://explain.depesz.com/s/dqO

Maybe you had better show us the actual query, and the table/index
definitions.  Because it's sure making odd choices here.  This seems
like the wrong join order altogether ...

            regards, tom lane


Re: 7k records into Sort node, 4.5m out?

From
Josh Berkus
Date:
On 8/15/12 1:51 PM, Tom Lane wrote:
> Christophe Pettus <xof@thebuild.com> writes:
>> On Aug 13, 2012, at 7:11 PM, Tom Lane wrote:
>>> The whole thing looks a bit weird to me --- why did it not use a
>>> nestloop join with inner indexscan on charlie?  With 7000 rows on the
>>> other side, the estimated cost for that shouldn't have been more than
>>> about 30000 ...
>
>> Here's the same query with set enable_megejoin = off.  All of the other query tuning parameters are default.
>
>>     http://explain.depesz.com/s/dqO
>
> Maybe you had better show us the actual query, and the table/index
> definitions.  Because it's sure making odd choices here.  This seems
> like the wrong join order altogether ...

We'll need to do that off-list for confidentiality reasons.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: 7k records into Sort node, 4.5m out?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 8/15/12 1:51 PM, Tom Lane wrote:
>> Maybe you had better show us the actual query, and the table/index
>> definitions.  Because it's sure making odd choices here.  This seems
>> like the wrong join order altogether ...

> We'll need to do that off-list for confidentiality reasons.

If you can show us the anonymized query plan, why not the anonymized query?
It doesn't look like it could be all that complicated.

            regards, tom lane


Re: 7k records into Sort node, 4.5m out?

From
Josh Berkus
Date:
> If you can show us the anonymized query plan, why not the anonymized query?
> It doesn't look like it could be all that complicated.

It's not:

SELECT COUNT(*)
FROM "user"
INNER JOIN "house"
      ON ("user"."house_id" = "house"."id")
LEFT OUTER JOIN "district"
      ON ("house"."district_id" = "district"."id")
WHERE ("user"."status" = 0
     AND ("district"."update_status" = 2
                OR "district"."update_status" = 3 )
     AND ("user"."valid" = 1
                OR "user"."valid" = 3 )
     AND "district"."is_test" = false );

However, since the anonymization above doesn't quite match that used in
the EXPLAIN plan, I'm not sure what you'll get out of it.  And yes, we
know that the outer join is being invalidated.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: 7k records into Sort node, 4.5m out?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> SELECT COUNT(*)
> FROM "user"
> INNER JOIN "house"
>       ON ("user"."house_id" = "house"."id")
> LEFT OUTER JOIN "district"
>       ON ("house"."district_id" = "district"."id")
> WHERE ("user"."status" = 0
>      AND ("district"."update_status" = 2
>                 OR "district"."update_status" = 3 )
>      AND ("user"."valid" = 1
>                 OR "user"."valid" = 3 )
>      AND "district"."is_test" = false );

> However, since the anonymization above doesn't quite match that used in
> the EXPLAIN plan, I'm not sure what you'll get out of it.  And yes, we
> know that the outer join is being invalidated.

Ah, I see where I was confused: in the original query plan I'd been
imagining that charlie.sierra was a unique column, but your gloss on
that as being house.district_id implies that it's highly non-unique.
And looking at the rowcounts in the original plan backs that up:
there are about 600 house rows per district row.  So my thought of
having district as the outer side of a nestloop scanning the index
on house.district_id would not really work very well --- maybe it
would end up cheaper than the mergejoin plan, but it's far from a
clear-cut win.

On the whole I'm thinking the code is operating as designed here.

            regards, tom lane


Re: 7k records into Sort node, 4.5m out?

From
Josh Berkus
Date:
> Ah, I see where I was confused: in the original query plan I'd been
> imagining that charlie.sierra was a unique column, but your gloss on
> that as being house.district_id implies that it's highly non-unique.
> And looking at the rowcounts in the original plan backs that up:
> there are about 600 house rows per district row.  So my thought of
> having district as the outer side of a nestloop scanning the index
> on house.district_id would not really work very well --- maybe it
> would end up cheaper than the mergejoin plan, but it's far from a
> clear-cut win.
>
> On the whole I'm thinking the code is operating as designed here.

Well, except for the part where it's choosing a plan which takes 486
seconds over a plan which takes 4 seconds.

I guess what I'm really not understanding is why it's calculating a cost
of 3.7m for the index scan, and then discarding that *entire* cost and
not including it in the total cost of the query?  This seems wrong,
especially since that index scan, in fact, ends up being 85% of the
execution time of the query:

 Merge Join (cost=7457.670..991613.190 rows=1092168 width=4) (actual
time=57.854..481062.706 rows=4514968 loops=1)

    Merge Cond: (charlie.sierra = four.quebec_seven)

Index Scan using whiskey_delta on charlie (cost=0.000..3775171.860
rows=84904088 width=8) (actual time=0.006..459501.341 rows=20759070 loops=1)

If the cost of the index scan were included in the total cost of the
query plan, then the planner *would* chose the nestloop plan instead.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: 7k records into Sort node, 4.5m out?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I guess what I'm really not understanding is why it's calculating a cost
> of 3.7m for the index scan, and then discarding that *entire* cost and
> not including it in the total cost of the query?

It isn't ... or at least, you've offered no evidence that it is.
It's discounting some fraction of the cost on the (apparently correct)
basis that the merge won't read that input all the way to the end.
Whether it's discounted by an appropriate fraction is hard to tell
from the information given.  The actual rows count is about a quarter
the whole-scan estimate, so a multiplier of 0.25 seems right in
hindsight, and that seems to match up roughly right with the mergejoin
cost estimate --- but not knowing the actual table size, there's a lot
of uncertainty here.

            regards, tom lane


Re: 7k records into Sort node, 4.5m out?

From
Josh Berkus
Date:
> It isn't ... or at least, you've offered no evidence that it is.

Sorry, I thought Christophe had sent you the details offlist.  Checking ...


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com