Re: -HEAD planner issue wrt hash_joins on dbt3 ? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date
Msg-id 570.1159307608@sss.pgh.pa.us
Whole thread Raw
In response to Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Responses Re: -HEAD planner issue wrt hash_joins on dbt3 ?  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
List pgsql-hackers
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
> Tom Lane wrote:
>> It evidently thinks that most of the rows in the join of part and
>> partsupp won't have any matching rows in lineitem, whereas on average
>> there are about 7 matching rows apiece.  So that's totally wacko, and
>> it's not immediately obvious why.  Could we see the pg_stats entries for
>> part.p_partkey, partsupp.ps_partkey, partsupp.ps_suppkey,
>> lineitem.l_partkey, lineitem.l_suppkey?

> http://www.kaltenbrunner.cc/files/dbt3_beta1_statistics.txt

OK, so we have 2 million parts and 100000 suppliers, and ANALYZE doesn't
seem to have been too far off at estimating either of those numbers.
I think the problem is that there are not very many suppliers for any
particular part, and thus the condition "part match AND supplier match"
is really not much more selective than "part match" alone.  The planner
is supposing that their selectivities are independent, which they
aren't.

Offhand I don't see any good way to fix this without multi-column
statistics, which is something that's certainly not happening for 8.2 :-(
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Constant changes (Re-Build)
Next
From: Tom Lane
Date:
Subject: Isn't strdup.h useless code?