Re: Abysmal hash join - Mailing list pgsql-performance

From Tom Lane
Subject Re: Abysmal hash join
Date
Msg-id 3204.1157989198@sss.pgh.pa.us
Whole thread Raw
In response to Re: Abysmal hash join  (Florian Weimer <fweimer@bfk.de>)
Responses Re: Abysmal hash join  (Florian Weimer <fweimer@bfk.de>)
List pgsql-performance
Florian Weimer <fweimer@bfk.de> writes:
>> Maybe larger stats targets would help.

> I've set default_statistics_target to 100 and rerun ANALYZE on that
> table.  The estimate went down to 43108 (and the hash join is still
> the preferred plan).  ANALZE with default_statistics_target = 200
> (which seems pretty large to me) is down to 26050 and the bitmap scan
> plan is chosen.

> PostgreSQL seems to think that there are only very few distinct values
> for that column (with default_statistics_target = 100 and 200):

Yeah, n_distinct estimation from a sample is inherently hard :-(.  Given
that you have such a long tail on the distribution, it might be worth
your while to crank the stats target for that column all the way to the
maximum (1000).  Also you need to experiment with extending the stats
for the smaller table.

I believe what's happening here is that the smaller table joins only to
less-frequent entries in the big table (correct?).  The hash join would
be appropriate if there were many rows joining to the very-frequent
entries, and the problem for the planner is to determine that that's not
so.  Given enough stats on the two joining columns, it should be able to
determine that.

Of course, large stats targets will slow down planning to some extent,
so you should also keep an eye on how long it takes to plan the query.

            regards, tom lane

pgsql-performance by date:

Previous
From: Florian Weimer
Date:
Subject: Re: Abysmal hash join
Next
From: Brian Wipf
Date:
Subject: Re: Configuring System for Speed