Re: Abysmal hash join - Mailing list pgsql-performance

From Florian Weimer
Subject Re: Abysmal hash join
Date
Msg-id 82sliyweny.fsf@mid.bfk.de
Whole thread Raw
In response to Re: Abysmal hash join  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Abysmal hash join
List pgsql-performance
* Tom Lane:

> Yeah, n_distinct estimation from a sample is inherently hard :-(.  Given
> that you have such a long tail on the distribution, it might be worth
> your while to crank the stats target for that column all the way to the
> maximum (1000).

I've done that.  Fortunately, ANALYZE time didn't increase by that
much, compared to the default (by just a factor of 10).  The bitmap
scan estimate is still way off (around 8000), but let's hope that it
won't matter in practice.

> Also you need to experiment with extending the stats for the smaller
> table.

Yeah, the situation is quite similar, but on a much smaller scale.

> I believe what's happening here is that the smaller table joins only to
> less-frequent entries in the big table (correct?).

Almost.  We won't select the rows based on these values, at least not
in queries of that type.  The reason is simply that the result set is
too large to be useful.

> Of course, large stats targets will slow down planning to some extent,
> so you should also keep an eye on how long it takes to plan the query.

These queries are mostly ad-hoc, so a delay of a couple of seconds
doesn't matter.  Only if you need to wait five minutes, it's a
different story.

It seems that the situation is under control now.  Thanks.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Durlacher Allee 47            tel: +49-721-96201-1
D-76131 Karlsruhe             fax: +49-721-96201-99

pgsql-performance by date:

Previous
From: Brian Wipf
Date:
Subject: Re: Configuring System for Speed
Next
From: Piñeiro
Date:
Subject: Performance problem with Sarge compared with Woody