On Sat, Apr 20, 2019 at 04:26:34PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> Considering how rare this issue likely is, we need to be looking for a
>> solution that does not break the common case.
>
>Agreed. What I think we need to focus on next is why the code keeps
>increasing the number of batches. It seems like there must be an undue
>amount of data all falling into the same bucket ... but if it were simply
>a matter of a lot of duplicate hash keys, the growEnabled shutoff
>heuristic ought to trigger.
>
I think it's really a matter of underestimate, which convinces the planner
to hash the larger table. In this case, the table is 42GB, so it's
possible it actually works as expected. With work_mem = 4MB I've seen 32k
batches, and that's not that far off, I'd day. Maybe there are more common
values, but it does not seem like a very contrived data set.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services