Re: Default setting for enable_hashagg_disk - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Default setting for enable_hashagg_disk |
Date | |
Msg-id | 20200409114855.h4hh7e2wt77p56sv@development Whole thread Raw |
In response to | Re: Default setting for enable_hashagg_disk (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: Default setting for enable_hashagg_disk
Re: Default setting for enable_hashagg_disk |
List | pgsql-hackers |
On Tue, Apr 07, 2020 at 05:39:01PM -0500, Justin Pryzby wrote: >On Tue, Apr 07, 2020 at 11:20:46AM -0700, Jeff Davis wrote: >> The enable_hashagg_disk GUC, if set to true, chooses HashAgg based on >> costing. If false, it only generates a HashAgg path if it thinks it will fit >> in work_mem, similar to the old behavior (though it wlil now spill to disk if >> the planner was wrong about it fitting in work_mem). The current default is >> true. > >Are there any other GUCs that behave like that ? It's confusing to me when I >see "Disk Usage: ... kB", despite setting it to "disable", and without the >usual disable_cost. I realize that postgres chose the plan on the hypothesis >that it would *not* exceed work_mem, and that spilling to disk is considered >preferable to ignoring the setting, and that "going back" to planning phase >isn't a possibility. > It it really any different from our enable_* GUCs? Even if you do e.g. enable_sort=off, we may still do a sort. Same for enable_groupagg etc. >template1=# explain (analyze, costs off, summary off) SELECT a, COUNT(1) FROM generate_series(1,999999) a GROUP BY 1 ; > HashAggregate (actual time=1370.945..2877.250 rows=999999 loops=1) > Group Key: a > Peak Memory Usage: 5017 kB > Disk Usage: 22992 kB > HashAgg Batches: 84 > -> Function Scan on generate_series a (actual time=314.507..741.517 rows=999999 loops=1) > >A previous version of the docs said this, which I thought was confusing, and you removed it. >But I guess this is the behavior it was trying to .. explain. > >+ <term><varname>enable_hashagg_disk</varname> (<type>boolean</type>) >+ ... This only affects the planner choice; >+ execution time may still require using disk-based hash >+ aggregation. The default is <literal>on</literal>. > >I suggest that should be reworded and then re-introduced, unless there's some >further behavior change allowing the previous behavior of >might-exceed-work-mem. > Yeah, it would be good to mention this is a best-effort setting. >"This setting determines whether the planner will elect to use a hash plan >which it expects will exceed work_mem and spill to disk. During execution, >hash nodes which exceed work_mem will spill to disk even if this setting is >disabled. To avoid spilling to disk, either increase work_mem (or set >enable_hashagg=off)." > >For sure the release notes should recommend re-calibrating work_mem. > I don't follow. Why would the recalibrating be needed? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: