Thread: Suspicious call of initial_cost_hashjoin()
try_partial_hashjoin_path() passes constant true to for the parallel_hash argument of initial_cost_hashjoin(). Shouldn't it instead pass the parallel_hash argument that it receives? This is related to commit 1804284042e659e7d16904e7bbb0ad546394b6a3. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
On Fri, Dec 22, 2017 at 10:45 PM, Antonin Houska <ah@cybertec.at> wrote: > try_partial_hashjoin_path() passes constant true to for the parallel_hash > argument of initial_cost_hashjoin(). Shouldn't it instead pass the > parallel_hash argument that it receives? Thanks. Yeah. When initial_cost_hashjoin() calls get_parallel_divisor() on a non-partial inner path I think it would return 1.0, so no damage was done there, but when ExecChooseHashTableSize() receives try_combined_work_mem == true it might underestimate the number of batches required for a partial hash join without parallel hash, because it would incorrectly assume that a single batch join could use the combined work_mem budget. This was quite well hidden because ExecHashTableCreate() calls ExecChooseHashTableSize() again (rather than reusing the results from planning time), so the bad nbatch estimate doesn't show up anywhere. -- Thomas Munro http://www.enterprisedb.com
Attachment
Hi Antonin, On 12/22/17 6:13 AM, Thomas Munro wrote: > On Fri, Dec 22, 2017 at 10:45 PM, Antonin Houska <ah@cybertec.at> wrote: >> try_partial_hashjoin_path() passes constant true to for the parallel_hash >> argument of initial_cost_hashjoin(). Shouldn't it instead pass the >> parallel_hash argument that it receives? > > Thanks. Yeah. When initial_cost_hashjoin() calls > get_parallel_divisor() on a non-partial inner path I think it would > return 1.0, so no damage was done there, but when > ExecChooseHashTableSize() receives try_combined_work_mem == true it > might underestimate the number of batches required for a partial hash > join without parallel hash, because it would incorrectly assume that a > single batch join could use the combined work_mem budget. This was > quite well hidden because ExecHashTableCreate() calls > ExecChooseHashTableSize() again (rather than reusing the results from > planning time), so the bad nbatch estimate doesn't show up anywhere. Does this look right to you? If so, can you sign up as a reviewer and mark it Ready for Committer? Thanks, -- -David david@pgmasters.net
David Steele <david@pgmasters.net> wrote: > On 12/22/17 6:13 AM, Thomas Munro wrote: > > On Fri, Dec 22, 2017 at 10:45 PM, Antonin Houska <ah@cybertec.at> wrote: > >> try_partial_hashjoin_path() passes constant true to for the parallel_hash > >> argument of initial_cost_hashjoin(). Shouldn't it instead pass the > >> parallel_hash argument that it receives? > > > > Thanks. Yeah. When initial_cost_hashjoin() calls > > get_parallel_divisor() on a non-partial inner path I think it would > > return 1.0, so no damage was done there, but when > > ExecChooseHashTableSize() receives try_combined_work_mem == true it > > might underestimate the number of batches required for a partial hash > > join without parallel hash, because it would incorrectly assume that a > > single batch join could use the combined work_mem budget. This was > > quite well hidden because ExecHashTableCreate() calls > > ExecChooseHashTableSize() again (rather than reusing the results from > > planning time), so the bad nbatch estimate doesn't show up anywhere. > Does this look right to you? Yes, this is what I meant. The patch applies cleanly and the code compiles well. > If so, can you sign up as a reviewer and mark it Ready for Committer? Done. Actually I think it'd be nice if the "parallel_hash" argument was mentioned in the header comment of initial_cost_hashjoin() function, but not sure this is worth returning the patch to the author. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: https://www.cybertec-postgresql.com
On Fri, Mar 2, 2018 at 9:06 PM, Antonin Houska <ah@cybertec.at> wrote: > David Steele <david@pgmasters.net> wrote: >> Does this look right to you? > > Yes, this is what I meant. The patch applies cleanly and the code compiles > well. > >> If so, can you sign up as a reviewer and mark it Ready for Committer? > > Done. Thanks. > Actually I think it'd be nice if the "parallel_hash" argument was mentioned in > the header comment of initial_cost_hashjoin() function, but not sure this is > worth returning the patch to the author. Done. -- Thomas Munro http://www.enterprisedb.com
Attachment
On 3/2/18 05:01, Thomas Munro wrote: > On Fri, Mar 2, 2018 at 9:06 PM, Antonin Houska <ah@cybertec.at> wrote: >> David Steele <david@pgmasters.net> wrote: >>> Does this look right to you? >> >> Yes, this is what I meant. The patch applies cleanly and the code compiles >> well. >> >>> If so, can you sign up as a reviewer and mark it Ready for Committer? >> >> Done. > > Thanks. > >> Actually I think it'd be nice if the "parallel_hash" argument was mentioned in >> the header comment of initial_cost_hashjoin() function, but not sure this is >> worth returning the patch to the author. > > Done. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services