Re: accounting for memory used for BufFile during hash joins - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: accounting for memory used for BufFile during hash joins |
Date | |
Msg-id | 20190529145517.sj2poqmb3cr4cg6w@development Whole thread Raw |
In response to | Re: accounting for memory used for BufFile during hash joins (Hubert Zhang <hzhang@pivotal.io>) |
Responses |
Re: accounting for memory used for BufFile during hash joins
|
List | pgsql-hackers |
On Tue, May 28, 2019 at 03:40:01PM +0800, Hubert Zhang wrote: >On Sat, May 4, 2019 at 8:34 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> >wrote: > >Hi Tomas > >I read your second patch which uses overflow buf files to reduce the total >number of batches. >It would solve the hash join OOM problem what you discussed above: 8K per >batch leads to batch bloating problem. > >I mentioned in another thread: > >https://www.postgresql.org/message-id/flat/CAB0yrekv%3D6_T_eUe2kOEvWUMwufcvfd15SFmCABtYFOkxCFdfA%40mail.gmail.com >There is another hashjoin OOM problem which disables splitting batches too >early. PG uses a flag hashtable->growEnable to determine whether to split >batches. Once one splitting failed(all the tuples are assigned to only one >batch of two split ones) The growEnable flag would be turned off forever. > >The is an opposite side of batch bloating problem. It only contains too few >batches and makes the in-memory hash table too large to fit into memory. > Yes. There are deffinitely multiple separate issues in the hashjoin code, and the various improvements discussed in this (and other) thread usually address just a subset of them. We need to figure out how to combine them or maybe devise some more generic solution. So I think we need to take a step back, and figure out how to combine these improvements - otherwise we might commit a fix for one issue, making it much harder/impossible to improve the other issues. The other important question is whether we see these cases as outliers (and the solutions as last-resort-attempt-to-survive kind of fix) or more widely applicable optimizations. I've seen some interesting speedups with the overflow-batches patch, but my feeling is we should really treat it as a last-resort to survive. I had a chat about this with Thomas Munro yesterday. Unfortunately, some beer was involved but I do vaguely remember he more or less convinced me the BNL (block nested loop join) might be the right approach here. We don't have any patch for that yet, though :-( >Here is the tradeoff: one batch takes more than 8KB(8KB makes sense, due to >performance), in-memory hash table takes memory as well and splitting >batched may(not must) reduce the in-memory hash table size but introduce >more batches(and thus more memory usage 8KB*#batch). >Can we conclude that it would be worth to splitting if satisfy: >(The reduced memory of in-memory hash table) - (8KB * number of new >batches) > 0 > Something like that, yes. >So I'm considering to combine our patch with your patch to fix join OOM >problem. No matter the OOM is introduced by (the memory usage of in-memory >hash table) or (8KB * number of batches). > >nbatch_inmemory in your patch could also use the upper rule to redefine. > >What's your opinion? > One of the issues with my "overflow batches" patch, pointed out to me by Thomas yesterday, is that it only works with non-parallel hash join. And we don't know how to make it work in the parallel mode :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: