Re: Out of Memory errors are frustrating as heck! - Mailing list pgsql-performance
From | Tomas Vondra |
---|---|
Subject | Re: Out of Memory errors are frustrating as heck! |
Date | |
Msg-id | 20190422030927.3huxq7gghms4kmf4@development Whole thread Raw |
In response to | Re: Out of Memory errors are frustrating as heck! (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: Out of Memory errors are frustrating as heck!
Re: Out of Memory errors are frustrating as heck! |
List | pgsql-performance |
On Sun, Apr 21, 2019 at 11:40:22AM -0500, Justin Pryzby wrote: >On Sun, Apr 21, 2019 at 10:36:43AM -0400, Tom Lane wrote: >> Jeff Janes <jeff.janes@gmail.com> writes: >> > The growEnabled stuff only prevents infinite loops. It doesn't prevent >> > extreme silliness. >> >> > If a single 32 bit hash value has enough tuples by itself to not fit in >> > work_mem, then it will keep splitting until that value is in a batch by >> > itself before shutting off >> >> I suspect, however, that we might be better off just taking the existence >> of the I/O buffers into account somehow while deciding whether it's worth >> growing further. That is, I'm imagining adding a second independent >> reason for shutting off growEnabled, along the lines of "increasing >> nbatch any further will require an unreasonable amount of buffer memory". >> The question then becomes how to define "unreasonable". > >On Sun, Apr 21, 2019 at 06:15:25PM +0200, Tomas Vondra wrote: >> I think the question the code needs to be asking is "If we double the >> number of batches, does the amount of memory we need drop?" And the >> memory needs to account both for the buffers and per-batch data. >> >> I don't think we can just stop increasing the number of batches when the >> memory for BufFile exceeds work_mem, because that entirely ignores the >> fact that by doing that we force the system to keep the per-batch stuff >> in memory (and that can be almost arbitrary amount). >... >> Of course, this just stops enforcing work_mem at some point, but it at >> least attempts to minimize the amount of memory used. > >This patch defines reasonable as "additional BatchFiles will not themselves >exceed work_mem; OR, exceeded work_mem already but additional BatchFiles are >going to save us RAM"... > OK. >I think the first condition is insensitive and not too important to get right, >it only allows work_mem to be exceeded by 2x, which maybe already happens for >multiple reasons, related to this thread and otherwise. It'd be fine to slap >on a factor of /2 or /4 or /8 there too. > TBH I'm not quite sure I understand all the conditions in the patch - it seems unnecessarily complicated. And I don't think it actually minimizes the amount of memory used for hash table + buffers, because it keeps the same spaceAllowed (which pushes nbatches up). At some point it actually makes to bump spaceAllowed and make larger batches instead of adding more batches, and the patch does not seem to do that. Also, the patch does this: if (hashtable->nbatch*sizeof(PGAlignedBlock) < hashtable->spaceAllowed) { ExecHashIncreaseNumBatches(hashtable); } else if (hashtable->spaceUsed/2 >= hashtable->spaceAllowed) { /* Exceeded spaceAllowed by 2x, so we'll save RAM by allowing nbatches to increase */ /* I think this branch would be hit almost same as below branch */ ExecHashIncreaseNumBatches(hashtable); } ... but the reasoning for the second branch seems wrong, because (spaceUsed/2 >= spaceAllowed) is not enough to guarantee that we actually save memory by doubling the number of batches. To do that, we need to make sure that (spaceUsed/2 >= hashtable->nbatch * sizeof(PGAlignedBlock)) But that may not be true - it certainly is not guaranteed by not getting into the first branch. Consider an ideal example with uniform distribution: create table small (id bigint, val text); create table large (id bigint, val text); insert into large select 1000000000 * random(), md5(i::text) from generate_series(1, 700000000) s(i); insert into small select 1000000000 * random(), md5(i::text) from generate_series(1, 10000) s(i); vacuum analyze large; vacuum analyze small; update pg_class set (relpages, reltuples) = (1000000, 1) where relname = 'large'; update pg_class set (relpages, reltuples) = (1, 1000000) where relname = 'small'; set work_mem = '1MB'; explain analyze select * from small join large using (id); A log after each call to ExecHashIncreaseNumBatches says this: nbatch=2 spaceUsed=463200 spaceAllowed=1048576 BufFile=16384 nbatch=4 spaceUsed=463120 spaceAllowed=1048576 BufFile=32768 nbatch=8 spaceUsed=457120 spaceAllowed=1048576 BufFile=65536 nbatch=16 spaceUsed=458320 spaceAllowed=1048576 BufFile=131072 nbatch=32 spaceUsed=457120 spaceAllowed=1048576 BufFile=262144 nbatch=64 spaceUsed=459200 spaceAllowed=1048576 BufFile=524288 nbatch=128 spaceUsed=455600 spaceAllowed=1048576 BufFile=1048576 nbatch=256 spaceUsed=525120 spaceAllowed=1048576 BufFile=2097152 nbatch=256 spaceUsed=2097200 spaceAllowed=1048576 BufFile=2097152 nbatch=512 spaceUsed=2097200 spaceAllowed=1048576 BufFile=4194304 nbatch=1024 spaceUsed=2097200 spaceAllowed=1048576 BufFile=8388608 nbatch=2048 spaceUsed=2097200 spaceAllowed=1048576 BufFile=16777216 nbatch=4096 spaceUsed=2097200 spaceAllowed=1048576 BufFile=33554432 nbatch=8192 spaceUsed=2097200 spaceAllowed=1048576 BufFile=67108864 nbatch=16384 spaceUsed=2097200 spaceAllowed=1048576 BufFile=134217728 So we've succeeded in keeping spaceUsed below 2*spaceAllowed (which seems rather confusing, BTW), but we've allocated 128MB for BufFile. So about 130MB in total. With 16k batches. What I think might work better is the attached v2 of the patch, with a single top-level condition, comparing the combined memory usage (spaceUsed + BufFile) against spaceAllowed. But it also tweaks spaceAllowed once the size needed for BufFile gets over work_mem/3. And it behaves like this: nbatch=2 spaceUsed=458640 spaceAllowed=1048576 BufFile=16384 nbatch=4 spaceUsed=455040 spaceAllowed=1048576 BufFile=32768 nbatch=8 spaceUsed=440160 spaceAllowed=1048576 BufFile=65536 nbatch=16 spaceUsed=426560 spaceAllowed=1048576 BufFile=131072 nbatch=32 spaceUsed=393200 spaceAllowed=1048576 BufFile=262144 nbatch=64 spaceUsed=329120 spaceAllowed=1572864 BufFile=524288 nbatch=128 spaceUsed=455600 spaceAllowed=3145728 BufFile=1048576 nbatch=256 spaceUsed=987440 spaceAllowed=6291456 BufFile=2097152 nbatch=512 spaceUsed=2040560 spaceAllowed=12582912 BufFile=4194304 nbatch=1024 spaceUsed=4114640 spaceAllowed=25165824 BufFile=8388608 nbatch=2048 spaceUsed=8302880 spaceAllowed=50331648 BufFile=16777216 So we end up with just 2k batches, using ~24MB of memory in total. That's because the spaceAllowed limit was bumped up instead of adding more and more batches. >The current patch doesn't unset growEnabled, since there's no point at which >the hashtable should grow without bound: if hash tables are *already* exceeding >work_mem by 2x as big, nbatches should be doubled. > Not sure. I guess it might be useful to re-evaluate the flag after a while - not necessarily by actually enabling it right away, but just checking if it'd move any tuples. Just disabling it once may be an issue when the input data is somehow correlated, which seems to be one of the issues with the data set discussed in this thread. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-performance by date: