Sort performance cliff with small work_mem - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Sort performance cliff with small work_mem |
Date | |
Msg-id | c5462706-ae0c-222c-4983-66584ee6b68f@iki.fi Whole thread Raw |
Responses |
Re: Sort performance cliff with small work_mem
(Robert Haas <robertmhaas@gmail.com>)
Re: Sort performance cliff with small work_mem (Peter Geoghegan <pg@bowt.ie>) |
List | pgsql-hackers |
Hi, I spent some time performance testing sorting, and spotted a funny phenomenon with very small work_mem settings. This example demonstrates it well: I sorted about 1 GB worth of pre-sorted integers, with different settings of work_mem, between 64 kB (the minimum), and 100 kB. I also added some logging to print out the number of "runs" performed: work_mem elapsed time runs -------- ------------ ---- 64 kB 20167 ms 35301 65 kB 19769 ms 34454 66 kB 19844 ms 33646 67 kB 20318 ms 32840 68 kB 20022 ms 32105 69 kB 19455 ms 31403 70 kB 19509 ms 30700 71 kB 19200 ms 30057 72 kB 19248 ms 29441 73 kB 19809 ms 29071 74 kB 19840 ms 28657 75 kB 19864 ms 28281 76 kB 19634 ms 27914 77 kB 19599 ms 27557 78 kB 19429 ms 27184 79 kB 19425 ms 26845 80 kB 20196 ms 26515 81 kB 21781 ms 26192 82 kB 18883 ms 25877 83 kB 20460 ms 25548 84 kB 18910 ms 25249 85 kB 29245 ms 2009692 86 kB 26532 ms 1039496 87 kB 25126 ms 701056 88 kB 25241 ms 528867 89 kB 24235 ms 418686 90 kB 24038 ms 350528 91 kB 24803 ms 301454 92 kB 23192 ms 264434 93 kB 23111 ms 233686 94 kB 23295 ms 210807 95 kB 26602 ms 192009 96 kB 22990 ms 176289 97 kB 22700 ms 162948 98 kB 22370 ms 150727 99 kB 22686 ms 140867 100 kB 22413 ms 132217 Huh, something funny happened going from 84 kB to 85 kB. I traced it to this piece of code in inittapes(). > /* > * Decrease availMem to reflect the space needed for tape buffers; but > * don't decrease it to the point that we have no room for tuples. (That > * case is only likely to occur if sorting pass-by-value Datums; in all > * other scenarios the memtuples[] array is unlikely to occupy more than > * half of allowedMem. In the pass-by-value case it's not important to > * account for tuple space, so we don't care if LACKMEM becomes > * inaccurate.) > */ > tapeSpace = (int64) maxTapes * TAPE_BUFFER_OVERHEAD; > > if (tapeSpace + GetMemoryChunkSpace(state->memtuples) < state->allowedMem) > USEMEM(state, tapeSpace); With a small work_mem values, maxTapes is always 6, so tapeSpace is 48 kB. With a small enough work_mem, 84 kB or below in this test case, there is not enough memory left at this point, so we don't subtract tapeSpace. However, with a suitably evil test case, you can get arbitrary close to the edge, so that we will subtract away almost all the remaining memory above, leaving only a few bytes for the tuples. In this example case, with work_mem='85 kB', each quicksorted run consists of only 15 tuples on average. To fix, I propose that we change the above so that we always subtract tapeSpace, but if there is less than e.g. 32 kB of memory left after that (including, if it went below 0), then we bump availMem back up to 32 kB. So we'd always reserve 32 kB to hold the tuples, even if that means that we exceed 'work_mem' slightly. Memory accounting isn't very accurate in general. Even as the code stands, we will exceed the requested work_mem if it's small enough, and we don't account for every small allocation and overhead anyway. So I think that would be acceptable, even though it's kind of cheating. Of course, sorting large sets with tiny work_mem settings isn't a very good idea to begin with. But I'd nevertheless like to avoid non-intuitive performance cliffs like this. It threw me off while performance testing another patch. - Heikki
pgsql-hackers by date: