Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoCCP=N9JjEo89r7rb6RiDJRub=C+Wbfmnr-QOO_6AiDXA@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Fri, Jul 9, 2021 at 2:37 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-07-08 20:53:32 -0700, Andres Freund wrote: > > On 2021-07-07 20:46:38 +0900, Masahiko Sawada wrote: > > > 1. Don't allocate more than 1GB. There was a discussion to eliminate > > > this limitation by using MemoryContextAllocHuge() but there were > > > concerns about point 2[1]. > > > > > > 2. Allocate the whole memory space at once. > > > > > > 3. Slow lookup performance (O(logN)). > > > > > > I’ve done some experiments in this area and would like to share the > > > results and discuss ideas. > > > > Yea, this is a serious issue. > > > > > > 3) could possibly be addressed to a decent degree without changing the > > fundamental datastructure too much. There's some sizable and trivial > > wins by just changing vac_cmp_itemptr() to compare int64s and by using > > an open coded bsearch(). > > Just using itemptr_encode() makes array in test #1 go from 8s to 6.5s on my > machine. > > Another thing I just noticed is that you didn't include the build times for the > datastructures. They are lower than the lookups currently, but it does seem > like a relevant thing to measure as well. E.g. for #1 I see the following build > times > > array 24.943 ms > tbm 206.456 ms > intset 93.575 ms > vtbm 134.315 ms > rtbm 145.964 ms > > that's a significant range... Good point. I got similar results when measuring on my machine: array 57.987 ms tbm 297.720 ms intset 113.796 ms vtbm 165.268 ms rtbm 199.658 ms > > Randomizing the lookup order (using a random shuffle in > generate_index_tuples()) changes the benchmark results for #1 significantly: > > shuffled time unshuffled time > array 6551.726 ms 6478.554 ms > intset 67590.879 ms 10815.810 ms > rtbm 17992.487 ms 2518.492 ms > tbm 364.917 ms 360.128 ms > vtbm 12227.884 ms 1288.123 ms I believe that in your test, tbm_reaped() actually always returned true. That could explain tbm was very fast in both cases. Since TIDBitmap in the core doesn't support the existence check tbm_reaped() in bdbench.c always returns true. I added a patch in the repository to add existence check support to TIDBitmap, although it assumes bitmap never be lossy. That being said, I'm surprised that rtbm is slower than array even in the unshuffled case. I've also measured the shuffle cases and got different results. To be clear, I used prepare() SQL function to prepare both virtual dead tuples and index tuples, load them by attach_dead_tuples() SQL function, and executed bench() SQL function for each data structure. Here are the results: shuffled time unshuffled time array 88899.513 ms 12616.521 ms intset 73476.055 ms 10063.405 ms rtbm 22264.671 ms 2073.171 ms tbm 10285.092 ms 1417.312 ms vtbm 14488.581 ms 1240.666 ms > > FWIW, I get an assertion failure when using an assertion build: > > #2 0x0000561800ea02e0 in ExceptionalCondition (conditionName=0x7f9115a88e91 "found", errorType=0x7f9115a88d11 "FailedAssertion", > fileName=0x7f9115a88e8a "rtbm.c", lineNumber=242) at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69 > #3 0x00007f9115a87645 in rtbm_add_tuples (rtbm=0x561806293280, blkno=0, offnums=0x7fffdccabb00, nitems=10) at rtbm.c:242 > #4 0x00007f9115a8363d in load_rtbm (rtbm=0x561806293280, itemptrs=0x7f908a203050, nitems=10000000) at bdbench.c:618 > #5 0x00007f9115a834b9 in rtbm_attach (lvtt=0x7f9115a8c300 <LVTestSubjects+352>, nitems=10000000, minblk=2139062143, maxblk=2139062143,maxoff=32639) > at bdbench.c:587 > #6 0x00007f9115a83837 in attach (lvtt=0x7f9115a8c300 <LVTestSubjects+352>, nitems=10000000, minblk=2139062143, maxblk=2139062143,maxoff=32639) > at bdbench.c:658 > #7 0x00007f9115a84190 in attach_dead_tuples (fcinfo=0x56180322d690) at bdbench.c:873 > > I assume you just inverted the Assert(found) assertion? Right. Fixed it. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: