Re: Optimising compactify_tuples() - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Optimising compactify_tuples() |
Date | |
Msg-id | CA+hUKG+SkE2vtS_owMNJyM0FbEHOdaUcOi4wHvkoq09HwGjBcg@mail.gmail.com Whole thread Raw |
In response to | Re: Optimising compactify_tuples() (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
On Fri, Sep 11, 2020 at 1:45 AM David Rowley <dgrowleyml@gmail.com> wrote: > On Thu, 10 Sep 2020 at 10:40, Thomas Munro <thomas.munro@gmail.com> wrote: > > I wonder if we could also identify a range at the high end that is > > already correctly sorted and maximally compacted so it doesn't even > > need to be copied out. > > I've experimented quite a bit with this patch today. I think I've > tested every idea you've mentioned here, so there's quite a lot of > information to share. > > I did write code to skip the copy to the separate buffer for tuples > that are already in the correct place and with a version of the patch > which keeps tuples in their traditional insert order (later lineitem's > tuple located earlier in the page) I see a generally a very large > number of tuples being skipped with this method. See attached > v4b_skipped_tuples.png. The vertical axis is the number of > compactify_tuple() calls during the benchmark where we were able to > skip that number of tuples. The average skipped tuples overall calls > during recovery was 81 tuples, so we get to skip about half the tuples > in the page doing this on this benchmark. Excellent. > > So one question is whether we want to do the order-reversing as part > > of this patch, or wait for a more joined-up project to make lots of > > code paths collude on making scan order match memory order > > (corellation = 1). Most or all of the gain from your patch would > > presumably still apply if did the exact opposite and forced offset > > order to match reverse-item ID order (correlation = -1), which also > > happens to be the initial state when you insert tuples today; you'd > > still tend towards a state that allows nice big memmov/memcpy calls > > during page compaction. IIUC currently we start with correlation -1 > > and then tend towards correlation = 0 after many random updates > > because we can't change the order, so it gets scrambled over time. > > I'm not sure what I think about that. > > So I did lots of benchmarking with both methods and my conclusion is > that I think we should stick to the traditional INSERT order with this > patch. But we should come back and revisit that more generally one > day. The main reason that I'm put off flipping the tuple order is that > it significantly reduces the number of times we hit the preordered > case. We go to all the trouble of reversing the order only to have it > broken again when we add 1 more tuple to the page. If we keep this > the traditional way, then it's much more likely that we'll maintain > that pre-order and hit the more optimal memmove code path. Right, that makes sense. Thanks for looking into it! > I've also attached another tiny patch that I think is pretty useful > separate from this. It basically changes: > > LOG: redo done at 0/D518FFD0 > > into: > > LOG: redo done at 0/D518FFD0 system usage: CPU: user: 58.93 s, > system: 0.74 s, elapsed: 62.31 s +1
pgsql-hackers by date: