Thread: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
My ongoing project to make VACUUM more predictable over time by proactive freezing [1] will increase the overall number of tuples frozen by VACUUM significantly (at least in larger tables). It's important that we avoid any new user-visible impact from extra freezing, though. I recently spent a lot of time on adding high-level techniques that aim to avoid extra freezing (e.g. by being lazy about freezing) when that makes sense. Low level techniques aimed at making the mechanical process of freezing cheaper might also help. (In any case it's well worth optimizing.) I'd like to talk about one such technique on this thread. The attached WIP patch reduces the size of xl_heap_freeze_page records by applying a simple deduplication process. This can be treated as independent work (I think it can, at least). The patch doesn't change anything about the conceptual model used by VACUUM's lazy_scan_prune function to build xl_heap_freeze_page records for a page, and yet still manages to make the WAL records for freeze records over 5x smaller in many important cases. They'll be ~4x-5x smaller with *most* workloads, even. Each individual tuple entry (each xl_heap_freeze_tuple) adds a full 12 bytes to the WAL record right now -- no matter what. So the existing approach is rather space inefficient by any standard (perhaps because it was developed under time pressure while fixing bugs in Postgres 9.3). More importantly, there is a lot of natural redundancy among each xl_heap_freeze_tuple entry -- each tuple's xl_heap_freeze_tuple details tend to match. We can usually get away with storing each unique combination of values from xl_heap_freeze_tuple once per xl_heap_freeze_page record, while storing associated page offset numbers in a separate area, grouped by their canonical freeze plan (which is a normalized version of the information currently stored in xl_heap_freeze_tuple). In practice most individual tuples that undergo any kind of freezing only need to have their xmin field frozen. And when xmax is affected at all, it'll usually just get set to InvalidTransactionId. And so the actual low-level processing steps for xmax have a high chance of being shared by other tuples on the page, even in ostensibly tricky cases. While there are quite a few paths that lead to VACUUM setting a tuple's xmax to InvalidTransactionId, they all end up with the same instructional state in the final xl_heap_freeze_tuple entry. Note that there is a small chance that the patch will be less space efficient by up to 2 bytes per tuple frozen per page in cases where we're allocating new Mulits during VACUUM. I think that this should be acceptable on its own -- even in rare bad cases we'll usually still come out ahead -- what are the chances that we won't make up the difference on the same page? Or at least within the same VACUUM? And that's before we talk about a future world in which freezing will batch tuples together at the page level (you don't have to bring the other VACUUM work into this discussion, I think, but it's not *completely* unrelated either). [1] https://postgr.es/m/CAH2-WzkFok_6EAHuK39GaW4FjEFQsY=3J0AAd6FXk93u-Xq3Fg@mail.gmail.com -- Peter Geoghegan
Attachment
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Masahiko Sawada
Date:
On Tue, Sep 13, 2022 at 6:02 AM Peter Geoghegan <pg@bowt.ie> wrote: > > My ongoing project to make VACUUM more predictable over time by > proactive freezing [1] will increase the overall number of tuples > frozen by VACUUM significantly (at least in larger tables). It's > important that we avoid any new user-visible impact from extra > freezing, though. I recently spent a lot of time on adding high-level > techniques that aim to avoid extra freezing (e.g. by being lazy about > freezing) when that makes sense. Low level techniques aimed at making > the mechanical process of freezing cheaper might also help. (In any > case it's well worth optimizing.) > > I'd like to talk about one such technique on this thread. The attached > WIP patch reduces the size of xl_heap_freeze_page records by applying > a simple deduplication process. This can be treated as independent > work (I think it can, at least). +1 > The patch doesn't change anything > about the conceptual model used by VACUUM's lazy_scan_prune function > to build xl_heap_freeze_page records for a page, and yet still manages > to make the WAL records for freeze records over 5x smaller in many > important cases. They'll be ~4x-5x smaller with *most* workloads, > even. After a quick benchmark, I've confirmed that the amount of WAL records for freezing 1 million tuples reduced to about one-fifth (1.2GB vs 250MB). Great. > > Each individual tuple entry (each xl_heap_freeze_tuple) adds a full 12 > bytes to the WAL record right now -- no matter what. So the existing > approach is rather space inefficient by any standard (perhaps because > it was developed under time pressure while fixing bugs in Postgres > 9.3). More importantly, there is a lot of natural redundancy among > each xl_heap_freeze_tuple entry -- each tuple's xl_heap_freeze_tuple > details tend to match. We can usually get away with storing each > unique combination of values from xl_heap_freeze_tuple once per > xl_heap_freeze_page record, while storing associated page offset > numbers in a separate area, grouped by their canonical freeze plan > (which is a normalized version of the information currently stored in > xl_heap_freeze_tuple). True. I've not looked at the patch in depth yet but I think we need regression tests for this. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Fri, Sep 16, 2022 at 12:30 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > After a quick benchmark, I've confirmed that the amount of WAL records > for freezing 1 million tuples reduced to about one-fifth (1.2GB vs > 250MB). Great. I think that the really interesting thing about the patch is how this changes the way we should think about freezing costs. It makes page-level batching seem very natural. The minimum possible size of a Heap2/FREEZE_PAGE record is 64 bytes, once alignment and so on is taken into account (without the patch). Once we already know that we have to freeze *some* tuples on a given heap page, it becomes very reasonable to freeze as many as possible, in batch, just because we know that it'll be much cheaper if we do it now versus doing it later on instead. Even if this extra freezing ends up "going to waste" due to updates against the same tuples that happen a little later on, the *added* cost of freezing "extra" tuples will have been so small that it's unlikely to matter. On the other hand, if it's not wasted we'll be *much* better off. It's very hard to predict the future, which is kinda what the current FreezeLimit-based approach to freezing does. It's actually quite easy to understand the cost of freezing now versus freezing later, though. At a high level, it makes sense for VACUUM to focus on freezing costs (including the risk that comes with *not* freezing with larger tables), and not worry so much about making accurate predictions. Making accurate predictions about freezing/workload characteristics is overrated. > True. I've not looked at the patch in depth yet but I think we need > regression tests for this. What did you have in mind? I think that the best way to test something like this is with wal_consistency_checking. That mostly works fine. However, note that heap_mask() won't always be able to preserve the state of a tuple's xmax when modified by freezing. We sometimes need "hint bits" to actually reliably be set in REDO, when replaying the records for freezing. At other times they really are just hints. We have to conservatively assume that it's just a hint when masking. Not sure if I can do much about that. Note that this optimization is one level below lazy_scan_prune(), and one level above heap_execute_freeze_tuple(). Neither function really changes at all. This seems useful because there are rare pg_upgrade-only paths where xvac fields need to be frozen. That's not tested either. -- Peter Geoghegan
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Mon, Sep 12, 2022 at 2:01 PM Peter Geoghegan <pg@bowt.ie> wrote: > I'd like to talk about one such technique on this thread. The attached > WIP patch reduces the size of xl_heap_freeze_page records by applying > a simple deduplication process. Attached is v2, which I'm just posting to keep CFTester happy. No real changes here. -- Peter Geoghegan
Attachment
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Nathan Bossart
Date:
On Tue, Sep 20, 2022 at 03:12:00PM -0700, Peter Geoghegan wrote: > On Mon, Sep 12, 2022 at 2:01 PM Peter Geoghegan <pg@bowt.ie> wrote: >> I'd like to talk about one such technique on this thread. The attached >> WIP patch reduces the size of xl_heap_freeze_page records by applying >> a simple deduplication process. > > Attached is v2, which I'm just posting to keep CFTester happy. No real > changes here. This idea seems promising. I see that you called this patch a work-in-progress, so I'm curious what else you are planning to do with it. As I'm reading this thread and the patch, I'm finding myself wondering if it's worth exploring using wal_compression for these records instead. I think you've essentially created an efficient compression mechanism for this one type of record, but I'm assuming that lz4/zstd would also yield some rather substantial improvements for this kind of data. Presumably a generic WAL record compression mechanism could be reused for other large records, too. That could be much easier than devising a deduplication strategy for every record type. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > This idea seems promising. I see that you called this patch a > work-in-progress, so I'm curious what else you are planning to do with it. I really just meant that the patch wasn't completely finished at that point. I hadn't yet convinced myself that I mostly had it right. I'm more confident now. > As I'm reading this thread and the patch, I'm finding myself wondering if > it's worth exploring using wal_compression for these records instead. The term deduplication works better than compression here because we're not actually decompressing anything in the REDO routine. Rather, the REDO routine processes each freeze plan by processing all affected tuples in order. To me this seems like the natural way to structure things -- the WAL records are much smaller, but in a way that's kind of incidental. The approach taken by the patch just seems like the natural approach, given the specifics of how freezing works at a high level. > I think you've essentially created an efficient compression mechanism for > this one type of record, but I'm assuming that lz4/zstd would also yield > some rather substantial improvements for this kind of data. I don't think of it that way. I've used the term "deduplication" to advertise the patch, but that's mostly just a description of what we're doing in the patch relative to what we do on HEAD today. There is nothing truly clever in the patch. We see a huge amount of redundancy among tuples from the same page in practically all cases, for reasons that have everything to do with what freezing is, and how it works at a high level. The thought process that led to my writing this patch was more high level than appearances suggest. (I often write patches that combine high level and low level insights in some way or other, actually.) Theoretically there might not be very much redundancy within each xl_heap_freeze_page record, with the right workload, but in practice a decrease of 4x or more is all but guaranteed once you have more than a few tuples to freeze on each page. If there are other WAL records that are as space inefficient as xl_heap_freeze_page is, then I'd be surprised -- it is *unusually* space inefficient (like I said, I suspect that this may have something to do with the fact that it was originally designed under time pressure). So I don't expect that this patch tells us much about what we should do for any other WAL record. I certainly *hope* that it doesn't, at least. > Presumably a > generic WAL record compression mechanism could be reused for other large > records, too. That could be much easier than devising a deduplication > strategy for every record type. It's quite possible that that's a good idea, but that should probably work as an additive thing. That's something that I think of as a "clever technique", whereas I'm focussed on just not being naive in how we represent this one specific WAL record type. -- Peter Geoghegan
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Presumably a > > generic WAL record compression mechanism could be reused for other large > > records, too. That could be much easier than devising a deduplication > > strategy for every record type. > > It's quite possible that that's a good idea, but that should probably > work as an additive thing. That's something that I think of as a > "clever technique", whereas I'm focussed on just not being naive in > how we represent this one specific WAL record type. BTW, if you wanted to pursue something like this, that would work with many different types of WAL record, ISTM that a "medium level" (not low level) approach might be the best place to start. In particular, the way that page offset numbers are represented in many WAL records is quite space inefficient. A domain-specific approach built with some understanding of how page offset numbers tend to look in practice seems promising. The representation of page offset numbers in PRUNE and VACUUM heapam WAL records (and in index WAL records) always just stores an array of 2 byte OffsetNumber elements. It probably wouldn't be all that difficult to come up with a simple scheme for compressing an array of OffsetNumbers in WAL records. It certainly doesn't seem like it would be all that difficult to get it down to 1 byte per offset number in most cases (even greater improvements seem doable). That could also be used for the xl_heap_freeze_page record type -- though only after this patch is committed. The patch makes the WAL record use a simple array of page offset numbers, just like in PRUNE/VACUUM records. That's another reason why the approach implemented by the patch seems like "the natural approach" to me. It's much closer to how heapam PRUNE records work (we have a variable number of arrays of page offset numbers in both cases). -- Peter Geoghegan
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Nathan Bossart
Date:
On Wed, Sep 21, 2022 at 02:41:28PM -0700, Peter Geoghegan wrote: > On Wed, Sep 21, 2022 at 2:11 PM Peter Geoghegan <pg@bowt.ie> wrote: >> > Presumably a >> > generic WAL record compression mechanism could be reused for other large >> > records, too. That could be much easier than devising a deduplication >> > strategy for every record type. >> >> It's quite possible that that's a good idea, but that should probably >> work as an additive thing. That's something that I think of as a >> "clever technique", whereas I'm focussed on just not being naive in >> how we represent this one specific WAL record type. > > BTW, if you wanted to pursue something like this, that would work with > many different types of WAL record, ISTM that a "medium level" (not > low level) approach might be the best place to start. In particular, > the way that page offset numbers are represented in many WAL records > is quite space inefficient. A domain-specific approach built with > some understanding of how page offset numbers tend to look in practice > seems promising. I wouldn't mind giving this a try. > The representation of page offset numbers in PRUNE and VACUUM heapam > WAL records (and in index WAL records) always just stores an array of > 2 byte OffsetNumber elements. It probably wouldn't be all that > difficult to come up with a simple scheme for compressing an array of > OffsetNumbers in WAL records. It certainly doesn't seem like it would > be all that difficult to get it down to 1 byte per offset number in > most cases (even greater improvements seem doable). > > That could also be used for the xl_heap_freeze_page record type -- > though only after this patch is committed. The patch makes the WAL > record use a simple array of page offset numbers, just like in > PRUNE/VACUUM records. That's another reason why the approach > implemented by the patch seems like "the natural approach" to me. It's > much closer to how heapam PRUNE records work (we have a variable > number of arrays of page offset numbers in both cases). Yeah, it seems likely that we could pack offsets in single bytes in many cases. A more sophisticated approach could even choose how many bits to use per offset based on the maximum in the array. Furthermore, we might be able to make use of SIMD instructions to mitigate any performance penalty. I'm tempted to start by just using single-byte offsets when possible since that should be relatively simple while still yielding a decent improvement for many workloads. What do you think? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Nathan Bossart
Date:
On Wed, Sep 21, 2022 at 02:11:36PM -0700, Peter Geoghegan wrote: > On Wed, Sep 21, 2022 at 1:14 PM Nathan Bossart <nathandbossart@gmail.com> wrote: >> Presumably a >> generic WAL record compression mechanism could be reused for other large >> records, too. That could be much easier than devising a deduplication >> strategy for every record type. > > It's quite possible that that's a good idea, but that should probably > work as an additive thing. That's something that I think of as a > "clever technique", whereas I'm focussed on just not being naive in > how we represent this one specific WAL record type. Got it. I think that's a fair point. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Wed, Sep 21, 2022 at 9:21 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > I wouldn't mind giving this a try. Definitely seems worth experimenting with. Many WAL records generated during VACUUM (and during opportunistic pruning/index tuple deletion) have offset number arrays that can be assumed to be both sorted and unique. My guess is that these WAL record types are the most amenable to compression at the WAL record level. > Yeah, it seems likely that we could pack offsets in single bytes in many > cases. A more sophisticated approach could even choose how many bits to > use per offset based on the maximum in the array. Furthermore, we might be > able to make use of SIMD instructions to mitigate any performance penalty. I guess I'd start with some kind of differential compression that relies on the arrays being both sorted and unique. While it might be important to be able to compose together multiple different techniques (something that is more than the sum of its parts can be very useful), it seems most important to quickly validate the basic idea first. One obvious thing that still seems worth pointing out: you may not need to decompress anything. All that you really need to do is to get the logic from some routine like PageIndexMultiDelete() to be executed by a REDO routine. Perhaps it'll make sense to come up with a representation that can just be passed to the routine directly. (I really don't know how likely this is, but it's something to consider.) > I'm tempted to start by just using single-byte offsets when possible since > that should be relatively simple while still yielding a decent improvement > for many workloads. What do you think? The really big wins for WAL size will come from applying high level insights about what is really needed, in all likelihood. The main overhead is very often generic WAL record header overhead. When it is then it'll be hard to really move the needle just by compressing the payload. To do that you'd need to find a way to use fewer WAL records to do the same amount of work. The thing that I think will really make the biggest difference is merging PRUNE records with FREEZE records (and maybe even make the merged record do the work of a VISIBLE record when that's possible). Just because now you have 1 WAL record instead of 2 (or even 3) WAL records. Obviously that's a complicated project, but it's another case where it feels like the more efficient approach might also be simpler. We often write a PRUNE record with only one or two items in the array, in which case it's practically free to do some freezing, at least in terms of WAL space overhead (as long as you can do it with the same WAL record). Plus freezing is already inescapably tied to pruning -- we always prune a page that we're going to try to freeze in VACUUM (we can't safely freeze dead tuples, so there is more or less a dependency already). Not that you shouldn't pursue compressing the payload from WAL records as a project -- maybe that'll work very well. I'm just pointing out that there is a bigger picture, that may or may not end up mattering here. For the patch on this thread there certainly is a bigger picture about costs over time. Something like that could be true for this other patch too. It's definitely worth considering the size of the WAL records when there are only one or two items, how common that may be in each individual case, etc. For example, FREEZE records have a minimum size of 64 bytes in practice, due to WAL record alignment overhead (the payload itself doesn't usually have to be aligned, but the WAL header still does). It may be possible to avoid going over the critical threshold that makes the WAL size one MAXALIGN() quantum larger in the event of having only a few tuples to freeze, a scenario where negative compression is likely. Negative compression is always a potential problem, but maybe you can deal with it very effectively by thinking about the problem holistically. If you're "wasting" space that was just going to be alignment padding anyway, does it really matter at all? (Maybe there is some reason to care, but I offhand I can't think of one.) -- Peter Geoghegan
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Tue, Sep 20, 2022 at 3:12 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached is v2, which I'm just posting to keep CFTester happy. No real > changes here. Attached is v3. I'd like to move forward with commit soon. I'll do so in the next few days, barring objections. v3 has vacuumlazy.c pass NewRelfrozenXid instead of FreezeLimit for the purposes of generating recovery conflicts during subsequent REDO of the resulting xl_heap_freeze_page WAL record. This more general approach is preparation for my patch to add page-level freezing [1]. It might theoretically lead to more recovery conflicts, but in practice the impact should be negligible. For one thing VACUUM must freeze *something* before any recovery conflict can happen during subsequent REDO on a replica in hot standby. It's far more likely that any disruptive recovery conflicts come from pruning. It also makes the cutoff_xid field from the xl_heap_freeze_page WAL record into a "standard latestRemovedXid format" field. In other words it backs up an XID passed by vacuumlazy.c caller during original execution (not in the REDO routine, as on HEAD). To make things clearer, the patch also renames the nearby xl_heap_visible.cutoff_xid field to xl_heap_visible.latestRemovedXid. Now there are no WAL records with a field called "cutoff_xid" (they're all called "latestRemovedXid" now). This matches PRUNE records, and B-Tree DELETE records. The overall picture is that all REDO routines (for both heapam and index AMs) now advertise that they have a field that they use to generate recovery conflicts that follows a standard format. All latestRemovedXid XIDs are applied in a standard way during REDO: by passing them to ResolveRecoveryConflictWithSnapshot(). Users can grep the output of tools like pg_waldump to find latestRemovedXid fields, without necessarily needing to give any thought to which kind of WAL records are involved, or even the rmgr. Presenting this information precisely and uniformity seems useful to me. (Perhaps we should have a truly generic name, which latestRemovedXid isn't, but that can be handled separately.) [1] https://commitfest.postgresql.org/39/3843/ -- Peter Geoghegan
Attachment
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Justin Pryzby
Date:
On Thu, Nov 10, 2022 at 04:48:17PM -0800, Peter Geoghegan wrote: > On Tue, Sep 20, 2022 at 3:12 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Attached is v2, which I'm just posting to keep CFTester happy. No real > > changes here. > > Attached is v3. I'd like to move forward with commit soon. I'll do so > in the next few days, barring objections. Note that this comment is dangling in your patch: +{ + Page page = BufferGetPage(buffer); + + /* nor when there are no tuples to freeze */ ... - /* Caller should not call me on a non-WAL-logged relation */ - Assert(RelationNeedsWAL(reln)); - /* nor when there are no tuples to freeze */ - Assert(ntuples > 0);
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Thu, Nov 10, 2022 at 7:00 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > Note that this comment is dangling in your patch: Attached is v4, which removes the old comments you pointed out were now out of place (they weren't adding much anyway). Also fixed bitrot against HEAD from today's visibility map commit from Jeff Davis. There is a more substantive change here, too. Like v3, v4 refactors the *mechanical* details of how the XID based cutoff is handed down. However, unlike v3, v4 goes back to using vacuumlazy.c's FreezeLimit as the starting point for generating a latestRemovedXid. It seemed better to deal with the recovery conflict issues created by my big page-level freezing/freezing strategy patch set in the patch set itself. Will commit this early next week barring any objections. Thanks -- Peter Geoghegan
Attachment
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Fri, Nov 11, 2022 at 10:38 AM Peter Geoghegan <pg@bowt.ie> wrote: > Attached is v4, which removes the old comments you pointed out were > now out of place (they weren't adding much anyway). Also fixed bitrot > against HEAD from today's visibility map commit from Jeff Davis. Pushed something like this earlier today, though without any changes to VISIBLE records. I just started a new thread to discuss standardizing the symbol name for recovery conflict XID cutoffs reported by tools like pg_waldump. Seemed better to deal with VISIBLE records in the scope of that new refactoring patch. Thanks -- Peter Geoghegan
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Andres Freund
Date:
Hi, On 2022-11-15 10:26:05 -0800, Peter Geoghegan wrote: > Pushed something like this earlier today, though without any changes > to VISIBLE records. While updating a patch to log various offsets in pg_waldump, I noticed a few minor issues in this patch: ISTM that some of the page level freezing functions are misnamed. In heapam.c the heap_xlog* routines are for replay, afaict. However heap_xlog_freeze_plan() is used to WAL log the freeze plan. heap_xlog_freeze_page() is used to replay that WAL record. Probably your brain is too used to nbtree/ :). I think s/heap_xlog_freeze/heap_log_freeze/ would mostly do the trick, except that heap_xlog_new_freeze_plan() doesn't quite fit in the scheme. The routines then also should be moved a bit up, because right now they're inbetween other routines doing WAL replay, adding to the confusion. The memcpy in heap_xlog_freeze_page() seems a tad odd. I assume that the alignment guarantees for xl_heap_freeze_plan are too weak? But imo it's failure prone (and I'm not sure strictly speaking legal from an undefined behaviour POV) to form a pointer to a misaligned array. Even if we then later just memcpy() from those pointers. Greetings, Andres Freund
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Mon, Jan 9, 2023 at 1:43 PM Andres Freund <andres@anarazel.de> wrote: > ISTM that some of the page level freezing functions are misnamed. In heapam.c > the heap_xlog* routines are for replay, afaict. However > heap_xlog_freeze_plan() is used to WAL log the freeze > plan. heap_xlog_freeze_page() is used to replay that WAL record. Probably your > brain is too used to nbtree/ :). Sometimes I wonder why other people stubbornly insist on not starting every function name with an underscore. :-) > I think s/heap_xlog_freeze/heap_log_freeze/ would mostly do the trick, except > that heap_xlog_new_freeze_plan() doesn't quite fit in the scheme. > The routines then also should be moved a bit up, because right now they're > inbetween other routines doing WAL replay, adding to the confusion. I believe that I used this scheme because of the fact that the new functions were conceptually related to REDO routines, even though they run during original execution. I'm quite happy to revise the code based on your suggestions, though. > The memcpy in heap_xlog_freeze_page() seems a tad odd. I assume that the > alignment guarantees for xl_heap_freeze_plan are too weak? They're not too weak. I'm not sure why the memcpy() was used. I see your point; it makes you wonder if it must be necessary, which then seems to call into question why it's okay to access the main array as an array. I can change this detail, too. I'll try to get back to it this week. -- Peter Geoghegan
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Mon, Jan 9, 2023 at 2:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > I'll try to get back to it this week. Attached patch fixes up these issues. It's almost totally mechanical. (Ended up using "git diff --color-moved=dimmed-zebra --color-moved-ws=ignore-all-space" with this, per your recent tip, which did help.) -- Peter Geoghegan
Attachment
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Andres Freund
Date:
Hi, On 2023-01-11 16:06:31 -0800, Peter Geoghegan wrote: > On Mon, Jan 9, 2023 at 2:18 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I'll try to get back to it this week. > > Attached patch fixes up these issues. It's almost totally mechanical. Looks better, thanks! > (Ended up using "git diff --color-moved=dimmed-zebra > --color-moved-ws=ignore-all-space" with this, per your recent tip, > which did help.) It's a really useful feature. I configured git to always use --color-moved=dimmed-zebra, but haven't quite dared to enable --color-moved-ws=ignore-all-space by default. Greetings, Andres Freund
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
From
Peter Geoghegan
Date:
On Wed, Jan 11, 2023 at 4:44 PM Andres Freund <andres@anarazel.de> wrote: > > Attached patch fixes up these issues. It's almost totally mechanical. > > Looks better, thanks! Pushed that just now. Thanks -- Peter Geoghegan