Re: decoupling table and index vacuum - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: decoupling table and index vacuum |
Date | |
Msg-id | CA+TgmoYf4PZ-zu4jYmgbQcSj5HRjs5briGuCBSWbAv9-woXySA@mail.gmail.com Whole thread Raw |
In response to | Re: decoupling table and index vacuum (Andres Freund <andres@anarazel.de>) |
Responses |
Re: decoupling table and index vacuum
Re: decoupling table and index vacuum |
List | pgsql-hackers |
On Wed, Apr 21, 2021 at 5:38 PM Andres Freund <andres@anarazel.de> wrote: > I'm not sure that's the only way to deal with this. While some form of > generic "conveyor belt" infrastructure would be a useful building block, > and it'd be sensible to use it here if it existed, it seems feasible to > dead tids in a different way here. You could e.g. have per-heap-vacuum > files with a header containing LSNs that indicate the age of the > contents. That's true, but have some reservations about being overly reliant on the filesystem to provide structure here. There are good reasons to be worried about bloating the number of files in the data directory. Hmm, but maybe we could mitigate that. First, we could skip this for small relations. If you can vacuum the table and all of its indexes using the naive algorithm in <10 seconds, you probably shouldn't do anything fancy. That would *greatly* reduce the number of additional files generated. Second, we could forget about treating them as separate relation forks and make them some other kind of thing entirely, in a separate directory, especially if we adopted Sawada-san's proposal to skip WAL logging. I don't know if that proposal is actually a good idea, because it effectively adds a performance penalty when you crash or fail over, and that sort of thing can be an unpleasant surprise. But it's something to think about. > > This scheme adds a lot of complexity, which is a concern, but it seems > > It's not completely independent: if you need to set some dead TIDs in > > the table to unused, you may have to force index vacuuming that isn't > > needed for bloat control. However, you only need to force it for > > indexes that haven't been vacuumed recently enough for some other > > reason, rather than every index. > > Hm - how would we know how recently that TID has been marked dead? We > don't even have xids for dead ItemIds... Maybe you're intending to > answer that in your next paragraph, but it's not obvious to me that'd be > sufficient... You wouldn't know anything about when things were added in terms of wall clock time, but the idea was that TIDs get added in order and stay in that order. So you know which ones were added first. Imagine a conceptually infinite array of TIDs: (17,5) (332,6) (5, 1) (2153,92) .... Each index keeps a pointer into this array. Initially it points to the start of the array, here (17,5). If an index vacuum starts after (17,5) and (332,6) have been added to the array but before (5,1) is added, then upon completion it updates its pointer to point to (5,1). If every index is pointing to (5,1) or some later element, then you know that (17,5) and (332,6) can be set LP_UNUSED. If not, and you want to get to a state where you CAN set (17,5) and (332,6) to LP_UNUSED, you just need to force index vac on indexes that are pointing to something prior to (5,1) -- and keep forcing it until those pointers reach (5,1) or later. > One thing that you didn't mention so far is that this'd allow us to add > dead TIDs to the "dead tid" file outside of vacuum too. In some > workloads most of the dead tuple removal happens as part of on-access > HOT pruning. While some indexes are likely to see that via the > killtuples logic, others may not. Being able to have more aggressive > index vacuum for the one or two bloated index, without needing to rescan > the heap, seems like it'd be a significant improvement. Oh, that's a very interesting idea. It does impose some additional requirements on any such system, though, because it means you have to be able to efficiently add single TIDs. For example, you mention a per-heap-VACUUM file above, but you can't get away with creating a new file per HOT prune no matter how you arrange things at the FS level. Actually, though, I think the big problem here is deduplication. A full-blown VACUUM can perhaps read all the already-known-to-be-dead TIDs into some kind of data structure and avoid re-adding them, but that's impractical for a HOT prune. > Have you thought about how we would do the scheduling of vacuums for the > different indexes? We don't really have useful stats for the number of > dead index entries to be expected in an index. It'd not be hard to track > how many entries are removed in an index via killtuples, but > e.g. estimating how many dead entries there are in a partial index seems > quite hard (at least without introducing significant overhead). No, I don't have any good ideas about that, really. Partial indexes seem like a hard problem, and so do GIN indexes or other kinds of things where you may have multiple index entries per heap tuple. We might have to accept some known-to-be-wrong approximations in such cases. > > One rather serious objection to this whole line of attack is that we'd > > ideally like VACUUM to reclaim disk space without using any more, in > > case the motivation for running VACUUM in the first place. > > I suspect we'd need a global limit of space used for this data. If above > that limit we'd switch to immediately performing the work required to > remove some of that space. I think that's entirely the wrong approach. On the one hand, it doesn't prevent you from running out of disk space during emergency maintenance, because the disk overall can be full even though you're below your quota of space for this particular purpose. On the other hand, it does subject you to random breakage when your database gets big enough that the critical information can't be stored within the configured quota. I think we'd end up with pathological cases very much like what used to happen with the fixed-size free space map. What happened there was that your database got big enough that you couldn't track all the free space any more and it just started bloating out the wazoo. What would happen here is that you'd silently lose the well-optimized version of VACUUM when your database gets too big. That does not seem like something anybody wants. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: