Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoATLuGOk7mEXXfXXqr7cq+1vWG4bh+YKrrFgukpbyjGeQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Fri, Jul 8, 2022 at 3:43 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > On Fri, Jul 8, 2022 at 9:10 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I guess that the tree height is affected by where garbages are, right? > > For example, even if all garbage in the table is concentrated in > > 0.5GB, if they exist between 2^17 and 2^18 block, we use the first > > byte of blockhi. If the table is larger than 128GB, the second byte of > > the blockhi could be used depending on where the garbage exists. > > Right. > > > Another variation of how to store TID would be that we use the block > > number as a key and store a bitmap of the offset as a value. We can > > use Bitmapset for example, > > I like the idea of using existing code to set/check a bitmap if it's > convenient. But (in case that was implied here) I'd really like to > stay away from variable-length values, which would require > "Single-value leaves" (slow). I also think it's fine to treat the > key/value as just bits, and not care where exactly they came from, as > we've been talking about. > > > or an approach like Roaring bitmap. > > This would require two new data structures instead of one. That > doesn't seem like a path to success. Agreed. > > > I think that at this stage it's better to define the design first. For > > example, key size and value size, and these sizes are fixed or can be > > set the arbitary size? > > I don't think we need to start over. Andres' prototype had certain > design decisions built in for the intended use case (although maybe > not clearly documented as such). Subsequent patches in this thread > substantially changed many design aspects. If there were any changes > that made things wonderful for vacuum, it wasn't explained, but Andres > did explain how some of these changes were not good for other uses. > Going to fixed 64-bit keys and values should still allow many future > applications, so let's do that if there's no reason not to. I thought Andres pointed out that given that we store BufferTag (or part of that) into the key, the fixed 64-bit keys might not be enough for buffer mapping use cases. If we want to use wider keys more than 64-bit, we would need to consider it. > > > For value size, if we support > > different value sizes specified by the user, we can either embed > > multiple values in the leaf node (called Multi-value leaves in ART > > paper) > > I don't think "Multi-value leaves" allow for variable-length values, > FWIW. And now I see I also used this term wrong in my earlier review > comment -- v3/4 don't actually use "multi-value leaves", but Andres' > does (going by the multiple leaf types). From the paper: "Multi-value > leaves: The values are stored in one of four different leaf node > types, which mirror the structure of inner nodes, but contain values > instead of pointers." Right, but sorry I meant the user specifies the arbitrary fixed-size value length on creation like we do in dynahash.c. > > (It seems v3/v4 could be called a variation of "Combined pointer/value > slots: If values fit into pointers, no separate node types are > necessary. Instead, each pointer storage location in an inner node can > either store a pointer or a value." But without the advantage of > variable length keys). Agreed. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
pgsql-hackers by date: