Re: [PoC] Improve dead tuple storage for lazy vacuum - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] Improve dead tuple storage for lazy vacuum |
Date | |
Msg-id | CAD21AoA_v8UN08dm==jRRD7cN3ObuowxUkXUbkN1eOK6486yMw@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] Improve dead tuple storage for lazy vacuum (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: [PoC] Improve dead tuple storage for lazy vacuum
|
List | pgsql-hackers |
On Sun, Mar 17, 2024 at 11:46 AM John Naylor <johncnaylorls@gmail.com> wrote: > > On Fri, Mar 15, 2024 at 9:17 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Fri, Mar 15, 2024 at 4:36 PM John Naylor <johncnaylorls@gmail.com> wrote: > > > > > > On Thu, Mar 14, 2024 at 7:04 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > Given TidStoreSetBlockOffsets() is designed to always set (i.e. > > > > overwrite) the value, I think we should not expect that found is > > > > always false. > > > > > > I find that a puzzling statement, since 1) it was designed for > > > insert-only workloads, not actual overwrite IIRC and 2) the tests will > > > now fail if the same block is set twice, since we just switched the > > > tests to use a remnant of vacuum's old array. Having said that, I > > > don't object to removing artificial barriers to using it for purposes > > > not yet imagined, as long as test_tidstore.sql warns against that. > > > > I think that if it supports only insert-only workload and expects the > > same block is set only once, it should raise an error rather than an > > assertion. It's odd to me that the function fails only with an > > assertion build assertions even though it actually works fine even in > > that case. > > After thinking some more, I think you're right -- it's too > heavy-handed to throw an error/assert and a public function shouldn't > make assumptions about the caller. It's probably just a matter of > documenting the function (and it's lack of generality), and the tests > (which are based on the thing we're replacing). Removed 'found' in 0003 patch. > > > As for test_tidstore you're right that the test code doesn't handle > > the case where setting the same block twice. I think that there is no > > problem in the fixed-TIDs tests, but we would need something for > > random-TIDs tests so that we don't set the same block twice. I guess > > it could be trivial since we can use SQL queries to generate TIDs. I'm > > not sure how the random-TIDs tests would be like, but I think we can > > use SELECT DISTINCT to eliminate the duplicates of block numbers to > > use. > > Also, I don't think we need random blocks, since the radix tree tests > excercise that heavily already. > > Random offsets is what I was thinking of (if made distinct and > ordered), but even there the code is fairy trivial, so I don't have a > strong feeling about it. Agreed. > > > > Given the above two things, I think this function's comment needs > > > stronger language about its limitations. Perhaps even mention that > > > it's intended for, and optimized for, vacuum. You and I have long > > > known that tidstore would need a separate, more complex, function to > > > add or remove individual tids from existing entries, but it might be > > > good to have that documented. > > > > Agreed. > > How about this: > > /* > - * Set the given TIDs on the blkno to TidStore. > + * Create or replace an entry for the given block and array of offsets > * > - * NB: the offset numbers in offsets must be sorted in ascending order. > + * NB: This function is designed and optimized for vacuum's heap scanning > + * phase, so has some limitations: > + * - The offset numbers in "offsets" must be sorted in ascending order. > + * - If the block number already exists, the entry will be replaced -- > + * there is no way to add or remove offsets from an entry. > */ > void > TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets, Looks good. > > I think we can stop including the debug-tid-store patch for CI now. > That would allow getting rid of some unnecessary variables. Agreed. > > + * Prepare to iterate through a TidStore. Since the radix tree is locked during > + * the iteration, TidStoreEndIterate() needs to be called when finished. > > + * Concurrent updates during the iteration will be blocked when inserting a > + * key-value to the radix tree. > > This is outdated. Locking is optional. The remaining real reason now > is that TidStoreEndIterate needs to free memory. We probably need to > say something about locking, too, but not this. Fixed. > > + * Scan the TidStore and return a pointer to TidStoreIterResult that has TIDs > + * in one block. We return the block numbers in ascending order and the offset > + * numbers in each result is also sorted in ascending order. > + */ > +TidStoreIterResult * > +TidStoreIterateNext(TidStoreIter *iter) > > The wording is a bit awkward. Fixed. > > +/* > + * Finish an iteration over TidStore. This needs to be called after finishing > + * or when existing an iteration. > + */ > > s/existing/exiting/ ? > > It seems to say we need to finish after finishing. Maybe more precise wording. Fixed. > > +/* Extract TIDs from the given key-value pair */ > +static void > +tidstore_iter_extract_tids(TidStoreIter *iter, uint64 key, > BlocktableEntry *page) > > This is a leftover from the old encoding scheme. This should really > take a "BlockNumber blockno" not a "key", and the only call site > should probably cast the uint64 to BlockNumber. Fixed. > > + * tidstore.h > + * Tid storage. > + * > + * > + * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group > > Update year. Updated. > > +typedef struct BlocktableEntry > +{ > + uint16 nwords; > + bitmapword words[FLEXIBLE_ARRAY_MEMBER]; > +} BlocktableEntry; > > In my WIP for runtime-embeddable offsets, nwords needs to be one byte. > That doesn't have any real-world affect on the largest offset > encountered, and only in 32-bit builds with 32kB block size would the > theoretical max change at all. To be precise, we could use in the > MaxBlocktableEntrySize calculation: > > Min(MaxOffsetNumber, BITS_PER_BITMAPWORD * PG_INT8_MAX - 1); I don't get this expression. Making the nwords one byte works well? With 8kB blocks, MaxOffsetNumber is 2048 and it requires 256 bitmapword entries on 64-bit OS or 512 bitmapword entries on 32-bit OS, respectively. One byte nwrods variable seems not to be sufficient for both cases. Also, where does the expression "BITS_PER_BITMAPWORD * PG_INT8_MAX - 1" come from? > > Tests: I never got rid of maxblkno and maxoffset, in case you wanted > to do that. And as discussed above, maybe > > -- Note: The test code use an array of TIDs for verification similar > -- to vacuum's dead item array pre-PG17. To avoid adding duplicates, > -- each call to do_set_block_offsets() should use different block > -- numbers. I've added this comment on top of the .sql file. I've attached the new patch sets. The summary of updates is: - Squashed all updates of v72 - 0004 and 0005 are updates for test_tidstore.sql. Particularly the 0005 patch adds randomized TID tests. - 0006 addresses review comments above. - 0007 and 0008 patches are pgindent stuff. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: