Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-WzkrDmfJiinW=T5Ve7=vfjvP0sgc7K1M5PaTFCOPH60CBg@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Victor Yegorov <vyegorov@gmail.com>) |
List | pgsql-hackers |
On Thu, Oct 29, 2020 at 3:05 PM Victor Yegorov <vyegorov@gmail.com> wrote: > And some more comments after another round of reading the patch. > > 1. Looks like UNIQUE_CHECK_NO_WITH_UNCHANGED is used for HOT updates, > should we use UNIQUE_CHECK_NO_HOT here? It is better understood like this. This would probably get me arrested by the tableam police, though. FWIW the way that that works is still kind of a hack. I think that I actually need a new boolean flag, rather than overloading the enum like this. > 2. You're modifying the table_tuple_update() function on 1311 line of include/access/tableam.h, > adding modified_attrs_hint. There's a large comment right before it describing parameters, > I think there should be a note about modified_attrs_hint parameter in that comment, 'cos > it is referenced from other places in tableam.h and also from backend/access/heap/heapam.c Okay, makes sense. > 3. Can you elaborate on the scoring model you're using? > Why do we expect a score of 25, what's the rationale behind this number? > And should it be #define-d ? See my remarks on this from the earlier e-mail. > 4. heap_compute_xid_horizon_for_tuples contains duplicate logic. Is it possible to avoid this? Maybe? I think that duplicating code is sometimes the lesser evil. Like in tuplesort.c, for example. I'm not sure if that's true here, but it certainly can be true. This is the kind of thing that I usually only make my mind up about at the last minute. It's a matter of taste. > 5. In this comment > > + * heap_index_batch_check() helper function. Sorts deltids array in the > + * order needed for useful processing. > > perhaps it is better to replace "useful" with more details? Or point to the place > where "useful processing" is described. Okay. > + else if (_bt_keep_natts_fast(rel, state->base, itup) > nkeyatts && > + _bt_dedup_save_htid(state, itup)) > + { > + > + } > > I would rather add a comment, explaining that the empty body of the clause is actually expected. Okay. Makes sense. > 7. In the _bt_dedup_delete_finish_pending() you're setting ispromising to false for both > posting and non-posting tuples. This contradicts comments before function. The idea is that we can have plain tuples (non-posting list tuples) that are non-promising when they're duplicates. Because why not? Somebody might have deleted them (rather than updating them). It is fine to have an open mind about this possibility despite the fact that it is close to zero (in the general case). Including these TIDs doesn't increase the amount of work we do in heapam. Even when we don't succeed in finding any of the non-dup TIDs as dead (which is very much the common case), telling heapam about their existence could help indirectly (which is somewhat more common). This factor alone could influence which heap pages heapam visits when there is no concentration of promising tuples on heap pages (since the total number of TIDs on each block is the tie-breaker condition when comparing heap blocks with an equal number of promising tuples during the block group sort in heapam.c). I believe that this approach tends to result in heapam going after older TIDs when it wouldn't otherwise, at least in some cases. You're right, though -- this is still unclear. Actually, I think that I should move the handling of promising/duplicate tuples into _bt_dedup_delete_finish_pending(), too (move it from _bt_dedup_delete_pass()). That would allow me to talk about all of the TIDs that get added to the deltids array (promising and non-promising) in one central function. I'll do it that way soon. -- Peter Geoghegan
pgsql-hackers by date: