Re: recovering from "found xmin ... from before relfrozenxid ..." - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: recovering from "found xmin ... from before relfrozenxid ..." |
Date | |
Msg-id | CA+fd4k7mgHGHa2UxPsAONR9WaMNJo-7ybB5iuc63J77nNzF=Mw@mail.gmail.com Whole thread Raw |
In response to | Re: recovering from "found xmin ... from before relfrozenxid ..." (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: recovering from "found xmin ... from before relfrozenxid ..."
|
List | pgsql-hackers |
On Wed, 12 Aug 2020 at 22:27, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > Thanks Robert for the review. Please find my comments inline below: > > On Fri, Aug 7, 2020 at 9:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Thu, Aug 6, 2020 at 9:23 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > Attached v4 patch fixes the latest comments from Robert and Masahiko-san. > > > > Compiler warning: > > > > heap_surgery.c:136:13: error: comparison of unsigned expression < 0 is > > always false [-Werror,-Wtautological-compare] > > if (blkno < 0 || blkno >= nblocks) > > ~~~~~ ^ ~ > > > > Fixed. > > > There's a certain inconsistency to these messages: > > > > rhaas=# create table foo (a int); > > CREATE TABLE > > rhaas=# insert into foo values (1); > > INSERT 0 1 > > rhaas=# select heap_force_kill('foo'::regclass, array['(0,2)'::tid]); > > NOTICE: skipping tid (0, 2) because it contains an invalid offset > > heap_force_kill > > ----------------- > > > > (1 row) > > > > rhaas=# select heap_force_kill('foo'::regclass, array['(1,0)'::tid]); > > ERROR: invalid item pointer > > LOCATION: tids_same_page_fetch_offnums, heap_surgery.c:347 > > rhaas=# select heap_force_kill('foo'::regclass, array['(1,1)'::tid]); > > ERROR: block number 1 is out of range for relation "foo" > > > > From a user perspective it seems like I've made three very similar > > mistakes: in the first case, the offset is too high, in the second > > case it's too low, and in the third case the block number is out of > > range. But in one case I get a NOTICE and in the other two cases I get > > an ERROR. In one case I get the relation name and in the other two > > cases I don't. The two complaints about an invalid offset are phrased > > completely differently from each other. For example, suppose you do > > this: > > > > ERROR: tid (%u, %u) is invalid for relation "%s" because the block > > number is out of range (%u..%u) > > ERROR: tid (%u, %u) is invalid for relation "%s" because the item > > number is out of range for this block (%u..%u) > > ERROR: tid (%u, %u) is invalid for relation "%s" because the item is unused > > ERROR: tid (%u, %u) is invalid for relation "%s" because the item is dead > > > > Corrected. > > > I think I misled you when I said to use pg_class_aclcheck. I think it > > should actually be pg_class_ownercheck. > > > > okay, I've changed it to pg_class_ownercheck. > > > I think the relkind sanity check should permit RELKIND_MATVIEW also. > > > > Yeah, actually we should allow MATVIEW, don't know why I thought of > blocking it earlier. > > > It's unclear to me why the freeze logic here shouldn't do this part > > what heap_prepare_freeze_tuple() does when freezing xmax: > > > > frz->t_infomask2 &= ~HEAP_HOT_UPDATED; > > frz->t_infomask2 &= ~HEAP_KEYS_UPDATED; > > > > Yeah, we should have these changes when freezing the xmax. > > > Likewise, why should we not freeze or invalidate xvac in the case > > where tuple->t_infomask & HEAP_MOVED, as heap_prepare_freeze_tuple() > > would do? > > > > Again, we should have this as well. > > Apart from above, this time I've also added the documentation on > pg_surgery module and added a few more test-cases. > > Attached patch with above changes. > Thank you for updating the patch! Here are my comments on v5 patch: --- a/contrib/Makefile +++ b/contrib/Makefile @@ -35,6 +35,7 @@ SUBDIRS = \ pg_standby \ pg_stat_statements \ pg_trgm \ + pg_surgery \ pgcrypto \ I guess we use alphabetical order here. So pg_surgery should be placed before pg_trgm. --- + if (heap_force_opt == HEAP_FORCE_KILL) + ItemIdSetDead(itemid); I think that if the page is an all-visible page, we should clear an all-visible bit on the visibility map corresponding to the page and PD_ALL_VISIBLE on the page header. Otherwise, index only scan would return the wrong results. --- + /* + * We do not mark the buffer dirty or do WAL logging for unmodifed + * pages. + */ + if (!did_modify_page) + goto skip_wal; + + /* Mark buffer dirty before we write WAL. */ + MarkBufferDirty(buf); + + /* XLOG stuff */ + if (RelationNeedsWAL(rel)) + log_newpage_buffer(buf, true); + +skip_wal: + END_CRIT_SECTION(); + s/unmodifed/unmodified/ Do we really need to use goto? I think we can modify it like follows: if (did_modity_page) { /* Mark buffer dirty before we write WAL. */ MarkBufferDirty(buf); /* XLOG stuff */ if (RelationNeedsWAL(rel)) log_newpage_buffer(buf, true); } END_CRIT_SECTION(); --- pg_force_freeze() can revival a tuple that is already deleted but not vacuumed yet. Therefore, the user might need to reindex indexes after using that function. For instance, with the following script, the last two queries: index scan and seq scan, will return different results. set enable_seqscan to off; set enable_bitmapscan to off; set enable_indexonlyscan to off; create table tbl (a int primary key); insert into tbl values (1); update tbl set a = a + 100 where a = 1; explain analyze select * from tbl where a < 200; -- revive deleted tuple on heap select heap_force_freeze('tbl', array['(0,1)'::tid]); -- index scan returns 2 tuples explain analyze select * from tbl where a < 200; -- seq scan returns 1 tuple set enable_seqscan to on; explain analyze select * from tbl; Also, if a tuple updated and moved to another partition is revived by heap_force_freeze(), its ctid still has special values: MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't see a problem yet caused by a visible tuple having the special ctid value, but it might be worth considering either to reset ctid value as well or to not freezing already-deleted tuple. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: