Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6 |
Date | |
Msg-id | 22317.1556206341@sss.pgh.pa.us Whole thread Raw |
In response to | Re: REINDEX INDEX results in a crash for an index of pg_class since9.6 (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Re: REINDEX INDEX results in a crash for an index of pg_class since9.6 |
List | pgsql-hackers |
Michael Paquier <michael@paquier.xyz> writes: > On Tue, Apr 23, 2019 at 08:03:37PM -0400, Tom Lane wrote: >> Oh! One gets you ten it "works" as long as the pg_class update is a >> HOT update, so that we don't actually end up touching the indexes. > I have been able to spend a bit more time testing and looking at the > root of the problem, and I have found two things: > 1) The problem is reproducible with REL9_5_STABLE. Actually, as far as I can tell, this has been broken since day 1. I can reproduce the assertion failure back to 9.1, and I think the only reason it doesn't happen in older branches is that they lack the ReindexIsProcessingIndex() check in RELATION_CHECKS :-(. What you have to do to get it to crash is to ensure that RelationSetNewRelfilenode's update of pg_class will be a non-HOT update. You can try to set that up with "vacuum full pg_class" but it turns out that that tends to leave the pg_class entries for pg_class's indexes in the last page of the relation, which is usually not totally full, so that a HOT update works and the bug doesn't manifest. A recipe like the following breaks every branch, by ensuring that the page containing pg_class_relname_nsp_index's entry is full: regression=# vacuum full pg_class; VACUUM regression=# do $$ begin for i in 100 .. 150 loop execute 'create table dummy'||i||'(f1 int)'; end loop; end $$; DO regression=# reindex index pg_class_relname_nsp_index; psql: server closed the connection unexpectedly As for an actual fix, I tried just moving reindex_index's SetReindexProcessing call from where it is down to after RelationSetNewRelfilenode, but that isn't sufficient: regression=# reindex index pg_class_relname_nsp_index; psql: ERROR: could not read block 3 in file "base/16384/41119": read only 0 of 8192 bytes #0 errfinish (dummy=0) at elog.c:411 #1 0x00000000007a9453 in mdread (reln=<value optimized out>, forknum=<value optimized out>, blocknum=<value optimized out>, buffer=0x7f608e6a7d00 "") at md.c:633 #2 0x000000000077a9af in ReadBuffer_common (smgr=<value optimized out>, relpersistence=112 'p', forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL, strategy=0x0, hit=0x7fff6a7452ef) at bufmgr.c:896 #3 0x000000000077b67e in ReadBufferExtended (reln=0x7f608db5d670, forkNum=MAIN_FORKNUM, blockNum=3, mode=<value optimized out>, strategy=<value optimized out>) at bufmgr.c:664 #4 0x00000000004ea95a in _bt_getbuf (rel=0x7f608db5d670, blkno=<value optimized out>, access=1) at nbtpage.c:805 #5 0x00000000004eb67a in _bt_getroot (rel=0x7f608db5d670, access=2) at nbtpage.c:323 #6 0x00000000004f2237 in _bt_search (rel=0x7f608db5d670, key=0x1d5a0c0, bufP=0x7fff6a7456a8, access=2, snapshot=0x0) at nbtsearch.c:99 #7 0x00000000004e8caf in _bt_doinsert (rel=0x7f608db5d670, itup=0x1c85e58, checkUnique=UNIQUE_CHECK_YES, heapRel=0x1ccb8d0) at nbtinsert.c:219 #8 0x00000000004efc17 in btinsert (rel=0x7f608db5d670, values=<value optimized out>, isnull=<value optimized out>, ht_ctid=0x1d12dc4, heapRel=0x1ccb8d0, checkUnique=UNIQUE_CHECK_YES, indexInfo=0x1c857f8) at nbtree.c:205 #9 0x000000000054c320 in CatalogIndexInsert (indstate=<value optimized out>, heapTuple=0x1d12dc0) at indexing.c:140 #10 0x000000000054c502 in CatalogTupleUpdate (heapRel=0x1ccb8d0, otid=0x1d12dc4, tup=0x1d12dc0) at indexing.c:215 #11 0x00000000008bcba7 in RelationSetNewRelfilenode (relation=0x7f608db5d670, persistence=112 'p') at relcache.c:3531 #12 0x0000000000548b16 in reindex_index (indexId=2663, skip_constraint_checks=false, persistence=112 'p', options=0) at index.c:3336 #13 0x00000000005ed129 in ReindexIndex (indexRelation=<value optimized out>, options=0, concurrent=false) at indexcmds.c:2304 #14 0x00000000007b5a45 in standard_ProcessUtility (pstmt=0x1c66d70, queryString=0x1c65f68 "reindex index pg_class_relname_nsp_index;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1c66e68, completionTag=0x7fff6a745e40 "") at utility.c:787 The problem here is that RelationSetNewRelfilenode is aggressively changing the index's relcache entry before it's written out the updated tuple, so that the tuple update tries to make an index entry in the new storage which isn't filled yet. I think we can fix it by *not* doing that, but leaving it to the relcache inval during the CommandCounterIncrement call to update the relcache entry. However, it looks like that will take some API refactoring, because the storage-creation functions expect to get the new relfilenode out of the relcache entry, and they'll have to be changed to not do it that way. I'll work on a patch ... regards, tom lane
pgsql-hackers by date: