Thread: BUG #16162: create index using gist_trgm_ops leads to panic
The following bug has been logged on the website: Bug reference: 16162 Logged by: Andreas Kunert Email address: kunert@cms.hu-berlin.de PostgreSQL version: 12.1 Operating system: Ubuntu 16.04 Description: Hello, while testing an upcoming migration of our database server to a new hardware including an update to a new postgres version I observed the following behavior on the new server running a self-compiled Postgres 12.1 under Ubuntu 16.04: When importing a complete pg_dump, about one in three times a panic occurs while trying to create a pg_trgm gist index and then the cluster restarts. The according excerpt from the log: 2019-12-12 10:32:03.886 CET [36999] LOG: statement: CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops); 2019-12-12 10:32:04.370 CET [36999] PANIC: invalid index offnum: 4 2019-12-12 10:32:04.370 CET [36999] STATEMENT: CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops); 2019-12-12 10:32:04.914 CET [31336] LOG: server process (PID 36999) was terminated by signal 6: Aborted 2019-12-12 10:32:04.914 CET [31336] DETAIL: Failed process was running: CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops); 2019-12-12 10:32:04.914 CET [31336] LOG: terminating any other active server processes I could successfully recreate the behaviour on another server (running a default-configured packaged version of postgres 12 for ubuntu 16.04 on a german locale) - the log look like this: 2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index PANIK: invalid index offnum: 5 2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index ANWEISUNG: CREATE INDEX gist_idx ON public.mytable USING gist(am_description public.gist_trgm_ops); 2019-12-11 17:15:07.501 CET [7436] LOG: Serverprozess (PID 14959) wurde von Signal 6 beendet: Abgebrochen 2019-12-11 17:15:07.501 CET [7436] DETAIL: Der fehlgeschlagene Prozess führte aus: CREATE INDEX gist_idx ON public.mytable USING gist(am_description public.gist_trgm_ops); 2019-12-11 17:15:07.501 CET [7436] LOG: aktive Serverprozesse werden abgebrochen 2019-12-11 17:15:07.515 CET [7436] LOG: alle Serverprozesse beendet; initialisiere neu Unfortunately the database contains sensitive data but after a couple of days and endless tests I managed to create a similarly built database containing only random data which provokes the same behaviour (the log messages above came from my test database). It consists of only on table with two columns (serial, text) containing 200000 lines filled with random strings of lengths varying from 1 to 1000. A perl script creating the database, creating and filling the table and finally iteratively creating and dropping the index is at the end of this mail. Since the error only occurs once in a while it can take a couple of iterations (in my tests usually at least 7 attempts). I hope I did not make a fundamental mistake and moreover I hope you can reproduce the behavior. Regards, Andreas ---------------------------------------- Database creation ---------------------------------------- CREATE ROLE "test_gist_index" LOGIN PASSWORD 'XXXXX' NOINHERIT; CREATE DATABASE "test_gist_index" OWNER "test_gist_index"; \c test_gist_index CREATE EXTENSION IF NOT EXISTS pg_trgm; --------------------------------------------------------------- Perl script creating and filling the table and running the test --------------------------------------------------------------- #!/usr/bin/perl use strict; use warnings; use DBI; STDOUT->autoflush(1); # generate alphabet similar to the one in the # original error-prone database my @alphabetstring; for (my $j=0; $j<3; $j++) { for (my $i=65; $i<65+26; $i++) { push(@alphabetstring, chr($i)); # A-Z push(@alphabetstring, chr($i+97-65)); # a-z } for (my $i=0; $i<10; $i++) { push(@alphabetstring, chr($i+48)); # 0-9 push(@alphabetstring, ' '); } } push(@alphabetstring, qw(ä ö ü Ä Ö Ü , ; : [ ] \( \) / - =)); # database connection my $dbh = DBI->connect("DBI:Pg:dbname=test_gist_index;host=XXXXX", "test_gist_index", "XXXXX", {'RaiseError' => 1}); $dbh->{pg_enable_utf8} = 0; # (re)create test table my $sth1 = $dbh->prepare("DROP TABLE IF EXISTS mytable; CREATE TABLE mytable (am_id serial, am_description text); "); $sth1->execute(); # fill table with random strings of arbitrary length my $sth2 = $dbh->prepare("INSERT INTO mytable (am_description) VALUES (?)"); for (my $i=1; $i<=200000; $i++) { my $stringlength = int(rand(1000)); my $mystring = ''; for (my $j=0; $j<=$stringlength; $j++) { my $randomchar = @alphabetstring[int(rand(scalar @alphabetstring))]; $mystring .= $randomchar; } $sth2->execute($mystring); if ($i % 1000 == 0) { print "."; } if ($i % 10000 == 0) { print " - $i\n"; } } # test gist index creation for (my $i=1; $i<=20; $i++) { print "\nbuilding gist $i\n"; my $sth3 = $dbh->prepare("DROP INDEX IF EXISTS gist_idx;"); print $sth3->execute(); my $sth4 = $dbh->prepare("CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops);"); print $sth4->execute(); } print "\n"; $dbh->disconnect();
On Thu, Dec 12, 2019 at 10:22:21AM +0000, PG Bug reporting form wrote: >The following bug has been logged on the website: > >Bug reference: 16162 >Logged by: Andreas Kunert >Email address: kunert@cms.hu-berlin.de >PostgreSQL version: 12.1 >Operating system: Ubuntu 16.04 >Description: > >Hello, > >while testing an upcoming migration of our database server to a new hardware >including an update to a new postgres version I observed the following >behavior on the new server running a self-compiled Postgres 12.1 under >Ubuntu 16.04: > >When importing a complete pg_dump, about one in three times a panic occurs >while trying to create a pg_trgm gist index and then the cluster restarts. >The according excerpt from the log: > >2019-12-12 10:32:03.886 CET [36999] LOG: statement: CREATE INDEX gist_idx >ON public.mytable USING gist (am_description public.gist_trgm_ops); >2019-12-12 10:32:04.370 CET [36999] PANIC: invalid index offnum: 4 >2019-12-12 10:32:04.370 CET [36999] STATEMENT: CREATE INDEX gist_idx ON >public.mytable USING gist (am_description public.gist_trgm_ops); >2019-12-12 10:32:04.914 CET [31336] LOG: server process (PID 36999) was >terminated by signal 6: Aborted >2019-12-12 10:32:04.914 CET [31336] DETAIL: Failed process was running: >CREATE INDEX gist_idx ON public.mytable USING gist (am_description >public.gist_trgm_ops); >2019-12-12 10:32:04.914 CET [31336] LOG: terminating any other active >server processes > >I could successfully recreate the behaviour on another server (running a >default-configured packaged version of postgres 12 for ubuntu 16.04 on a >german locale) - the log look like this: > >2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index PANIK: >invalid index offnum: 5 >2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index >ANWEISUNG: CREATE INDEX gist_idx ON public.mytable USING >gist(am_description public.gist_trgm_ops); >2019-12-11 17:15:07.501 CET [7436] LOG: Serverprozess (PID 14959) wurde von >Signal 6 beendet: Abgebrochen >2019-12-11 17:15:07.501 CET [7436] DETAIL: Der fehlgeschlagene Prozess >führte aus: CREATE INDEX gist_idx ON public.mytable USING >gist(am_description public.gist_trgm_ops); >2019-12-11 17:15:07.501 CET [7436] LOG: aktive Serverprozesse werden >abgebrochen >2019-12-11 17:15:07.515 CET [7436] LOG: alle Serverprozesse beendet; >initialisiere neu > > >Unfortunately the database contains sensitive data but after a couple of >days and endless tests I managed to create a similarly built database >containing only random data which provokes the same behaviour (the log >messages above came from my test database). It consists of only on table >with two columns (serial, text) containing 200000 lines filled with random >strings of lengths varying from 1 to 1000. > >A perl script creating the database, creating and filling the table and >finally iteratively creating and dropping the index is at the end of this >mail. > Thanks for the report. The reproducer perl script is very useful, I've managed to reproduce the crash pretty fast. With asserts enabled, the backtrace from master looks like this: Core was generated by `postgres: user test ::1(44878) CREATE INDEX '. Program terminated with signal SIGABRT, Aborted. #0 0x00007dc91a786e35 in raise () from /lib64/libc.so.6 Missing separate debuginfos, use: dnf debuginfo-install glibc-2.29-27.fc30.x86_64 libgcc-9.2.1-1.fc30.x86_64 (gdb) bt #0 0x00007dc91a786e35 in raise () from /lib64/libc.so.6 #1 0x00007dc91a771895 in abort () from /lib64/libc.so.6 #2 0x0000000000a8b375 in ExceptionalCondition (conditionName=0xb08e78 "ItemIdHasStorage(((ItemId) (&((PageHeader) (page))->pd_linp[(todelete)- 1])))", errorType=0xb08d2a "FailedAssertion", fileName=0xb08e21 "gistutil.c", lineNumber=69)at assert.c:67 #3 0x00000000004c6685 in gistnospace (page=0x7dc919d73d00 "", itvec=0x7ffcbf3cacb0, len=2, todelete=8, freespace=819) atgistutil.c:69 #4 0x00000000004b64b5 in gistplacetopage (rel=0x7dc911705a70, freespace=819, giststate=0x1dc9400, buffer=16377, itup=0x7ffcbf3cacb0,ntup=2, oldoffnum=8, newblkno=0x0, leftchildbuf=16299, splitinfo=0x7ffcbf3cac40, markfollowright=true,heapRel=0x7dc9116f6970, is_build=true) at gist.c:256 #5 0x00000000004b9868 in gistinserttuples (state=0x7ffcbf3cadd0, stack=0x7ffcbf3cae00, giststate=0x1dc9400, tuples=0x7ffcbf3cacb0,ntup=2, oldoffnum=8, leftchild=16299, rightchild=15979, unlockbuf=true, unlockleftchild=false) at gist.c:1266 #6 0x00000000004b9af0 in gistfinishsplit (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400, splitinfo=0x1df4568,unlockbuf=false) at gist.c:1363 #7 0x00000000004b9922 in gistinserttuples (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400, tuples=0x7ffcbf3cad70,ntup=1, oldoffnum=0, leftchild=0, rightchild=0, unlockbuf=false, unlockleftchild=false) at gist.c:1293 #8 0x00000000004b97b3 in gistinserttuple (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400, tuple=0x1ded6f8, oldoffnum=0)at gist.c:1219 #9 0x00000000004b8531 in gistdoinsert (r=0x7dc911705a70, itup=0x1ded6f8, freespace=819, giststate=0x1dc9400, heapRel=0x7dc9116f6970,is_build=true) at gist.c:875 #10 0x00000000004bb3a9 in gistBuildCallback (index=0x7dc911705a70, tid=0x1de09bc, values=0x7ffcbf3cb1a0, isnull=0x7ffcbf3cb180,tupleIsAlive=true, state=0x7ffcbf3cb3f0) at gistbuild.c:470 #11 0x00000000004f71b7 in heapam_index_build_range_scan (heapRelation=0x7dc9116f6970, indexRelation=0x7dc911705a70, indexInfo=0x1ddfac0,allow_sync=true, anyvisible=false, progress=true, start_blockno=0, numblocks=4294967295, callback=0x4bb2da<gistBuildCallback>, callback_state=0x7ffcbf3cb3f0, scan=0x1de0968) at heapam_handler.c:1659 #12 0x00000000004bab42 in table_index_build_scan (table_rel=0x7dc9116f6970, index_rel=0x7dc911705a70, index_info=0x1ddfac0,allow_sync=true, progress=true, callback=0x4bb2da <gistBuildCallback>, callback_state=0x7ffcbf3cb3f0,scan=0x0) at ../../../../src/include/access/tableam.h:1508 #13 0x00000000004bae2a in gistbuild (heap=0x7dc9116f6970, index=0x7dc911705a70, indexInfo=0x1ddfac0) at gistbuild.c:196 #14 0x00000000005a0019 in index_build (heapRelation=0x7dc9116f6970, indexRelation=0x7dc911705a70, indexInfo=0x1ddfac0, isreindex=false,parallel=true) at index.c:2871 #15 0x000000000059d510 in index_create (heapRelation=0x7dc9116f6970, indexRelationName=0x1d8a670 "gist_idx", indexRelationId=16512,parentIndexRelid=0, parentConstraintId=0, relFileNode=0, indexInfo=0x1ddfac0, indexColNames=0x1dbbef8,accessMethodObjectId=783, tableSpaceId=0, collationObjectId=0x1dbbf70, classObjectId=0x1dbbf90, coloptions=0x1dbbfb0, reloptions=0, flags=0, constr_flags=0, allow_system_table_mods=false,is_internal=false, constraintId=0x7ffcbf3cb734) at index.c:1221 #16 0x0000000000680605 in DefineIndex (relationId=16485, stmt=0x1d8a5d8, indexRelationId=0, parentIndexId=0, parentConstraintId=0,is_alter_table=false, check_rights=true, check_not_in_use=true, skip_build=false, quiet=false) at indexcmds.c:987 #17 0x000000000091679b in ProcessUtilitySlow (pstate=0x1d8a4c0, pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idxON public.mytable USING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,queryEnv=0x0, dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at utility.c:1368 #18 0x000000000091579c in standard_ProcessUtility (pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytableUSING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at utility.c:922 #19 0x0000000000914894 in ProcessUtility (pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytableUSING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at utility.c:359 #20 0x000000000091377b in PortalRunUtility (portal=0x1d603e0, pstmt=0x1cf9678, isTopLevel=true, setHoldSnapshot=false, dest=0x1cf9958,completionTag=0x7ffcbf3cc040 "") at pquery.c:1175 #21 0x0000000000913997 in PortalRunMulti (portal=0x1d603e0, isTopLevel=true, setHoldSnapshot=false, dest=0x1cf9958, altdest=0x1cf9958,completionTag=0x7ffcbf3cc040 "") at pquery.c:1321 #22 0x0000000000912ec1 in PortalRun (portal=0x1d603e0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1cf9958,altdest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at pquery.c:796 #23 0x000000000090cbf9 in exec_simple_query (query_string=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytable USING gist (am_descriptionpublic.gist_trgm_ops);") at postgres.c:1226 #24 0x0000000000910ff4 in PostgresMain (argc=1, argv=0x1d243f0, dbname=0x1d243d0 "test", username=0x1d243b0 "user") at postgres.c:4306 #25 0x0000000000864a19 in BackendRun (port=0x1d1d310) at postmaster.c:4498 #26 0x00000000008641ea in BackendStartup (port=0x1d1d310) at postmaster.c:4189 #27 0x0000000000860661 in ServerLoop () at postmaster.c:1727 #28 0x000000000085ff0d in PostmasterMain (argc=3, argv=0x1cf23b0) at postmaster.c:1400 #29 0x000000000077037e in main (argc=3, argv=0x1cf23b0) at main.c:210 This only happens on some runs (in my case the CREATE INDEX passed 8x without any issue and failed on 9th iteration), so there has to be some random element affecting this. FWIW the first place where the offnum=8 happens to be used is this call in gistfinishsplit() - it's the stack->downlinkoffnum gistinserttuples(state, stack->parent, giststate, tuples, 2, stack->downlinkoffnum, left->buf, right->buf, true, /* Unlock parent */ unlockbuf /* Unlock stack->buffer if caller wants that */ ); so maybe the GIST stack is stale, not updated, or something like that. I think I've seen an issue mentioning that recently ... yep, found it [1]. And I see Tom came to about the same conclusion, pointing to the same place in gistfinishsplit. It's only a couple days old, I don't think I've seen any proposed fixes yet. [1] https://www.postgresql.org/message-id/flat/16134-0423f729671dec64%40postgresql.org regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > This only happens on some runs (in my case the CREATE INDEX passed 8x > without any issue and failed on 9th iteration), so there has to be some > random element affecting this. As far as that goes, gistchoose() intentionally makes random choices when two insertion paths seem equally good. So failing only sometimes isn't very surprising. > so maybe the GIST stack is stale, not updated, or something like that. I > think I've seen an issue mentioning that recently ... yep, found it [1]. > And I see Tom came to about the same conclusion, pointing to the same > place in gistfinishsplit. It's only a couple days old, I don't think > I've seen any proposed fixes yet. Good that we seem to have just one bug there not two :-(. I've not looked any further than my previous message, though. I wonder if this could be a recently-introduced bug? I do not recall seeing complaints like this before v12. regards, tom lane
On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wonder if this could be a recently-introduced bug? I do not
recall seeing complaints like this before v12.
Looks like the culprit is:
commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed Apr 3 17:03:15 2019 +0300
Generate less WAL during GiST, GIN and SP-GiST index build.
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed Apr 3 17:03:15 2019 +0300
Generate less WAL during GiST, GIN and SP-GiST index build.
The commit before went 150 iterations with no failures, while this one failed on the 15th iteration, then again on 4th.
Cheers,
Jeff
On Thu, Dec 12, 2019 at 07:05:53PM -0500, Jeff Janes wrote: >On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> >> I wonder if this could be a recently-introduced bug? I do not >> recall seeing complaints like this before v12. >> > >Looks like the culprit is: > >commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b >Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> >Date: Wed Apr 3 17:03:15 2019 +0300 > > Generate less WAL during GiST, GIN and SP-GiST index build. > >The commit before went 150 iterations with no failures, while this one >failed on the 15th iteration, then again on 4th. > Thanks for the bisect. I've planned to try reproducing this on 11, but this commit only in 12, so Tom is right it's a recent bug. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 13/12/2019 02:16, Tomas Vondra wrote: > On Thu, Dec 12, 2019 at 07:05:53PM -0500, Jeff Janes wrote: >> On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I wonder if this could be a recently-introduced bug? I do not >>> recall seeing complaints like this before v12. >> >> Looks like the culprit is: >> >> commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b >> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Wed Apr 3 17:03:15 2019 +0300 >> >> Generate less WAL during GiST, GIN and SP-GiST index build. >> >> The commit before went 150 iterations with no failures, while this one >> failed on the 15th iteration, then again on 4th. > > Thanks for the bisect. I've planned to try reproducing this on 11, but > this commit only in 12, so Tom is right it's a recent bug. Thanks, I'll take a look... - Heikki
On 13/12/2019 02:16, Tomas Vondra wrote: > On Thu, Dec 12, 2019 at 07:05:53PM -0500, Jeff Janes wrote: >> On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I wonder if this could be a recently-introduced bug? I do not >>> recall seeing complaints like this before v12. >> >> Looks like the culprit is: >> >> commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b >> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Wed Apr 3 17:03:15 2019 +0300 >> >> Generate less WAL during GiST, GIN and SP-GiST index build. >> >> The commit before went 150 iterations with no failures, while this one >> failed on the 15th iteration, then again on 4th. > > Thanks for the bisect. I've planned to try reproducing this on 11, but > this commit only in 12, so Tom is right it's a recent bug. Like Tom suspected at [1], the bug was that the parent page got split, and the stacked information wasn't updated. The code called gistFindCorrectParent(), which should've updated the stack, but that function checked the LSN of the page and did nothing if it matched. To trigger the bug, you needed to have an insertion that split a page into three (or more) pages, and inserting the downlink caused the parent page to split, too. Committed a fix. Many thanks for the reproducer scripts, Andreas and Alexander! [1] https://www.postgresql.org/message-id/9409.1574617130%40sss.pgh.pa.us - Heikki
On Fri, Dec 13, 2019 at 5:02 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Committed a fix.
Thank you. I ran it out to 600 iterations with no problems.
Cheers,
Jeff
Hello Heikki, 14.12.2019 1:02, Heikki Linnakangas wrote: > Like Tom suspected at [1], the bug was that the parent page got split, > and the stacked information wasn't updated. The code called > gistFindCorrectParent(), which should've updated the stack, but that > function checked the LSN of the page and did nothing if it matched. To > trigger the bug, you needed to have an insertion that split a page > into three (or more) pages, and inserting the downlink caused the > parent page to split, too. > > Committed a fix. Many thanks for the reproducer scripts, Andreas and > Alexander! The script that I presented in [0] still reproduces assertion failure for me on REL_12_STABLE (fd005e1a): iteration 6 psql:/tmp/test_gist_intarray.sql:2: NOTICE: extension "intarray" already exists, skipping CREATE EXTENSION DROP TABLE CREATE TABLE COPY 7000 INSERT 0 7000 INSERT 0 14000 INSERT 0 28000 INSERT 0 56000 psql:/tmp/test_gist_intarray.sql:10: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. psql:/tmp/test_gist_intarray.sql:10: fatal: connection to server was lost Core was generated by `postgres: law regression [local] CREATE INDEX '. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 #1 0x00007f10c0ed4801 in __GI_abort () at abort.c:79 #2 0x000055cfe1d476a6 in ExceptionalCondition ( conditionName=conditionName@entry=0x55cfe1db42a8 "!(((((ItemId) (&((PageHeader) (page))->pd_linp[(todelete) - 1])))->lp_len != 0))", errorType=errorType@entry=0x55cfe1d9e788 "FailedAssertion", fileName=fileName@entry=0x55cfe1db4250 "gistutil.c", lineNumber=lineNumber@entry=70) at assert.c:54 #3 0x000055cfe18e3dc0 in gistnospace (page=page@entry=0x7f10b8705d00 "", itvec=itvec@entry=0x7ffec1b676e0, len=len@entry=2, todelete=todelete@entry=5, freespace=freespace@entry=819) at gistutil.c:70 #4 0x000055cfe18e1301 in gistplacetopage (rel=0x7f10c2006500, freespace=819, giststate=giststate@entry=0x55cfe3934ec0, buffer=920, itup=itup@entry=0x7ffec1b676e0, ntup=ntup@entry=2, oldoffnum=5, newblkno=0x0, leftchildbuf=897, splitinfo=0x7ffec1b67670, markfollowright=true, heapRel=0x7f10c20107a0, is_build=true) at gist.c:257 ... [0] https://www.postgresql.org/message-id/16134-0423f729671dec64%40postgresql.org > > [1] https://www.postgresql.org/message-id/9409.1574617130%40sss.pgh.pa.us > > - Heikki Best regards, Alexander
On Sat, Dec 14, 2019 at 2:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
Hello Heikki,
14.12.2019 1:02, Heikki Linnakangas wrote:
>
> Committed a fix. Many thanks for the reproducer scripts, Andreas and
> Alexander!
The script that I presented in [0] still reproduces assertion failure
for me on REL_12_STABLE (fd005e1a):
I've verified that this was introduced by 9155580f and that Heikki's latest commit a7ee7c851 has not fixed it. I've reproduced it without cassert.
Cheers,
Jeff
On 15 December 2019 19:57:37 EET, Jeff Janes <jeff.janes@gmail.com> wrote: >On Sat, Dec 14, 2019 at 2:00 PM Alexander Lakhin <exclusion@gmail.com> >wrote: > >> Hello Heikki, >> 14.12.2019 1:02, Heikki Linnakangas wrote: >> > >> > Committed a fix. Many thanks for the reproducer scripts, Andreas >and >> > Alexander! >> The script that I presented in [0] still reproduces assertion failure >> for me on REL_12_STABLE (fd005e1a): >> > >I've verified that this was introduced by 9155580f and that Heikki's >latest >commit a7ee7c851 has not fixed it. I've reproduced it without cassert. Yeah, I was able to reproduce it too. *sigh*, thanks, I'll investigate tomorrow. - Heikki
On 15/12/2019 19:57, Jeff Janes wrote: > On Sat, Dec 14, 2019 at 2:00 PM Alexander Lakhin <exclusion@gmail.com > <mailto:exclusion@gmail.com>> wrote: > > 14.12.2019 1:02, Heikki Linnakangas wrote: > > > > Committed a fix. Many thanks for the reproducer scripts, Andreas and > > Alexander! > The script that I presented in [0] still reproduces assertion failure > for me on REL_12_STABLE (fd005e1a): > > I've verified that this was introduced by 9155580f and that Heikki's > latest commit a7ee7c851 has not fixed it. I've reproduced it without > cassert. So there was still one instance of basically the same bug. In the same function, there are two calls to gistinserttuples(): > /* > * insert downlinks for the siblings from right to left, until there are > * only two siblings left. > */ > for (int pos = list_length(splitinfo) - 1; pos > 1; pos--) > { > right = (GISTPageSplitInfo *) list_nth(splitinfo, pos); > left = (GISTPageSplitInfo *) list_nth(splitinfo, pos - 1); > > if (gistinserttuples(state, stack->parent, giststate, > &right->downlink, 1, > InvalidOffsetNumber, > left->buf, right->buf, false, false)) > { > /* > * If the parent page was split, need to relocate the original > * parent pointer. > */ > stack->downlinkoffnum = InvalidOffsetNumber; > gistFindCorrectParent(state->r, stack); > } > /* gistinserttuples() released the lock on right->buf. */ > } > > right = (GISTPageSplitInfo *) lsecond(splitinfo); > left = (GISTPageSplitInfo *) linitial(splitinfo); > > /* > * Finally insert downlink for the remaining right page and update the > * downlink for the original page to not contain the tuples that were > * moved to the new pages. > */ > tuples[0] = left->downlink; > tuples[1] = right->downlink; > gistinserttuples(state, stack->parent, giststate, > tuples, 2, > stack->downlinkoffnum, > left->buf, right->buf, > true, /* Unlock parent */ > unlockbuf /* Unlock stack->buffer if caller wants that */ > ); > Assert(left->buf == stack->buffer); I added the "stack-downlinkoffnum = InvalidOffsetNumber" to the first call on Friday. But the second call needs the same treatment, because it's possible that we reuse the same stack on a later call to gistfinishsplit(), if a grandchild page is split twice, for the same leaf insert. I feel pretty dumb, because I looked at that second call on Friday too, thinking if it also needs to reset 'downlinkoffnum', but somehow I concluded it does not. Clearly it does, as this test case demonstrates. Fix committed. Thanks again for the report! - Heikki
Hello Heikki, 16.12.2019 15:02, Heikki Linnakangas wrote: > > I added the "stack-downlinkoffnum = InvalidOffsetNumber" to the first > call on Friday. But the second call needs the same treatment, because > it's possible that we reuse the same stack on a later call to > gistfinishsplit(), if a grandchild page is split twice, for the same > leaf insert. I feel pretty dumb, because I looked at that second call > on Friday too, thinking if it also needs to reset 'downlinkoffnum', > but somehow I concluded it does not. Clearly it does, as this test > case demonstrates. > > Fix committed. Thanks again for the report! I can confirm that now the issue is gone. Thank you! Best regards, Alexander