Thread: BUG #16162: create index using gist_trgm_ops leads to panic

BUG #16162: create index using gist_trgm_ops leads to panic

From

PG Bug reporting form

Date:

12 December 2019, 10:22:21

The following bug has been logged on the website:

Bug reference:      16162
Logged by:          Andreas Kunert
Email address:      kunert@cms.hu-berlin.de
PostgreSQL version: 12.1
Operating system:   Ubuntu 16.04
Description:

Hello,

while testing an upcoming migration of our database server to a new hardware
including an update to a new postgres version I observed the following
behavior on the new server running a self-compiled Postgres 12.1 under
Ubuntu 16.04:

When importing a complete pg_dump, about one in three times a panic occurs
while trying to create a pg_trgm gist index and then the cluster restarts.
The according excerpt from the log:

2019-12-12 10:32:03.886 CET [36999] LOG:  statement: CREATE INDEX gist_idx
ON public.mytable USING gist (am_description public.gist_trgm_ops);
2019-12-12 10:32:04.370 CET [36999] PANIC:  invalid index offnum: 4
2019-12-12 10:32:04.370 CET [36999] STATEMENT:  CREATE INDEX gist_idx ON
public.mytable USING gist (am_description public.gist_trgm_ops);
2019-12-12 10:32:04.914 CET [31336] LOG:  server process (PID 36999) was
terminated by signal 6: Aborted
2019-12-12 10:32:04.914 CET [31336] DETAIL:  Failed process was running:
CREATE INDEX gist_idx ON public.mytable USING gist (am_description
public.gist_trgm_ops);
2019-12-12 10:32:04.914 CET [31336] LOG:  terminating any other active
server processes

I could successfully recreate the behaviour on another server (running a
default-configured packaged version of postgres 12 for ubuntu 16.04 on a
german locale) - the log look like this:

2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index PANIK:
invalid index offnum: 5
2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index
ANWEISUNG:  CREATE INDEX gist_idx ON public.mytable USING
gist(am_description public.gist_trgm_ops);
2019-12-11 17:15:07.501 CET [7436] LOG:  Serverprozess (PID 14959) wurde von
Signal 6 beendet: Abgebrochen
2019-12-11 17:15:07.501 CET [7436] DETAIL:  Der fehlgeschlagene Prozess
führte aus: CREATE INDEX gist_idx ON public.mytable USING
gist(am_description public.gist_trgm_ops);
2019-12-11 17:15:07.501 CET [7436] LOG:  aktive Serverprozesse werden
abgebrochen
2019-12-11 17:15:07.515 CET [7436] LOG:  alle Serverprozesse beendet;
initialisiere neu


Unfortunately the database contains sensitive data but after a couple of
days and endless tests I managed to create a similarly built database
containing only random data which provokes the same behaviour (the log
messages above came from my test database). It consists of only on table
with two columns (serial, text) containing 200000 lines filled with random
strings of lengths varying from 1 to 1000.

A perl script creating the database, creating and filling the table and
finally iteratively creating and dropping the index is at the end of this
mail.

Since the error only occurs once in a while it can take a couple of
iterations (in my tests usually at least 7 attempts).

I hope I did not make a fundamental mistake and moreover I hope you can
reproduce the behavior.

Regards,
Andreas





----------------------------------------
Database creation
----------------------------------------

CREATE ROLE "test_gist_index" LOGIN PASSWORD 'XXXXX' NOINHERIT;
CREATE DATABASE "test_gist_index" OWNER "test_gist_index";
\c test_gist_index
CREATE EXTENSION IF NOT EXISTS pg_trgm;



---------------------------------------------------------------
Perl script creating and filling the table and running the test
---------------------------------------------------------------

#!/usr/bin/perl

use strict;
use warnings;
use DBI;
STDOUT->autoflush(1);


# generate alphabet similar to the one in the
# original error-prone database

my @alphabetstring;

for (my $j=0; $j<3; $j++) {
    for (my $i=65; $i<65+26; $i++) {
        push(@alphabetstring, chr($i));        # A-Z
        push(@alphabetstring, chr($i+97-65));  # a-z
    }
    for (my $i=0; $i<10; $i++) {
        push(@alphabetstring, chr($i+48));     # 0-9
        push(@alphabetstring, ' ');
    }
}

push(@alphabetstring, qw(ä ö ü Ä Ö Ü , ; : [ ] \( \) / - =));

# database connection

my $dbh = DBI->connect("DBI:Pg:dbname=test_gist_index;host=XXXXX",
"test_gist_index", "XXXXX", {'RaiseError' => 1});
$dbh->{pg_enable_utf8} = 0;

# (re)create test table

my $sth1 = $dbh->prepare("DROP TABLE IF EXISTS mytable; CREATE TABLE mytable
(am_id serial, am_description text); ");
$sth1->execute();

# fill table with random strings of arbitrary length

my $sth2 = $dbh->prepare("INSERT INTO mytable (am_description) VALUES
(?)");

for (my $i=1; $i<=200000; $i++) {

    my $stringlength = int(rand(1000));
    my $mystring = '';

    for (my $j=0; $j<=$stringlength; $j++) {

        my $randomchar = @alphabetstring[int(rand(scalar @alphabetstring))];
        $mystring .= $randomchar;

    }
    $sth2->execute($mystring);
    
    if ($i % 1000 == 0) { print "."; }
    if ($i % 10000 == 0) { print " - $i\n"; }
}

# test gist index creation

for (my $i=1; $i<=20; $i++) {
    print "\nbuilding gist $i\n";
    my $sth3 = $dbh->prepare("DROP INDEX IF EXISTS gist_idx;");
    print $sth3->execute();

    my $sth4 = $dbh->prepare("CREATE INDEX gist_idx ON public.mytable USING
gist (am_description public.gist_trgm_ops);");
    print $sth4->execute();

}

print "\n";

$dbh->disconnect();

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Tomas Vondra

Date:

12 December 2019, 12:50:23

On Thu, Dec 12, 2019 at 10:22:21AM +0000, PG Bug reporting form wrote:
>The following bug has been logged on the website:
>
>Bug reference:      16162
>Logged by:          Andreas Kunert
>Email address:      kunert@cms.hu-berlin.de
>PostgreSQL version: 12.1
>Operating system:   Ubuntu 16.04
>Description:
>
>Hello,
>
>while testing an upcoming migration of our database server to a new hardware
>including an update to a new postgres version I observed the following
>behavior on the new server running a self-compiled Postgres 12.1 under
>Ubuntu 16.04:
>
>When importing a complete pg_dump, about one in three times a panic occurs
>while trying to create a pg_trgm gist index and then the cluster restarts.
>The according excerpt from the log:
>
>2019-12-12 10:32:03.886 CET [36999] LOG:  statement: CREATE INDEX gist_idx
>ON public.mytable USING gist (am_description public.gist_trgm_ops);
>2019-12-12 10:32:04.370 CET [36999] PANIC:  invalid index offnum: 4
>2019-12-12 10:32:04.370 CET [36999] STATEMENT:  CREATE INDEX gist_idx ON
>public.mytable USING gist (am_description public.gist_trgm_ops);
>2019-12-12 10:32:04.914 CET [31336] LOG:  server process (PID 36999) was
>terminated by signal 6: Aborted
>2019-12-12 10:32:04.914 CET [31336] DETAIL:  Failed process was running:
>CREATE INDEX gist_idx ON public.mytable USING gist (am_description
>public.gist_trgm_ops);
>2019-12-12 10:32:04.914 CET [31336] LOG:  terminating any other active
>server processes
>
>I could successfully recreate the behaviour on another server (running a
>default-configured packaged version of postgres 12 for ubuntu 16.04 on a
>german locale) - the log look like this:
>
>2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index PANIK:
>invalid index offnum: 5
>2019-12-11 17:15:00.735 CET [14959] test_gist_index@test_gist_index
>ANWEISUNG:  CREATE INDEX gist_idx ON public.mytable USING
>gist(am_description public.gist_trgm_ops);
>2019-12-11 17:15:07.501 CET [7436] LOG:  Serverprozess (PID 14959) wurde von
>Signal 6 beendet: Abgebrochen
>2019-12-11 17:15:07.501 CET [7436] DETAIL:  Der fehlgeschlagene Prozess
>führte aus: CREATE INDEX gist_idx ON public.mytable USING
>gist(am_description public.gist_trgm_ops);
>2019-12-11 17:15:07.501 CET [7436] LOG:  aktive Serverprozesse werden
>abgebrochen
>2019-12-11 17:15:07.515 CET [7436] LOG:  alle Serverprozesse beendet;
>initialisiere neu
>
>
>Unfortunately the database contains sensitive data but after a couple of
>days and endless tests I managed to create a similarly built database
>containing only random data which provokes the same behaviour (the log
>messages above came from my test database). It consists of only on table
>with two columns (serial, text) containing 200000 lines filled with random
>strings of lengths varying from 1 to 1000.
>
>A perl script creating the database, creating and filling the table and
>finally iteratively creating and dropping the index is at the end of this
>mail.
>

Thanks for the report. The reproducer perl script is very useful, I've
managed to reproduce the crash pretty fast. With asserts enabled, the
backtrace from master looks like this:

Core was generated by `postgres: user test ::1(44878) CREATE INDEX  '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007dc91a786e35 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.29-27.fc30.x86_64 libgcc-9.2.1-1.fc30.x86_64
(gdb) bt
#0  0x00007dc91a786e35 in raise () from /lib64/libc.so.6
#1  0x00007dc91a771895 in abort () from /lib64/libc.so.6
#2  0x0000000000a8b375 in ExceptionalCondition (conditionName=0xb08e78 "ItemIdHasStorage(((ItemId) (&((PageHeader)
(page))->pd_linp[(todelete)- 1])))", errorType=0xb08d2a "FailedAssertion", fileName=0xb08e21 "gistutil.c",
lineNumber=69)at assert.c:67
 
#3  0x00000000004c6685 in gistnospace (page=0x7dc919d73d00 "", itvec=0x7ffcbf3cacb0, len=2, todelete=8, freespace=819)
atgistutil.c:69
 
#4  0x00000000004b64b5 in gistplacetopage (rel=0x7dc911705a70, freespace=819, giststate=0x1dc9400, buffer=16377,
itup=0x7ffcbf3cacb0,ntup=2, oldoffnum=8, newblkno=0x0, leftchildbuf=16299, splitinfo=0x7ffcbf3cac40,
markfollowright=true,heapRel=0x7dc9116f6970, is_build=true) at gist.c:256
 
#5  0x00000000004b9868 in gistinserttuples (state=0x7ffcbf3cadd0, stack=0x7ffcbf3cae00, giststate=0x1dc9400,
tuples=0x7ffcbf3cacb0,ntup=2, oldoffnum=8, leftchild=16299, rightchild=15979, unlockbuf=true, unlockleftchild=false) at
gist.c:1266
#6  0x00000000004b9af0 in gistfinishsplit (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400,
splitinfo=0x1df4568,unlockbuf=false) at gist.c:1363
 
#7  0x00000000004b9922 in gistinserttuples (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400,
tuples=0x7ffcbf3cad70,ntup=1, oldoffnum=0, leftchild=0, rightchild=0, unlockbuf=false, unlockleftchild=false) at
gist.c:1293
#8  0x00000000004b97b3 in gistinserttuple (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400, tuple=0x1ded6f8,
oldoffnum=0)at gist.c:1219
 
#9  0x00000000004b8531 in gistdoinsert (r=0x7dc911705a70, itup=0x1ded6f8, freespace=819, giststate=0x1dc9400,
heapRel=0x7dc9116f6970,is_build=true) at gist.c:875
 
#10 0x00000000004bb3a9 in gistBuildCallback (index=0x7dc911705a70, tid=0x1de09bc, values=0x7ffcbf3cb1a0,
isnull=0x7ffcbf3cb180,tupleIsAlive=true, state=0x7ffcbf3cb3f0) at gistbuild.c:470
 
#11 0x00000000004f71b7 in heapam_index_build_range_scan (heapRelation=0x7dc9116f6970, indexRelation=0x7dc911705a70,
indexInfo=0x1ddfac0,allow_sync=true, anyvisible=false, progress=true, start_blockno=0, numblocks=4294967295,
callback=0x4bb2da<gistBuildCallback>, callback_state=0x7ffcbf3cb3f0, scan=0x1de0968)
 
     at heapam_handler.c:1659
#12 0x00000000004bab42 in table_index_build_scan (table_rel=0x7dc9116f6970, index_rel=0x7dc911705a70,
index_info=0x1ddfac0,allow_sync=true, progress=true, callback=0x4bb2da <gistBuildCallback>,
callback_state=0x7ffcbf3cb3f0,scan=0x0) at ../../../../src/include/access/tableam.h:1508
 
#13 0x00000000004bae2a in gistbuild (heap=0x7dc9116f6970, index=0x7dc911705a70, indexInfo=0x1ddfac0) at
gistbuild.c:196
#14 0x00000000005a0019 in index_build (heapRelation=0x7dc9116f6970, indexRelation=0x7dc911705a70, indexInfo=0x1ddfac0,
isreindex=false,parallel=true) at index.c:2871
 
#15 0x000000000059d510 in index_create (heapRelation=0x7dc9116f6970, indexRelationName=0x1d8a670 "gist_idx",
indexRelationId=16512,parentIndexRelid=0, parentConstraintId=0, relFileNode=0, indexInfo=0x1ddfac0,
indexColNames=0x1dbbef8,accessMethodObjectId=783, tableSpaceId=0, collationObjectId=0x1dbbf70,
 
     classObjectId=0x1dbbf90, coloptions=0x1dbbfb0, reloptions=0, flags=0, constr_flags=0,
allow_system_table_mods=false,is_internal=false, constraintId=0x7ffcbf3cb734) at index.c:1221
 
#16 0x0000000000680605 in DefineIndex (relationId=16485, stmt=0x1d8a5d8, indexRelationId=0, parentIndexId=0,
parentConstraintId=0,is_alter_table=false, check_rights=true, check_not_in_use=true, skip_build=false, quiet=false) at
indexcmds.c:987
#17 0x000000000091679b in ProcessUtilitySlow (pstate=0x1d8a4c0, pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX
gist_idxON public.mytable USING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0,queryEnv=0x0, dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "")
 
     at utility.c:1368
#18 0x000000000091579c in standard_ProcessUtility (pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idx ON
public.mytableUSING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0,dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at utility.c:922
 
#19 0x0000000000914894 in ProcessUtility (pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idx ON
public.mytableUSING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
queryEnv=0x0,dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at utility.c:359
 
#20 0x000000000091377b in PortalRunUtility (portal=0x1d603e0, pstmt=0x1cf9678, isTopLevel=true, setHoldSnapshot=false,
dest=0x1cf9958,completionTag=0x7ffcbf3cc040 "") at pquery.c:1175 
#21 0x0000000000913997 in PortalRunMulti (portal=0x1d603e0, isTopLevel=true, setHoldSnapshot=false, dest=0x1cf9958,
altdest=0x1cf9958,completionTag=0x7ffcbf3cc040 "") at pquery.c:1321
 
#22 0x0000000000912ec1 in PortalRun (portal=0x1d603e0, count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x1cf9958,altdest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at pquery.c:796
 
#23 0x000000000090cbf9 in exec_simple_query (query_string=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytable USING gist
(am_descriptionpublic.gist_trgm_ops);") at postgres.c:1226
 
#24 0x0000000000910ff4 in PostgresMain (argc=1, argv=0x1d243f0, dbname=0x1d243d0 "test", username=0x1d243b0 "user") at
postgres.c:4306
#25 0x0000000000864a19 in BackendRun (port=0x1d1d310) at postmaster.c:4498
#26 0x00000000008641ea in BackendStartup (port=0x1d1d310) at postmaster.c:4189
#27 0x0000000000860661 in ServerLoop () at postmaster.c:1727
#28 0x000000000085ff0d in PostmasterMain (argc=3, argv=0x1cf23b0) at postmaster.c:1400
#29 0x000000000077037e in main (argc=3, argv=0x1cf23b0) at main.c:210

This only happens on some runs (in my case the CREATE INDEX passed 8x
without any issue and failed on 9th iteration), so there has to be some
random element affecting this.

FWIW the first place where the offnum=8 happens to be used is this call
in gistfinishsplit() - it's the stack->downlinkoffnum

     gistinserttuples(state, stack->parent, giststate,
                      tuples, 2,
                      stack->downlinkoffnum,
                      left->buf, right->buf,
                      true,        /* Unlock parent */
                      unlockbuf    /* Unlock stack->buffer if caller wants that */
         );

so maybe the GIST stack is stale, not updated, or something like that. I
think I've seen an issue mentioning that recently ... yep, found it [1].

And I see Tom came to about the same conclusion, pointing to the same
place in gistfinishsplit. It's only a couple days old, I don't think
I've seen any proposed fixes yet.


[1] https://www.postgresql.org/message-id/flat/16134-0423f729671dec64%40postgresql.org

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Tom Lane

Date:

12 December 2019, 13:44:22

Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> This only happens on some runs (in my case the CREATE INDEX passed 8x
> without any issue and failed on 9th iteration), so there has to be some
> random element affecting this.

As far as that goes, gistchoose() intentionally makes random choices
when two insertion paths seem equally good.  So failing only sometimes
isn't very surprising.

> so maybe the GIST stack is stale, not updated, or something like that. I
> think I've seen an issue mentioning that recently ... yep, found it [1].
> And I see Tom came to about the same conclusion, pointing to the same
> place in gistfinishsplit. It's only a couple days old, I don't think
> I've seen any proposed fixes yet.

Good that we seem to have just one bug there not two :-(.  I've not
looked any further than my previous message, though.

I wonder if this could be a recently-introduced bug?  I do not
recall seeing complaints like this before v12.

            regards, tom lane

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Jeff Janes

Date:

13 December 2019, 00:05:53

On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder if this could be a recently-introduced bug? I do not
recall seeing complaints like this before v12.

Looks like the culprit is:

commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed Apr 3 17:03:15 2019 +0300

Generate less WAL during GiST, GIN and SP-GiST index build.

The commit before went 150 iterations with no failures, while this one failed on the 15th iteration, then again on 4th.

Cheers,

Jeff

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Tomas Vondra

Date:

13 December 2019, 00:16:18

On Thu, Dec 12, 2019 at 07:05:53PM -0500, Jeff Janes wrote:
>On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>>
>> I wonder if this could be a recently-introduced bug?  I do not
>> recall seeing complaints like this before v12.
>>
>
>Looks like the culprit is:
>
>commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b
>Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>Date:   Wed Apr 3 17:03:15 2019 +0300
>
>    Generate less WAL during GiST, GIN and SP-GiST index build.
>
>The commit before went 150 iterations with no failures, while this one
>failed on the 15th iteration, then again on 4th.
>

Thanks for the bisect. I've planned to try reproducing this on 11, but
this commit only in 12, so Tom is right it's a recent bug.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Heikki Linnakangas

Date:

13 December 2019, 17:09:20

On 13/12/2019 02:16, Tomas Vondra wrote:
> On Thu, Dec 12, 2019 at 07:05:53PM -0500, Jeff Janes wrote:
>> On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I wonder if this could be a recently-introduced bug?  I do not
>>> recall seeing complaints like this before v12.
>>
>> Looks like the culprit is:
>>
>> commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b
>> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date:   Wed Apr 3 17:03:15 2019 +0300
>>
>>     Generate less WAL during GiST, GIN and SP-GiST index build.
>>
>> The commit before went 150 iterations with no failures, while this one
>> failed on the 15th iteration, then again on 4th.
> 
> Thanks for the bisect. I've planned to try reproducing this on 11, but
> this commit only in 12, so Tom is right it's a recent bug.

Thanks, I'll take a look...

- Heikki

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Heikki Linnakangas

Date:

13 December 2019, 22:02:52

On 13/12/2019 02:16, Tomas Vondra wrote:
 > On Thu, Dec 12, 2019 at 07:05:53PM -0500, Jeff Janes wrote:
 >> On Thu, Dec 12, 2019 at 8:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
 >>> I wonder if this could be a recently-introduced bug?  I do not
 >>> recall seeing complaints like this before v12.
 >>
 >> Looks like the culprit is:
 >>
 >> commit 9155580fd5fc2a0cbb23376dfca7cd21f59c2c7b
 >> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
 >> Date:   Wed Apr 3 17:03:15 2019 +0300
 >>
 >>     Generate less WAL during GiST, GIN and SP-GiST index build.
 >>
 >> The commit before went 150 iterations with no failures, while this one
 >> failed on the 15th iteration, then again on 4th.
 >
 > Thanks for the bisect. I've planned to try reproducing this on 11, but
 > this commit only in 12, so Tom is right it's a recent bug.

Like Tom suspected at [1], the bug was that the parent page got split, 
and the stacked information wasn't updated. The code called 
gistFindCorrectParent(), which should've updated the stack, but that 
function checked the LSN of the page and did nothing if it matched. To 
trigger the bug, you needed to have an insertion that split a page into 
three (or more) pages, and inserting the downlink caused the parent page 
to split, too.

Committed a fix. Many thanks for the reproducer scripts, Andreas and 
Alexander!

[1] https://www.postgresql.org/message-id/9409.1574617130%40sss.pgh.pa.us

- Heikki

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Jeff Janes

Date:

14 December 2019, 03:03:40

On Fri, Dec 13, 2019 at 5:02 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

Committed a fix.

Thank you. I ran it out to 600 iterations with no problems.

Cheers,

Jeff

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Alexander Lakhin

Date:

14 December 2019, 19:00:01

Hello Heikki,
14.12.2019 1:02, Heikki Linnakangas wrote:
> Like Tom suspected at [1], the bug was that the parent page got split,
> and the stacked information wasn't updated. The code called
> gistFindCorrectParent(), which should've updated the stack, but that
> function checked the LSN of the page and did nothing if it matched. To
> trigger the bug, you needed to have an insertion that split a page
> into three (or more) pages, and inserting the downlink caused the
> parent page to split, too.
>
> Committed a fix. Many thanks for the reproducer scripts, Andreas and
> Alexander!
The script that I presented in [0] still reproduces assertion failure
for me on REL_12_STABLE (fd005e1a):

iteration 6
psql:/tmp/test_gist_intarray.sql:2: NOTICE:  extension "intarray"
already exists, skipping
CREATE EXTENSION
DROP TABLE
CREATE TABLE
COPY 7000
INSERT 0 7000
INSERT 0 14000
INSERT 0 28000
INSERT 0 56000
psql:/tmp/test_gist_intarray.sql:10: server closed the connection
unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
psql:/tmp/test_gist_intarray.sql:10: fatal: connection to server was lost

Core was generated by `postgres: law regression [local] CREATE
INDEX                                 '.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f10c0ed4801 in __GI_abort () at abort.c:79
#2  0x000055cfe1d476a6 in ExceptionalCondition (
    conditionName=conditionName@entry=0x55cfe1db42a8 "!(((((ItemId)
(&((PageHeader) (page))->pd_linp[(todelete) - 1])))->lp_len != 0))",
errorType=errorType@entry=0x55cfe1d9e788 "FailedAssertion",
    fileName=fileName@entry=0x55cfe1db4250 "gistutil.c",
lineNumber=lineNumber@entry=70) at assert.c:54
#3  0x000055cfe18e3dc0 in gistnospace (page=page@entry=0x7f10b8705d00
"", itvec=itvec@entry=0x7ffec1b676e0,
    len=len@entry=2, todelete=todelete@entry=5,
freespace=freespace@entry=819) at gistutil.c:70
#4  0x000055cfe18e1301 in gistplacetopage (rel=0x7f10c2006500,
freespace=819,
    giststate=giststate@entry=0x55cfe3934ec0, buffer=920,
itup=itup@entry=0x7ffec1b676e0, ntup=ntup@entry=2,
    oldoffnum=5, newblkno=0x0, leftchildbuf=897,
splitinfo=0x7ffec1b67670, markfollowright=true,
    heapRel=0x7f10c20107a0, is_build=true) at gist.c:257
...
[0]
https://www.postgresql.org/message-id/16134-0423f729671dec64%40postgresql.org
>
> [1] https://www.postgresql.org/message-id/9409.1574617130%40sss.pgh.pa.us
>
> - Heikki
Best regards,
Alexander

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Jeff Janes

Date:

15 December 2019, 17:57:37

On Sat, Dec 14, 2019 at 2:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:

Hello Heikki,
14.12.2019 1:02, Heikki Linnakangas wrote:
>
> Committed a fix. Many thanks for the reproducer scripts, Andreas and
> Alexander!
The script that I presented in [0] still reproduces assertion failure
for me on REL_12_STABLE (fd005e1a):

I've verified that this was introduced by 9155580f and that Heikki's latest commit a7ee7c851 has not fixed it. I've reproduced it without cassert.

Cheers,

Jeff

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Heikki Linnakangas

Date:

15 December 2019, 18:19:58

On 15 December 2019 19:57:37 EET, Jeff Janes <jeff.janes@gmail.com> wrote:
>On Sat, Dec 14, 2019 at 2:00 PM Alexander Lakhin <exclusion@gmail.com>
>wrote:
>
>> Hello Heikki,
>> 14.12.2019 1:02, Heikki Linnakangas wrote:
>> >
>> > Committed a fix. Many thanks for the reproducer scripts, Andreas
>and
>> > Alexander!
>> The script that I presented in [0] still reproduces assertion failure
>> for me on REL_12_STABLE (fd005e1a):
>>
>
>I've verified that this was introduced by 9155580f and that Heikki's
>latest
>commit a7ee7c851 has not fixed it.  I've reproduced it without cassert.

Yeah, I was able to reproduce it too. *sigh*, thanks, I'll investigate tomorrow.

- Heikki

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Heikki Linnakangas

Date:

16 December 2019, 12:02:15

On 15/12/2019 19:57, Jeff Janes wrote:
> On Sat, Dec 14, 2019 at 2:00 PM Alexander Lakhin <exclusion@gmail.com 
> <mailto:exclusion@gmail.com>> wrote:
> 
>     14.12.2019 1:02, Heikki Linnakangas wrote:
>      >
>      > Committed a fix. Many thanks for the reproducer scripts, Andreas and
>      > Alexander!
>     The script that I presented in [0] still reproduces assertion failure
>     for me on REL_12_STABLE (fd005e1a):
> 
> I've verified that this was introduced by 9155580f and that Heikki's 
> latest commit a7ee7c851 has not fixed it.  I've reproduced it without 
> cassert.

So there was still one instance of basically the same bug. In the same 
function, there are two calls to gistinserttuples():

>     /*
>      * insert downlinks for the siblings from right to left, until there are
>      * only two siblings left.
>      */
>     for (int pos = list_length(splitinfo) - 1; pos > 1; pos--)
>     {
>         right = (GISTPageSplitInfo *) list_nth(splitinfo, pos);
>         left = (GISTPageSplitInfo *) list_nth(splitinfo, pos - 1);
> 
>         if (gistinserttuples(state, stack->parent, giststate,
>                              &right->downlink, 1,
>                              InvalidOffsetNumber,
>                              left->buf, right->buf, false, false))
>         {
>             /*
>              * If the parent page was split, need to relocate the original
>              * parent pointer.
>              */
>             stack->downlinkoffnum = InvalidOffsetNumber;
>             gistFindCorrectParent(state->r, stack);
>         }
>         /* gistinserttuples() released the lock on right->buf. */
>     }
> 
>     right = (GISTPageSplitInfo *) lsecond(splitinfo);
>     left = (GISTPageSplitInfo *) linitial(splitinfo);
> 
>     /*
>      * Finally insert downlink for the remaining right page and update the
>      * downlink for the original page to not contain the tuples that were
>      * moved to the new pages.
>      */
>     tuples[0] = left->downlink;
>     tuples[1] = right->downlink;
>     gistinserttuples(state, stack->parent, giststate,
>                      tuples, 2,
>                      stack->downlinkoffnum,
>                      left->buf, right->buf,
>                      true,        /* Unlock parent */
>                      unlockbuf    /* Unlock stack->buffer if caller wants that */
>         );
>     Assert(left->buf == stack->buffer);

I added the "stack-downlinkoffnum = InvalidOffsetNumber" to the first 
call on Friday. But the second call needs the same treatment, because 
it's possible that we reuse the same stack on a later call to 
gistfinishsplit(), if a grandchild page is split twice, for the same 
leaf insert. I feel pretty dumb, because I looked at that second call on 
Friday too, thinking if it also needs to reset 'downlinkoffnum', but 
somehow I concluded it does not. Clearly it does, as this test case 
demonstrates.

Fix committed. Thanks again for the report!

- Heikki

Re: BUG #16162: create index using gist_trgm_ops leads to panic

From

Alexander Lakhin

Date:

16 December 2019, 19:00:00

Hello Heikki,
16.12.2019 15:02, Heikki Linnakangas wrote:
>
> I added the "stack-downlinkoffnum = InvalidOffsetNumber" to the first
> call on Friday. But the second call needs the same treatment, because
> it's possible that we reuse the same stack on a later call to
> gistfinishsplit(), if a grandchild page is split twice, for the same
> leaf insert. I feel pretty dumb, because I looked at that second call
> on Friday too, thinking if it also needs to reset 'downlinkoffnum',
> but somehow I concluded it does not. Clearly it does, as this test
> case demonstrates.
>
> Fix committed. Thanks again for the report!
I can confirm that now the issue is gone.
Thank you!

Best regards,
Alexander