Thread: Segfault on ANALYZE in SERIALIZABLE isolation

Segfault on ANALYZE in SERIALIZABLE isolation

From
Joe Wildish
Date:
Hackers,

Head of master is giving me a segfault on running ANALYZE when isolation mode is SERIALIZABLE.

My configure:

export CFLAGS="-g"
export LDFLAGS="-L/usr/local/opt/readline/lib"
export CPPFLAGS="-I/usr/local/opt/readline/include"

./configure \
   --prefix=/Users/joe/Development/tmp/pg \
   --enable-cassert \
   --enable-debug \
   --with-readline

To reproduce:

[joe@oberon pg]$ ./bin/initdb -D $(pwd)/data
The files belonging to this database system will be owned by user "joe".
This user must also own the server process.

The database cluster will be initialized with locale "en_GB.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /Users/joe/Development/tmp/pg/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Europe/London
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    ./bin/pg_ctl -D /Users/joe/Development/tmp/pg/data -l logfile start

[joe@oberon pg]$ ./bin/pg_ctl -D /Users/joe/Development/tmp/pg/data -l logfile start && ./bin/psql -d postgres
waiting for server to start.... done
server started
psql (12devel)
Type "help" for help.

[local] joe@postgres=# ALTER SYSTEM SET DEFAULT_TRANSACTION_ISOLATION TO 'serializable';
ALTER SYSTEM
[local] joe@postgres=# \q
[joe@oberon pg]$ ./bin/pg_ctl -D /Users/joe/Development/tmp/pg/data -l logfile restart && ./bin/psql -d postgres
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started
psql (12devel)
Type "help" for help.

[local] joe@postgres=# ANALYZE;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
 @!>


Logline:
2019-05-18 15:10:06.831 BST [45177] LOG:  server process (PID 45186) was terminated by signal 11: Segmentation fault:
11


Cheers,
-Joe





Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Sergei Kornilov
Date:
Hi

I can reproduce with:

set default_transaction_isolation TO serializable ;
analyze ;

Here is backtrace:

#0  SerializationNeededForRead (snapshot=0x0, relation=0x7f53e9a525f8) at predicate.c:530
#1  PredicateLockRelation (relation=relation@entry=0x7f53e9a525f8, snapshot=snapshot@entry=0x0) at predicate.c:2507
#2  0x0000562395b78a14 in heap_beginscan (relation=0x7f53e9a525f8, snapshot=0x0, nkeys=0, key=0x0, parallel_scan=0x0,
allow_strat=<optimizedout>, 
 
    allow_sync=false, allow_pagemode=true, is_bitmapscan=false, is_samplescan=true, temp_snap=false) at heapam.c:1180
#3  0x0000562395c782d7 in table_beginscan_analyze (rel=0x7f53e9a525f8) at ../../../src/include/access/tableam.h:786
#4  acquire_sample_rows (onerel=onerel@entry=0x7f53e9a525f8, elevel=elevel@entry=13, rows=rows@entry=0x562396f01dd0,
targrows=targrows@entry=30000,
 
    totalrows=totalrows@entry=0x7ffd0603e498, totaldeadrows=totaldeadrows@entry=0x7ffd0603e490) at analyze.c:1032
#5  0x0000562395c790f2 in do_analyze_rel (onerel=onerel@entry=0x7f53e9a525f8, params=params@entry=0x7ffd0603e6a0,
va_cols=va_cols@entry=0x0,
 
    acquirefunc=0x562395c781fa <acquire_sample_rows>, relpages=0, inh=inh@entry=false, in_outer_xact=false, elevel=13)
atanalyze.c:502
 
#6  0x0000562395c79930 in analyze_rel (relid=<optimized out>, relation=0x0, params=params@entry=0x7ffd0603e6a0,
va_cols=0x0,
 
    in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:260
#7  0x0000562395cf6f90 in vacuum (relations=0x562396ecbf80, params=params@entry=0x7ffd0603e6a0, bstrategy=<optimized
out>,bstrategy@entry=0x0, 
 
    isTopLevel=isTopLevel@entry=true) at vacuum.c:413
#8  0x0000562395cf759d in ExecVacuum (pstate=pstate@entry=0x562396df69f8, vacstmt=vacstmt@entry=0x562396dd54c0,
isTopLevel=isTopLevel@entry=true)
    at vacuum.c:199
#9  0x0000562395e84863 in standard_ProcessUtility (pstmt=0x562396dd5820, queryString=0x562396dd4ad8 "analyze ;",
context=PROCESS_UTILITY_TOPLEVEL,
 
    params=0x0, queryEnv=0x0, dest=0x562396dd5918, completionTag=0x7ffd0603ea10 "") at utility.c:670
#10 0x0000562395e84dba in ProcessUtility (pstmt=pstmt@entry=0x562396dd5820, queryString=<optimized out>,
context=<optimizedout>, 
 
    params=<optimized out>, queryEnv=<optimized out>, dest=dest@entry=0x562396dd5918, completionTag=0x7ffd0603ea10 "")
atutility.c:360
 
#11 0x0000562395e811a1 in PortalRunUtility (portal=portal@entry=0x562396e3a178, pstmt=pstmt@entry=0x562396dd5820,
isTopLevel=isTopLevel@entry=true,
 
    setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x562396dd5918,
completionTag=completionTag@entry=0x7ffd0603ea10"") at pquery.c:1175
 
#12 0x0000562395e81e0e in PortalRunMulti (portal=portal@entry=0x562396e3a178, isTopLevel=isTopLevel@entry=true, 
    setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x562396dd5918, altdest=altdest@entry=0x562396dd5918,

    completionTag=completionTag@entry=0x7ffd0603ea10 "") at pquery.c:1321
#13 0x0000562395e82b99 in PortalRun (portal=portal@entry=0x562396e3a178, count=count@entry=9223372036854775807,
isTopLevel=isTopLevel@entry=true,
 
    run_once=run_once@entry=true, dest=dest@entry=0x562396dd5918, altdest=altdest@entry=0x562396dd5918,
completionTag=0x7ffd0603ea10"")
 
    at pquery.c:796
#14 0x0000562395e7ee14 in exec_simple_query (query_string=query_string@entry=0x562396dd4ad8 "analyze ;") at
postgres.c:1215
#15 0x0000562395e80cfc in PostgresMain (argc=<optimized out>, argv=argv@entry=0x562396e00320, dbname=<optimized out>,
username=<optimizedout>)
 
    at postgres.c:4249
#16 0x0000562395df6358 in BackendRun (port=port@entry=0x562396df7d30) at postmaster.c:4431
#17 0x0000562395df9477 in BackendStartup (port=port@entry=0x562396df7d30) at postmaster.c:4122
#18 0x0000562395df969a in ServerLoop () at postmaster.c:1704
#19 0x0000562395dfabdb in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377
#20 0x0000562395d59083 in main (argc=3, argv=0x562396dcf200) at main.c:228

regards, Sergei



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Sergei Kornilov
Date:
Hello

Seems table_beginscan_analyze (src/include/access/tableam.h) should not pass second argument as NULL.
CC'ing Andres Freund

PS: also I noticed in src/include/utils/snapshot.h exactly same comment for both SNAPSHOT_SELF and SNAPSHOT_DIRTY -
theyhave no difference?
 

regards, Sergei



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Tom Lane
Date:
Sergei Kornilov <sk@zsrv.org> writes:
> I can reproduce with:

> set default_transaction_isolation TO serializable ;
> analyze ;

> Here is backtrace:

> #0  SerializationNeededForRead (snapshot=0x0, relation=0x7f53e9a525f8) at predicate.c:530
> #1  PredicateLockRelation (relation=relation@entry=0x7f53e9a525f8, snapshot=snapshot@entry=0x0) at predicate.c:2507
> #2  0x0000562395b78a14 in heap_beginscan (relation=0x7f53e9a525f8, snapshot=0x0, nkeys=0, key=0x0, parallel_scan=0x0,
allow_strat=<optimizedout>,  
>     allow_sync=false, allow_pagemode=true, is_bitmapscan=false, is_samplescan=true, temp_snap=false) at heapam.c:1180
> #3  0x0000562395c782d7 in table_beginscan_analyze (rel=0x7f53e9a525f8) at ../../../src/include/access/tableam.h:786
> #4  acquire_sample_rows (onerel=onerel@entry=0x7f53e9a525f8, elevel=elevel@entry=13, rows=rows@entry=0x562396f01dd0,
targrows=targrows@entry=30000, 
>     totalrows=totalrows@entry=0x7ffd0603e498, totaldeadrows=totaldeadrows@entry=0x7ffd0603e490) at analyze.c:1032

So the problem is that something is passing a null snapshot to something
that isn't expecting that.  This seems closely related to the tableam
API issue that was being debated a day or two back about whether it's
ever valid to hand a null snapshot to an AM.  Andres, which layer do
you think is at fault here?

            regards, tom lane



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Tom Lane
Date:
I wrote:
> Sergei Kornilov <sk@zsrv.org> writes:
>> I can reproduce with:
>> set default_transaction_isolation TO serializable ;
>> analyze ;

> So the problem is that something is passing a null snapshot to something
> that isn't expecting that.  This seems closely related to the tableam
> API issue that was being debated a day or two back about whether it's
> ever valid to hand a null snapshot to an AM.  Andres, which layer do
> you think is at fault here?

Bisecting confirms that this broke at

commit 737a292b5de296615a715ddce2b2d83d1ee245c5
Author: Andres Freund <andres@anarazel.de>
Date:   Sat Mar 30 16:21:09 2019 -0700

    tableam: VACUUM and ANALYZE support.

I'd thought possibly this had something to do with bb16aba50 (Enable
parallel query with SERIALIZABLE isolation) but the bisection result
makes it pretty clear that it's just a tableam API screwup.

            regards, tom lane



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Andres Freund
Date:
Hi,

On May 18, 2019 11:55:01 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>I wrote:
>> Sergei Kornilov <sk@zsrv.org> writes:
>>> I can reproduce with:
>>> set default_transaction_isolation TO serializable ;
>>> analyze ;
>
>> So the problem is that something is passing a null snapshot to
>something
>> that isn't expecting that.  This seems closely related to the tableam
>> API issue that was being debated a day or two back about whether it's
>> ever valid to hand a null snapshot to an AM.  Andres, which layer do
>> you think is at fault here?

Not quite - that was about the DML callbacks, this is about the scan itself. And while we have a snapshot allocated,
theanalyze version of the beginscan intentionally doesn't take a snapshot. 

>Bisecting confirms that this broke at
>
>commit 737a292b5de296615a715ddce2b2d83d1ee245c5
>Author: Andres Freund <andres@anarazel.de>
>Date:   Sat Mar 30 16:21:09 2019 -0700
>
>    tableam: VACUUM and ANALYZE support.
>
>I'd thought possibly this had something to do with bb16aba50 (Enable
>parallel query with SERIALIZABLE isolation) but the bisection result
>makes it pretty clear that it's just a tableam API screwup.

I'm not yet at my computer, but I think all that's needed is to expand the check that prevents the predicate lock to be
acquiredfor heap type scans to the analyze case. I'll check it in a few. 

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Not quite - that was about the DML callbacks, this is about the scan itself. And while we have a snapshot allocated,
theanalyze version of the beginscan intentionally doesn't take a snapshot. 

Uh, what?  That's a *huge* regression.  See, eg, 7170268ef.  We
really want ANALYZE to act as though it's reading a normal MVCC
snapshot.

            regards, tom lane



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Andres Freund
Date:
Hi,

On 2019-05-18 15:48:47 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Not quite - that was about the DML callbacks, this is about the scan itself. And while we have a snapshot
allocated,the analyze version of the beginscan intentionally doesn't take a snapshot.
 
> 
> Uh, what?  That's a *huge* regression.  See, eg, 7170268ef.  We
> really want ANALYZE to act as though it's reading a normal MVCC
> snapshot.

Hm? That's not new at all? In 11 we just do:

            switch (HeapTupleSatisfiesVacuum(&targtuple,
                                             OldestXmin,
                                             targbuffer))

I.e. unrelated to the tableam changes there's no mvcc snapshot in for
visibility determinations. And that's not changed, heap's implementation
still uses HTSV.  We do *hold* a snapshot, but that's all outside of
analyze.c afaik.

I've complained before that we're using a snapshot for analyze's
sampling scan in a lot of cases where it's not necessary, and that's a
very significant problem in production (where autvacuum doesn't cause
slowdowns, but autoanalyze does quite substantially).  But I'd not
change it just as an aside.

Greetings,

Andres Freund



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Andres Freund
Date:
Hi,

Thanks for the report Joe!

I've pushed a fix for this.

I ended up going down the path of making scan_begin's arguments a
bitmask. Given that several people expressed desire for that, and that
recognizing analyze scans would have required a new argument, that
seemed the most reasonable course.

I think the code handling sync/strat in heapam's initscan() could be
simplified so we don't set/unset the flags (and previously the booleans)
multiple times. But that seems like it ought to be done separately.

I'd normally have asked for a round of feedback for the changes, but it
seems more urgent to get something out for beta1. As the changes are all
below tableam, we can adjust this later without causing much trouble.

Regards,

Andres

On 2019-05-18 13:12:41 -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-05-18 15:48:47 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > Not quite - that was about the DML callbacks, this is about the scan itself. And while we have a snapshot
allocated,the analyze version of the beginscan intentionally doesn't take a snapshot.
 
> > 
> > Uh, what?  That's a *huge* regression.  See, eg, 7170268ef.  We
> > really want ANALYZE to act as though it's reading a normal MVCC
> > snapshot.
> 
> Hm? That's not new at all? In 11 we just do:
> 
>             switch (HeapTupleSatisfiesVacuum(&targtuple,
>                                              OldestXmin,
>                                              targbuffer))
> 
> I.e. unrelated to the tableam changes there's no mvcc snapshot in for
> visibility determinations. And that's not changed, heap's implementation
> still uses HTSV.  We do *hold* a snapshot, but that's all outside of
> analyze.c afaik.

Greetings,

Andres Freund



Re: Segfault on ANALYZE in SERIALIZABLE isolation

From
Andres Freund
Date:
Hi,

On 2019-05-18 18:12:31 +0300, Sergei Kornilov wrote:
> Seems table_beginscan_analyze (src/include/access/tableam.h) should not pass second argument as NULL.

As hopefully explained downthread, and in the commit message, that's not
really the concern. We shouldn't use the snapshot in the first place.

> PS: also I noticed in src/include/utils/snapshot.h exactly same
> comment for both SNAPSHOT_SELF and SNAPSHOT_DIRTY - they have no
> difference?

That was copy & paste mistake. Fixed. Also expanded the comments a
bit. Thanks for noticing!

Greetings,

Andres Freund