Thread: Re: [COMMITTERS] pgsql: Avoid SnapshotResetXmin() during AtEOXact_Snapshot()

On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
>
> For normal commits and aborts we already reset PgXact->xmin
> Avoiding touching highly contented shmem improves concurrent
> performance.
>
> Simon Riggs

I'm getting occasional crashes with backtraces that look like this:

#0  0x00007fff9679c286 in __pthread_kill ()
#1  0x00007fff94e1a9f9 in pthread_kill ()
#2  0x00007fff9253a9a3 in abort ()
#3  0x0000000107e0659e in ExceptionalCondition (conditionName=<value
temporarily unavailable, due to optimizations>, errorType=0x6 <Address
0x6 out of bounds>, fileName=<value temporarily unavailable, due to
optimizations>, lineNumber=<value temporarily unavailable, due to
optimizations>) at assert.c:54
#4  0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value
temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at
snapmgr.c:1154
#5  0x0000000107a76c06 in CleanupTransaction () at xact.c:2643
#6  0x0000000107a76267 in CommitTransactionCommand () at xact.c:2818
#7  0x0000000107cecfc2 in exec_simple_query
(query_string=0x7f975481e640 "ABORT TRANSACTION") at postgres.c:2461
#8  0x0000000107ceabb7 in PostgresMain (argc=<value temporarily
unavailable, due to optimizations>, argv=<value temporarily
unavailable, due to optimizations>, dbname=<value temporarily
unavailable, due to optimizations>, username=<value temporarily
unavailable, due to optimizations>) at postgres.c:4071
#9  0x0000000107c6bb58 in PostmasterMain (argc=<value temporarily
unavailable, due to optimizations>, argv=<value temporarily
unavailable, due to optimizations>) at postmaster.c:4317
#10 0x0000000107be5cdd in main (argc=<value temporarily unavailable,
due to optimizations>, argv=<value temporarily unavailable, due to
optimizations>) at main.c:228

I suspect that is the fault of this patch.  Please fix or revert.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
>>
>> For normal commits and aborts we already reset PgXact->xmin
>> Avoiding touching highly contented shmem improves concurrent
>> performance.
>>
>> Simon Riggs
>
> I'm getting occasional crashes with backtraces that look like this:
>
> #4  0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value
> temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at
> snapmgr.c:1154
> #5  0x0000000107a76c06 in CleanupTransaction () at xact.c:2643
>
> I suspect that is the fault of this patch.  Please fix or revert.

Also, the entire buildfarm is turning red.

longfin, spurfowl, and magpie all show this assertion failure in the
log.  I haven't checked the others.

TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On Fri, Mar 24, 2017 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
>>>
>>> For normal commits and aborts we already reset PgXact->xmin
>>> Avoiding touching highly contented shmem improves concurrent
>>> performance.
>>>
>>> Simon Riggs
>>
>> I'm getting occasional crashes with backtraces that look like this:
>>
>> #4  0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value
>> temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at
>> snapmgr.c:1154
>> #5  0x0000000107a76c06 in CleanupTransaction () at xact.c:2643
>>
>> I suspect that is the fault of this patch.  Please fix or revert.
>
> Also, the entire buildfarm is turning red.
>
> longfin, spurfowl, and magpie all show this assertion failure in the
> log.  I haven't checked the others.
>
> TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154)

Another thing that is interesting is that when I run make -j8
check-world, the overall tests appear to succeed even though there are
failures mid-way through:

test tablefunc                ... FAILED (test process exited with exit code 2)

...but then later we end with:

ok
All tests successful.
Files=11, Tests=80, 251 wallclock secs ( 0.07 usr  0.02 sys + 19.77
cusr 14.45 csys = 34.31 CPU)
Result: PASS

real    4m27.421s
user    3m50.047s
sys    1m31.937s

That's unrelated to the current problem of course, but it seems to
suggest that make's -j option doesn't entirely do what you'd expect
when used with make check-world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



On 24 March 2017 at 16:14, Robert Haas <robertmhaas@gmail.com> wrote:

> I suspect that is the fault of this patch.  Please fix or revert.

Will revert then fix.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [COMMITTERS] pgsql: Avoid SnapshotResetXmin() duringAtEOXact_Snapshot()

From
Andres Freund
Date:
On 2017-03-24 13:50:54 -0400, Robert Haas wrote:
> On Fri, Mar 24, 2017 at 12:27 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Fri, Mar 24, 2017 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Fri, Mar 24, 2017 at 10:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>> Avoid SnapshotResetXmin() during AtEOXact_Snapshot()
> >>>
> >>> For normal commits and aborts we already reset PgXact->xmin
> >>> Avoiding touching highly contented shmem improves concurrent
> >>> performance.
> >>>
> >>> Simon Riggs
> >>
> >> I'm getting occasional crashes with backtraces that look like this:
> >>
> >> #4  0x0000000107e4be2b in AtEOXact_Snapshot (isCommit=<value
> >> temporarily unavailable, due to optimizations>, isPrepare=0 '\0') at
> >> snapmgr.c:1154
> >> #5  0x0000000107a76c06 in CleanupTransaction () at xact.c:2643
> >>
> >> I suspect that is the fault of this patch.  Please fix or revert.
> >
> > Also, the entire buildfarm is turning red.
> >
> > longfin, spurfowl, and magpie all show this assertion failure in the
> > log.  I haven't checked the others.
> >
> > TRAP: FailedAssertion("!(MyPgXact->xmin == 0)", File: "snapmgr.c", Line: 1154)
> 
> Another thing that is interesting is that when I run make -j8
> check-world, the overall tests appear to succeed even though there are
> failures mid-way through:
> 
> test tablefunc                ... FAILED (test process exited with exit code 2)
> 
> ...but then later we end with:
> 
> ok
> All tests successful.
> Files=11, Tests=80, 251 wallclock secs ( 0.07 usr  0.02 sys + 19.77
> cusr 14.45 csys = 34.31 CPU)
> Result: PASS

> real    4m27.421s
> user    3m50.047s
> sys    1m31.937s

> That's unrelated to the current problem of course, but it seems to
> suggest that make's -j option doesn't entirely do what you'd expect
> when used with make check-world.
> 

That's likely the output of a different test from the one that failed.
It's a lot easier to see the result if you're doing
&& echo success || echo failure

- Andres