Home > mailing lists

Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Improving connection scalability: GetSnapshotData()
Date	August 16, 2020 21:16:04
Msg-id	20200816181604.l54m6kss5ntd6xow@alap3.anarazel.de Whole thread Raw
In response to	Re: Improving connection scalability: GetSnapshotData() (Andres Freund <andres@anarazel.de>)
Responses	Re: Improving connection scalability: GetSnapshotData()
List	pgsql-hackers

Tree view

Hi,

On 2020-08-15 09:42:00 -0700, Andres Freund wrote:
> On 2020-08-15 11:10:51 -0400, Tom Lane wrote:
> > We have two essentially identical buildfarm failures since these patches
> > went in:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=damselfly&dt=2020-08-15%2011%3A27%3A32
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2020-08-15%2003%3A09%3A14
> >
> > They're both in the same place in the freeze-the-dead isolation test:
> 
> > TRAP: FailedAssertion("!TransactionIdPrecedes(members[i].xid, cutoff_xid)", File: "heapam.c", Line: 6051)
> > 0x9613eb <ExceptionalCondition+0x5b> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> > 0x52d586 <heap_prepare_freeze_tuple+0x926> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> > 0x53bc7e <heap_vacuum_rel+0x100e> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> > 0x6949bb <vacuum_rel+0x25b> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> > 0x694532 <vacuum+0x602> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> > 0x693d1c <ExecVacuum+0x37c> at /home/pgbuildfarm/buildroot/HEAD/inst/bin/postgres
> > 0x8324b3
> > ...
> > 2020-08-14 22:16:41.783 CDT [78410:4] LOG:  server process (PID 80395) was terminated by signal 6: Abort trap
> > 2020-08-14 22:16:41.783 CDT [78410:5] DETAIL:  Failed process was running: VACUUM FREEZE tab_freeze;
> >
> > peripatus has successes since this failure, so it's not fully reproducible
> > on that machine.  I'm suspicious of a timing problem in computing vacuum's
> > cutoff_xid.
> 
> Hm, maybe it's something around what I observed in
> https://www.postgresql.org/message-id/20200723181018.neey2jd3u7rfrfrn%40alap3.anarazel.de
> 
> I.e. that somehow we end up with hot pruning and freezing coming to a
> different determination, and trying to freeze a hot tuple.
> 
> I'll try to add a few additional asserts here, and burn some cpu tests
> trying to trigger the issue.
> 
> I gotta escape the heat in the house for a few hours though (no AC
> here), so I'll not look at the results till later this afternoon, unless
> it triggers soon.

690 successful runs later, it didn't trigger for me :(. Seems pretty
clear that there's another variable than pure chance, otherwise it seems
like that number of runs should have hit the issue, given the number of
bf hits vs bf runs.

My current plan would is to push a bit of additional instrumentation to
help narrow down the issue. We can afterwards decide what of that we'd
like to keep longer term, and what not.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Tom Lane
Date: 15 August 2020, 20:44:34
Subject: Re: run pgindent on a regular basis / scripted manner

From: Tom Lane
Date: 16 August 2020, 21:30:24
Subject: Re: Improving connection scalability: GetSnapshotData()

Re: Improving connection scalability: GetSnapshotData() - Mailing list pgsql-hackers

Previous

Next