Re: Errors on missing pg_subtrans/ files with 9.3 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Errors on missing pg_subtrans/ files with 9.3
Date
Msg-id 20131125114712.GI28145@alap2.anarazel.de
Whole thread Raw
In response to Re: Errors on missing pg_subtrans/ files with 9.3  (J Smith <dark.panda+lists@gmail.com>)
Responses Re: Errors on missing pg_subtrans/ files with 9.3  (J Smith <dark.panda+lists@gmail.com>)
Re: Errors on missing pg_subtrans/ files with 9.3  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 2013-11-24 16:56:26 -0500, J Smith wrote:
> coredumper worked like a charm. Useful tool, that is... although as a
> bit of advice, I'd try not to run it on Postgres if your various
> memory settings are tweaked towards production use -- the core dump
> that was captured on my server weighed in at 16 GB.

> Nov 23 14:38:32 dev postgres[23810]: [4-1] user=dev,db=dev ERROR:  could not access status of transaction 13514992
> Nov 23 14:38:32 dev postgres[23810]: [4-2] user=dev,db=dev DETAIL:  Could not open file "pg_subtrans/00CE": Success.
> Nov 23 14:38:32 dev postgres[23810]: [4-3] user=dev,db=dev CONTEXT:  SQL statement "SELECT 1 FROM ONLY
"dev"."collection_batches"x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
 


Ok, this is helpful. Do you rather longrunning transactions? The
transaction that does foreign key checks has an xid of 10260613, while
the row that's getting checked has 13514992.

#4  0x0000000000635dc7 in XactLockTableWait (xid=13514992) at lmgr.c:501       tag = {locktag_field1 = 13514992,
locktag_field2= 0, locktag_field3 = 0, locktag_field4 = 0, locktag_type = 4 '\004', locktag_lockmethodid = 1 '\001'}
 
#5  0x0000000000482223 in heap_lock_updated_tuple_rec (rel=0x2b20f050a8d0, tuple=<value optimized out>, ctid=<value
optimizedout>, xid=10260613, mode=LockTupleKeyShare) at heapam.c:4847
 

I am not sure whether that's the origin of the problem but at the very
least it seems to me that heap_lock_updated_tuple_rec() is missing
several important pieces:
a) do the priorXmax==xmin dance to check we're still following the same  ctid chain. Currently we could easily stumble
acrosscompletely  unrelated tuples if a chain element aborted and got vacuumed.
 
b) Check whether a chain element actually aborted - currently we're  only doing that in the HEAP_KEYS_UPDATED updated
case,but that seems  wrong (we can't check for committed tho!).
 
c) (reported separately as well) cope with failure of heap_fetch() to  return anything.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Sequence Access Method WIP
Next
From: Sawada Masahiko
Date:
Subject: Re: Logging WAL when updating hintbit