Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", - Mailing list pgsql-hackers

From Tom Lane
Subject Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",
Date
Msg-id 5407.1130533136@sss.pgh.pa.us
Whole thread Raw
In response to Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",
Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",
List pgsql-hackers
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> All of them have in common that the slotno being passed ($3 below) is in
> SLRU_PAGE_READ_IN_PROGRESS state ... could it be a problem with lock
> reordering?  Maybe somebody is trying to read in a page, and somebody
> else steals the buffer from under them.  Not sure how likely is that.

It's even more interesting than that: in all three cases,
SlruSelectLRUPage has selected a "least recently used" page that is
still in READ_IN_PROGRESS state (ie, we haven't finished faulting it in)
and is recursively calling SimpleLruReadPage to wait for that condition
to terminate.

Apparently, Jim's setup could desperately do with a larger SLRU arena
for pg_subtrans, because this is supposed to be a never-happen path ---
if you can't finish loading a page before you need its slot for
something else, you are thrashing with a capital T.

I suppose there's a bug in this path, but I'm darned if I can see what
it is.  There are a number of obvious inefficiencies, but those
shouldn't be important given that this isn't supposed to happen much.
But how's it getting to the Assert failure?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",
Next
From: "Jim C. Nasby"
Date:
Subject: Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",