Re: PostgreSQL12 crash bug report - Mailing list pgsql-bugs

From Andres Freund
Subject Re: PostgreSQL12 crash bug report
Date
Msg-id 20190905195911.cxqf4r3umf5fgph4@alap3.anarazel.de
Whole thread Raw
In response to Re: PostgreSQL12 crash bug report  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: PostgreSQL12 crash bug report
Re: PostgreSQL12 crash bug report
List pgsql-bugs
Hi,

On 2019-08-27 13:06:37 -0400, Tom Lane wrote:
> Seems reasonable to store a pointer in the EPQState rather than pass
> that as a separate argument, but I think you need a bit more documentation
> work to keep the two EState pointers from being impossibly confusing.
> It might help to write something like "estate is an executor state node
> instantiated for the same rangetable as parentestate (though it might
> be running only a sub-tree of the main plan).  While parentestate
> represents the "real" plan execution, estate runs EPQ rechecks in
> which the table scan nodes are changed to return only the current tuple
> of interest for each table.  Those current tuples are found using the
> rowmark mechanisms."

I tried that for a while, but was unsatisfied with putting it to the
EState(s) - felt like too general a comment. Moved it to the top of the
struct instead.  This basically ended up reordering all of struct
EState, so it'd be good to have a second look.


> Maybe we should change the "estate" field name to something else, too
> --- "childestate" is the first thing that comes to mind but it's
> rather generic.  "recheckestate", perhaps?

I went with recheckestate, and then also renamed the PlanState to follow
recheck* pattern, to make clearer it's not the main plan's PlanState
(sub-)tree.


> Also I dislike the field name "es_active_epq", as what that suggests
> to me is exactly backwards from the way you apparently are using it.
> Maybe "es_parent_epq" instead?

I went with es_epq_active, as I suggested in my earlier email, which
still seems accurate to me.

I really want to move consideration of es_ep_active to ExecInitNode()
time, rather than execution time. If we add an execScan helper, we can
have it set the corresponding executor node's ExecProcNode to
a) a function that performs qual checks and projection
b) a function that performs projection
c) the fetch method from the scan node
d) basically the current ExecScan, when es_epq_active


> The original intent of EvalPlanQualInit was to do an absolutely
> minimal amount of work, since we don't know yet whether EPQ will
> ever be needed (and, generally, the way to bet is that it won't be).
> You've added some pallocs and list-to-array conversion, which
> while not terribly expensive still seem like they don't belong
> there.  Can't we move most of that to EvalPlanQualBegin?  I see that
> we want the substitute_slot array to exist so that EvalPlanQualSlot
> can work without EvalPlanQualBegin, but if we're postponing
> non-locking-rowmark work then we don't need the rest.

Did that.


> In ExecIndexOnlyMarkPos and ExecIndexOnlyRestrPos, you seem to have
> incorrectly changed the function names given in comments and elog messages
> (copy&pasto?)

Fixed.

Did a good bit more comment polishing, renamed a few more variables.

I also added tests for things that I thought were clearly missing
(including a test that errors out before the code changes in the
patch).

For one of the new tests I added a noisy_oper(comment, a, oper, b)
function, to see which EPQ tests are performed with which
values. Without that I found it to be really hard to be confident that
we actually are using the right column values. I think it might make
sense to expand it's use a bit more - but that seems like something for
later.


I tried for a while to develop one for mark/restore of IndexOnlyScans,
but I concluded that that code is basically dead right now. Every scan
node of a normal that gets modified or needs a rowmark implies having
ctid as part of the targetlist. And we neither allow ctid to be part of
index definitions, nor understand that we actually kinda know the ctid
from within the index scan (HOT would make using the tid hard).  So the
relevant code in nodeIndexOnly.c seems dead?

Greetings,

Andres Freund

Attachment

pgsql-bugs by date:

Previous
From: Jeremy Schneider
Date:
Subject: Re: ERROR: multixact X from before cutoff Y found to be still running
Next
From: Andres Freund
Date:
Subject: Re: PostgreSQL12 crash bug report