Re: initial pruning in parallel append - Mailing list pgsql-hackers

From Robert Haas
Subject Re: initial pruning in parallel append
Date
Msg-id CA+TgmoY1KFTWtDOoxZvQV3q9SzJB370qM_FOggDZz+3qM+ZHtA@mail.gmail.com
Whole thread Raw
In response to initial pruning in parallel append  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: initial pruning in parallel append
List pgsql-hackers
On Tue, Jun 27, 2023 at 9:23 AM Amit Langote <amitlangote09@gmail.com> wrote:
> Maybe that stuff could be resurrected, though I was wondering if the
> risk of the same initial pruning steps returning different results
> when performed repeatedly in *one query lifetime* aren't pretty
> minimal or maybe rather non-existent?  I think that's because
> performing initial pruning steps entails computing constant and/or
> stable expressions and comparing them with an unchanging set of
> partition bound values, with comparison functions whose result is also
> presumed to be stable. Then there's also the step of mapping the
> partition indexes as they appear in the PartitionDesc to the indexes
> of their subplans under Append/MergeAppend using the information
> contained in PartitionPruneInfo (subplan_map) and the result of
> mapping should be immutable too.
>
> I considered that the comparison functions that
> match_clause_to_partition_key() obtains by calling get_opfamily_proc()
> may in fact not be stable, though that doesn't seem to be a worry at
> least with the out-of-the-box pg_amproc collection:

I think it could be acceptable if a stable function not actually being
stable results in some kind of internal error message, hopefully one
that in some way hints to the user what the problem was. But crashing
because some expression was supposed to be stable and wasn't is a
bridge too far, especially if, as I think would be the case here, the
crash happens in a part of the code that is far removed from where the
problem was introduced.

The real issue here is about how much trust you can place in a given
invariant. If, in a single function, we initialize a value to 0 and
thereafter only ever increment it, we can logically reason that if we
ever see a value less than zero, there must have been an overflow. It
is true that if our function calls some other function, that other
function could access data through a garbage pointer and possibly
corrupt the value of our function's local variable, but that's
extremely unlikely, and we can basically decide that we're not going
to are about it, because such code is likely to crash anyway before
too long.

But now consider an invariant that implicates a larger amount of code
e.g. you must always hold a buffer pin before accessing the buffer
contents. In many cases, it's fairly easy to verify that this must be
so in any given piece of code, but there are problems: some code that
does buffer access is complicated enough that it's hard to fully
verify, especially when buffer pins are held across long periods, and
what is probably worse, there are tons of different places in the code
that access buffers. Hence, we've had bugs in this area, and likely
will have bugs in this area again. In theory, with a sufficient amount
of really careful work, you can find all of the problems, but in
practice it's pretty difficult. Nonetheless, we just have to just
accept the risk that we're going to crash if a bug in this area does
exist, because there's no real way to cope with the contents of the
buffer that you're accessing being swapped out while you're in the
middle of looking at it, or even modifying it.

But the present case is different in a couple of ways. First, there's
probably even more code involved, including a good bit of it that's
not in core but is user-defined. Second, we've generally made a
decision up until now that we don't want to have a hard dependency on
stable functions actually being stable. If they aren't, and for
example you're using index expressions, your queries may return wrong
answers, but you won't get weird internal error messages, and you
won't get a crash. I think the bar for this feature is the same.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Tristan Partin"
Date:
Subject: Re: Minor configure/meson cleanup
Next
From: Ashutosh Bapat
Date:
Subject: Re: Oversight in reparameterize_path_by_child leading to executor crash