Re: initial pruning in parallel append - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: initial pruning in parallel append |
Date | |
Msg-id | CA+TgmoY1KFTWtDOoxZvQV3q9SzJB370qM_FOggDZz+3qM+ZHtA@mail.gmail.com Whole thread Raw |
In response to | initial pruning in parallel append (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: initial pruning in parallel append
|
List | pgsql-hackers |
On Tue, Jun 27, 2023 at 9:23 AM Amit Langote <amitlangote09@gmail.com> wrote: > Maybe that stuff could be resurrected, though I was wondering if the > risk of the same initial pruning steps returning different results > when performed repeatedly in *one query lifetime* aren't pretty > minimal or maybe rather non-existent? I think that's because > performing initial pruning steps entails computing constant and/or > stable expressions and comparing them with an unchanging set of > partition bound values, with comparison functions whose result is also > presumed to be stable. Then there's also the step of mapping the > partition indexes as they appear in the PartitionDesc to the indexes > of their subplans under Append/MergeAppend using the information > contained in PartitionPruneInfo (subplan_map) and the result of > mapping should be immutable too. > > I considered that the comparison functions that > match_clause_to_partition_key() obtains by calling get_opfamily_proc() > may in fact not be stable, though that doesn't seem to be a worry at > least with the out-of-the-box pg_amproc collection: I think it could be acceptable if a stable function not actually being stable results in some kind of internal error message, hopefully one that in some way hints to the user what the problem was. But crashing because some expression was supposed to be stable and wasn't is a bridge too far, especially if, as I think would be the case here, the crash happens in a part of the code that is far removed from where the problem was introduced. The real issue here is about how much trust you can place in a given invariant. If, in a single function, we initialize a value to 0 and thereafter only ever increment it, we can logically reason that if we ever see a value less than zero, there must have been an overflow. It is true that if our function calls some other function, that other function could access data through a garbage pointer and possibly corrupt the value of our function's local variable, but that's extremely unlikely, and we can basically decide that we're not going to are about it, because such code is likely to crash anyway before too long. But now consider an invariant that implicates a larger amount of code e.g. you must always hold a buffer pin before accessing the buffer contents. In many cases, it's fairly easy to verify that this must be so in any given piece of code, but there are problems: some code that does buffer access is complicated enough that it's hard to fully verify, especially when buffer pins are held across long periods, and what is probably worse, there are tons of different places in the code that access buffers. Hence, we've had bugs in this area, and likely will have bugs in this area again. In theory, with a sufficient amount of really careful work, you can find all of the problems, but in practice it's pretty difficult. Nonetheless, we just have to just accept the risk that we're going to crash if a bug in this area does exist, because there's no real way to cope with the contents of the buffer that you're accessing being swapped out while you're in the middle of looking at it, or even modifying it. But the present case is different in a couple of ways. First, there's probably even more code involved, including a good bit of it that's not in core but is user-defined. Second, we've generally made a decision up until now that we don't want to have a hard dependency on stable functions actually being stable. If they aren't, and for example you're using index expressions, your queries may return wrong answers, but you won't get weird internal error messages, and you won't get a crash. I think the bar for this feature is the same. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: