Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] MERGE SQL Statement for PG11 |
Date | |
Msg-id | CAH2-WzmhvUA7Mcv1JuGM3rjmkLQmVC-_VtS6ZiZi2CpHb4F9uQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] MERGE SQL Statement for PG11 (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: [HACKERS] MERGE SQL Statement for PG11
|
List | pgsql-hackers |
On Tue, Feb 27, 2018 at 10:19 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: >> I attach a rough example of this, that uses plpgsql. > > Thanks for writing the sample code. I understand you probably don't mean to > suggest that we need to mimic the behaviour of the plpgsql code and the > semantics offered by MERGE would most likely be different than what the > plpgsql sample does. Because there are several problems with the plpgsql > code: > > - It would never turn a MATCHED case into a NOT MATCHED case because of > concurrent UPDATE/DELETE > - The WHERE clauses attached to the UPDATE/DELETE statement should be using > the quals attached to the WHEN clauses to ensure they are evaluated on the > new version of the row, if needed. It's definitely possible to poke holes in my plpgsql example, most (or all) of which are not fixable within the confines of what plpgsql can do. I still think that it's really useful to frame the discussion with examples of the kind of procedural code MERGE replaces, though, because: * That's the explicit purpose of MERGE, according to the SQL standard. Everyone is very clear that the join outputs rows that are separately inserted, updated, or deleted (additional "WHEN ... AND" quals are evaluated separately). We should be clear on that, too. We're quite specifically replacing procedural code that follows a general pattern. We might even give an example of such procedural code in the docs, as SQL Server does. * It shows that in some ways, the INSERT/UPDATE/DELETE parts are separate "zero or one row" statements. They could do their own snapshot acquisitions within RC mode, for example. Also, you don't get the ON CONFLICT behavior with excluded being affected by BEFORE ROW triggers within UPDATE expression evaluation for ON CONFLICT DO UPDATE. * My example is buggy, but seemingly only in a way that is just about unavoidable -- all the bugs are in RC mode with concurrent writes. Poking holes in what I came up with is actually useful, and may be less confusing than discussing the same issues some other way. There are very few users in the world that would understand these issues. Moreover, if all affected users that have this kind of code in the wild were to somehow magically develop a strong understanding of this stuff overnight, even then I'm not sure that much would change. They still use RC mode for all the usual reasons, and they mostly won't have any way of fixing concurrency issues that is actually acceptable to them. In short, I'm not sure that I can fix the problems with my plpgsql code, so what chance do they have of fixing theirs? >> >> Some novel new behavior -- "EPQ with a twist"-- is clearly necessary. >> >> I feel a bit uneasy about it because anything that anybody suggests is >> >> likely to be at least a bit arbitrary (EPQ itself is kind of >> >> arbitrary). We only get to make a decision on how "EPQ with a twist" >> >> will work once, and that should be a decision that is made following >> >> careful deliberation. Ambiguity is much more likely to kill a patch >> >> than a specific technical defect, at least in my experience. Somebody >> >> can usually just fix a technical defect. > > > TBH that's one reason why I like Simon's proposed behaviour of throwing > errors in case of corner cases. I am not suggesting that's what we do at the > end, but it's definitely worth considering. I now feel like Simon's suggestion of throwing an error in corner cases isn't so bad. It still seems like we could do better, but the more I think about it, the less that seems like a cop-out. My reasons are: * As I already said, my plpgsql example, while buggy, might actually be the safest way to get the intended behavior in RC mode today. I can definitely imagine a way of dealing with concurrency that is both safer and less prone to throwing weird errors, but the fact remains that my example is the state of the art here, in a way. * Simon has already introduced something that looks like "EPQ with a twist" to me -- the steps that happen before he even raises this error. IOW, he does something extra that is EPQ-like. He likely does a lot better than my plpgsql example manages to, I think. * I suspect that the kind of users that really like the ON CONFLICT DO UPDATE's simplicity (in terms of what it guarantees them) are unlikely to care about MERGE at all. The kind of user that cares about MERGE is likely to have at least heard of the isolation levels. * I do see an important difference between making likely-unexpected errors in RC mode very unlikely, and making them *impossible*. This patch is not ON CONFLICT DO UPDATE, though, and that strong guarantee simply isn't on the table. * We can all agree that *not* raising an error in the specific way Simon proposes is possible, somehow or other. We also all agree that avoiding the broader category of RC errors can only be taken so far (e.g. in any event duplicate violations errors are entirely possible, in RC mode, when a MERGE inserts a row). So this is a question of what exact middle ground to take. Neither of the two extremes (throwing an error on the first sign of a RC conflict, and magically preventing concurrency anomalies) are actually on the table. I will not block this patch because it merely makes throwing likely-unexpected errors in RC mode "very unlikely", rather than "very very unlikely". Not least because I have a hard time imagining a single user caring about the difference that still exists with Simon's less ambitious (though not entirely unambitious) version of "EPQ with a twist". > Here are some observations from Rahila's analysis so far. I must say, > Oracle's handling seems quite inconsistent, especially the conditions under > which it sometimes re-evaluates the join and sometimes don't. > I am curiously surprised by it's behaviour of re-evaluating join only when > certain columns are updated. It looks to me irrespective of what we choose, > our implementation would be much superior to what Oracle offers. I'm not that surprised that it's generally kind of arbitrary, though the fact that you can update something while the join qual no longer passes does seem particularly poor. BTW, one thing that I remember very clearly from my research on MERGE years ago is this: all of the major implementations were in some way or other quite buggy. At least in RC mode, MERGE kind of promises something that it really can't quite deliver. Implementations are seemingly left to pick and choose how to paper over those cracks. The description of what Oracle allows here does make me feel better about our direction. Having the least-worst semantics on RC conflict handling certainly seems good enough to me. Especially because we have a "true UPSERT" already, unlike both Oracle and SQL Server. > BTW I've sent v17a of the patch, which is very close to being complete from > my perspective (except some documentation fixes/improvements). The only > thing pending is the decision to accept or change the currently implemented > concurrency semantics. I need to go look at that. I'll try to take a firmer position on this. I know that I've been saying that for a quite a while now, but my failure to take a firmer position for so long is not because I didn't try. It's because there is no really good answer. -- Peter Geoghegan
pgsql-hackers by date: