Re: More efficient RI checks - take 2 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: More efficient RI checks - take 2 |
Date | |
Msg-id | 20200428222142.vqfetg7m32rxnjtj@alap3.anarazel.de Whole thread Raw |
In response to | Re: More efficient RI checks - take 2 (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi, On 2020-04-28 10:44:58 -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Robert Haas (robertmhaas@gmail.com) wrote: > >> As you say, perhaps there's room for both things, but also as you say, > >> it's not obvious how to decide intelligently between them. > > > The single-row case seems pretty clear and also seems common enough that > > it'd be worth paying the cost to figure out if it's a single-row > > statement or not. It's not that obvious to me that it's going to be beneficial to go down a planned path in all that many cases. If all that the RI check does is a index_rescan() && index_getnext_slot(), there's not that many realistic types of plans that are going to be better. IIUC a query to check a transition table would, simplified, boil down to either: SELECT * FROM transition_table tt WHERE -- for a insertion/update into the referencing table and ( NOT EXISTS (SELECT * FROM referenced_table rt WHERE rt.referenced_column = tt.referencing_column) [AND ... , ] ) -- for update / delete of referenced table OR EXISTS (SELECT * FROM referencing_table rt1 WHERE rt1.referencing_column = tt.referenced_column1) [OR ... , ] LIMIT 1; Where a returned row would signal an error. But it would need to handle row locking, CASCADE/SET NULL/SET DEFAULT etc. More on that below. While it's tempting to want to write the latter check as -- for update / delete of referenced table SELECT * FROM referencing_table rt WHERE referencing_column IN (SELECT referenced_column FROM transition_table tt) LIMIT 1; that'd make it harder to know the violating row. As the transition table isn't ordered it seems like there's not that many realistic ways to execute this: 1) A nestloop semi/anti-join with an inner index scan 2) Sort transition table, do a merge semi/anti-join between sort and an ordered index scan on the referenced / referencing table(s). 3) Hash semi/anti-join, requiring a full table scan of the tables I think 1) would be worse than just doing the indexscan manually. 2) would probably be beneficial if there's a lot of rows on the inner side, due to the ordered access and deduplication. 3) would sometimes be beneficial because it'd avoid an index scan for each tuple in the transition table. The cases in which it is clear to me that a bulk check could theoretically be significantly better than a fast per-row check are: 1) There's a *lot* of rows in the transition table to comparatively small referenced / referencing tables. As those tables can cheaply be hashed, a hashjoin will be be more efficient than doing a index lookup for each transition table row. 2) There's a lot of duplicate content in the transition table. E.g. because there's a lot of references to the same row. Did I forget important ones? With regard to the row locking etc that I elided above: I think that actually will prevent most if not all interesting optimizations: Because of the FOR KEY SHARE that's required, the planner plan will pretty much always degrade to a per row subplan check anyway. Is there any formulation of the necessary queries that don't suffer from this problem? > That seems hard to do in advance ... but it would be easy to code > a statement-level AFTER trigger along the lines of > > if (transition table contains one row) > // fast special case here > else > // slow general case here. I suspect we'd need something more complicated than this for it to be beneficial. My gut feeling would be that the case where a transition table style check would be most commonly beneficial is when you have a very small referenced table, and a *lot* of rows get inserted. But clearly we wouldn't want to have bulk insertion suddenly also store all rows in a transition table. Nor would we want to have a bulk UPDATE cause all the updated rows to be stored in the transition table, even though none of the relevant columns changed (i.e. the RI_FKey_[f]pk_upd_check_required logic in AfterTriggerSaveEvent()). I still don't quite see how shunting RI checks through triggers saves us more than it costs: Especially for the stuff we do as AFTER: Most of the time we could do the work we defer till query end immediately, rather than building up an in-memory queue. Besides saving memory, in a lot of cases that'd also make it unnecessary to refetch the row at a later time, possibly needing to chase updated row versions. But even for the BEFORE checks, largely going through generic trigger code means it's much harder to batch checks without suddenly requiring memory proportional to the number of inserted rows. There obviously are cases where it's not possible to check just after each row. Deferrable constraints, as well as CASCADING / SET NOT NULL / SET DEFAULT on tables with user defined triggers, for example. But it'd likely be sensible to handle that in the way we already handle deferred uniqueness checks, i.e. we only queue something if there's a potential for a problem. > I think the question really comes down to this: is the per-row overhead of > the transition-table mechanism comparable to that of the AFTER trigger > queue? Or if not, can we make it so? It's probably more expensive, in some ways, at the moment. The biggest difference is that the transition table stores complete rows, valid as of the moment they've been inserted/updated/deleted, whereas the trigger queue only stores enough information to fetch the tuple again during trigger execution. Several RI checks however re-check visiblity before executing, so that's another fetch, that'd likely not be elided by a simple change to using transition tables. Both have significant downsides, obviously. Storing complete rows can take a lot more memory, and refetching rows is expensive, especially if it happens much later (with the row pushed out of shared_buffers potentially). I think it was a mistake to have these two different systems in trigger.c. When transition tables were added we shouldn't have kept per-tuple state both in the queue and in the transition tuplestore. Instead we should have only used the tuplestore, and optimized what information we store inside depending on the need of the various after triggers. Greetings, Andres Freund
pgsql-hackers by date: