Thread: More efficient RI checks - take 2
After having reviewed [1] more than a year ago (the problem I found was that the transient table is not available for deferred constraints), I've tried to implement the same in an alternative way. The RI triggers still work as row level triggers, but if multiple events of the same kind appear in the queue, they are all passed to the trigger function at once. Thus the check query does not have to be executed that frequently. Some performance comparisons are below. (Besides the execution time, please note the difference in the number of trigger function executions.) In general, the checks are significantly faster if there are many rows to process, and a bit slower when we only need to check a single row. However I'm not sure about the accuracy if only a single row is measured (if a single row check is performed several times, the execution time appears to fluctuate). Comments are welcome. Setup ===== CREATE TABLE p(i int primary key); INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x); CREATE TABLE f(i int REFERENCES p); Insert many rows into the FK table ================================== master: EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=32.741..32.741 rows=0 loops=1) -> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.403..4.802 rows=16384 loops=1) Planning Time: 0.050 ms Trigger for constraint f_i_fkey: time=448.986 calls=16384 Execution Time: 485.444 ms (5 rows) patched: EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=34.053..34.053 rows=0 loops=1) -> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.223..4.448 rows=16384 loops=1) Planning Time: 0.047 ms Trigger for constraint f_i_fkey: time=105.164 calls=8 Execution Time: 141.201 ms Insert a single row into the FK table ===================================== master: EXPLAIN ANALYZE INSERT INTO f VALUES (1); QUERY PLAN ------------------------------------------------------------------------------------------ Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060 rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1) Planning Time: 0.026 ms Trigger for constraint f_i_fkey: time=0.435 calls=1 Execution Time: 0.517 ms (5 rows) patched: EXPLAIN ANALYZE INSERT INTO f VALUES (1); QUERY PLAN ------------------------------------------------------------------------------------------ Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066 rows=0 loops=1) -> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1) Planning Time: 0.025 ms Trigger for constraint f_i_fkey: time=0.578 calls=1 Execution Time: 0.670 ms Check if FK row exists during deletion from the PK ================================================== master: DELETE FROM p WHERE i=16384; ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f" DETAIL: Key (i)=(16384) is still referenced from table "f". Time: 3.381 ms patched: DELETE FROM p WHERE i=16384; ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f" DETAIL: Key (i)=(16384) is still referenced from table "f". Time: 5.561 ms Cascaded DELETE --- many PK rows ================================ DROP TABLE f; CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE); INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i); master: EXPLAIN ANALYZE DELETE FROM p; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=38.334..38.334 rows=0 loops=1) -> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.019..3.925 rows=16384 loops=1) Planning Time: 0.049 ms Trigger for constraint f_i_fkey: time=31348.756 calls=16384 Execution Time: 31390.784 ms patched: EXPLAIN ANALYZE DELETE FROM p; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=33.360..33.360 rows=0 loops=1) -> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.012..3.183 rows=16384 loops=1) Planning Time: 0.094 ms Trigger for constraint f_i_fkey: time=9.580 calls=8 Execution Time: 43.941 ms Cascaded DELETE --- a single PK row =================================== INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x); INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i); master: DELETE FROM p WHERE i=16384; DELETE 1 Time: 5.754 ms patched: DELETE FROM p WHERE i=16384; DELETE 1 Time: 8.098 ms Cascaded UPDATE - many rows =========================== master: EXPLAIN ANALYZE UPDATE p SET i = i + 16384; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Update on p (cost=0.00..277.80 rows=16384 width=10) (actual time=166.954..166.954 rows=0 loops=1) -> Seq Scan on p (cost=0.00..277.80 rows=16384 width=10) (actual time=0.013..7.780 rows=16384 loops=1) Planning Time: 0.177 ms Trigger for constraint f_i_fkey on p: time=60405.362 calls=16384 Trigger for constraint f_i_fkey on f: time=455.874 calls=16384 Execution Time: 61036.996 ms patched: EXPLAIN ANALYZE UPDATE p SET i = i + 16384; QUERY PLAN ------------------------------------------------------------------------------------------------------------ Update on p (cost=0.00..277.77 rows=16382 width=10) (actual time=159.512..159.512 rows=0 loops=1) -> Seq Scan on p (cost=0.00..277.77 rows=16382 width=10) (actual time=0.014..7.783 rows=16382 loops=1) Planning Time: 0.146 ms Trigger for constraint f_i_fkey on p: time=169.628 calls=9 Trigger for constraint f_i_fkey on f: time=124.079 calls=2 Execution Time: 456.072 ms Cascaded UPDATE - a single row ============================== master: UPDATE p SET i = i - 16384 WHERE i=32767; UPDATE 1 Time: 4.858 ms patched: UPDATE p SET i = i - 16384 WHERE i=32767; UPDATE 1 Time: 11.955 ms [1] https://commitfest.postgresql.org/22/1975/ -- Antonin Houska Web: https://www.cybertec-postgresql.com
Attachment
st 8. 4. 2020 v 18:36 odesílatel Antonin Houska <ah@cybertec.at> napsal:
After having reviewed [1] more than a year ago (the problem I found was that
the transient table is not available for deferred constraints), I've tried to
implement the same in an alternative way. The RI triggers still work as row
level triggers, but if multiple events of the same kind appear in the queue,
they are all passed to the trigger function at once. Thus the check query does
not have to be executed that frequently.
Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In general,
the checks are significantly faster if there are many rows to process, and a
bit slower when we only need to check a single row. However I'm not sure about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).
It is hard task to choose good strategy for immediate constraints, but for deferred constraints you know how much rows should be checked, and then you can choose better strategy.
Is possible to use estimation for choosing method of RI checks?
Comments are welcome.
Setup
=====
CREATE TABLE p(i int primary key);
INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
CREATE TABLE f(i int REFERENCES p);
Insert many rows into the FK table
==================================
master:
EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=32.741..32.741 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.403..4.802 rows=16384 loops=1)
Planning Time: 0.050 ms
Trigger for constraint f_i_fkey: time=448.986 calls=16384
Execution Time: 485.444 ms
(5 rows)
patched:
EXPLAIN ANALYZE INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Insert on f (cost=0.00..163.84 rows=16384 width=4) (actual time=34.053..34.053 rows=0 loops=1)
-> Function Scan on generate_series g (cost=0.00..163.84 rows=16384 width=4) (actual time=2.223..4.448 rows=16384 loops=1)
Planning Time: 0.047 ms
Trigger for constraint f_i_fkey: time=105.164 calls=8
Execution Time: 141.201 ms
Insert a single row into the FK table
=====================================
master:
EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.060..0.060 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.026 ms
Trigger for constraint f_i_fkey: time=0.435 calls=1
Execution Time: 0.517 ms
(5 rows)
patched:
EXPLAIN ANALYZE INSERT INTO f VALUES (1);
QUERY PLAN
------------------------------------------------------------------------------------------
Insert on f (cost=0.00..0.01 rows=1 width=4) (actual time=0.066..0.066 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.025 ms
Trigger for constraint f_i_fkey: time=0.578 calls=1
Execution Time: 0.670 ms
Check if FK row exists during deletion from the PK
==================================================
master:
DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 3.381 ms
patched:
DELETE FROM p WHERE i=16384;
ERROR: update or delete on table "p" violates foreign key constraint "f_i_fkey" on table "f"
DETAIL: Key (i)=(16384) is still referenced from table "f".
Time: 5.561 ms
Cascaded DELETE --- many PK rows
================================
DROP TABLE f;
CREATE TABLE f(i int REFERENCES p ON UPDATE CASCADE ON DELETE CASCADE);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
master:
EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=38.334..38.334 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.019..3.925 rows=16384 loops=1)
Planning Time: 0.049 ms
Trigger for constraint f_i_fkey: time=31348.756 calls=16384
Execution Time: 31390.784 ms
patched:
EXPLAIN ANALYZE DELETE FROM p;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Delete on p (cost=0.00..236.84 rows=16384 width=6) (actual time=33.360..33.360 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..236.84 rows=16384 width=6) (actual time=0.012..3.183 rows=16384 loops=1)
Planning Time: 0.094 ms
Trigger for constraint f_i_fkey: time=9.580 calls=8
Execution Time: 43.941 ms
Cascaded DELETE --- a single PK row
===================================
INSERT INTO p SELECT x FROM generate_series(1, 16384) g(x);
INSERT INTO f SELECT i FROM generate_series(1, 16384) g(i);
master:
DELETE FROM p WHERE i=16384;
DELETE 1
Time: 5.754 ms
patched:
DELETE FROM p WHERE i=16384;
DELETE 1
Time: 8.098 ms
Cascaded UPDATE - many rows
===========================
master:
EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.80 rows=16384 width=10) (actual time=166.954..166.954 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.80 rows=16384 width=10) (actual time=0.013..7.780 rows=16384 loops=1)
Planning Time: 0.177 ms
Trigger for constraint f_i_fkey on p: time=60405.362 calls=16384
Trigger for constraint f_i_fkey on f: time=455.874 calls=16384
Execution Time: 61036.996 ms
patched:
EXPLAIN ANALYZE UPDATE p SET i = i + 16384;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Update on p (cost=0.00..277.77 rows=16382 width=10) (actual time=159.512..159.512 rows=0 loops=1)
-> Seq Scan on p (cost=0.00..277.77 rows=16382 width=10) (actual time=0.014..7.783 rows=16382 loops=1)
Planning Time: 0.146 ms
Trigger for constraint f_i_fkey on p: time=169.628 calls=9
Trigger for constraint f_i_fkey on f: time=124.079 calls=2
Execution Time: 456.072 ms
Cascaded UPDATE - a single row
==============================
master:
UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 4.858 ms
patched:
UPDATE p SET i = i - 16384 WHERE i=32767;
UPDATE 1
Time: 11.955 ms
[1] https://commitfest.postgresql.org/22/1975/
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
On Wed, Apr 8, 2020 at 1:06 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
st 8. 4. 2020 v 18:36 odesílatel Antonin Houska <ah@cybertec.at> napsal:After having reviewed [1] more than a year ago (the problem I found was that
the transient table is not available for deferred constraints), I've tried to
implement the same in an alternative way. The RI triggers still work as row
level triggers, but if multiple events of the same kind appear in the queue,
they are all passed to the trigger function at once. Thus the check query does
not have to be executed that frequently.
I'm excited that you picked this up!
Some performance comparisons are below. (Besides the execution time, please
note the difference in the number of trigger function executions.) In general,
the checks are significantly faster if there are many rows to process, and a
bit slower when we only need to check a single row. However I'm not sure about
the accuracy if only a single row is measured (if a single row check is
performed several times, the execution time appears to fluctuate).
These numbers are very promising, and much more in line with my initial expectations. Obviously the impact on single-row DML is of major concern, though.
It is hard task to choose good strategy for immediate constraints, but for deferred constraints you know how much rows should be checked, and then you can choose better strategy.
Is possible to use estimation for choosing method of RI checks?
In doing my initial attempt, the feedback I was getting was that the people who truly understood the RI checks fell into the following groups:
1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers
1. people who wanted to remove the SPI calls from the triggers
2. people who wanted to completely refactor RI to not use triggers
3. people who wanted to completely refactor triggers
While #3 is clearly beyond the scope for an endeavor like this, #1 seems like it would nearly eliminate the 1-row penalty (we'd still have the TupleStore initi penalty, but it would just be a handy queue structure, and maybe that cost would be offset by removing the SPI overhead), and once that is done, we could see about step #2.
Pavel Stehule <pavel.stehule@gmail.com> wrote: >> st 8. 4. 2020 v 18:36 odesílatel Antonin Houska <ah@cybertec.at> napsal: > >> Some performance comparisons are below. (Besides the execution time, please >> note the difference in the number of trigger function executions.) In general, >> the checks are significantly faster if there are many rows to process, and a >> bit slower when we only need to check a single row. However I'm not sure about >> the accuracy if only a single row is measured (if a single row check is >> performed several times, the execution time appears to fluctuate). > > It is hard task to choose good strategy for immediate constraints, but for > deferred constraints you know how much rows should be checked, and then you > can choose better strategy. > > Is possible to use estimation for choosing method of RI checks? The exact number of rows ("batch size") is always known before the query is executed. So one problem to solve is that, when only one row is affected, we need to convince the planner that the "transient table" really contains a single row. Otherwise it can, for example, produce a hash join where the hash eventually contains a single row. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Corey Huinker <corey.huinker@gmail.com> wrote: > These numbers are very promising, and much more in line with my initial > expectations. Obviously the impact on single-row DML is of major concern, > though. Yes, I agree. > In doing my initial attempt, the feedback I was getting was that the people > who truly understood the RI checks fell into the following groups: > 1. people who wanted to remove the SPI calls from the triggers > 2. people who wanted to completely refactor RI to not use triggers > 3. people who wanted to completely refactor triggers > > While #3 is clearly beyond the scope for an endeavor like this, #1 seems > like it would nearly eliminate the 1-row penalty (we'd still have the > TupleStore initi penalty, but it would just be a handy queue structure, and > maybe that cost would be offset by removing the SPI overhead), I can imagine removal of the SPI from the current implementation (and constructing the plans "manually"), but note that the queries I use in my patch are no longer that trivial. So the SPI makes sense to me because it ensures regular query planning. As for the tuplestore, I'm not sure the startup cost is a problem: if you're concerned about the 1-row case, the row should usually be stored in memory. > and once that is done, we could see about step #2. As I said during my review of your patch last year, I think the RI semantics has too much in common with that of triggers. I'd need more info to imagine such a change. -- Antonin Houska Web: https://www.cybertec-postgresql.com
I can imagine removal of the SPI from the current implementation (and
constructing the plans "manually"), but note that the queries I use in my
patch are no longer that trivial. So the SPI makes sense to me because it
ensures regular query planning.
As an intermediate step, in the case where we have one row, it should be simple enough to extract that row manually, and do an SPI call with fixed values rather than the join to the ephemeral table, yes?
As for the tuplestore, I'm not sure the startup cost is a problem: if you're
concerned about the 1-row case, the row should usually be stored in memory.
> and once that is done, we could see about step #2.
As I said during my review of your patch last year, I think the RI semantics
has too much in common with that of triggers. I'd need more info to imagine
such a change.
As a general outline, I think that DML would iterate over the 2 sets of potentially relevant RI definitions rather than iterating over the triggers.
The similarities between RI and general triggers are obvious, which explains why they went that route initially, but they're also a crutch, but since all RI operations boil down to either an iteration over a tuplestore to do lookups in an index (when checking for referenced rows), or a hash join of the transient data against the un-indexed table when checking for referencing rows, and people who know this stuff far better than me seem to think that SPI overhead is best avoided when possible. I'm looking forward to having more time to spend on this.
On 2020-Apr-20, Corey Huinker wrote: > > I can imagine removal of the SPI from the current implementation (and > > constructing the plans "manually"), but note that the queries I use in my > > patch are no longer that trivial. So the SPI makes sense to me because it > > ensures regular query planning. > > As an intermediate step, in the case where we have one row, it should be > simple enough to extract that row manually, and do an SPI call with fixed > values rather than the join to the ephemeral table, yes? I do wonder if the RI stuff would actually end up being faster without SPI. If not, we'd only end up writing more code to do the same thing. Now that tables can be partitioned, it is much more of a pain than when only regular tables could be supported. Obviously without SPI you wouldn't *have* to go through the planner, which might be a win in itself if the execution tree to use were always perfectly clear ... but now that the queries get more complex per partitioning and this optimization, is it? You could remove the crosscheck_snapshot feature from SPI, I suppose, but that's not that much code. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I do wonder if the RI stuff would actually end up being faster without > SPI. If not, we'd only end up writing more code to do the same thing. > Now that tables can be partitioned, it is much more of a pain than when > only regular tables could be supported. Obviously without SPI you > wouldn't *have* to go through the planner, which might be a win in > itself if the execution tree to use were always perfectly clear ... but > now that the queries get more complex per partitioning and this > optimization, is it? AFAIK, we do not have any code besides the planner that is capable of building a plan tree at all, and I'd be pretty hesitant to try to create such; those things are complicated. It'd likely only make sense to bypass the planner if the required work is predictable enough that you don't need a plan tree at all, but can just hard-wire what has to be done. That seems a bit unlikely in the presence of partitioning. Instead of a plan tree, you could build a parse tree to pass through the planner, rather than building a SQL statement string that has to be parsed. The code jumps through enough hoops to make sure the string will be parsed "just so" that this might net out to about an equal amount of code in ri_triggers.c, and it'd save a nontrivial amount of parsing work. But you'd have to abandon SPI, probably, or at least it's not clear how much that'd be doing for you anymore. regards, tom lane
Hi, On 2020-04-21 11:34:54 -0400, Alvaro Herrera wrote: > On 2020-Apr-20, Corey Huinker wrote: > > > > I can imagine removal of the SPI from the current implementation (and > > > constructing the plans "manually"), but note that the queries I use in my > > > patch are no longer that trivial. So the SPI makes sense to me because it > > > ensures regular query planning. > > > > As an intermediate step, in the case where we have one row, it should be > > simple enough to extract that row manually, and do an SPI call with fixed > > values rather than the join to the ephemeral table, yes? > > I do wonder if the RI stuff would actually end up being faster without > SPI. I would suspect so. How much is another question. I assume that with constructing plans "manually" you don't mean to create a plan tree, but to invoke parser/planner directly? I think that'd likely be better than going through SPI, and there's precedent too. But honestly, my gut feeling is that for a lot of cases it'd be best just bypass parser, planner *and* executor. And just do manual systable_beginscan() style checks. For most cases we exactly know what plan shape we expect, and going through the overhead of creating a query string, parsing, planning, caching the previous steps, and creating an executor tree for every check is a lot. Even just the amount of memory for caching the plans can be substantial. Side note: I for one would appreciate a setting that just made all RI actions requiring a seqscan error out... > If not, we'd only end up writing more code to do the same thing. Now > that tables can be partitioned, it is much more of a pain than when > only regular tables could be supported. Obviously without SPI you > wouldn't *have* to go through the planner, which might be a win in > itself if the execution tree to use were always perfectly clear > ... but now that the queries get more complex per partitioning and > this optimization, is it? I think it's actually a good case where we will commonly be able to do *better* than generic planning. The infrastructure for efficient partition pruning exists (for COPY etc) - but isn't easily applicable to generic plans. Greetings, Andres Freund
Hi, On 2020-04-21 16:14:53 -0400, Tom Lane wrote: > AFAIK, we do not have any code besides the planner that is capable of > building a plan tree at all, and I'd be pretty hesitant to try to create > such; those things are complicated. I suspect what was meant was not to create the plan tree directly, but to bypass SPI when creating the plan / executing the query. IMO SPI for most uses in core PG really adds more complication and overhead than warranted. The whole concept of having a global tuptable, a stack and xact.c integration to repair that design defficiency... The hiding of what happens behind a pretty random set of different abstractions. That all makes it appear as if SPI did something super complicated, but it really doesn't. It just is a bad and over-complicated abstraction layer. Greetings, Andres Freund
On 2020-Apr-22, Andres Freund wrote: > I assume that with constructing plans "manually" you don't mean to > create a plan tree, but to invoke parser/planner directly? I think > that'd likely be better than going through SPI, and there's precedent > too. Well, I was actually thinking in building ready-made execution trees, bypassing the planner altogether. But apparently no one thinks that this is a good idea, and we don't have any code that does that already, so maybe it's not a great idea. However: > But honestly, my gut feeling is that for a lot of cases it'd be best > just bypass parser, planner *and* executor. And just do manual > systable_beginscan() style checks. For most cases we exactly know what > plan shape we expect, and going through the overhead of creating a query > string, parsing, planning, caching the previous steps, and creating an > executor tree for every check is a lot. Even just the amount of memory > for caching the plans can be substantial. Avoiding the executor altogether scares me, but I can't say exactly why. Foe example, you couldn't use foreign tables at either side of the FK -- but we don't allow FKs on those tables and we'd have to use some specific executor node for such a thing anyway. So this not a real argument against going that route. > Side note: I for one would appreciate a setting that just made all RI > actions requiring a seqscan error out... Hmm, interesting thought. I guess there are actual cases where it's not strictly necessary, for example where the referencing table is really tiny -- not the *referenced* table, note, since you need the UNIQUE index on that side in any case. I suppose that's not a really interesting case. I don't think this is implementable when going through SPI. > I think it's actually a good case where we will commonly be able to do > *better* than generic planning. The infrastructure for efficient > partition pruning exists (for COPY etc) - but isn't easily applicable to > generic plans. True. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Well, I was actually thinking in building ready-made execution trees, > bypassing the planner altogether. But apparently no one thinks that > this is a good idea, and we don't have any code that does that already, > so maybe it's not a great idea. If it's any consolation, I had the same idea very recently while chatting with Amit Langote. Maybe it's a bad idea, but you're not the only one who had it. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2020-04-22 13:46:22 -0400, Robert Haas wrote: > On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Well, I was actually thinking in building ready-made execution trees, > > bypassing the planner altogether. But apparently no one thinks that > > this is a good idea, and we don't have any code that does that already, > > so maybe it's not a great idea. I was commenting on what I understood Corey to say, but was fairly unclear about it. But I'm also far from sure that I understood Corey correctly... > If it's any consolation, I had the same idea very recently while > chatting with Amit Langote. Maybe it's a bad idea, but you're not the > only one who had it. :-) That seems extremely hard, given our current infrastructure. I think there's actually a good case to be made for the idea in the abstract, but ... The amount of logic the ExecInit* routines have is substantial, the state they set up ss complicates. A lot of nodes have state that is private to their .c files. All executor nodes reference the corresponding Plan nodes, so you also need to mock up those. Greetings, Andres Freund
Hi, On 2020-04-08 13:55:55 -0400, Corey Huinker wrote: > In doing my initial attempt, the feedback I was getting was that the people > who truly understood the RI checks fell into the following groups: > 1. people who wanted to remove the SPI calls from the triggers > 2. people who wanted to completely refactor RI to not use triggers > 3. people who wanted to completely refactor triggers FWIW, for me these three are largely independent avenues: WRT 1: There's a lot of benefit in reducing the per-call overhead of RI. Not going through SPI is one way to do that. Even if RI were not to use triggers, we'd still want to reduce the per-statement costs. WRT 2: Not using the generic per-row trigger framework for RI has significant benefits too - checking multiple rows at once, deduplicating repeated checks, reducing the per-row storage overhead ... WRT 3: Fairly obviously improving the generic trigger code (more efficient fetching of tuple versions, spilling etc) would have benefits entirely independent of other RI improvements. Greetings, Andres Freund
On Wed, Apr 22, 2020 at 2:36 PM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-04-22 13:46:22 -0400, Robert Haas wrote:
> On Wed, Apr 22, 2020 at 1:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Well, I was actually thinking in building ready-made execution trees,
> > bypassing the planner altogether. But apparently no one thinks that
> > this is a good idea, and we don't have any code that does that already,
> > so maybe it's not a great idea.
I was commenting on what I understood Corey to say, but was fairly
unclear about it. But I'm also far from sure that I understood Corey
correctly...
I was unclear because, even after my failed foray into statement level triggers for RI checks, I'm still pretty inexperienced in this area.
I'm just happy that it's being discussed.
On Wed, Apr 22, 2020 at 2:36 PM Andres Freund <andres@anarazel.de> wrote: > > If it's any consolation, I had the same idea very recently while > > chatting with Amit Langote. Maybe it's a bad idea, but you're not the > > only one who had it. :-) > > That seems extremely hard, given our current infrastructure. I think > there's actually a good case to be made for the idea in the abstract, > but ... The amount of logic the ExecInit* routines have is substantial, > the state they set up ss complicates. A lot of nodes have state that is > private to their .c files. All executor nodes reference the > corresponding Plan nodes, so you also need to mock up those. Right -- the idea I was talking about was to create a Plan tree without using the main planner. So it wouldn't bother costing an index scan on each index, and a sequential scan, on the target table - it would just make an index scan plan, or maybe an index path that it would then convert to an index plan. Or something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Right -- the idea I was talking about was to create a Plan tree > without using the main planner. So it wouldn't bother costing an index > scan on each index, and a sequential scan, on the target table - it > would just make an index scan plan, or maybe an index path that it > would then convert to an index plan. Or something like that. Consing up a Path tree and then letting create_plan() make it into an executable plan might not be a terrible idea. There's a whole boatload of finicky details that you could avoid that way, like everything in setrefs.c. But it's not entirely clear to me that we know the best plan for a statement-level RI action with sufficient certainty to go that way. Is it really the case that the plan would not vary based on how many tuples there are to check, for example? If we *do* know exactly what should happen, I'd tend to lean towards Andres' idea that we shouldn't be using the executor at all, but just hard-wiring stuff at the level of "do these table scans". Also, it's definitely not the case that create_plan() has an API that's so clean that you would be able to use it without major hassles. You'd still have to generate a pile of lookalike planner data structures, and make sure that expression subtrees have been fed through eval_const_expressions(), etc etc. On the whole I still think that generating a Query tree and then letting the planner do its thing might be the best approach. regards, tom lane
On Wed, Apr 22, 2020 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > But it's not entirely clear to me that we know the best plan for a > statement-level RI action with sufficient certainty to go that way. > Is it really the case that the plan would not vary based on how > many tuples there are to check, for example? If we *do* know > exactly what should happen, I'd tend to lean towards Andres' > idea that we shouldn't be using the executor at all, but just > hard-wiring stuff at the level of "do these table scans". Well, I guess I'd naively think we want an index scan on a plain table. It is barely possible that in some corner case a sequential scan would be faster, but could it be enough faster to save the cost of planning? I doubt it, but I just work here. On a partitioning hierarchy we want to figure out which partition is relevant for the value we're trying to find, and then scan that one. I'm not sure there are any other cases. We have to have a UNIQUE constraint or we can't be referencing this target table. So it can't be a plain inheritance hierarchy, nor (I think) a foreign table. > Also, it's definitely not the case that create_plan() has an API > that's so clean that you would be able to use it without major > hassles. You'd still have to generate a pile of lookalike planner > data structures, and make sure that expression subtrees have been > fed through eval_const_expressions(), etc etc. Yeah, that's annoying. > On the whole I still think that generating a Query tree and then > letting the planner do its thing might be the best approach. Maybe, but it seems awfully heavy-weight. Once you go into the planner it's pretty hard to avoid considering indexes we don't care about, bitmap scans we don't care about, a sequential scan we don't care about, etc. You'd certainly save something just from avoiding parsing, but planning's pretty expensive too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > Right -- the idea I was talking about was to create a Plan tree > > without using the main planner. So it wouldn't bother costing an index > > scan on each index, and a sequential scan, on the target table - it > > would just make an index scan plan, or maybe an index path that it > > would then convert to an index plan. Or something like that. > > Consing up a Path tree and then letting create_plan() make it into > an executable plan might not be a terrible idea. There's a whole > boatload of finicky details that you could avoid that way, like > everything in setrefs.c. > > But it's not entirely clear to me that we know the best plan for a > statement-level RI action with sufficient certainty to go that way. > Is it really the case that the plan would not vary based on how > many tuples there are to check, for example? I'm concerned about that too. With my patch the checks become a bit slower if only a single row is processed. The problem seems to be that the planner is not entirely convinced about that the number of input rows, so it can still build a plan that expects many rows. For example (as I mentioned elsewhere in the thread), a hash join where the hash table only contains one tuple. Or similarly a sort node for a single input tuple. -- Antonin Houska Web: https://www.cybertec-postgresql.com
čt 23. 4. 2020 v 7:06 odesílatel Antonin Houska <ah@cybertec.at> napsal:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Right -- the idea I was talking about was to create a Plan tree
> > without using the main planner. So it wouldn't bother costing an index
> > scan on each index, and a sequential scan, on the target table - it
> > would just make an index scan plan, or maybe an index path that it
> > would then convert to an index plan. Or something like that.
>
> Consing up a Path tree and then letting create_plan() make it into
> an executable plan might not be a terrible idea. There's a whole
> boatload of finicky details that you could avoid that way, like
> everything in setrefs.c.
>
> But it's not entirely clear to me that we know the best plan for a
> statement-level RI action with sufficient certainty to go that way.
> Is it really the case that the plan would not vary based on how
> many tuples there are to check, for example?
I'm concerned about that too. With my patch the checks become a bit slower if
only a single row is processed. The problem seems to be that the planner is
not entirely convinced about that the number of input rows, so it can still
build a plan that expects many rows. For example (as I mentioned elsewhere in
the thread), a hash join where the hash table only contains one tuple. Or
similarly a sort node for a single input tuple.
without statistics the planner expect about 2000 rows table , no?
Pavel
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Pavel Stehule <pavel.stehule@gmail.com> wrote: > čt 23. 4. 2020 v 7:06 odesílatel Antonin Houska <ah@cybertec.at> napsal: > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > But it's not entirely clear to me that we know the best plan for a > > statement-level RI action with sufficient certainty to go that way. > > Is it really the case that the plan would not vary based on how > > many tuples there are to check, for example? > > I'm concerned about that too. With my patch the checks become a bit slower if > only a single row is processed. The problem seems to be that the planner is > not entirely convinced about that the number of input rows, so it can still > build a plan that expects many rows. For example (as I mentioned elsewhere in > the thread), a hash join where the hash table only contains one tuple. Or > similarly a sort node for a single input tuple. > > without statistics the planner expect about 2000 rows table , no? I think that at some point it estimates the number of rows from the number of table pages, but I don't remember details. I wanted to say that if we constructed the plan "manually", we'd need at least two substantially different variants: one to check many rows and the other to check a single row. -- Antonin Houska Web: https://www.cybertec-postgresql.com
čt 23. 4. 2020 v 8:28 odesílatel Antonin Houska <ah@cybertec.at> napsal:
Pavel Stehule <pavel.stehule@gmail.com> wrote:
> čt 23. 4. 2020 v 7:06 odesílatel Antonin Houska <ah@cybertec.at> napsal:
>
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > But it's not entirely clear to me that we know the best plan for a
> > statement-level RI action with sufficient certainty to go that way.
> > Is it really the case that the plan would not vary based on how
> > many tuples there are to check, for example?
>
> I'm concerned about that too. With my patch the checks become a bit slower if
> only a single row is processed. The problem seems to be that the planner is
> not entirely convinced about that the number of input rows, so it can still
> build a plan that expects many rows. For example (as I mentioned elsewhere in
> the thread), a hash join where the hash table only contains one tuple. Or
> similarly a sort node for a single input tuple.
>
> without statistics the planner expect about 2000 rows table , no?
I think that at some point it estimates the number of rows from the number of
table pages, but I don't remember details.
I wanted to say that if we constructed the plan "manually", we'd need at least
two substantially different variants: one to check many rows and the other to
check a single row.
There can be more variants - a hash join should not be good enough for bigger data.
The overhead of RI is too big, so I think any solution that will be faster then current and can be inside Postgres 14 can be perfect.
But when you know so input is only one row, you can build a query without join
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Apr 22, 2020 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > But it's not entirely clear to me that we know the best plan for a > > statement-level RI action with sufficient certainty to go that way. > > Is it really the case that the plan would not vary based on how > > many tuples there are to check, for example? If we *do* know > > exactly what should happen, I'd tend to lean towards Andres' > > idea that we shouldn't be using the executor at all, but just > > hard-wiring stuff at the level of "do these table scans". > > Well, I guess I'd naively think we want an index scan on a plain > table. It is barely possible that in some corner case a sequential > scan would be faster, but could it be enough faster to save the cost > of planning? I doubt it, but I just work here. > > On a partitioning hierarchy we want to figure out which partition is > relevant for the value we're trying to find, and then scan that one. > > I'm not sure there are any other cases. We have to have a UNIQUE > constraint or we can't be referencing this target table. So it can't > be a plain inheritance hierarchy, nor (I think) a foreign table. In the cases where we have a UNIQUE constraint, and therefore a clear index to use, I tend to agree that we should just be getting to it and avoiding the planner/executor, as Andres suggest. I'm not super thrilled about the idea of throwing an ERROR when we haven't got an index to use though, and we don't require an index on the referring side, meaning that, with such a change, a DELETE or UPDATE on the referred table with an ON CASCADE FK will just start throwing errors. That's not terribly friendly, even if it's not really best practice to not have an index to help with those cases. I'd hope that we would at least teach pg_upgrade to look for such cases and throw errors (though maybe that could be downgraded to a WARNING with a flag..?) if it finds any when upgrading, so that users don't upgrade and then suddenly start getting errors for simple statements that used to work just fine. > > On the whole I still think that generating a Query tree and then > > letting the planner do its thing might be the best approach. > > Maybe, but it seems awfully heavy-weight. Once you go into the planner > it's pretty hard to avoid considering indexes we don't care about, > bitmap scans we don't care about, a sequential scan we don't care > about, etc. You'd certainly save something just from avoiding > parsing, but planning's pretty expensive too. Agreed. Thanks, Stephen
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Apr 22, 2020 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> But it's not entirely clear to me that we know the best plan for a >> statement-level RI action with sufficient certainty to go that way. > Well, I guess I'd naively think we want an index scan on a plain > table. It is barely possible that in some corner case a sequential > scan would be faster, but could it be enough faster to save the cost > of planning? I doubt it, but I just work here. I think we're failing to communicate here. I agree that if the goal is simply to re-implement what the RI triggers currently do --- that is, retail one-row-at-a-time checks --- then we could probably dispense with all the parser/planner/executor overhead and directly implement an indexscan using an API at about the level genam.c provides. (The issue of whether it's okay to require an index to be available is annoying, but we could always fall back to the old ways if one is not.) However, what I thought this thread was about was switching to statement-level RI checking. At that point, what we're talking about is performing a join involving a not-known-in-advance number of tuples on each side. If you think you can hard-wire the choice of join technology and have it work well all the time, I'm going to say with complete confidence that you are wrong. The planner spends huge amounts of effort on that and still doesn't always get it right ... but it does better than a hard-wired choice would do. Maybe there's room to pursue both things --- you could imagine, perhaps, looking at the planner's estimate of number of affected rows at executor startup and deciding from that whether to fire per-row or per-statement RI triggers. But we're really going to want different implementations within those two types of triggers. regards, tom lane
On Thu, Apr 23, 2020 at 2:18 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2020-Apr-22, Andres Freund wrote: > > I assume that with constructing plans "manually" you don't mean to > > create a plan tree, but to invoke parser/planner directly? I think > > that'd likely be better than going through SPI, and there's precedent > > too. > > Well, I was actually thinking in building ready-made execution trees, > bypassing the planner altogether. But apparently no one thinks that > this is a good idea, and we don't have any code that does that already, > so maybe it's not a great idea. We do have an instance in validateForeignKeyConstraint() of "manually" enforcing RI: If RI_Initial_Check() (a relatively complex query) cannot be performed, the referencing table is scanned manually and each tuple thus found is looked up in the referenced table by using RI_FKey_check_ins(), a simpler query. Ironically though, RI_Initial_Check() is to short-circuit the manual algorithm. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 23, 2020 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think we're failing to communicate here. I agree that if the goal > is simply to re-implement what the RI triggers currently do --- that > is, retail one-row-at-a-time checks --- then we could probably dispense > with all the parser/planner/executor overhead and directly implement > an indexscan using an API at about the level genam.c provides. > (The issue of whether it's okay to require an index to be available is > annoying, but we could always fall back to the old ways if one is not.) > > However, what I thought this thread was about was switching to > statement-level RI checking. At that point, what we're talking > about is performing a join involving a not-known-in-advance number > of tuples on each side. If you think you can hard-wire the choice > of join technology and have it work well all the time, I'm going to > say with complete confidence that you are wrong. The planner spends > huge amounts of effort on that and still doesn't always get it right > ... but it does better than a hard-wired choice would do. Oh, yeah. If we're talking about that, then getting by without using the planner doesn't seem feasible. Sorry, I guess I didn't read the thread carefully enough. As you say, perhaps there's room for both things, but also as you say, it's not obvious how to decide intelligently between them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Thu, Apr 23, 2020 at 10:35 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I think we're failing to communicate here. I agree that if the goal > > is simply to re-implement what the RI triggers currently do --- that > > is, retail one-row-at-a-time checks --- then we could probably dispense > > with all the parser/planner/executor overhead and directly implement > > an indexscan using an API at about the level genam.c provides. > > (The issue of whether it's okay to require an index to be available is > > annoying, but we could always fall back to the old ways if one is not.) > > > > However, what I thought this thread was about was switching to > > statement-level RI checking. At that point, what we're talking > > about is performing a join involving a not-known-in-advance number > > of tuples on each side. If you think you can hard-wire the choice > > of join technology and have it work well all the time, I'm going to > > say with complete confidence that you are wrong. The planner spends > > huge amounts of effort on that and still doesn't always get it right > > ... but it does better than a hard-wired choice would do. > > Oh, yeah. If we're talking about that, then getting by without using > the planner doesn't seem feasible. Sorry, I guess I didn't read the > thread carefully enough. Yeah, I had been thinking about what we might do with the existing row-level RI checks too. If we're able to get statement-level without much impact on the single-row-statement case then that's certainly interesting, although it sure feels like we're ending up with a lot left on the table. > As you say, perhaps there's room for both things, but also as you say, > it's not obvious how to decide intelligently between them. The single-row case seems pretty clear and also seems common enough that it'd be worth paying the cost to figure out if it's a single-row statement or not. Perhaps we start with row-level for the first row, implemented directly using an index lookup, and when we hit some threshold (maybe even just "more than one") switch to using the transient table and queue'ing the rest to check at the end. What bothers me the most about this approach (though, to be clear, I think we should still pursue it) is the risk that we might end up picking a spectacularly bad plan that ends up taking a great deal more time than the index-probe based approach we almost always have today. If we limit that impact to only cases where >1 row is involved, then that's certainly better (though maybe we'll need a GUC for this anyway..? If we had the single-row approach + the statement-level one, presumably the GUC would just make us always take the single-row method, so it hopefully wouldn't be too grotty to have). Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Robert Haas (robertmhaas@gmail.com) wrote: >> As you say, perhaps there's room for both things, but also as you say, >> it's not obvious how to decide intelligently between them. > The single-row case seems pretty clear and also seems common enough that > it'd be worth paying the cost to figure out if it's a single-row > statement or not. That seems hard to do in advance ... but it would be easy to code a statement-level AFTER trigger along the lines of if (transition table contains one row) // fast special case here else // slow general case here. I think the question really comes down to this: is the per-row overhead of the transition-table mechanism comparable to that of the AFTER trigger queue? Or if not, can we make it so? regards, tom lane
Hi, On 2020-04-28 10:44:58 -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Robert Haas (robertmhaas@gmail.com) wrote: > >> As you say, perhaps there's room for both things, but also as you say, > >> it's not obvious how to decide intelligently between them. > > > The single-row case seems pretty clear and also seems common enough that > > it'd be worth paying the cost to figure out if it's a single-row > > statement or not. It's not that obvious to me that it's going to be beneficial to go down a planned path in all that many cases. If all that the RI check does is a index_rescan() && index_getnext_slot(), there's not that many realistic types of plans that are going to be better. IIUC a query to check a transition table would, simplified, boil down to either: SELECT * FROM transition_table tt WHERE -- for a insertion/update into the referencing table and ( NOT EXISTS (SELECT * FROM referenced_table rt WHERE rt.referenced_column = tt.referencing_column) [AND ... , ] ) -- for update / delete of referenced table OR EXISTS (SELECT * FROM referencing_table rt1 WHERE rt1.referencing_column = tt.referenced_column1) [OR ... , ] LIMIT 1; Where a returned row would signal an error. But it would need to handle row locking, CASCADE/SET NULL/SET DEFAULT etc. More on that below. While it's tempting to want to write the latter check as -- for update / delete of referenced table SELECT * FROM referencing_table rt WHERE referencing_column IN (SELECT referenced_column FROM transition_table tt) LIMIT 1; that'd make it harder to know the violating row. As the transition table isn't ordered it seems like there's not that many realistic ways to execute this: 1) A nestloop semi/anti-join with an inner index scan 2) Sort transition table, do a merge semi/anti-join between sort and an ordered index scan on the referenced / referencing table(s). 3) Hash semi/anti-join, requiring a full table scan of the tables I think 1) would be worse than just doing the indexscan manually. 2) would probably be beneficial if there's a lot of rows on the inner side, due to the ordered access and deduplication. 3) would sometimes be beneficial because it'd avoid an index scan for each tuple in the transition table. The cases in which it is clear to me that a bulk check could theoretically be significantly better than a fast per-row check are: 1) There's a *lot* of rows in the transition table to comparatively small referenced / referencing tables. As those tables can cheaply be hashed, a hashjoin will be be more efficient than doing a index lookup for each transition table row. 2) There's a lot of duplicate content in the transition table. E.g. because there's a lot of references to the same row. Did I forget important ones? With regard to the row locking etc that I elided above: I think that actually will prevent most if not all interesting optimizations: Because of the FOR KEY SHARE that's required, the planner plan will pretty much always degrade to a per row subplan check anyway. Is there any formulation of the necessary queries that don't suffer from this problem? > That seems hard to do in advance ... but it would be easy to code > a statement-level AFTER trigger along the lines of > > if (transition table contains one row) > // fast special case here > else > // slow general case here. I suspect we'd need something more complicated than this for it to be beneficial. My gut feeling would be that the case where a transition table style check would be most commonly beneficial is when you have a very small referenced table, and a *lot* of rows get inserted. But clearly we wouldn't want to have bulk insertion suddenly also store all rows in a transition table. Nor would we want to have a bulk UPDATE cause all the updated rows to be stored in the transition table, even though none of the relevant columns changed (i.e. the RI_FKey_[f]pk_upd_check_required logic in AfterTriggerSaveEvent()). I still don't quite see how shunting RI checks through triggers saves us more than it costs: Especially for the stuff we do as AFTER: Most of the time we could do the work we defer till query end immediately, rather than building up an in-memory queue. Besides saving memory, in a lot of cases that'd also make it unnecessary to refetch the row at a later time, possibly needing to chase updated row versions. But even for the BEFORE checks, largely going through generic trigger code means it's much harder to batch checks without suddenly requiring memory proportional to the number of inserted rows. There obviously are cases where it's not possible to check just after each row. Deferrable constraints, as well as CASCADING / SET NOT NULL / SET DEFAULT on tables with user defined triggers, for example. But it'd likely be sensible to handle that in the way we already handle deferred uniqueness checks, i.e. we only queue something if there's a potential for a problem. > I think the question really comes down to this: is the per-row overhead of > the transition-table mechanism comparable to that of the AFTER trigger > queue? Or if not, can we make it so? It's probably more expensive, in some ways, at the moment. The biggest difference is that the transition table stores complete rows, valid as of the moment they've been inserted/updated/deleted, whereas the trigger queue only stores enough information to fetch the tuple again during trigger execution. Several RI checks however re-check visiblity before executing, so that's another fetch, that'd likely not be elided by a simple change to using transition tables. Both have significant downsides, obviously. Storing complete rows can take a lot more memory, and refetching rows is expensive, especially if it happens much later (with the row pushed out of shared_buffers potentially). I think it was a mistake to have these two different systems in trigger.c. When transition tables were added we shouldn't have kept per-tuple state both in the queue and in the transition tuplestore. Instead we should have only used the tuplestore, and optimized what information we store inside depending on the need of the various after triggers. Greetings, Andres Freund
Antonin Houska <ah@cybertec.at> wrote: > In general, the checks are significantly faster if there are many rows to > process, and a bit slower when we only need to check a single row. Attached is a new version that uses the existing simple queries if there's only one row to check. SPI is used for both single-row and bulk checks - as discussed in this thread, it can perhaps be replaced with a different approach if appears to be beneficial, at least for the single-row checks. I think using a separate query for the single-row check is more practicable than convincing the planner that the bulk-check query should only check a single row. So this patch version tries to show what it'd look like. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Attachment
- v02-0001-Check-for-RI-violation-outside-ri_PerformCheck.patch
- v02-0002-Changed-ri_GenerateQual-so-it-generates-the-whole-qu.patch
- v02-0003-Return-early-from-ri_NullCheck-if-possible.patch
- v02-0004-Introduce-infrastructure-for-batch-processing-RI-eve.patch
- v02-0005-Process-multiple-RI-trigger-events-at-a-time.patch
Hi, I was looking at this patch with Corey during a patch-review session. So these are basically our "combined" comments. On 2020-06-05 17:16:43 +0200, Antonin Houska wrote: > From 6c1cb8ae7fbf0a8122d8c6637c61b9915bc57223 Mon Sep 17 00:00:00 2001 > From: Antonin Houska <ah@cybertec.at> > Date: Fri, 5 Jun 2020 16:42:34 +0200 > Subject: [PATCH 1/5] Check for RI violation outside ri_PerformCheck(). Probably good to add a short comment to the commit explaining why you're doing this. The change makes sense to me. Unless somebody protests I think we should just apply it regardless of the rest of the series - the code seems clearer afterwards. > From 6b09e5598553c8e57b4ef9342912f51adb48f8af Mon Sep 17 00:00:00 2001 > From: Antonin Houska <ah@cybertec.at> > Date: Fri, 5 Jun 2020 16:42:34 +0200 > Subject: [PATCH 2/5] Changed ri_GenerateQual() so it generates the whole > qualifier. > > This way we can use the function to reduce the amount of copy&pasted code a > bit. > /* > - * ri_GenerateQual --- generate a WHERE clause equating two variables > + * ri_GenerateQual --- generate WHERE/ON clause. > + * > + * Note: to avoid unnecessary explicit casts, make sure that the left and > + * right operands match eq_oprs expect (ie don't swap the left and right > + * operands accidentally). > + */ > +static void > +ri_GenerateQual(StringInfo buf, char *sep, int nkeys, > + const char *ltabname, Relation lrel, > + const int16 *lattnums, > + const char *rtabname, Relation rrel, > + const int16 *rattnums, > + const Oid *eq_oprs, > + GenQualParams params, > + Oid *paramtypes) > +{ > + for (int i = 0; i < nkeys; i++) > + { > + Oid ltype = RIAttType(lrel, lattnums[i]); > + Oid rtype = RIAttType(rrel, rattnums[i]); > + Oid lcoll = RIAttCollation(lrel, lattnums[i]); > + Oid rcoll = RIAttCollation(rrel, rattnums[i]); > + char paramname[16]; > + char *latt, > + *ratt; > + char *sep_current = i > 0 ? sep : NULL; > + > + if (params != GQ_PARAMS_NONE) > + sprintf(paramname, "$%d", i + 1); > + > + if (params == GQ_PARAMS_LEFT) > + { > + latt = paramname; > + paramtypes[i] = ltype; > + } > + else > + latt = ri_ColNameQuoted(ltabname, RIAttName(lrel, lattnums[i])); > + > + if (params == GQ_PARAMS_RIGHT) > + { > + ratt = paramname; > + paramtypes[i] = rtype; > + } > + else > + ratt = ri_ColNameQuoted(rtabname, RIAttName(rrel, rattnums[i])); Why do we need support for having params on left or right side, instead of just having them on one side? > + ri_GenerateQualComponent(buf, sep_current, latt, ltype, eq_oprs[i], > + ratt, rtype); > + > + if (lcoll != rcoll) > + ri_GenerateQualCollation(buf, lcoll); > + } > +} > + > +/* > + * ri_GenerateQual --- generate a component of WHERE/ON clause equating two > + * variables, to be AND-ed to the other components. > * > * This basically appends " sep leftop op rightop" to buf, adding casts > * and schema qualification as needed to ensure that the parser will select > @@ -1828,17 +1802,86 @@ quoteRelationName(char *buffer, Relation rel) > * if they aren't variables or parameters. > */ > static void > -ri_GenerateQual(StringInfo buf, > - const char *sep, > - const char *leftop, Oid leftoptype, > - Oid opoid, > - const char *rightop, Oid rightoptype) > +ri_GenerateQualComponent(StringInfo buf, > + const char *sep, > + const char *leftop, Oid leftoptype, > + Oid opoid, > + const char *rightop, Oid rightoptype) > { > - appendStringInfo(buf, " %s ", sep); > + if (sep) > + appendStringInfo(buf, " %s ", sep); > generate_operator_clause(buf, leftop, leftoptype, opoid, > rightop, rightoptype); > } Why is this handled inside ri_GenerateQualComponent() instead of ri_GenerateQual()? Especially because the latter now has code to pass in a different sep into ri_GenerateQualComponent(). > +/* > + * ri_ColNameQuoted() --- return column name, with both table and column name > + * quoted. > + */ > +static char * > +ri_ColNameQuoted(const char *tabname, const char *attname) > +{ > + char quoted[MAX_QUOTED_NAME_LEN]; > + StringInfo result = makeStringInfo(); > + > + if (tabname && strlen(tabname) > 0) > + { > + quoteOneName(quoted, tabname); > + appendStringInfo(result, "%s.", quoted); > + } > + > + quoteOneName(quoted, attname); > + appendStringInfoString(result, quoted); > + > + return result->data; > +} Why does this new function accept a NULL / zero length string? I guess that's because we currently don't qualify in all places? > +/* > + * Check that RI trigger function was called in expected context > + */ > +static void > +ri_CheckTrigger(FunctionCallInfo fcinfo, const char *funcname, int tgkind) > +{ > + TriggerData *trigdata = (TriggerData *) fcinfo->context; > + > + if (!CALLED_AS_TRIGGER(fcinfo)) > + ereport(ERROR, > + (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > + errmsg("function \"%s\" was not called by trigger manager", funcname))); > + > + /* > + * Check proper event > + */ > + if (!TRIGGER_FIRED_AFTER(trigdata->tg_event) || > + !TRIGGER_FIRED_FOR_ROW(trigdata->tg_event)) > + ereport(ERROR, > + (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > + errmsg("function \"%s\" must be fired AFTER ROW", funcname))); > + > + switch (tgkind) > + { > + case RI_TRIGTYPE_INSERT: > + if (!TRIGGER_FIRED_BY_INSERT(trigdata->tg_event)) > + ereport(ERROR, > + (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > + errmsg("function \"%s\" must be fired for INSERT", funcname))); > + break; > + case RI_TRIGTYPE_UPDATE: > + if (!TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event)) > + ereport(ERROR, > + (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > + errmsg("function \"%s\" must be fired for UPDATE", funcname))); > + break; > + > + case RI_TRIGTYPE_DELETE: > + if (!TRIGGER_FIRED_BY_DELETE(trigdata->tg_event)) > + ereport(ERROR, > + (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), > + errmsg("function \"%s\" must be fired for DELETE", funcname))); > + break; > + } > +} > + Why did you move this around, as part of this commit? > From 208c733d759592402901599446b4f7e7197c1777 Mon Sep 17 00:00:00 2001 > From: Antonin Houska <ah@cybertec.at> > Date: Fri, 5 Jun 2020 16:42:34 +0200 > Subject: [PATCH 4/5] Introduce infrastructure for batch processing RI events. > > Separate storage is used for the RI trigger events because the "transient > table" that we provide to statement triggers would not be available for > deferred constraints. Also, the regular statement level trigger is not ideal > for the RI checks because it requires the query execution to complete before > the RI checks even start. On the other hand, if we use batches of row trigger > events, we only need to tune the batch size so that user gets RI violation > error rather soon. > > This patch only introduces the infrastructure, however the trigger function is > still called per event. This is just to reduce the size of the diffs. > --- > src/backend/commands/tablecmds.c | 68 +- > src/backend/commands/trigger.c | 406 ++++++-- > src/backend/executor/spi.c | 16 +- > src/backend/utils/adt/ri_triggers.c | 1385 +++++++++++++++++++-------- > src/include/commands/trigger.h | 25 + > 5 files changed, 1381 insertions(+), 519 deletions(-) My first comment here is that this is too large a change and should be broken up. I think there's also not enough explanation in here what the new design is. I can infer some of that from the code, but that's imo shifting work to the reviewer / reader unnecessarily. > +static void AfterTriggerExecuteRI(EState *estate, > + ResultRelInfo *relInfo, > + FmgrInfo *finfo, > + Instrumentation *instr, > + TriggerData *trig_last, > + MemoryContext batch_context); > static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid, > CmdType cmdType); > static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs); > @@ -3807,13 +3821,16 @@ afterTriggerDeleteHeadEventChunk(AfterTriggersQueryData *qs) > * fmgr lookup cache space at the caller level. (For triggers fired at > * the end of a query, we can even piggyback on the executor's state.) > * > - * event: event currently being fired. > + * event: event currently being fired. Pass NULL if the current batch of RI > + * trigger events should be processed. > * rel: open relation for event. > * trigdesc: working copy of rel's trigger info. > * finfo: array of fmgr lookup cache entries (one per trigger in trigdesc). > * instr: array of EXPLAIN ANALYZE instrumentation nodes (one per trigger), > * or NULL if no instrumentation is wanted. > + * trig_last: trigger info used for the last trigger execution. > * per_tuple_context: memory context to call trigger function in. > + * batch_context: memory context to store tuples for RI triggers. > * trig_tuple_slot1: scratch slot for tg_trigtuple (foreign tables only) > * trig_tuple_slot2: scratch slot for tg_newtuple (foreign tables only) > * ---------- > @@ -3824,39 +3841,55 @@ AfterTriggerExecute(EState *estate, > ResultRelInfo *relInfo, > TriggerDesc *trigdesc, > FmgrInfo *finfo, Instrumentation *instr, > + TriggerData *trig_last, > MemoryContext per_tuple_context, > + MemoryContext batch_context, > TupleTableSlot *trig_tuple_slot1, > TupleTableSlot *trig_tuple_slot2) > { > Relation rel = relInfo->ri_RelationDesc; > AfterTriggerShared evtshared = GetTriggerSharedData(event); > Oid tgoid = evtshared->ats_tgoid; > - TriggerData LocTriggerData = {0}; > HeapTuple rettuple; > - int tgindx; > bool should_free_trig = false; > bool should_free_new = false; > + bool is_new = false; > > - /* > - * Locate trigger in trigdesc. > - */ > - for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++) > + if (trig_last->tg_trigger == NULL) > { > - if (trigdesc->triggers[tgindx].tgoid == tgoid) > + int tgindx; > + > + /* > + * Locate trigger in trigdesc. > + */ > + for (tgindx = 0; tgindx < trigdesc->numtriggers; tgindx++) > { > - LocTriggerData.tg_trigger = &(trigdesc->triggers[tgindx]); > - break; > + if (trigdesc->triggers[tgindx].tgoid == tgoid) > + { > + trig_last->tg_trigger = &(trigdesc->triggers[tgindx]); > + trig_last->tgindx = tgindx; > + break; > + } > } > + if (trig_last->tg_trigger == NULL) > + elog(ERROR, "could not find trigger %u", tgoid); > + > + if (RI_FKey_trigger_type(trig_last->tg_trigger->tgfoid) != > + RI_TRIGGER_NONE) > + trig_last->is_ri_trigger = true; > + > + is_new = true; > } > - if (LocTriggerData.tg_trigger == NULL) > - elog(ERROR, "could not find trigger %u", tgoid); > + > + /* trig_last for non-RI trigger should always be initialized again. */ > + Assert(trig_last->is_ri_trigger || is_new); > > /* > * If doing EXPLAIN ANALYZE, start charging time to this trigger. We want > * to include time spent re-fetching tuples in the trigger cost. > */ > - if (instr) > - InstrStartNode(instr + tgindx); > + if (instr && !trig_last->is_ri_trigger) > + InstrStartNode(instr + trig_last->tgindx); I'm pretty unhappy about the amount of new infrastructure this adds to trigger.c. We're now going to have a third copy of the tuples (for a time). trigger.c is already a pretty complicated / archaic piece of infrastructure, and this patchset seems to make that even worse. We'll grow yet another separate representation of tuples, there's a lot new branches (less concerned about the runtime costs, more about the code complexity) etc. > +/* ---------- > + * Construct the query to check inserted/updated rows of the FK table. > + * > + * If "insert" is true, the rows are inserted, otherwise they are updated. > + * > + * The query string built is > + * SELECT t.fkatt1 [, ...] > + * FROM <tgtable> t LEFT JOIN LATERAL > + * (SELECT t.fkatt1 [, ...] > + * FROM [ONLY] <pktable> p > + * WHERE t.fkatt1 = p.pkatt1 [AND ...] > + * FOR KEY SHARE OF p) AS m > + * ON t.fkatt1 = m.fkatt1 [AND ...] > + * WHERE m.fkatt1 ISNULL > + * LIMIT 1 > + * Why do we need the lateral query here? Greetings, Andres Freund
On Fri, Jun 05, 2020 at 05:16:43PM +0200, Antonin Houska wrote: > Antonin Houska <ah@cybertec.at> wrote: > > > In general, the checks are significantly faster if there are many rows to > > process, and a bit slower when we only need to check a single row. > > Attached is a new version that uses the existing simple queries if there's > only one row to check. SPI is used for both single-row and bulk checks - as > discussed in this thread, it can perhaps be replaced with a different approach > if appears to be beneficial, at least for the single-row checks. > > I think using a separate query for the single-row check is more practicable > than convincing the planner that the bulk-check query should only check a > single row. So this patch version tries to show what it'd look like. I'm interested in testing this patch, however there's a lot of internals to digest. Are there any documentation updates or regression tests to add ? If FKs support "bulk" validation, users should know when that applies, and be able to check that it's working as intended. Even if the test cases are overly verbose or not stable, and not intended for commit, that would be a useful temporary addition. I think that calls=4 indicates this is using bulk validation. postgres=# begin; explain(analyze, timing off, costs off, summary off, verbose) DELETE FROM t WHERE i<999; rollback; BEGIN QUERY PLAN ----------------------------------------------------------------------- Delete on public.t (actual rows=0 loops=1) -> Index Scan using t_pkey on public.t (actual rows=998 loops=1) Output: ctid Index Cond: (t.i < 999) Trigger RI_ConstraintTrigger_a_16399 for constraint t_i_fkey: calls=4 I started thinking about this 1+ years ago wondering if a BRIN index could be used for (bulk) FK validation. So I would like to be able to see the *plan* for the query. I was able to show the plan and see that BRIN can be used like so: |SET auto_explain.log_nested_statements=on; SET client_min_messages=debug; SET auto_explain.log_min_duration=0; Should the plan be visible in explain (not auto-explain) ? BTW did you see this older thread ? https://www.postgresql.org/message-id/flat/CA%2BU5nMLM1DaHBC6JXtUMfcG6f7FgV5mPSpufO7GRnbFKkF2f7g%40mail.gmail.com -- Justin
On Sat, Sep 26, 2020 at 09:59:17PM -0500, Justin Pryzby wrote: > I'm interested in testing this patch, however there's a lot of internals to > digest. Even with that, the thread has been waiting on author for a couple of weeks now, so I have marke dthe entry as RwF. -- Michael
Attachment
Justin Pryzby <pryzby@telsasoft.com> wrote: > I'm interested in testing this patch, however there's a lot of internals to > digest. > > Are there any documentation updates or regression tests to add ? I'm not sure if user documentation should be changed unless a new GUC or statistics information is added. As for regression tests, perhaps in the next version of the patch. But right now I don't know how to implement the feature in a less invasive way (see the complaint by Andres in [1]), nor do I have enough time to work on the patch. > If FKs support "bulk" validation, users should know when that applies, and > be able to check that it's working as intended. Even if the test cases are > overly verbose or not stable, and not intended for commit, that would be a > useful temporary addition. > > I think that calls=4 indicates this is using bulk validation. > > postgres=# begin; explain(analyze, timing off, costs off, summary off, verbose) DELETE FROM t WHERE i<999; rollback; > BEGIN > QUERY PLAN > ----------------------------------------------------------------------- > Delete on public.t (actual rows=0 loops=1) > -> Index Scan using t_pkey on public.t (actual rows=998 loops=1) > Output: ctid > Index Cond: (t.i < 999) > Trigger RI_ConstraintTrigger_a_16399 for constraint t_i_fkey: calls=4 > I started thinking about this 1+ years ago wondering if a BRIN index could be > used for (bulk) FK validation. > > So I would like to be able to see the *plan* for the query. > I was able to show the plan and see that BRIN can be used like so: > |SET auto_explain.log_nested_statements=on; SET client_min_messages=debug; SET auto_explain.log_min_duration=0; > Should the plan be visible in explain (not auto-explain) ? For development purposes, I thin I could get the plan this way: SET debug_print_plan TO on; SET client_min_messages TO debug; (The plan is cached, so I think the query will only be displayed during the first execution in the session). Do you think that the documentation should advise the user to create BRIN index on the FK table? > BTW did you see this older thread ? > https://www.postgresql.org/message-id/flat/CA%2BU5nMLM1DaHBC6JXtUMfcG6f7FgV5mPSpufO7GRnbFKkF2f7g%40mail.gmail.com Not yet. Thanks. [1] https://www.postgresql.org/message-id/20200630011729.mr25bmmbvsattxe2%40alap3.anarazel.de -- Antonin Houska Web: https://www.cybertec-postgresql.com