Thread: Tips on troubleshooting slow DELETE (suspect cascades)

Tips on troubleshooting slow DELETE (suspect cascades)

From
Jim Vanns
Date:
Hi,

I have a slow (CPU bound) DELETE statement I'm attempting to debug and I suspect that its actually the ON DELETE CASCADE on the foreign key thats causing it. I suspect this because the dry-run mode of the same query (a SELECT instead of DELETE) doesn't suffer the same fate. The statement is effectively;

# Dry mode
SELECT prune_function(timestamp);
# Destructive mode
DELETE FROM foobar p USING prune_function(timestamp) AS e WHERE p.id = e.prune_id

The logs seem to hold no information on the progress of the statement but the CPU is pegged at 100% for hours. The SELECT equivalent runs in under a minute.

What I need is a way to see into this statement as it executes to confirm my suspicion - does anyone have any tips on that?

Cheers

Jim

--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London

Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Adrian Klaver
Date:
On 1/16/24 09:45, Jim Vanns wrote:
> Hi,
> 
> I have a slow (CPU bound) DELETE statement I'm attempting to debug and I 
> suspect that its actually the ON DELETE CASCADE on the foreign key thats 
> causing it. I suspect this because the dry-run mode of the same query (a 
> SELECT instead of DELETE) doesn't suffer the same fate. The statement is 
> effectively;
> 
> # Dry mode
> SELECT prune_function(timestamp);
> # Destructive mode
> DELETE FROM foobar p USING prune_function(timestamp) AS e WHERE p.id 
> <http://p.id> = e.prune_id
> 
> The logs seem to hold no information on the progress of the statement 
> but the CPU is pegged at 100% for hours. The SELECT equivalent runs in 
> under a minute.
> 
> What I need is a way to see into this statement as it executes to 
> confirm my suspicion - does anyone have any tips on that?

Explain:

https://www.postgresql.org/docs/current/sql-explain.html

It would also be helpful to reply with the table definitions for the 
tables. If that is not possible then at least whether there is an index 
on the FK reference in the child table(s)?

> 
> Cheers
> 
> Jim
> 
> -- 
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Tom Lane
Date:
Adrian Klaver <adrian.klaver@aklaver.com> writes:
> On 1/16/24 09:45, Jim Vanns wrote:
>> I have a slow (CPU bound) DELETE statement I'm attempting to debug and I 
>> suspect that its actually the ON DELETE CASCADE on the foreign key thats 
>> causing it.

99% of the time, the cause is lack of an index on the foreign key's
referencing columns.  We make you have a unique index on the
referenced columns, because otherwise the FK constraint's semantics
are unclear.  But you're not required to make one on the other side.

>> What I need is a way to see into this statement as it executes to 
>> confirm my suspicion - does anyone have any tips on that?

> Explain:
> https://www.postgresql.org/docs/current/sql-explain.html

Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
enforcement trigger for the FK, this is likely what's happening.

            regards, tom lane



Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Jim Vanns
Date:
Hi Tom/Adrian.

I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case. Also, I keep having to terminate the statement
because it never runs to completion and produces the plan (at least in
ANALYZE VERBOSE mode anyway).

I have, however, pulled the function body code out and produced an
isolated case that can be EXPLAINED. The table in question is a
curious one since it models a hierarchy as an adjacency list and so
the fkey reference is back to itself (to a primary key - so is an
additional index required?):

CREATE TABLE tree (
   ts TIMESTAMPTZ NOT NULL
   tree_id BIGINT NOT NULL,
   parent_id BIGINT NULL,
   --
   CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
   FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
      ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX ON tree USING BRIN (ts);

The tree table has 95,915,630 rows.

I've not yet got a complete or reliable plan :( I have made a DB copy
and will be dropping the constraint to see what effect that has.

Cheers,

Jim

On Tue, 16 Jan 2024 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Adrian Klaver <adrian.klaver@aklaver.com> writes:
> > On 1/16/24 09:45, Jim Vanns wrote:
> >> I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
> >> suspect that its actually the ON DELETE CASCADE on the foreign key thats
> >> causing it.
>
> 99% of the time, the cause is lack of an index on the foreign key's
> referencing columns.  We make you have a unique index on the
> referenced columns, because otherwise the FK constraint's semantics
> are unclear.  But you're not required to make one on the other side.
>
> >> What I need is a way to see into this statement as it executes to
> >> confirm my suspicion - does anyone have any tips on that?
>
> > Explain:
> > https://www.postgresql.org/docs/current/sql-explain.html
>
> Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
> enforcement trigger for the FK, this is likely what's happening.
>
>                         regards, tom lane



-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London



Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Adrian Klaver
Date:
On 1/18/24 08:37, Jim Vanns wrote:
> Hi Tom/Adrian.
> 
> I should have already stated I did begin with EXPLAIN but given they
> don't easily work with (the internals) stored/procedures, it wasn't
> useful in this case. Also, I keep having to terminate the statement
> because it never runs to completion and produces the plan (at least in
> ANALYZE VERBOSE mode anyway).
> 
> I have, however, pulled the function body code out and produced an
> isolated case that can be EXPLAINED. The table in question is a

I'm guessing the function is prune_function(timestamp)?

What is the function body code?

> curious one since it models a hierarchy as an adjacency list and so
> the fkey reference is back to itself (to a primary key - so is an
> additional index required?):
> 
> CREATE TABLE tree (
>     ts TIMESTAMPTZ NOT NULL
>     tree_id BIGINT NOT NULL,
>     parent_id BIGINT NULL,
>     --
>     CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
>     FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
>        ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
> );
> CREATE INDEX ON tree USING BRIN (ts);
> 
> The tree table has 95,915,630 rows.
> 
> I've not yet got a complete or reliable plan :( I have made a DB copy
> and will be dropping the constraint to see what effect that has.
> 
> Cheers,
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Rob Sargent
Date:

> On Jan 18, 2024, at 9:46 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 1/18/24 08:37, Jim Vanns wrote:
>> Hi Tom/Adrian.
>> I should have already stated I did begin with EXPLAIN but given they
>> don't easily work with (the internals) stored/procedures, it wasn't
>> useful in this case. Also, I keep having to terminate the statement
>> because it never runs to completion and produces the plan (at least in
>> ANALYZE VERBOSE mode anyway).
>> I have, however, pulled the function body code out and produced an
>> isolated case that can be EXPLAINED. The table in question is a
>
> I'm guessing the function is prune_function(timestamp)?
>
> What is the function body code?
>
>> curious one since it models a hierarchy as an adjacency list and so
>> the fkey reference is back to itself (to a primary key - so is an
>> additional index required?):
>> CREATE TABLE tree (
>>    ts TIMESTAMPTZ NOT NULL
>>    tree_id BIGINT NOT NULL,
>>    parent_id BIGINT NULL,
>>    --
>>    CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
>>    FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
>>       ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
>> );
>> CREATE INDEX ON tree USING BRIN (ts);
>> The tree table has 95,915,630 rows.
>> I've not yet got a complete or reliable plan :( I have made a DB copy
>> and will be dropping the constraint to see what effect that has.
>> Cheers,
>

Have you tried making a table from prune_function() and using that in the delete statement?
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
>
>



Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Tom Lane
Date:
Jim Vanns <jvanns@ilm.com> writes:
> I should have already stated I did begin with EXPLAIN but given they
> don't easily work with (the internals) stored/procedures, it wasn't
> useful in this case.

auto_explain with nested statements enabled might help?

            regards, tom lane



Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Jim Vanns
Date:
After dropping the constraint entirely the DELETE completes in 4
minutes (the same time as the dry-run using SELECT against the
function instead of a DELETE). A marked improvement on 3 hours
followed by a pg_cancel_backend()!

Jim

On Thu, 18 Jan 2024 at 16:37, Jim Vanns <jvanns@ilm.com> wrote:
>
> Hi Tom/Adrian.
>
> I should have already stated I did begin with EXPLAIN but given they
> don't easily work with (the internals) stored/procedures, it wasn't
> useful in this case. Also, I keep having to terminate the statement
> because it never runs to completion and produces the plan (at least in
> ANALYZE VERBOSE mode anyway).
>
> I have, however, pulled the function body code out and produced an
> isolated case that can be EXPLAINED. The table in question is a
> curious one since it models a hierarchy as an adjacency list and so
> the fkey reference is back to itself (to a primary key - so is an
> additional index required?):
>
> CREATE TABLE tree (
>    ts TIMESTAMPTZ NOT NULL
>    tree_id BIGINT NOT NULL,
>    parent_id BIGINT NULL,
>    --
>    CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
>    FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
>       ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
> );
> CREATE INDEX ON tree USING BRIN (ts);
>
> The tree table has 95,915,630 rows.
>
> I've not yet got a complete or reliable plan :( I have made a DB copy
> and will be dropping the constraint to see what effect that has.
>
> Cheers,
>
> Jim
>
> On Tue, 16 Jan 2024 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Adrian Klaver <adrian.klaver@aklaver.com> writes:
> > > On 1/16/24 09:45, Jim Vanns wrote:
> > >> I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
> > >> suspect that its actually the ON DELETE CASCADE on the foreign key thats
> > >> causing it.
> >
> > 99% of the time, the cause is lack of an index on the foreign key's
> > referencing columns.  We make you have a unique index on the
> > referenced columns, because otherwise the FK constraint's semantics
> > are unclear.  But you're not required to make one on the other side.
> >
> > >> What I need is a way to see into this statement as it executes to
> > >> confirm my suspicion - does anyone have any tips on that?
> >
> > > Explain:
> > > https://www.postgresql.org/docs/current/sql-explain.html
> >
> > Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
> > enforcement trigger for the FK, this is likely what's happening.
> >
> >                         regards, tom lane
>
>
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London



-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London



Re: Tips on troubleshooting slow DELETE (suspect cascades)

From
Martin Ritchie
Date:
Add an index to parent_id. What is likely happening is each time a row is deleted, it has to scan the entire table to make sure it is not referenced by any parent_id records.


On Thu, Jan 18, 2024 at 12:04 PM Jim Vanns <jvanns@ilm.com> wrote:
After dropping the constraint entirely the DELETE completes in 4
minutes (the same time as the dry-run using SELECT against the
function instead of a DELETE). A marked improvement on 3 hours
followed by a pg_cancel_backend()!

Jim

On Thu, 18 Jan 2024 at 16:37, Jim Vanns <jvanns@ilm.com> wrote:
>
> Hi Tom/Adrian.
>
> I should have already stated I did begin with EXPLAIN but given they
> don't easily work with (the internals) stored/procedures, it wasn't
> useful in this case. Also, I keep having to terminate the statement
> because it never runs to completion and produces the plan (at least in
> ANALYZE VERBOSE mode anyway).
>
> I have, however, pulled the function body code out and produced an
> isolated case that can be EXPLAINED. The table in question is a
> curious one since it models a hierarchy as an adjacency list and so
> the fkey reference is back to itself (to a primary key - so is an
> additional index required?):
>
> CREATE TABLE tree (
>    ts TIMESTAMPTZ NOT NULL
>    tree_id BIGINT NOT NULL,
>    parent_id BIGINT NULL,
>    --
>    CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
>    FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
>       ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
> );
> CREATE INDEX ON tree USING BRIN (ts);
>
> The tree table has 95,915,630 rows.
>
> I've not yet got a complete or reliable plan :( I have made a DB copy
> and will be dropping the constraint to see what effect that has.
>
> Cheers,
>
> Jim
>
> On Tue, 16 Jan 2024 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Adrian Klaver <adrian.klaver@aklaver.com> writes:
> > > On 1/16/24 09:45, Jim Vanns wrote:
> > >> I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
> > >> suspect that its actually the ON DELETE CASCADE on the foreign key thats
> > >> causing it.
> >
> > 99% of the time, the cause is lack of an index on the foreign key's
> > referencing columns.  We make you have a unique index on the
> > referenced columns, because otherwise the FK constraint's semantics
> > are unclear.  But you're not required to make one on the other side.
> >
> > >> What I need is a way to see into this statement as it executes to
> > >> confirm my suspicion - does anyone have any tips on that?
> >
> > > Explain:
> > > https://www.postgresql.org/docs/current/sql-explain.html
> >
> > Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
> > enforcement trigger for the FK, this is likely what's happening.
> >
> >                         regards, tom lane
>
>
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London



--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London