RE: [EXT] Re: Troubleshooting a long running delete statement - Mailing list pgsql-performance
From | Dirschel, Steve |
---|---|
Subject | RE: [EXT] Re: Troubleshooting a long running delete statement |
Date | |
Msg-id | DM6PR03MB4332A5461B801B904DA275F6FAB09@DM6PR03MB4332.namprd03.prod.outlook.com Whole thread Raw |
In response to | Re: Troubleshooting a long running delete statement (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: [EXT] Re: Troubleshooting a long running delete statement
Re: [EXT] Re: Troubleshooting a long running delete statement |
List | pgsql-performance |
On Wed, Oct 06, 2021 at 06:00:07PM +0000, Dirschel, Steve wrote:
> • When I did an explain on the delete I could see it was full scanning the table. I did a full scan of the table interactively in less than 1 second so the long runtime was not due to the full tablescan.
> I started looking at table definitions (indexes, FK's, etc.) and comparing to Oracle and noticed some indexes missing. I then could see the table being deleted from was a child table with a FK pointing to a parent table. Finally I was able to see that the parent table was missing an index on the FK column so for every row being deleted from the child it was full scanning the parent. All makes sense after the fact but I'm looking for a more methodical way to come to that conclusion by looking at database statistics.
>
> Are there other statistics in Postgres I may have looked at to methodically come to the conclusion that the problem was the missing index on the parent FK column?
I think explain (analyze on) would've helped you.
If I understand your scenario, it'd look like this:
|postgres=# explain (analyze) delete from t; Delete on t
|(cost=0.00..145.00 rows=10000 width=6) (actual time=10.124..10.136 rows=0 loops=1)
| -> Seq Scan on t (cost=0.00..145.00 rows=10000 width=6) (actual
|time=0.141..2.578 rows=10000 loops=1) Planning Time: 0.484 ms Trigger
|for constraint u_i_fkey: time=4075.123 calls=10000 Execution Time:
|4087.764 ms
You can see the query plan used for the FK trigger with autoexplain.
postgres=*# SET auto_explain.log_min_duration='0s'; SET client_min_messages=debug; SET auto_explain.log_nested_statements=on;
postgres=*# explain (analyze) delete from t;
|...
|Query Text: DELETE FROM ONLY "public"."u" WHERE $1 OPERATOR(pg_catalog.=) "i"
|Delete on u (cost=0.00..214.00 rows=1 width=6) (actual rows=0 loops=1)
| Buffers: shared hit=90
| -> Seq Scan on u (cost=0.00..214.00 rows=1 width=6) (actual rows=1 loops=1)
| Filter: ($1 = i)
| Rows Removed by Filter: 8616
| Buffers: shared hit=89
|...
Thanks for the reply and the info above. My question was more directed at how can you troubleshoot the active session running the query. In the examples above you are actually executing the query. If an application is executing the query I can't go in an re-execute it. I also found by trial and error if I do execute it interactively and CTRL C out if it a message is returned which would give hints where to look next:
ERROR: canceling statement due to user request
CONTEXT: SQL statement "SELECT 1 FROM ONLY "cf4"."category_page" x WHERE $1::pg_catalog.text OPERATOR(pg_catalog.=) "prism_guid"::pg_catalog.text FOR KEY SHARE OF x"
That query above is hitting the parent table and my delete is against the child so that info would be very helpful, but it was only available if I interactively ran it. I’m mostly interested in what stats/info is available while it’s running to troubleshoot it.
Regards
Steve
pgsql-performance by date: