Thread: Troubleshooting a long running delete statement
- One could estimate the number of logical reads the delete should do based on expected number of rows to delete, expected logical reads against the table per row, expected logical reads against each index per row.
- One could look in V$SQL and see how many logical reads the query was actually doing.
- One could look at V$SESS_IO and see how many logical reads the session was doing.
- When I did an explain on the delete I could see it was full scanning the table. I did a full scan of the table interactively in less than 1 second so the long runtime was not due to the full tablescan.
- I could not find the query in pg_stat_statements to see how many shared block reads/hits the query was doing to see if the numbers were extremely high. Based on documentation queries do not show up in pg_stat_statements until after they complete.
- pg_stat_activity showed wait_event_type and wait_event were null for the session every time I looked. So the session was continually using CPU.
On 10/6/21 14:00, Dirschel, Steve wrote:
.EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } Question:How would one troubleshoot this issue in Postgres as to why the delete was running so long? My background is Oracle and there are various statistics I may look at:
- One could estimate the number of logical reads the delete should do based on expected number of rows to delete, expected logical reads against the table per row, expected logical reads against each index per row.
- One could look in V$SQL and see how many logical reads the query was actually doing.
- One could look at V$SESS_IO and see how many logical reads the session was doing.
In this case you would see the query was doing way more logical reads that expected and then try and think of scenarios that would cause that.Here is what I could see in Postgres:
- When I did an explain on the delete I could see it was full scanning the table. I did a full scan of the table interactively in less than 1 second so the long runtime was not due to the full tablescan.
- I could not find the query in pg_stat_statements to see how many shared block reads/hits the query was doing to see if the numbers were extremely high. Based on documentation queries do not show up in pg_stat_statements until after they complete.
- pg_stat_activity showed wait_event_type and wait_event were null for the session every time I looked. So the session was continually using CPU.
I started looking at table definitions (indexes, FK's, etc.) and comparing to Oracle and noticed some indexes missing. I then could see the table being deleted from was a child table with a FK pointing to a parent table. Finally I was able to see that the parent table was missing an index on the FK column so for every row being deleted from the child it was full scanning the parent. All makes sense after the fact but I'm looking for a more methodical way to come to that conclusion by looking at database statistics.Are there other statistics in Postgres I may have looked at to methodically come to the conclusion that the problem was the missing index on the parent FK column?ThanksThis e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
Hi Steve,
First, check whether you have any triggers on the table. The best way of doing it is to use information_schema.triggers. I have seen triggers introduce some "mysterious" functionality in Oracle as well. Second, check constraints. Is the table you're deleting from the parent table of a foreign key constraint(s)? If the constraints are defined with "ON DELETE CASCADE", you maybe deleting more than you think. If it is not defined with "ON DELETE CASCADE" or "ON DELETE SET NULL", you would get an error. If that passes the muster, then check the processes doing the most of IO using "iotop" or "atop". I like the latter. You can then check what the busy processes are doing using strace -e trace=file and, for good measure, 'perf top".
Regards
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
On Wed, Oct 06, 2021 at 06:00:07PM +0000, Dirschel, Steve wrote: > • When I did an explain on the delete I could see it was full scanning the table. I did a full scan of the tableinteractively in less than 1 second so the long runtime was not due to the full tablescan. > I started looking at table definitions (indexes, FK's, etc.) and comparing to Oracle and noticed some indexes missing. I then could see the table being deleted from was a child table with a FK pointing to a parent table. Finally Iwas able to see that the parent table was missing an index on the FK column so for every row being deleted from the childit was full scanning the parent. All makes sense after the fact but I'm looking for a more methodical way to come tothat conclusion by looking at database statistics. > > Are there other statistics in Postgres I may have looked at to methodically come to the conclusion that the problem wasthe missing index on the parent FK column? I think explain (analyze on) would've helped you. If I understand your scenario, it'd look like this: |postgres=# explain (analyze) delete from t; | Delete on t (cost=0.00..145.00 rows=10000 width=6) (actual time=10.124..10.136 rows=0 loops=1) | -> Seq Scan on t (cost=0.00..145.00 rows=10000 width=6) (actual time=0.141..2.578 rows=10000 loops=1) | Planning Time: 0.484 ms | Trigger for constraint u_i_fkey: time=4075.123 calls=10000 | Execution Time: 4087.764 ms You can see the query plan used for the FK trigger with autoexplain. postgres=*# SET auto_explain.log_min_duration='0s'; SET client_min_messages=debug; SET auto_explain.log_nested_statements=on; postgres=*# explain (analyze) delete from t; |... |Query Text: DELETE FROM ONLY "public"."u" WHERE $1 OPERATOR(pg_catalog.=) "i" |Delete on u (cost=0.00..214.00 rows=1 width=6) (actual rows=0 loops=1) | Buffers: shared hit=90 | -> Seq Scan on u (cost=0.00..214.00 rows=1 width=6) (actual rows=1 loops=1) | Filter: ($1 = i) | Rows Removed by Filter: 8616 | Buffers: shared hit=89 |...
On Wed, 2021-10-06 at 18:00 +0000, Dirschel, Steve wrote: > Are there other statistics in Postgres I may have looked at to methodically come to the conclusion that the problem wasthe missing index on the parent FK column? You could use the query from my article to find the missing indexes: https://www.cybertec-postgresql.com/en/index-your-foreign-key/ Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On 10/6/21 14:00, Dirschel, Steve wrote:
Question:
How would one troubleshoot this issue in Postgres as to why the delete was running so long? My background is Oracle and there are various statistics I may look at:
· One could estimate the number of logical reads the delete should do based on expected number of rows to delete, expected logical reads against the table per row, expected logical reads against each index per row.
· One could look in V$SQL and see how many logical reads the query was actually doing.
· One could look at V$SESS_IO and see how many logical reads the session was doing.
In this case you would see the query was doing way more logical reads that expected and then try and think of scenarios that would cause that.
Here is what I could see in Postgres:
· When I did an explain on the delete I could see it was full scanning the table. I did a full scan of the table interactively in less than 1 second so the long runtime was not due to the full tablescan.
· I could not find the query in pg_stat_statements to see how many shared block reads/hits the query was doing to see if the numbers were extremely high. Based on documentation queries do not show up in pg_stat_statements until after they complete.
· pg_stat_activity showed wait_event_type and wait_event were null for the session every time I looked. So the session was continually using CPU.
I started looking at table definitions (indexes, FK's, etc.) and comparing to Oracle and noticed some indexes missing. I then could see the table being deleted from was a child table with a FK pointing to a parent table. Finally I was able to see that the parent table was missing an index on the FK column so for every row being deleted from the child it was full scanning the parent. All makes sense after the fact but I'm looking for a more methodical way to come to that conclusion by looking at database statistics.
Are there other statistics in Postgres I may have looked at to methodically come to the conclusion that the problem was the missing index on the parent FK column?
Thanks
This e-mail is for the sole use of the intended recipient and contains information that may be privileged and/or confidential. If you are not an intended recipient, please notify the sender by return e-mail and delete this e-mail and any attachments. Certain required legal entity disclosures can be accessed on our website: https://www.thomsonreuters.com/en/resources/disclosures.html
Hi Steve,
First, check whether you have any triggers on the table. The best way of doing it is to use information_schema.triggers. I have seen triggers introduce some "mysterious" functionality in Oracle as well. Second, check constraints. Is the table you're deleting from the parent table of a foreign key constraint(s)? If the constraints are defined with "ON DELETE CASCADE", you maybe deleting more than you think. If it is not defined with "ON DELETE CASCADE" or "ON DELETE SET NULL", you would get an error. If that passes the muster, then check the processes doing the most of IO using "iotop" or "atop". I like the latter. You can then check what the busy processes are doing using strace -e trace=file and, for good measure, 'perf top".
Regards
--
Mladen Gogala
Database Consultant
Tel: (347) 321-1217
https://dbwhisperer.wordpress.com
Thanks for the reply and I hope I’m replying to this e-mail correctly at the bottom of the chain. We are running on AWS aurora postgres. I assume strace -e isn’t an option given we don’t have access to the server or are you aware of a method I could still do that without server access?
Regards
Steve
Thanks for the reply and I hope I’m replying to this e-mail correctly at the bottom of the chain.
Hey, it's not me, it's rules and regulations. And that's incredibly important on this group, or so I was lead to believe :)
No, access to the OS is not an option with RDS. However, RDS comes with support. You will have to contact support. They may use strace for you.We are running on AWS aurora postgres. I assume strace -e isn’t an option given we don’t have access to the server or are you aware of a method I could still do that without server access?
Regards
Steve
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
postgres=# explain (analyze) delete from t; Delete on t
I would try explain (analyze, timing, buffers). That would also give you the timing of each step so you can figure which one takes the longes.
Regards
-- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
Here is what I could see in Postgres:
- When I did an explain on the delete I could see it was full scanning the table. I did a full scan of the table interactively in less than 1 second so the long runtime was not due to the full tablescan.
Thanks for the reply and the info above. My question was more directed at how can you troubleshoot the active session running the query. In the examples above you are actually executing the query. If an application is executing the query I can't go in an re-execute it.
Maciek
This won't be immediately useful, but there's been a patch proposed for Postgres 15 to allow logging the plan of a running query [1]. Progress seems to have stalled a bit, but it seems like there was a fair amount of interest, so I wouldn't count it out yet. If you have thoughts on the proposed functionality, I'm sure thoughtful feedback would be appreciated.