Thread: Lengthy deletion
Hi.
I was instructed to delete old records from one of the tables in our production system. The deletion took hours and I had to stop it in mid-operation and reschedule it as a night job. But then I had to do the same when I got up in the morning and it was still running.
The odd thing about it: There are 4720965 records in the table, of which I have to delete 3203485. This should not have taken too long, and the EXPLAIN estimate for it seemed to agree with me:
bcentral=> explain delete
from subscriptions s
where (date_registered < '2011-11-13' and operator <> 'P') and service_id not in ( select id from alerts_services )
;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Delete (cost=38885.86..155212.37 rows=1630425 width=6)
-> Bitmap Heap Scan on subscriptions s (cost=38885.86..155212.37 rows=1630425 width=6)
Filter: ((date_registered < '2011-11-13 00:00:00'::timestamp without time zone) AND (operator <> 'P'::bpchar) AND (NOT (hashed SubPlan 1)))
-> Bitmap Index Scan on t_ind (cost=0.00..38473.03 rows=2361115 width=0)
Index Cond: ((date_registered < '2011-11-13 00:00:00'::timestamp without time zone) = true)
SubPlan 1
-> Seq Scan on alerts_services (cost=0.00..4.58 rows=258 width=4)
(7 rows)
I got an interesting clue, though, when I canceled the deletion the second time around. I got the following error message:
Cancel request sent
ERROR: canceling statement due to user request
CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."sent_messages" x WHERE $1 OPERATOR(pg_catalog.=) "subscription_id" FOR SHARE OF x"
As you can see in the EXPLAIN sentence, I'm trying to delete from a table called "subscriptions", and this context is in another table called "sent_messages" which is related to it by foreign key. Now, I'd say that it was waiting to get a lock on the "sent_messages" table (from which I duly removed the related records before running my delete), and that I should have known that. Only, I was using another connection to monitor pg_stat_activity while the delete is done, and the delete process had "false" in the "waiting" column!
bcentral=# SELECT usename, procpid, query_start, client_addr, client_port, current_query, waiting
FROM pg_stat_activity
WHERE query_start < now() - interval '3 seconds'
AND xact_start is not null order by xact_start;
-[ RECORD 1 ]-+------------------------------------------------------------------------------------------------
--------------------
usename | bcentral
procpid | 20047
query_start | 2011-11-29 02:01:28.968161+02
client_addr | 192.168.34.34
client_port | 55709
current_query | delete
: from subscriptions s
: where (date_registered < '2011-11-13' and operator <> 'P') and service_id not in ( select id fr
om alerts_services )
: ;
waiting | f
Um... so what gives? What's happening here? The server is PostgreSQL 9.0.4.
TIA,
Herouth
"Herouth Maoz" <herouth@unicell.co.il> writes: > I was instructed to delete old records from one of the tables in our production system. The deletion took hours and I hadto stop it in mid-operation and reschedule it as a night job. But then I had to do the same when I got up in the morningand it was still running. > I got an interesting clue, though, when I canceled the deletion the second time around. I got the following error message: > Cancel request sent > ERROR: canceling statement due to user request > CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."sent_messages" x WHERE $1 OPERATOR(pg_catalog.=) "subscription_id"FOR SHARE OF x" Yup, that's a clue all right. I'll bet a nickel that you don't have an index on the foreign key's referencing column (ie, sent_messages.subscription_id). That means each delete in the referenced table has to seqscan the referencing table to see if the delete would result in an FK violation. regards, tom lane
"Herouth Maoz" <herouth@unicell.co.il> writes:I was instructed to delete old records from one of the tables in our production system. The deletion took hours and I had to stop it in mid-operation and reschedule it as a night job. But then I had to do the same when I got up in the morning and it was still running.I got an interesting clue, though, when I canceled the deletion the second time around. I got the following error message:Cancel request sentERROR: canceling statement due to user requestCONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."sent_messages" x WHERE $1 OPERATOR(pg_catalog.=) "subscription_id" FOR SHARE OF x"
Yup, that's a clue all right. I'll bet a nickel that you don't
have an index on the foreign key's referencing column (ie,
sent_messages.subscription_id). That means each delete in
the referenced table has to seqscan the referencing table to
see if the delete would result in an FK violation.
--
Herouth Maoz <herouth@unicell.co.il> writes: > On 29/11/2011, at 09:13, Tom Lane wrote: >> Yup, that's a clue all right. I'll bet a nickel that you don't >> have an index on the foreign key's referencing column (ie, >> sent_messages.subscription_id). That means each delete in >> the referenced table has to seqscan the referencing table to >> see if the delete would result in an FK violation. > Makes sense. But shouldn't that be figured into the EXPLAIN plan? If you run EXPLAIN ANALYZE, on a reasonably recent release, you'll see the foreign-key trigger eating a lot of time. Plain EXPLAIN doesn't show triggers because it has no idea how much time they'll take. regards, tom lane