Thread: slow delete
I have a table with 29K rows total and I need to delete about 80K out of it.
I have a b-tree index on column cola (varchar(255) ) for my where clause to use.
my "select count(*) from test where cola = 'abc' runs very fast,
but my actual "delete from test where cola = 'abc';" takes forever, never can finish and I haven't figured why....
In my explain output, what is that "Bitmap Heap Scan on table"? is it a table scan? is my index being used?
How does delete work? to delete 80K rows that meet my condition, does Postgres find them all and delete them all together or one at a time?
by the way, there is a foreign key on another table that references the primary key col0 on table test.
Could some one help me out here?
Thanks a lot,
Jessica
testdb=# select count(*) from test;
count
--------
295793 --total 295,793 rows
(1 row)
Time: 155.079 ms
testdb=# select count(*) from test where cola = 'abc';
count
-------
80998 - need to delete 80,988 rows
(1 row)
testdb=# explain delete from test where cola = 'abc';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=2110.49..10491.57 rows=79766 width=6)
Recheck Cond: ((cola)::text = 'abc'::text)
-> Bitmap Index Scan on test_cola_idx (cost=0.00..2090.55 rows=79766 width=0)
Index Cond: ((cola)::text = 'abc'::text)
(4 rows)
I have a b-tree index on column cola (varchar(255) ) for my where clause to use.
my "select count(*) from test where cola = 'abc' runs very fast,
but my actual "delete from test where cola = 'abc';" takes forever, never can finish and I haven't figured why....
In my explain output, what is that "Bitmap Heap Scan on table"? is it a table scan? is my index being used?
How does delete work? to delete 80K rows that meet my condition, does Postgres find them all and delete them all together or one at a time?
by the way, there is a foreign key on another table that references the primary key col0 on table test.
Could some one help me out here?
Thanks a lot,
Jessica
testdb=# select count(*) from test;
count
--------
295793 --total 295,793 rows
(1 row)
Time: 155.079 ms
testdb=# select count(*) from test where cola = 'abc';
count
-------
80998 - need to delete 80,988 rows
(1 row)
testdb=# explain delete from test where cola = 'abc';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=2110.49..10491.57 rows=79766 width=6)
Recheck Cond: ((cola)::text = 'abc'::text)
-> Bitmap Index Scan on test_cola_idx (cost=0.00..2090.55 rows=79766 width=0)
Index Cond: ((cola)::text = 'abc'::text)
(4 rows)
Jessica Richard wrote: > I have a table with 29K rows total and I need to delete about 80K out of it. I assume you meant 290K or something. > I have a b-tree index on column cola (varchar(255) ) for my where clause > to use. > > my "select count(*) from test where cola = 'abc' runs very fast, > > but my actual "delete from test where cola = 'abc';" takes forever, > never can finish and I haven't figured why.... When you delete, the database server must: - Check all foreign keys referencing the data being deleted - Update all indexes on the data being deleted - and actually flag the tuples as deleted by your transaction All of which takes time. It's a much slower operation than a query that just has to find out how many tuples match the search criteria like your SELECT does. How many indexes do you have on the table you're deleting from? How many foreign key constraints are there to the table you're deleting from? If you find that it just takes too long, you could drop the indexes and foreign key constraints, do the delete, then recreate the indexes and foreign key constraints. This can sometimes be faster, depending on just what proportion of the table must be deleted. Additionally, remember to VACUUM ANALYZE the table after that sort of big change. AFAIK you shouldn't really have to if autovacuum is doing its job, but it's not a bad idea anyway. -- Craig Ringer
> by the way, there is a foreign key on another table that references the > primary key col0 on table test. Is there an index on the referencing field in the other table ? Postgres must find the rows referencing the deleted rows, so if you forget to index the referencing column, this can take forever.
Thanks so much for your help.
I can select the 80K data out of 29K rows very fast, but we I delete them, it always just hangs there(> 4 hours without finishing), not deleting anything at all. Finally, I select pky_col where cola = 'abc', and redirect it to an out put file with a list of pky_col numbers, then put them in to a script with 80k lines of individual delete, then it ran fine, slow but actually doing the delete work:
delete from test where pk_col = n1;
delete from test where pk_col = n2;
...
My next question is: what is the difference between "select" and "delete"? There is another table that has one foreign key to reference the test (parent) table that I am deleting from and this foreign key does not have an index on it (a 330K row table).
Deleting one row at a time is fine: delete from test where pk_col = n1;
but deleting the big chunk all together (with 80K rows to delete) always hangs: delete from test where cola = 'abc';
I am wondering if I don't have enough memory to hold and carry on the 80k-row delete.....
but how come I can select those 80k-row very fast? what is the difference between select and delete?
Maybe the foreign key without an index does play a big role here, a 330K-row table references a 29K-row table will get a lot of table scan on the foreign table to check if each row can be deleted from the parent table... Maybe select from the parent table does not have to check the child table?
Thank you for pointing out about dropping the constraint first, I can imagine that it will be a lot faster.
But what if it is a memory issue that prevent me from deleting the 80K-row all at once, where do I check about the memory issue(buffer pool) how to tune it on the memory side?
Thanks a lot,
Jessica
I can select the 80K data out of 29K rows very fast, but we I delete them, it always just hangs there(> 4 hours without finishing), not deleting anything at all. Finally, I select pky_col where cola = 'abc', and redirect it to an out put file with a list of pky_col numbers, then put them in to a script with 80k lines of individual delete, then it ran fine, slow but actually doing the delete work:
delete from test where pk_col = n1;
delete from test where pk_col = n2;
...
My next question is: what is the difference between "select" and "delete"? There is another table that has one foreign key to reference the test (parent) table that I am deleting from and this foreign key does not have an index on it (a 330K row table).
Deleting one row at a time is fine: delete from test where pk_col = n1;
but deleting the big chunk all together (with 80K rows to delete) always hangs: delete from test where cola = 'abc';
I am wondering if I don't have enough memory to hold and carry on the 80k-row delete.....
but how come I can select those 80k-row very fast? what is the difference between select and delete?
Maybe the foreign key without an index does play a big role here, a 330K-row table references a 29K-row table will get a lot of table scan on the foreign table to check if each row can be deleted from the parent table... Maybe select from the parent table does not have to check the child table?
Thank you for pointing out about dropping the constraint first, I can imagine that it will be a lot faster.
But what if it is a memory issue that prevent me from deleting the 80K-row all at once, where do I check about the memory issue(buffer pool) how to tune it on the memory side?
Thanks a lot,
Jessica
----- Original Message ----
From: Craig Ringer <craig@postnewspapers.com.au>
To: Jessica Richard <rjessil@yahoo.com>
Cc: pgsql-performance@postgresql.org
Sent: Friday, July 4, 2008 1:16:31 AM
Subject: Re: [PERFORM] slow delete
Jessica Richard wrote:
> I have a table with 29K rows total and I need to delete about 80K out of it.
I assume you meant 290K or something.
> I have a b-tree index on column cola (varchar(255) ) for my where clause
> to use.
>
> my "select count(*) from test where cola = 'abc' runs very fast,
>
> but my actual "delete from test where cola = 'abc';" takes forever,
> never can finish and I haven't figured why....
When you delete, the database server must:
- Check all foreign keys referencing the data being deleted
- Update all indexes on the data being deleted
- and actually flag the tuples as deleted by your transaction
All of which takes time. It's a much slower operation than a query that
just has to find out how many tuples match the search criteria like your
SELECT does.
How many indexes do you have on the table you're deleting from? How many
foreign key constraints are there to the table you're deleting from?
If you find that it just takes too long, you could drop the indexes and
foreign key constraints, do the delete, then recreate the indexes and
foreign key constraints. This can sometimes be faster, depending on just
what proportion of the table must be deleted.
Additionally, remember to VACUUM ANALYZE the table after that sort of
big change. AFAIK you shouldn't really have to if autovacuum is doing
its job, but it's not a bad idea anyway.
--
Craig Ringer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
From: Craig Ringer <craig@postnewspapers.com.au>
To: Jessica Richard <rjessil@yahoo.com>
Cc: pgsql-performance@postgresql.org
Sent: Friday, July 4, 2008 1:16:31 AM
Subject: Re: [PERFORM] slow delete
Jessica Richard wrote:
> I have a table with 29K rows total and I need to delete about 80K out of it.
I assume you meant 290K or something.
> I have a b-tree index on column cola (varchar(255) ) for my where clause
> to use.
>
> my "select count(*) from test where cola = 'abc' runs very fast,
>
> but my actual "delete from test where cola = 'abc';" takes forever,
> never can finish and I haven't figured why....
When you delete, the database server must:
- Check all foreign keys referencing the data being deleted
- Update all indexes on the data being deleted
- and actually flag the tuples as deleted by your transaction
All of which takes time. It's a much slower operation than a query that
just has to find out how many tuples match the search criteria like your
SELECT does.
How many indexes do you have on the table you're deleting from? How many
foreign key constraints are there to the table you're deleting from?
If you find that it just takes too long, you could drop the indexes and
foreign key constraints, do the delete, then recreate the indexes and
foreign key constraints. This can sometimes be faster, depending on just
what proportion of the table must be deleted.
Additionally, remember to VACUUM ANALYZE the table after that sort of
big change. AFAIK you shouldn't really have to if autovacuum is doing
its job, but it's not a bad idea anyway.
--
Craig Ringer
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
> My next question is: what is the difference between "select" and "delete"? > There is another table that has one foreign key to reference the test > (parent) table that I am deleting from and this foreign key does not have > an index on it (a 330K row table). The difference is that with SELECT you're not performing any modifications to the data, while with DELETE you are. That means that with DELETE you may have a lot of overhead due to FK checks etc. Someone already pointed out that if you reference a table A from table B (using a foreign key), then you have to check FK in case of DELETE, and that may knock the server down if the table B is huge and does not have an index on the FK column. > Deleting one row at a time is fine: delete from test where pk_col = n1; > > but deleting the big chunk all together (with 80K rows to delete) always > hangs: delete from test where cola = 'abc'; > > I am wondering if I don't have enough memory to hold and carry on the > 80k-row delete..... > but how come I can select those 80k-row very fast? what is the difference > between select and delete? > > Maybe the foreign key without an index does play a big role here, a > 330K-row table references a 29K-row table will get a lot of table scan on > the foreign table to check if each row can be deleted from the parent > table... Maybe select from the parent table does not have to check the > child table? Yes, and PFC already pointed this out. Tomas
On Friday 04 July 2008, tv@fuzzy.cz wrote: > > My next question is: what is the difference between "select" and > > "delete"? There is another table that has one foreign key to reference > > the test (parent) table that I am deleting from and this foreign key > > does not have an index on it (a 330K row table). > Yeah you need to fix that. You're doing 80,000 sequential scans of that table to do your delete. That's a whole lot of memory access ... I don't let people here create foreign key relationships without matching indexes - they always cause problems otherwise. -- Alan