Thread: delete is getting hung when there is a huge data in table
Hi, I am facing an issue with the deletion of huge data. We have a cronscript which is used to delete the data of last 3 months from one of the tables. Data in the table is large (8872597 as you can see the count below) since it is from last 3 months. fm_db_Server3=# select count(*) from audittraillogentry ; count --------- 8872597 (1 row) Now issue is that when this script for the deletion of data is launched , it is taking more than 7 days and doing nothingi.e not a single row has been deleted. Then we stopped the script,terminated the database sessions by using SELECT pg_terminate_backend(proc pid) and run the followingcommand delete from audittraillogentry where intime <= to_timestamp('2015-01-30 23:59:59.999', 'YYYY/MM/DD-HH24:MI:SS.FF3') OR outtime <= to_timestamp('2015-01-30 23:59:59.999', 'YYYY/MM/DD-HH24:MI:SS.FF3'); still this delete operation is not working and not a single row has been deleted from the table. Now i have following questions - 1. If postgreSQL has some limitations for deletion of large data? 2. If i should run the vacumm, after stopping the cron script ? because probably to get the "smaller" table? 3. if dropping the indexes can help here? now sure. 4.if i should think about partitioning , if there is any limitation while delaing with large data in postgreSQL? regards Mitu _____ ___________________________________
Mitu Verma <mitu.verma@ericsson.com> writes: > 1. If postgreSQL has some limitations for deletion of large data? Not as such, but you've not given us any details that would permit comment. A reasonably likely bet is that this table is referenced by a foreign key in some other table, and that other table has no index on the referencing column. That would make the FK is-it-ok-to-delete checks very slow. regards, tom lane
On Saturday, May 2, 2015, Mitu Verma <mitu.verma@ericsson.com> wrote:
still this delete operation is not working and not a single row has been deleted from the table.
Because of MVCC other sessions are not able to see partial deletions...and as you aluded to knowing the data itself is not actually removed by a deletion though a vacuum will eventually allow the same space to be reused thus deferring the increase in size from future insertions.
David J.
This delete runs in a single transaction. That means the entire transaction has to complete before you will see anything deleted. Interrupting the transaction simply rolls it back, so nothing is deleted. Tom already pointed out the potential foreign key slowdown, another slowdown may simply be drive speed. My recommendation: cut the delete in chunks. For example delete the data one week at a time. That way the transaction is smaller, the dataset to delete is smaller and it will finish quicker. Uwe On Sun, May 03, 2015 03:24:25 AM Mitu Verma wrote: > Hi, > > I am facing an issue with the deletion of huge data. > We have a cronscript which is used to delete the data of last 3 months from > one of the tables. Data in the table is large (8872597 as you can see the > count below) since it is from last 3 months. > > fm_db_Server3=# select count(*) from audittraillogentry ; > > > count > --------- > 8872597 > (1 row) > > Now issue is that when this script for the deletion of data is launched , it > is taking more than 7 days and doing nothing i.e not a single row has been > deleted. > > Then we stopped the script,terminated the database sessions by using SELECT > pg_terminate_backend(proc pid) and run the following command > > delete from audittraillogentry where intime <= to_timestamp('2015-01-30 > 23:59:59.999', 'YYYY/MM/DD-HH24:MI:SS.FF3') OR outtime <= > to_timestamp('2015-01-30 23:59:59.999', 'YYYY/MM/DD-HH24:MI:SS.FF3'); still > this delete operation is not working and not a single row has been deleted > from the table. > > Now i have following questions - > > 1. If postgreSQL has some limitations for deletion of large data? > 2. If i should run the vacumm, after stopping the cron script ? because > probably to get the "smaller" table? 3. if dropping the indexes can help > here? now sure. > 4.if i should think about partitioning , if there is any limitation while > delaing with large data in postgreSQL? > > regards > Mitu > > _____ > ___________________________________
> Now issue is that when this script for the deletion of data is launched , it is taking more than 7 days and doing nothing i.e not a single row has been deleted. Deleting a large number of rows can take a long time. Often it's quicker to delete smaller chunks. The LIMIT clause is not supported by DELETE, so you need some kind of subquery. We use something like: do $_$declare num_rows bigint; begin loop delete from YourTable where id in (select id from YourTable where id < 500 limit 100); get diagnostics num_rows = row_count; raise notice 'deleted % rows', num_rows; exit when num_rows = 0; end loop; end;$_$; This deletes rows with an id smaller than 500 in chunks of 100. Kind regards, Andomar
> still this delete operation is not working and not a single row has been deleted from the table. This is an intended effect of ACID properties compliance = transactions are guaranteed to atomically show their effects upon completion or do nothing at all. This also applies to multiple data-changing statements if they're wrapped into a transaction. http://en.wikipedia.org/wiki/ACID. Postgres just fully implements that through http://en.wikipedia.org/wiki/Multiversion_concurrency_control As to why it takes forever, you're not providing enough information for a solid hypothesis. Speculation (8 million rows is not that big): - very slow and database-unfriendly storage stack (RAID 5 perhaps?) - missing indexes to match your clauses (on modern hardware a full table scan over 8 million rows should complete in the order of minutes. Try to run a select with the same clauses) - have you checked if there are other processes holding exclusive locks on the table/some of the records you're trying to delete? pg_lock and pg_stat_activity are your friends here - are there triggers on the table that could be slow/hitting locks in other tables? Regardless, you might want to split your statement into smaller chunks through less greedy clauses and see what happens. Regards F On 03/05/15 15:24, Mitu Verma wrote: > Hi, > > I am facing an issue with the deletion of huge data. > We have a cronscript which is used to delete the data of last 3 months from one of the tables. > Data in the table is large (8872597 as you can see the count below) since it is from last 3 months. > > fm_db_Server3=# select count(*) from audittraillogentry ; > > > count > --------- > 8872597 > (1 row) > > Now issue is that when this script for the deletion of data is launched , it is taking more than 7 days and doing nothingi.e not a single row has been deleted. > > Then we stopped the script,terminated the database sessions by using SELECT pg_terminate_backend(proc pid) and run thefollowing command > > delete from audittraillogentry where intime <= to_timestamp('2015-01-30 23:59:59.999', 'YYYY/MM/DD-HH24:MI:SS.FF3') ORouttime <= to_timestamp('2015-01-30 23:59:59.999', 'YYYY/MM/DD-HH24:MI:SS.FF3'); > still this delete operation is not working and not a single row has been deleted from the table. > > Now i have following questions - > > 1. If postgreSQL has some limitations for deletion of large data? > 2. If i should run the vacumm, after stopping the cron script ? because probably to get the "smaller" table? > 3. if dropping the indexes can help here? now sure. > 4.if i should think about partitioning , if there is any limitation while delaing with large data in postgreSQL? > > regards > Mitu > > _____ > ___________________________________ > >
Thank you so much all of you. Table audittraillogentry have PRIMARY KEY and FOREIGN KEY defined, below is the detail of existing table audittraillogentry. As you can see ., it is referenced by 2 tables , "cdrdetails" and "cdrlogentry" . "cdrdetails" table do not have the indexwhereas "cdrlogentry" has the index. Now after creating the index on "cdrdetails", deletion has become fast, 12 lakh records are deleted in 16 minutes, whichis a drastic improvement in performance. Before indexing deletion of 500 records were taking ~2 minutes. fm_db_Server1-> \d+ audittraillogentry Table "mmsuper.audittraillogentry" Column | Type | Modifiers | Storage | Description --------------------------+-----------------------------+-----------+----------+------------- event | smallint | | plain | innodeid | character varying(80) | | extended | innodename | character varying(80) | | extended | sourceid | character varying(300) | | extended | intime | timestamp without time zone | | plain | outnodeid | character varying(80) | | extended | outnodename | character varying(80) | | extended | destinationid | character varying(300) | | extended | outtime | timestamp without time zone | | plain | bytes | bigint | | plain | cdrs | bigint | | plain | tableindex | bigint | not null | plain | noofsubfilesinfile | bigint | | plain | recordsequencenumberlist | character varying(1000) | | extended | Indexes: "audittraillogentry_pkey" PRIMARY KEY, btree (tableindex), tablespace "mmdata" "audit_destid_index" btree (destinationid), tablespace "mmindex" "audit_intime_index" btree (intime DESC), tablespace "mmindex" "audit_outtime_index" btree (outtime DESC), tablespace "mmindex" "audit_sourceid_index" btree (sourceid), tablespace "mmindex" Referenced by: TABLE "cdrdetails" CONSTRAINT "audittableindex_fkey" FOREIGN KEY (audittableindex) REFERENCES audittraillogentry(tableindex)ON DELETE CASCADE TABLE "cdrlogentry" CONSTRAINT "cdrlogentry_audittableindex_fkey" FOREIGN KEY (audittableindex) REFERENCES audittraillogentry(tableindex) Has OIDs: no Tablespace: "mmdata" -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: May 03, 2015 9:43 AM To: Mitu Verma Cc: 'pgsql-general@postgresql.org' (pgsql-general@postgresql.org) Subject: Re: [GENERAL] delete is getting hung when there is a huge data in table Mitu Verma <mitu.verma@ericsson.com> writes: > 1. If postgreSQL has some limitations for deletion of large data? Not as such, but you've not given us any details that would permit comment. A reasonably likely bet is that this table is referenced by a foreign key in some other table, and that other table has noindex on the referencing column. That would make the FK is-it-ok-to-delete checks very slow. regards, tom lane
On Wed, May 6, 2015 at 1:56 AM, Mitu Verma <mitu.verma@ericsson.com> wrote: > Thank you so much all of you. > > Table audittraillogentry have PRIMARY KEY and FOREIGN KEY defined, below is the detail of existing table audittraillogentry. > > As you can see ., it is referenced by 2 tables , "cdrdetails" and "cdrlogentry" . "cdrdetails" table do not have the indexwhereas "cdrlogentry" has the index. > Now after creating the index on "cdrdetails", deletion has become fast, 12 lakh records are deleted in 16 minutes, whichis a drastic improvement in performance. > Before indexing deletion of 500 records were taking ~2 minutes. Yeah, this (unindexed foreign key causing slow deletes) is probably the #1 performance gotcha in SQL. If you're often doing very large deletes, sometimes it can help to attempt to work out a better strategy, perhaps one of: *) using TRUNCATE...CASADE *) table partitioning organized such that you can drop a partition to delete rows *) temporarily disabling RI during large deletes (can be dangerous and but in certain limited cases can be useful). merlin