make bulk deletes faster? - Mailing list pgsql-performance

From James Klo
Subject make bulk deletes faster?
Date
Msg-id jklo-C7336F.21104017122005@news.hub.org
Whole thread Raw
Responses Re: make bulk deletes faster?  (Michael Fuhr <mike@fuhr.org>)
Re: make bulk deletes faster?  (Mitch Skinner <lists@arctur.us>)
Re: make bulk deletes faster?  (Ang Chin Han <ang.chin.han@gmail.com>)
List pgsql-performance
I have the following table:

CREATE TABLE timeblock
(
  timeblockid int8 NOT NULL,
  starttime timestamp,
  endtime timestamp,
  duration int4,
  blocktypeid int8,
  domain_id int8,
  create_date timestamp,
  revision_date timestamp,
  scheduleid int8,
  CONSTRAINT timeblock_pkey PRIMARY KEY (timeblockid),
  CONSTRAINT fk25629e03312570b FOREIGN KEY (blocktypeid)
      REFERENCES blocktype (blocktypeid) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT fk25629e09be84177 FOREIGN KEY (domain_id)
      REFERENCES wa_common_domain (domain_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH OIDS;

CREATE INDEX timeblock_blocktype_idx
  ON timeblock
  USING btree
  (blocktypeid);

CREATE INDEX timeblock_date_idx
  ON timeblock
  USING btree
  (starttime, endtime);

CREATE INDEX timeblockepoch_idx
  ON timeblock
  USING btree
  (date_trunc('minute'::text, starttime), (date_part('epoch'::text,
date_trunc('minute'::text, starttime)) * 1000::double precision),
date_trunc('minute'::text, endtime), (date_part('epoch'::text,
date_trunc('minute'::text, endtime)) * 1000::double precision));

CREATE INDEX timeblockhourmin_idx
  ON timeblock
  USING btree
  (date_part('hour'::text, starttime), date_part('minute'::text,
starttime), date_part('hour'::text, endtime), date_part('minute'::text,
endtime));

CREATE INDEX timeblockid_idx
  ON timeblock
  USING btree
  (timeblockid);


There are also indexes on wa_common_domain and blocktype on pkeys.

explain analyze delete from timeblock where timeblockid = 666666

Index Scan using timeblockid_idx on timeblock  (cost=0.00..5.28 rows=1
width=6) (actual time=0.022..0.022 rows=0 loops=1)
  Index Cond: (timeblockid = 666666)
Total runtime: 0.069 ms


I need to routinely move data from the timeblock table to an archive
table with the same schema named timeblock_archive.  I really need this
to happen as quickly as possible, as the archive operation appears to
really tax the db server...

I'd like some suggestions on how to get the deletes to happen faster, as
while deleting individually appears to extremely fast, when I go to
delete lots of rows the operation takes an extremely long time to
complete (5000 rows takes about 3 minutes, 1000000 rows takes almost
close to 4 hours or more depending upon server load; wall time btw).

i've tried several different approaches doing the delete and I can't
seem to make it much faster... anyone have any ideas?

The approaches I've taken both use a temp table to define the set that
needs to be deleted.

Here's what I've tried:

Attempt 1:
----------
delete from timeblock where timeblockid in (select timeblockid from
timeblock_tmp)


Attempt 2:
----------
num_to_delete := (select count(1) from tmp_timeblock);
RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [%
rows]', timestart, num_to_delete;
cur_offset := 0;
while cur_offset < num_to_delete loop
   delete from timeblock where timeblockid in
      (select timeblockid from
         tmp_timeblock limit 100 offset cur_offset);
   get diagnostics num_affected = ROW_COUNT;
   RAISE DEBUG 'archiveDailyData(%): delete from timeblock [% rows]
cur_offset = %', timestart, num_affected, cur_offset;
      cur_offset := cur_offset + 100;
end loop;


Attempt 3:
----------
   num_to_delete := (select count(1) from tmp_timeblock);
   cur_offset := num_to_delete;
   RAISE DEBUG 'archiveDailyData(%): need to delete from timeblock [%
rows]', timestart, num_to_delete;
   open del_cursor for select timeblockid from tmp_timeblock;
   loop
      fetch del_cursor into del_pkey;
      if not found then
         exit;
      else
         delete from timeblock where timeblockid = del_pkey;
         get diagnostics num_affected = ROW_COUNT;
         cur_offset := cur_offset - num_affected;
         if cur_offset % 1000 = 0 then
            RAISE DEBUG 'archiveDailyData(%): delete from timeblock [%
left]', timestart, cur_offset;
         end if;
      end if;
   end loop;
   close del_cursor;


I've considered using min(starttime) and max(starttime) from the temp
table and doing:

delete from timeblock where starttime between min and max;

however, I'm concerned about leaving orphan data, deleting too much data
running into foreign key conflicts, etc.

dropping the indexes on timeblock could be bad, as this table recieves
has a high volume on reads, inserts & updates.

Any one have any suggestions?

Thanks,

Jim K

pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: Should Oracle outperform PostgreSQL on a complex
Next
From: Greg Stark
Date:
Subject: Re: Should Oracle outperform PostgreSQL on a complex