Re: Delete/update with limit - Mailing list pgsql-general

From Chris Browne
Subject Re: Delete/update with limit
Date
Msg-id 60644bymua.fsf@dba2.int.libertyrms.com
Whole thread Raw
In response to Delete/update with limit  (Csaba Nagy <nagy@ecircle-ag.com>)
List pgsql-general
nagy@ecircle-ag.com (Csaba Nagy) writes:
> In postgres we're currently not chunking, due to the fact that the code
> to do it is simply overly contorted and inefficient compared to the
> other DBs we use. At least all the solutions we could think of to do the
> chunking in a safe way while the inserts are running in parallel,
> without disturbing them, have invariably resulted in overly complicated
> code compared to the simple delete with limit + delete trigger solution
> we have for the other DBs.
>
> Now I don't put too much hope I can convince anybody that the limit on
> the delete/update commands has valid usage scenarios, but then can
> anybody help me find a good solution to chunk-wise process such a buffer
> table where insert speed is the highest priority (thus no indexes, the
> minimum of fields), and batch processing should still work fine with big
> table size, while not impacting at all the inserts, and finish in short
> time to avoid long running transactions ? Cause I can't really think of
> one... other than our scheme with the delete with limit + trigger +
> private temp table thing.

All that comes to mind is to put a SERIAL primary key on the table,
which shouldn't be *too* terribly expensive an overhead, assuming
there is reasonably complex processing going on; you then do something
like:

- select ID from the incoming table, order by ID, limit 500, to grab a
  list of IDs;

- delete from the table for that set of IDs.

Actually, is there any particular reason why you couldn't simply have
your "batch processing" loop look like:

Loop Forever
  DELETE from incoming_table;
  VACUUM incoming_table;
End Loop;

???

The alternative that I suggested amounts to:

Loop Forever
  DELETE from incoming_table where id in (select id from incoming_table limit 500);
  VACUUM incoming_table;
End Loop;

I realize you're concerned that maintaining the index will be too
costly; I don't think it is obvious without actual benchmarking that
this is *in fact* too costly.

I'm pretty sure of one countervailing consideration: there's a cost to
VACUUMing the table that will throw in some costs; it is possible that
the cost of the index would be noise against that.
--
"cbbrowne","@","acm.org"
http://cbbrowne.com/info/lisp.html
When a man talks dirty to a woman, its sexual harassment. When a woman
talks dirty to a man, it's 3.95 per minute.

pgsql-general by date:

Previous
From: Ben
Date:
Subject: Re: two phase commit
Next
From: Andrew Sullivan
Date:
Subject: Re: two phase commit