Joseph Shraibman <jks@selectacast.net> writes:
> Doug McNaught wrote:
> >
> > If you don't, every update is its own transaction, and Postgres will
> > sync the disks (and wait for the sync to complete) after every one.
> > Doing N updates in one transaction will only sync after the whole
> > transaction is complete. Trust me; it's *way* faster.
>
> I thought WAL did away with most of the syncing.
Most but not all. It still makes a big difference. Transactions have
other overhead, too (housekeeping and such).
> Do you really think I should do 1000 updates in a transaction instead of
> an IN with 1000 items? I can do my buffer flush any way I want but I'd
> have to think the overhead of making 1000 calls to the backend would be
> more than overwhelm the cost of the big OR statement (especially if the
> server and client aren't on the same machine).
Perhaps.
The issue for me would be: OK, 1000 entries in an IN works fine.
Maybe 2000 works fine. At some point (as you've seen) you hit a
limit, whether it's query length, recursion depth or whatever. Then
you have to go rewrite your code. I like to do it right the first
time. ;)
If you know you will never ever have more than N items in the IN
clause, and N is demonstrably less than the limit, use IN. "Never
ever" is a phrase that often turns out to be false in software
devlopment...
If you're doing the updates in batches (say, 1000 at a time using IN)
you still might want to consider wrapping the whole thing in a
transaction. That way, if the client or the network craps out in the
middle of the run, you don't have a half-complete set of updates to
clean up.
-Doug
--
The rain man gave me two cures; he said jump right in,
The first was Texas medicine--the second was just railroad gin,
And like a fool I mixed them, and it strangled up my mind,
Now people just get uglier, and I got no sense of time... --Dylan