Thread: Bunching "transactions"

Bunching "transactions"

From
Jean-David Beyer
Date:
I have just changed around some programs that ran too slowly (too much time
in io-wait) and they speeded up greatly. This was not unexpected, but I
wonder about the limitations.

By transaction, I mean a single INSERT or a few related INSERTs.

What I used to do is roughly like this:

for each file {
   for each record {
      BEGIN WORK;
      INSERT stuff in table(s);
      if error {
    ROLLBACK WORK
      }
      else {
         COMMIT WORK;
      }
   }
}

The speedup was the obvious one:

for each file {
   BEGIN WORK;
   for each record {
      INSERT stuff in table(s);
   }
   if error {
      ROLLBACK WORK
   }
   else {
      COMMIT WORK;
   }
}

This means, of course, that the things I think of as transactions have been
bunched into a much smaller number of what postgreSQL thinks of as large
transactions, since there is only one per file rather than one per record.
Now if a file has several thousand records, this seems to work out just great.

But what is the limitation on such a thing? In this case, I am just
populating the database and there are no other users at such a time. I am
willing to lose the whole insert of a file if something goes wrong -- I
would fix whatever went wrong and start over anyway.

But at some point, disk IO would have to be done. Is this just a function of
how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it
have to do with wal_buffers and checkpoint_segments?

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 11:10:01 up 2 days, 3:28, 4 users, load average: 5.76, 5.70, 5.53

Re: Bunching "transactions"

From
Erik Jones
Date:
On Oct 25, 2007, at 10:30 AM, Jean-David Beyer wrote:

> I have just changed around some programs that ran too slowly (too
> much time
> in io-wait) and they speeded up greatly. This was not unexpected,
> but I
> wonder about the limitations.
>
> By transaction, I mean a single INSERT or a few related INSERTs.
>
> What I used to do is roughly like this:
>
> for each file {
>    for each record {
>       BEGIN WORK;
>       INSERT stuff in table(s);
>       if error {
>     ROLLBACK WORK
>       }
>       else {
>          COMMIT WORK;
>       }
>    }
> }
>
> The speedup was the obvious one:
>
> for each file {
>    BEGIN WORK;
>    for each record {
>       INSERT stuff in table(s);
>    }
>    if error {
>       ROLLBACK WORK
>    }
>    else {
>       COMMIT WORK;
>    }
> }
>
> This means, of course, that the things I think of as transactions
> have been
> bunched into a much smaller number of what postgreSQL thinks of as
> large
> transactions, since there is only one per file rather than one per
> record.
> Now if a file has several thousand records, this seems to work out
> just great.
>
> But what is the limitation on such a thing? In this case, I am just
> populating the database and there are no other users at such a
> time. I am
> willing to lose the whole insert of a file if something goes wrong
> -- I
> would fix whatever went wrong and start over anyway.
>
> But at some point, disk IO would have to be done. Is this just a
> function of
> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or
> does it
> have to do with wal_buffers and checkpoint_segments?

You're reading data from a file and generating inserts?  Can you not
use COPY?  That would be the most performant.

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: Bunching "transactions"

From
Heikki Linnakangas
Date:
Jean-David Beyer wrote:
> This means, of course, that the things I think of as transactions have been
> bunched into a much smaller number of what postgreSQL thinks of as large
> transactions, since there is only one per file rather than one per record.
> Now if a file has several thousand records, this seems to work out just great.

Using the small transactions, you were limited by the speed your hard
disk flush the commit WAL records to the disk. With small transactions
like that, it's not about the bandwidth, but latency of the hard drive.
Using larger transactions helps because you get more work done on each
disk operation.

Upcoming 8.3 release will have a feature called "asynchronous commit",
which should speed up those small transactions dramatically, if you
don't want to batch them into larger transactions like you did:

http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT

> But what is the limitation on such a thing? In this case, I am just
> populating the database and there are no other users at such a time. I am
> willing to lose the whole insert of a file if something goes wrong -- I
> would fix whatever went wrong and start over anyway.
>
> But at some point, disk IO would have to be done. Is this just a function of
> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it
> have to do with wal_buffers and checkpoint_segments?

Well, you have to do the I/O eventually, regardless of shared_buffers.
Common wisdom is that increasing wal_buffers from the default helps with
bulk loading like that, up to a point. Increasing checkpoint_segments
helps as well. After you've done all that, you're going to be limited by
either the bandwidth of your I/O system, or the speed of your CPU,
depending on your hardware. Using COPY instead of INSERTs will help if
it's CPU.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

Re: Bunching "transactions"

From
Tom Lane
Date:
Jean-David Beyer <jeandavid8@verizon.net> writes:
> But what is the limitation on such a thing?

AFAIR, the only limit on the size of a transaction is 2^32 commands
(due to CommandCounter being 32 bits).

            regards, tom lane

Re: Bunching "transactions"

From
Chris Browne
Date:
jeandavid8@verizon.net (Jean-David Beyer) writes:
> But what is the limitation on such a thing? In this case, I am just
> populating the database and there are no other users at such a time. I am
> willing to lose the whole insert of a file if something goes wrong -- I
> would fix whatever went wrong and start over anyway.
>
> But at some point, disk IO would have to be done. Is this just a function of
> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it
> have to do with wal_buffers and checkpoint_segments?

I have done bulk data loads where I was typically loading hundreds of
thousands of rows in as a single transaction, and it is worth
observing that loading in data from a pg_dump will do exactly the same
thing, where, in general, each table's data is loaded as a single
transaction.

It has tended to be the case that increasing the number of checkpoint
segments is helpful, though it's less obvious that this is the case in
8.2 and later versions, what with the ongoing changes to checkpoint
flushing.

In general, this isn't something that typically needs to get tuned
really finely; if you tune your DB, in general, "pretty big
transactions" should generally work fine, up to rather large sizes of
"pretty big."
--
"cbbrowne","@","acm.org"
http://linuxdatabases.info/info/languages.html
"Why use Windows, since there is a door?"
-- <fachat@galileo.rhein-neckar.de> Andre Fachat

Re: Bunching "transactions"

From
Jean-David Beyer
Date:
Chris Browne wrote:
> jeandavid8@verizon.net (Jean-David Beyer) writes:
>> But what is the limitation on such a thing? In this case, I am just
>> populating the database and there are no other users at such a time. I am
>> willing to lose the whole insert of a file if something goes wrong -- I
>> would fix whatever went wrong and start over anyway.
>>
>> But at some point, disk IO would have to be done. Is this just a function of
>> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it
>> have to do with wal_buffers and checkpoint_segments?
>
> I have done bulk data loads where I was typically loading hundreds of
> thousands of rows in as a single transaction, and it is worth
> observing that loading in data from a pg_dump will do exactly the same
> thing, where, in general, each table's data is loaded as a single
> transaction.

I guess a reasonable standard of performance would be that if my initial
population of the database takes only a little longer than a restore of the
database using pg_restore, I am pretty close, and that is good enough. Of
course, the restore depends on how fast my tape drive can pull the tape --
it claims up to 12 MB/sec transfer rate, so it looks as though it will be
tape-limited rather than postgreSQL-limited.
>
> It has tended to be the case that increasing the number of checkpoint
> segments is helpful, though it's less obvious that this is the case in
> 8.2 and later versions, what with the ongoing changes to checkpoint
> flushing.

I am running postgresql-8.1.9-1.el5 because that is what comes with RHEL5.
I probably will not upgrade until a little while after RHEL7 comes out,
since I hate upgrading.
>
> In general, this isn't something that typically needs to get tuned
> really finely; if you tune your DB, in general, "pretty big
> transactions" should generally work fine, up to rather large sizes of
> "pretty big."


--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 15:05:01 up 2 days, 7:23, 5 users, load average: 4.11, 4.22, 4.16

Re: Bunching "transactions"

From
Chris Browne
Date:
jeandavid8@verizon.net (Jean-David Beyer) writes:
> Chris Browne wrote:
>> jeandavid8@verizon.net (Jean-David Beyer) writes:
>>> But what is the limitation on such a thing? In this case, I am just
>>> populating the database and there are no other users at such a time. I am
>>> willing to lose the whole insert of a file if something goes wrong -- I
>>> would fix whatever went wrong and start over anyway.
>>>
>>> But at some point, disk IO would have to be done. Is this just a function of
>>> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it
>>> have to do with wal_buffers and checkpoint_segments?
>>
>> I have done bulk data loads where I was typically loading hundreds of
>> thousands of rows in as a single transaction, and it is worth
>> observing that loading in data from a pg_dump will do exactly the same
>> thing, where, in general, each table's data is loaded as a single
>> transaction.
>
> I guess a reasonable standard of performance would be that if my initial
> population of the database takes only a little longer than a restore of the
> database using pg_restore, I am pretty close, and that is good enough. Of
> course, the restore depends on how fast my tape drive can pull the tape --
> it claims up to 12 MB/sec transfer rate, so it looks as though it will be
> tape-limited rather than postgreSQL-limited.

That's quite possible.

There is a further factor, which is that grouping things into larger
transactions has very clearly diminishing returns.

Supposing you have a stream of 50,000 operations updating one tuple
(those could be UPDATE, DELETE, or INSERT; it is not, at first order,
material what sort they are), then the effects of grouping are thus...

- With none...

 Cost = cost of doing 50,000 updates
        + cost of doing 50,000 COMMITs

- If you COMMIT after every 2 updates

 Cost = cost of doing 50,000 updates
        + cost of doing 25,000 COMMITs

- If you COMMIT after every 10 updates

 Cost = cost of doing 50,000 updates
        + cost of doing 5,000 COMMITs

- If you COMMIT after every 100 updates

 Cost = cost of doing 50,000 updates
        + cost of doing 500 COMMITs

The amount of work that COMMIT does is fairly much constant,
regardless of the number of updates in the transaction, so that the
cost, in that equation, of COMMITs pretty quickly evaporates to
irrelevancy.

And increasing the sizes of the transactions does not give you
*increasing* performance improvements; the improvements will tend to
decline.

I wouldn't worry about trying to strictly minimize the number of
transactions COMMITted; once you have grouped "enough" data into one
transaction, that should be good enough.

Further, the Right Thing is to group related data together, and come
up with a policy that is driven primarily by the need for data
consistency.  If things work well enough, then don't go off trying to
optimize something that doesn't really need optimization, and perhaps
break the logic of the application.
--
output = ("cbbrowne" "@" "acm.org")
http://cbbrowne.com/info/unix.html
Users should cultivate an ability to make the simplest molehill into a
mountain   by   finding   controversial interpretations   of innocuous
sounding statements that the sender never intended or imagined.
-- from the Symbolics Guidelines for Sending Mail

Re: Bunching "transactions"

From
Jean-David Beyer
Date:
Heikki Linnakangas wrote:
> Jean-David Beyer wrote:
>
>> My IO system has two Ultra/320 LVD SCSI controllers and 6 10,000rpm SCSI
>> hard drives. The dual SCSI controller is on its own PCI-X bus (the machine
>> has 5 independent PCI-X busses). Two hard drives are on one SCSI controller
>> and the other four are on the other. The WAL is on the first controller,
>> most of the rest is on the other controller. Once in a while, I get 144
>> Megabytes/sec transfers for a few seconds at a time to the hard drive
>> system, where I have an advertizing-maximum of 640 Megabytes/second. Each
>> hard drive claims to take a sustained data rate of about 80
>> Megabytes/second. When I test it, I can get 55 and sometimes a little more
>> for a single drive.
>
> You might find that you get better performance by just putting all the
> drives on a single RAID array. Or not :-). I'm not a hardware guy
> myself, but having read this mailing list for some time, I've seen
> different people come to different conclusions on that one. I guess it
> depends on the hardware and the application.

In the old days, I was a "hardware guy." But not in the last 15 years or so
(although I did put this machine together from parts). Right now, I do not
think I would get more performance with a single RAID array. Certainly not
if it were software RAID. Right now, I have the WAL on one drive that is not
heavily used when doing bulk loading of the database, and the main data on
the other 4 drives on a different SCSI controller. Measurements revealed
that THE bottleneck was the writing to the WAL.

The indices for any one table are on a different drive from the data itself
to minimize seek contention (and IO transmission contention, too, but that
does not seem to be an issue). Note that now the machine is only in IO-WAIT
state less than 1% of the time, and I no longer notice the main postgres
server process in D state. It used to be in D state a lot of the time before
I started bunching transactions. The IO to the drive with the WAL dropped
from a little over 3000 sectors per second to about 700 sectors per second,
for one thing. And the IO bandwidth consumed dropped, IIRC, about 50%.
>
>> Likewise, I seemto have enough processing power.
>>
>> top - 12:47:22 up 2 days,  5:06,  4 users,  load average: 1.40, 3.13, 4.20
>> Tasks: 168 total,   3 running, 165 sleeping,   0 stopped,   0 zombie
>> Cpu0  : 29.5%us,  3.3%sy,  0.0%ni, 67.0%id,  0.2%wa,  0.0%hi,  0.0%si,
>> Cpu1  : 21.8%us,  3.1%sy,  0.0%ni, 73.7%id,  1.4%wa,  0.0%hi,  0.0%si,
>> Cpu2  : 24.6%us,  3.6%sy,  0.0%ni, 71.7%id,  0.1%wa,  0.0%hi,  0.0%si,
>> Cpu3  : 23.1%us,  2.7%sy,  0.0%ni, 74.0%id,  0.1%wa,  0.1%hi,  0.0%si,
>> Mem:   8185340k total,  5112656k used,  3072684k free,    32916k buffers
>> Swap:  4096496k total,      384k used,  4096112k free,  4549536k cached
>>
 > Actually it looks like you've saturated the CPU.

How do you figure that? There are two or four (depending on how you count
them) CPUs. The CPUs seem to be running at 75% idle. If I let BOINC
processes run (nice 19), I can soak up most of this idle time. I turned them
off for the purpose of these measurements because they hide the io-wait times.

> Postgres backends are
> single-threaded, so a single bulk load like that won't use more than one
> CPU at a time. If you add up the usr percentages above, it's ~100%.

If you add up the idle percentages, it is about 300%. Recall that there are
two hyperthreaded processors here. That is more than two processors (but
less than four). If I examine the postgres processes, one of them used to
get to 100% once in a while when I did things like DELETE FROM tablename;
but I do a TRUNCATE now and it is much faster. Now any single process peaks
at 80% of a CPU and usually runs at less than 50%. The postgres processes
run on multiple CPUS. Looking at the top command, normally my client runs at
around 20% on one CPU, the main postgres server runs on a second at between
30% and 80% (depends on which tables I am loading), and the writer runs on
yet another. The two loggers wander around more. But these last three run at
around 1% each. In fact, the writer is idle much of the time.
>
> You should switch to using COPY if you can.
>
Someone else posted that I should not get neurotic about squeezing the last
little bit out of this (not his exact words), and I agree. This is only for
doing an initial load of the database after all. And as long as the total
time is acceptable, that is good enough. When I first started this (using
DB2), one of the loads used to take something like 10 hours. Redesigning my
basic approach got that time down to about 2 hours without too much
programming effort. As the amount of data has increased, that started
creeping up, and one of the tables, that has about 6,000,000 entries at the
moment, took overnight to load. That is why I looked into bunching these
transactions, with gratifying results.

To use COPY, I would have to write a bunch of special purpose programs to
convert the data as I get them into a form that COPY could handle them. (I
imagine pg_dump and pg_restore use COPY). And running those would take time
too. There ought to be a law against making huge spreadsheets for data, but
people who should be using a relational database for things seem more
comfortable with spreadsheets. So that is the form in which I get these data.

From a programming point of view, I hate spreadsheets because the
calculations and the data are intermixed, and you cannot see what the
calculations are unless you specifically look for them. And the way people
design (if that is the proper term) a spreadsheet results in something that
could not be considered normalized in any sense of that term.
One of these tables has columns from A all the way to KC (I guess that is
over 300 columns), and I would not be able to use such a table even if
postgres would accept one. IIRC, DB2 would not take such a wide one, but I
am not sure about that anymore. Anyhow, I believe in separation of concerns,
and mixing programs and data as in spreadsheets is a step in the wrong
direction.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 08:15:01 up 3 days, 33 min, 0 users, load average: 4.09, 4.15, 4.20

Re: Bunching "transactions"

From
Jean-David Beyer
Date:
Chris Browne wrote:

> Further, the Right Thing is to group related data together, and come
> up with a policy that is driven primarily by the need for data
> consistency.  If things work well enough, then don't go off trying to
> optimize something that doesn't really need optimization, and perhaps
> break the logic of the application.

Right. I think it was Jon Louis Bently who wrote (in his book, "Writing
Efficient Programs") something to the effect, "Premature optimization is the
root of all evil." Just because so much of it broke the logic of the
application (and did not help anyway). (Gotta profile first, for one thing.)

I had a boss once who insisted we write everyting in assembly language for
efficiency. We did not even know what algorithms we needed for the
application. And at the time (System 360 days), IBM did not even publish the
execution times for the instruction set of the machine we were using because
so many executed in zero-time -- overlapped with other instructions, local
caching in the processor, locality of memory reference, and so on. To get
efficiency, you must first get your algorithms right, including getting the
best ones for the problem at hand.

--
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 10:05:01 up 3 days, 2:23, 1 user, load average: 4.10, 4.24, 4.18

Re: Bunching "transactions"

From
Greg Smith
Date:
On Fri, 26 Oct 2007, Jean-David Beyer wrote:

> I think it was Jon Louis Bently who wrote (in his book, "Writing
> Efficient Programs") something to the effect, "Premature optimization is
> the root of all evil."

That quote originally comes from Tony Hoare, popularized by a paper
written by Donald Knuth in 1974.  The full statement is "We should forget
about small efficiencies, say about 97% of the time: premature
optimization is the root of all evil.  Yet we should not pass up our
opportunities in that critical 3%."

My favorite sound-bite on this topic is from William Wulf:  "More
computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including
blind stupidity."  That was back in 1972.  Both his and Knuth's papers
centered on abusing GOTO, which typically justified at the time via
performance concerns.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD