Thread: Bunching "transactions"
I have just changed around some programs that ran too slowly (too much time in io-wait) and they speeded up greatly. This was not unexpected, but I wonder about the limitations. By transaction, I mean a single INSERT or a few related INSERTs. What I used to do is roughly like this: for each file { for each record { BEGIN WORK; INSERT stuff in table(s); if error { ROLLBACK WORK } else { COMMIT WORK; } } } The speedup was the obvious one: for each file { BEGIN WORK; for each record { INSERT stuff in table(s); } if error { ROLLBACK WORK } else { COMMIT WORK; } } This means, of course, that the things I think of as transactions have been bunched into a much smaller number of what postgreSQL thinks of as large transactions, since there is only one per file rather than one per record. Now if a file has several thousand records, this seems to work out just great. But what is the limitation on such a thing? In this case, I am just populating the database and there are no other users at such a time. I am willing to lose the whole insert of a file if something goes wrong -- I would fix whatever went wrong and start over anyway. But at some point, disk IO would have to be done. Is this just a function of how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it have to do with wal_buffers and checkpoint_segments? -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 11:10:01 up 2 days, 3:28, 4 users, load average: 5.76, 5.70, 5.53
On Oct 25, 2007, at 10:30 AM, Jean-David Beyer wrote: > I have just changed around some programs that ran too slowly (too > much time > in io-wait) and they speeded up greatly. This was not unexpected, > but I > wonder about the limitations. > > By transaction, I mean a single INSERT or a few related INSERTs. > > What I used to do is roughly like this: > > for each file { > for each record { > BEGIN WORK; > INSERT stuff in table(s); > if error { > ROLLBACK WORK > } > else { > COMMIT WORK; > } > } > } > > The speedup was the obvious one: > > for each file { > BEGIN WORK; > for each record { > INSERT stuff in table(s); > } > if error { > ROLLBACK WORK > } > else { > COMMIT WORK; > } > } > > This means, of course, that the things I think of as transactions > have been > bunched into a much smaller number of what postgreSQL thinks of as > large > transactions, since there is only one per file rather than one per > record. > Now if a file has several thousand records, this seems to work out > just great. > > But what is the limitation on such a thing? In this case, I am just > populating the database and there are no other users at such a > time. I am > willing to lose the whole insert of a file if something goes wrong > -- I > would fix whatever went wrong and start over anyway. > > But at some point, disk IO would have to be done. Is this just a > function of > how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or > does it > have to do with wal_buffers and checkpoint_segments? You're reading data from a file and generating inserts? Can you not use COPY? That would be the most performant. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Jean-David Beyer wrote: > This means, of course, that the things I think of as transactions have been > bunched into a much smaller number of what postgreSQL thinks of as large > transactions, since there is only one per file rather than one per record. > Now if a file has several thousand records, this seems to work out just great. Using the small transactions, you were limited by the speed your hard disk flush the commit WAL records to the disk. With small transactions like that, it's not about the bandwidth, but latency of the hard drive. Using larger transactions helps because you get more work done on each disk operation. Upcoming 8.3 release will have a feature called "asynchronous commit", which should speed up those small transactions dramatically, if you don't want to batch them into larger transactions like you did: http://www.postgresql.org/docs/8.3/static/runtime-config-wal.html#GUC-SYNCHRONOUS-COMMIT > But what is the limitation on such a thing? In this case, I am just > populating the database and there are no other users at such a time. I am > willing to lose the whole insert of a file if something goes wrong -- I > would fix whatever went wrong and start over anyway. > > But at some point, disk IO would have to be done. Is this just a function of > how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it > have to do with wal_buffers and checkpoint_segments? Well, you have to do the I/O eventually, regardless of shared_buffers. Common wisdom is that increasing wal_buffers from the default helps with bulk loading like that, up to a point. Increasing checkpoint_segments helps as well. After you've done all that, you're going to be limited by either the bandwidth of your I/O system, or the speed of your CPU, depending on your hardware. Using COPY instead of INSERTs will help if it's CPU. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Jean-David Beyer <jeandavid8@verizon.net> writes: > But what is the limitation on such a thing? AFAIR, the only limit on the size of a transaction is 2^32 commands (due to CommandCounter being 32 bits). regards, tom lane
jeandavid8@verizon.net (Jean-David Beyer) writes: > But what is the limitation on such a thing? In this case, I am just > populating the database and there are no other users at such a time. I am > willing to lose the whole insert of a file if something goes wrong -- I > would fix whatever went wrong and start over anyway. > > But at some point, disk IO would have to be done. Is this just a function of > how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it > have to do with wal_buffers and checkpoint_segments? I have done bulk data loads where I was typically loading hundreds of thousands of rows in as a single transaction, and it is worth observing that loading in data from a pg_dump will do exactly the same thing, where, in general, each table's data is loaded as a single transaction. It has tended to be the case that increasing the number of checkpoint segments is helpful, though it's less obvious that this is the case in 8.2 and later versions, what with the ongoing changes to checkpoint flushing. In general, this isn't something that typically needs to get tuned really finely; if you tune your DB, in general, "pretty big transactions" should generally work fine, up to rather large sizes of "pretty big." -- "cbbrowne","@","acm.org" http://linuxdatabases.info/info/languages.html "Why use Windows, since there is a door?" -- <fachat@galileo.rhein-neckar.de> Andre Fachat
Chris Browne wrote: > jeandavid8@verizon.net (Jean-David Beyer) writes: >> But what is the limitation on such a thing? In this case, I am just >> populating the database and there are no other users at such a time. I am >> willing to lose the whole insert of a file if something goes wrong -- I >> would fix whatever went wrong and start over anyway. >> >> But at some point, disk IO would have to be done. Is this just a function of >> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it >> have to do with wal_buffers and checkpoint_segments? > > I have done bulk data loads where I was typically loading hundreds of > thousands of rows in as a single transaction, and it is worth > observing that loading in data from a pg_dump will do exactly the same > thing, where, in general, each table's data is loaded as a single > transaction. I guess a reasonable standard of performance would be that if my initial population of the database takes only a little longer than a restore of the database using pg_restore, I am pretty close, and that is good enough. Of course, the restore depends on how fast my tape drive can pull the tape -- it claims up to 12 MB/sec transfer rate, so it looks as though it will be tape-limited rather than postgreSQL-limited. > > It has tended to be the case that increasing the number of checkpoint > segments is helpful, though it's less obvious that this is the case in > 8.2 and later versions, what with the ongoing changes to checkpoint > flushing. I am running postgresql-8.1.9-1.el5 because that is what comes with RHEL5. I probably will not upgrade until a little while after RHEL7 comes out, since I hate upgrading. > > In general, this isn't something that typically needs to get tuned > really finely; if you tune your DB, in general, "pretty big > transactions" should generally work fine, up to rather large sizes of > "pretty big." -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 15:05:01 up 2 days, 7:23, 5 users, load average: 4.11, 4.22, 4.16
jeandavid8@verizon.net (Jean-David Beyer) writes: > Chris Browne wrote: >> jeandavid8@verizon.net (Jean-David Beyer) writes: >>> But what is the limitation on such a thing? In this case, I am just >>> populating the database and there are no other users at such a time. I am >>> willing to lose the whole insert of a file if something goes wrong -- I >>> would fix whatever went wrong and start over anyway. >>> >>> But at some point, disk IO would have to be done. Is this just a function of >>> how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it >>> have to do with wal_buffers and checkpoint_segments? >> >> I have done bulk data loads where I was typically loading hundreds of >> thousands of rows in as a single transaction, and it is worth >> observing that loading in data from a pg_dump will do exactly the same >> thing, where, in general, each table's data is loaded as a single >> transaction. > > I guess a reasonable standard of performance would be that if my initial > population of the database takes only a little longer than a restore of the > database using pg_restore, I am pretty close, and that is good enough. Of > course, the restore depends on how fast my tape drive can pull the tape -- > it claims up to 12 MB/sec transfer rate, so it looks as though it will be > tape-limited rather than postgreSQL-limited. That's quite possible. There is a further factor, which is that grouping things into larger transactions has very clearly diminishing returns. Supposing you have a stream of 50,000 operations updating one tuple (those could be UPDATE, DELETE, or INSERT; it is not, at first order, material what sort they are), then the effects of grouping are thus... - With none... Cost = cost of doing 50,000 updates + cost of doing 50,000 COMMITs - If you COMMIT after every 2 updates Cost = cost of doing 50,000 updates + cost of doing 25,000 COMMITs - If you COMMIT after every 10 updates Cost = cost of doing 50,000 updates + cost of doing 5,000 COMMITs - If you COMMIT after every 100 updates Cost = cost of doing 50,000 updates + cost of doing 500 COMMITs The amount of work that COMMIT does is fairly much constant, regardless of the number of updates in the transaction, so that the cost, in that equation, of COMMITs pretty quickly evaporates to irrelevancy. And increasing the sizes of the transactions does not give you *increasing* performance improvements; the improvements will tend to decline. I wouldn't worry about trying to strictly minimize the number of transactions COMMITted; once you have grouped "enough" data into one transaction, that should be good enough. Further, the Right Thing is to group related data together, and come up with a policy that is driven primarily by the need for data consistency. If things work well enough, then don't go off trying to optimize something that doesn't really need optimization, and perhaps break the logic of the application. -- output = ("cbbrowne" "@" "acm.org") http://cbbrowne.com/info/unix.html Users should cultivate an ability to make the simplest molehill into a mountain by finding controversial interpretations of innocuous sounding statements that the sender never intended or imagined. -- from the Symbolics Guidelines for Sending Mail
Heikki Linnakangas wrote: > Jean-David Beyer wrote: > >> My IO system has two Ultra/320 LVD SCSI controllers and 6 10,000rpm SCSI >> hard drives. The dual SCSI controller is on its own PCI-X bus (the machine >> has 5 independent PCI-X busses). Two hard drives are on one SCSI controller >> and the other four are on the other. The WAL is on the first controller, >> most of the rest is on the other controller. Once in a while, I get 144 >> Megabytes/sec transfers for a few seconds at a time to the hard drive >> system, where I have an advertizing-maximum of 640 Megabytes/second. Each >> hard drive claims to take a sustained data rate of about 80 >> Megabytes/second. When I test it, I can get 55 and sometimes a little more >> for a single drive. > > You might find that you get better performance by just putting all the > drives on a single RAID array. Or not :-). I'm not a hardware guy > myself, but having read this mailing list for some time, I've seen > different people come to different conclusions on that one. I guess it > depends on the hardware and the application. In the old days, I was a "hardware guy." But not in the last 15 years or so (although I did put this machine together from parts). Right now, I do not think I would get more performance with a single RAID array. Certainly not if it were software RAID. Right now, I have the WAL on one drive that is not heavily used when doing bulk loading of the database, and the main data on the other 4 drives on a different SCSI controller. Measurements revealed that THE bottleneck was the writing to the WAL. The indices for any one table are on a different drive from the data itself to minimize seek contention (and IO transmission contention, too, but that does not seem to be an issue). Note that now the machine is only in IO-WAIT state less than 1% of the time, and I no longer notice the main postgres server process in D state. It used to be in D state a lot of the time before I started bunching transactions. The IO to the drive with the WAL dropped from a little over 3000 sectors per second to about 700 sectors per second, for one thing. And the IO bandwidth consumed dropped, IIRC, about 50%. > >> Likewise, I seemto have enough processing power. >> >> top - 12:47:22 up 2 days, 5:06, 4 users, load average: 1.40, 3.13, 4.20 >> Tasks: 168 total, 3 running, 165 sleeping, 0 stopped, 0 zombie >> Cpu0 : 29.5%us, 3.3%sy, 0.0%ni, 67.0%id, 0.2%wa, 0.0%hi, 0.0%si, >> Cpu1 : 21.8%us, 3.1%sy, 0.0%ni, 73.7%id, 1.4%wa, 0.0%hi, 0.0%si, >> Cpu2 : 24.6%us, 3.6%sy, 0.0%ni, 71.7%id, 0.1%wa, 0.0%hi, 0.0%si, >> Cpu3 : 23.1%us, 2.7%sy, 0.0%ni, 74.0%id, 0.1%wa, 0.1%hi, 0.0%si, >> Mem: 8185340k total, 5112656k used, 3072684k free, 32916k buffers >> Swap: 4096496k total, 384k used, 4096112k free, 4549536k cached >> > Actually it looks like you've saturated the CPU. How do you figure that? There are two or four (depending on how you count them) CPUs. The CPUs seem to be running at 75% idle. If I let BOINC processes run (nice 19), I can soak up most of this idle time. I turned them off for the purpose of these measurements because they hide the io-wait times. > Postgres backends are > single-threaded, so a single bulk load like that won't use more than one > CPU at a time. If you add up the usr percentages above, it's ~100%. If you add up the idle percentages, it is about 300%. Recall that there are two hyperthreaded processors here. That is more than two processors (but less than four). If I examine the postgres processes, one of them used to get to 100% once in a while when I did things like DELETE FROM tablename; but I do a TRUNCATE now and it is much faster. Now any single process peaks at 80% of a CPU and usually runs at less than 50%. The postgres processes run on multiple CPUS. Looking at the top command, normally my client runs at around 20% on one CPU, the main postgres server runs on a second at between 30% and 80% (depends on which tables I am loading), and the writer runs on yet another. The two loggers wander around more. But these last three run at around 1% each. In fact, the writer is idle much of the time. > > You should switch to using COPY if you can. > Someone else posted that I should not get neurotic about squeezing the last little bit out of this (not his exact words), and I agree. This is only for doing an initial load of the database after all. And as long as the total time is acceptable, that is good enough. When I first started this (using DB2), one of the loads used to take something like 10 hours. Redesigning my basic approach got that time down to about 2 hours without too much programming effort. As the amount of data has increased, that started creeping up, and one of the tables, that has about 6,000,000 entries at the moment, took overnight to load. That is why I looked into bunching these transactions, with gratifying results. To use COPY, I would have to write a bunch of special purpose programs to convert the data as I get them into a form that COPY could handle them. (I imagine pg_dump and pg_restore use COPY). And running those would take time too. There ought to be a law against making huge spreadsheets for data, but people who should be using a relational database for things seem more comfortable with spreadsheets. So that is the form in which I get these data. From a programming point of view, I hate spreadsheets because the calculations and the data are intermixed, and you cannot see what the calculations are unless you specifically look for them. And the way people design (if that is the proper term) a spreadsheet results in something that could not be considered normalized in any sense of that term. One of these tables has columns from A all the way to KC (I guess that is over 300 columns), and I would not be able to use such a table even if postgres would accept one. IIRC, DB2 would not take such a wide one, but I am not sure about that anymore. Anyhow, I believe in separation of concerns, and mixing programs and data as in spreadsheets is a step in the wrong direction. -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 08:15:01 up 3 days, 33 min, 0 users, load average: 4.09, 4.15, 4.20
Chris Browne wrote: > Further, the Right Thing is to group related data together, and come > up with a policy that is driven primarily by the need for data > consistency. If things work well enough, then don't go off trying to > optimize something that doesn't really need optimization, and perhaps > break the logic of the application. Right. I think it was Jon Louis Bently who wrote (in his book, "Writing Efficient Programs") something to the effect, "Premature optimization is the root of all evil." Just because so much of it broke the logic of the application (and did not help anyway). (Gotta profile first, for one thing.) I had a boss once who insisted we write everyting in assembly language for efficiency. We did not even know what algorithms we needed for the application. And at the time (System 360 days), IBM did not even publish the execution times for the instruction set of the machine we were using because so many executed in zero-time -- overlapped with other instructions, local caching in the processor, locality of memory reference, and so on. To get efficiency, you must first get your algorithms right, including getting the best ones for the problem at hand. -- .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 10:05:01 up 3 days, 2:23, 1 user, load average: 4.10, 4.24, 4.18
On Fri, 26 Oct 2007, Jean-David Beyer wrote: > I think it was Jon Louis Bently who wrote (in his book, "Writing > Efficient Programs") something to the effect, "Premature optimization is > the root of all evil." That quote originally comes from Tony Hoare, popularized by a paper written by Donald Knuth in 1974. The full statement is "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%." My favorite sound-bite on this topic is from William Wulf: "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity." That was back in 1972. Both his and Knuth's papers centered on abusing GOTO, which typically justified at the time via performance concerns. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD