Thread: performance of insert/delete/update
There had been a great deal of discussion of how to improve the performance of select/sorting on this list, what about insert/delete/update? Is there any rules of thumb we need to follow? What are the parameters we should tweak to whip the horse to go faster? Thanks -- Wei Weng Network Software Engineer KenCast Inc.
Wei, > There had been a great deal of discussion of how to improve the > performance of select/sorting on this list, what about > insert/delete/update? > > Is there any rules of thumb we need to follow? What are the > parameters > we should tweak to whip the horse to go faster? yes, lots of rules. Wanna be more specific? You wondering about query structure, hardware, memory config, what? -Josh Berkus
On Thu, 2002-11-21 at 16:23, Josh Berkus wrote: > Wei, > > > There had been a great deal of discussion of how to improve the > > performance of select/sorting on this list, what about > > insert/delete/update? > > > > Is there any rules of thumb we need to follow? What are the > > parameters > > we should tweak to whip the horse to go faster? > > yes, lots of rules. Wanna be more specific? You wondering about > query structure, hardware, memory config, what? I am most concerned about the software side, that is query structures and postgresql config. Thanks -- Wei Weng Network Software Engineer KenCast Inc.
On 21 Nov 2002, Wei Weng wrote: > On Thu, 2002-11-21 at 16:23, Josh Berkus wrote: > > Wei, > > > > > There had been a great deal of discussion of how to improve the > > > performance of select/sorting on this list, what about > > > insert/delete/update? > > > > > > Is there any rules of thumb we need to follow? What are the > > > parameters > > > we should tweak to whip the horse to go faster? > > > > yes, lots of rules. Wanna be more specific? You wondering about > > query structure, hardware, memory config, what? > I am most concerned about the software side, that is query structures > and postgresql config. The absolutely most important thing to do to speed up inserts and updates is to squeeze as many as you can into one transaction. Within reason, of course. There's no great gain in putting more than a few thousand together at a time. If your application is only doing one or two updates in a transaction, it's going to be slower in terms of records written per second than an application that is updating 100 rows in a transaction. Reducing triggers and foreign keys on the inserted tables to a minimum helps. Inserting into temporary holding tables and then having a regular process that migrates the data into the main tables is sometimes necessary if you're putting a lot of smaller inserts into a very large dataset. Then using a unioned view to show the two tables as one. Putting WAL (e.g. $PGDATA/pg_xlog directory) on it's own drive(s). Putting indexes that have to be updated during inserts onto their own drive(s). Performing regular vacuums on heavily updated tables. Also, if your hardware is reliable, you can turn off fsync in postgresql.conf. That can increase performance by anywhere from 2 to 10 times, depending on your application.
Scott, > The absolutely most important thing to do to speed up inserts and > updates > is to squeeze as many as you can into one transaction. Within > reason, of > course. There's no great gain in putting more than a few thousand > together at a time. If your application is only doing one or two > updates > in a transaction, it's going to be slower in terms of records written > per > second than an application that is updating 100 rows in a > transaction. This only works up to the limit of the memory you have available for Postgres. If the updates in one transaction exceed your available memory, you'll see a lot of swaps to disk log that will slow things down by a factor of 10-50 times. > Reducing triggers and foreign keys on the inserted tables to a > minimum > helps. ... provided that this will not jeapordize your data integrity. If you have indispensable triggers in PL/pgSQL, re-qriting them in C will make them, and thus updates on their tables, faster. Also, for foriegn keys, it speeds up inserts and updates on parent tables with many child records if the foriegn key column in the child table is indexed. > Putting WAL (e.g. $PGDATA/pg_xlog directory) on it's own drive(s). > > Putting indexes that have to be updated during inserts onto their own > > drive(s). > > Performing regular vacuums on heavily updated tables. > > Also, if your hardware is reliable, you can turn off fsync in > postgresql.conf. That can increase performance by anywhere from 2 to > 10 > times, depending on your application. It can be dangerous though ... in the event of a power outage, for example, your database could be corrupted and difficult to recover. So ... "at your own risk". I've found that switching from fsync to fdatasync on Linux yields marginal performance gain ... about 10-20%. Also, if you are doing large updates (many records at once) you may want to increase WAL_FILES and CHECKPOINT_BUFFER in postgresql.conf to allow for large transactions. Finally, you want to structure your queries so that you do the minimum number of update writes possible, or insert writes. For example, a procedure that inserts a row, does some calculations, and then modifies several fields in that row is going to slow stuff down significantly compared to doing the calculations as variables and only a single insert. Certainly don't hit a table with 8 updates, each updating one field instead of a single update statement. -Josh Berkus
On Thu, 21 Nov 2002, Josh Berkus wrote: > Scott, > > > The absolutely most important thing to do to speed up inserts and > > updates > > is to squeeze as many as you can into one transaction. Within > > reason, of > > course. There's no great gain in putting more than a few thousand > > together at a time. If your application is only doing one or two > > updates > > in a transaction, it's going to be slower in terms of records written > > per > > second than an application that is updating 100 rows in a > > transaction. > > This only works up to the limit of the memory you have available for > Postgres. If the updates in one transaction exceed your available > memory, you'll see a lot of swaps to disk log that will slow things > down by a factor of 10-50 times. Sorry, but that isn't true. MVCC means we don't have to hold all the data in memory, we can have multiple versions of the same tuples on disk, and use memory for what it's meant for, buffering. The performance gain comes from the fact that postgresql doesn't have to perform the data consistency checks needed during an insert until after all the rows are inserted, and it can "gang check" them/ > > Reducing triggers and foreign keys on the inserted tables to a > > minimum > > helps. > > ... provided that this will not jeapordize your data integrity. If you > have indispensable triggers in PL/pgSQL, re-qriting them in C will make > them, and thus updates on their tables, faster. Agreed. But you've probably seen the occasional "I wasn't sure if we needed that check or not, so I threw it in just in case" kind of database design. :-) I definitely don't advocate just tossing all your FKs to make it run faster. Also note that many folks have replaced foreign keys with triggers and gained in performance, as fks in pgsql still have some deadlock issues to be worked out. > Also, for foriegn keys, it speeds up inserts and updates on parent > tables with many child records if the foriegn key column in the child > table is indexed. Absolutely. > > Putting WAL (e.g. $PGDATA/pg_xlog directory) on it's own drive(s). > > > > Putting indexes that have to be updated during inserts onto their own > > > > drive(s). > > > > Performing regular vacuums on heavily updated tables. > > > > Also, if your hardware is reliable, you can turn off fsync in > > postgresql.conf. That can increase performance by anywhere from 2 to > > 10 > > times, depending on your application. > > It can be dangerous though ... in the event of a power outage, for > example, your database could be corrupted and difficult to recover. So > ... "at your own risk". No, the database will not be corrupted, at least not in my experience. however, you MAY lose data from transactions that you thought were committed. I think Tom posted something about this a few days back. > I've found that switching from fsync to fdatasync on Linux yields > marginal performance gain ... about 10-20%. I'll have to try that. > Also, if you are doing large updates (many records at once) you may > want to increase WAL_FILES and CHECKPOINT_BUFFER in postgresql.conf to > allow for large transactions. Actually, postgresql will create more WAL files if it needs to to handle the size of a transaction. BUT, it won't create extra ones for heavier parallel load without being told to. I've inserted 100,000 rows at a time with no problem on a machine with only 1 WAL file specified, and it didn't burp. It does run faster having multiple wal files when under parallel load. > Finally, you want to structure your queries so that you do the minimum > number of update writes possible, or insert writes. For example, a > procedure that inserts a row, does some calculations, and then modifies > several fields in that row is going to slow stuff down significantly > compared to doing the calculations as variables and only a single > insert. Certainly don't hit a table with 8 updates, each updating one > field instead of a single update statement. This is critical, and bites many people coming from a row level locking database to an MVCC database. In MVCC every update creates a new on disk tuple. I think someone on the list a while back was updating their database something like this: update table set field1='abc' where id=1; update table set field2='def' where id=1; update table set field3='ghi' where id=1; update table set field4='jkl' where id=1; update table set field5='mno' where id=1; update table set field6='pqr' where id=1; and they had to vacuum something like every 5 minutes. Also, things like: update table set field1=field1+1 are killers in an MVCC database as well.
Scott, > > This only works up to the limit of the memory you have available for > > Postgres. If the updates in one transaction exceed your available > > memory, you'll see a lot of swaps to disk log that will slow things > > down by a factor of 10-50 times. > > Sorry, but that isn't true. MVCC means we don't have to hold all the data > in memory, we can have multiple versions of the same tuples on disk, and > use memory for what it's meant for, buffering. Sorry, you're absolutely correct. I don't know what I was thinking of; 's the problem with an off-the-cuff response. Please disregard the previous quote. Instead: Doing several large updates in a single transaction can lower performance if the number of updates is sufficient to affect index usability and a VACUUM is really needed between them. For example, a series of large data transformation statements on a single table or set of related tables should have VACCUUM statements between them, thus preventing you from putting them in a single transaction. Example, the series: 1. INSERT 10,000 ROWS INTO table_a; 2. UPDATE 100,000 ROWS IN table_a WHERE table_b; 3. UPDATE 100,000 ROWS IN table_c WHERE table_a; WIll almost certainly need a VACUUM or even VACUUM FULL table_a after 2), requiring you to split the update series into 2 transactions. Otherwise, the "where table_a" condition in step 3) will be extremely slow. > Also note that many folks have replaced foreign keys with triggers and > gained in performance, as fks in pgsql still have some deadlock issues to > be worked out. Yeah. I think Neil Conway is overhauling FKs, which everyone considers a bit of a hack in the current implementation, including Jan who wrote it. > > It can be dangerous though ... in the event of a power outage, for > > example, your database could be corrupted and difficult to recover. So > > ... "at your own risk". > > No, the database will not be corrupted, at least not in my experience. > however, you MAY lose data from transactions that you thought were > committed. I think Tom posted something about this a few days back. Hmmm ... have you done this? I'd like the performance gain, but I don't want to risk my data integrity. I've seen some awful things in databases (such as duplicate primary keys) from yanking a power cord repeatedly. > update table set field1=field1+1 > > are killers in an MVCC database as well. Yeah -- don't I know it. -- -Josh Berkus Aglio Database Solutions San Francisco
On Thu, 21 Nov 2002, Josh Berkus wrote: > Doing several large updates in a single transaction can lower performance if > the number of updates is sufficient to affect index usability and a VACUUM is > really needed between them. For example, a series of large data > transformation statements on a single table or set of related tables should > have VACCUUM statements between them, thus preventing you from putting them > in a single transaction. > > Example, the series: > 1. INSERT 10,000 ROWS INTO table_a; > 2. UPDATE 100,000 ROWS IN table_a WHERE table_b; > 3. UPDATE 100,000 ROWS IN table_c WHERE table_a; > > WIll almost certainly need a VACUUM or even VACUUM FULL table_a after 2), > requiring you to split the update series into 2 transactions. Otherwise, the > "where table_a" condition in step 3) will be extremely slow. Very good point. One that points out the different mind set one needs when dealing with pgsql. > > > It can be dangerous though ... in the event of a power outage, for > > > example, your database could be corrupted and difficult to recover. So > > > ... "at your own risk". > > > > No, the database will not be corrupted, at least not in my experience. > > however, you MAY lose data from transactions that you thought were > > committed. I think Tom posted something about this a few days back. > > Hmmm ... have you done this? I'd like the performance gain, but I don't want > to risk my data integrity. I've seen some awful things in databases (such as > duplicate primary keys) from yanking a power cord repeatedly. I have, with killall -9 postmaster, on several occasions during testing under heavy parallel load. I've never had 7.2.x fail because of this.
Scott, > > > The absolutely most important thing to do to speed up inserts and > > > updates > > > is to squeeze as many as you can into one transaction. I was discussing this on IRC, and nobody could verify this assertion. Do you have an example of bunlding multiple writes into a transaction giving a performance gain? -Josh
On Fri, 2002-11-22 at 22:18, Josh Berkus wrote: > Scott, > > > > > The absolutely most important thing to do to speed up inserts and > > > > updates > > > > is to squeeze as many as you can into one transaction. > > I was discussing this on IRC, and nobody could verify this assertion. > Do you have an example of bunlding multiple writes into a transaction > giving a performance gain? Unfortunately, I missed the beginning of this thread, but I do know that eliminating as many indexes as possible is the answer. If I'm going to insert "lots" of rows in an off-line situation, then I'll drop *all* of the indexes, load the data, then re-index. If deleting "lots", then I'll drop all but the 1 relevant index, then re-index afterwards. As for bundling multiple statements into a transaction to increase performance, I think the questions are: - how much disk IO does one BEGIN TRANSACTION do? If it *does* do disk IO, then "bundling" *will* be more efficient, since less disk IO will be performed. - are, for example, 500 COMMITs of small amounts of data more or less efficient than 1 COMMIT of a large chunk of data? On the proprietary database that I use at work, efficiency goes up, then levels off at ~100 inserts per transaction. Ron -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "they love our milk and honey, but preach about another | | way of living" | | Merle Haggard, "The Fighting Side Of Me" | +------------------------------------------------------------+
Ron, > As for bundling multiple statements into a transaction to increase > performance, I think the questions are: > - how much disk IO does one BEGIN TRANSACTION do? If it *does* > do disk IO, then "bundling" *will* be more efficient, since > less disk IO will be performed. > - are, for example, 500 COMMITs of small amounts of data more or > less efficient than 1 COMMIT of a large chunk of data? On the > proprietary database that I use at work, efficiency goes up, > then levels off at ~100 inserts per transaction. That's because some commercial databases (MS SQL, Sybase) use an "unwinding transaction log" method of updating. That is, during a transaction, changes are written only to the transaction log, and those changes are "played" to the database only on a COMMIT. It's an approach that is more efficient for large transactions, but has the unfortuate side effect of *requiring* read and write row locks for the duration of the transaction. In Postgres, with MVCC, changes are written to the database immediately with a new transaction ID and the new rows are "activated" on COMMIT. So the changes are written to the database as the statements are executed, regardless. This is less efficient for large transactions than the "unwinding log" method, but has the advantage of eliminating read locks entirely and most deadlock situations. Under MVCC, then, I am not convinced that bundling a bunch of writes into one transaction is faster until I see it demonstrated. I certainly see no performance gain on my system. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Under MVCC, then, I am not convinced that bundling a bunch of writes into one > transaction is faster until I see it demonstrated. I certainly see no > performance gain on my system. Are you running with fsync off? The main reason for bundling updates into larger transactions is that each transaction commit requires an fsync on the WAL log. If you have fsync enabled, it is physically impossible to commit transactions faster than one per revolution of the WAL disk, no matter how small the transactions. (*) So it pays to make the transactions larger, not smaller. On my machine I see a sizable difference (more than 2x) in the rate at which simple INSERT statements are processed as separate transactions and as large batches --- if I have fsync on. With fsync off, nearly no difference. regards, tom lane (*) See recent proposals from Curtis Faith in pgsql-hackers about how we might circumvent that limit ... but it's there today.
Tom, > On my machine I see a sizable difference (more than 2x) in the rate at > which simple INSERT statements are processed as separate transactions > and as large batches --- if I have fsync on. With fsync off, nearly no > difference. I'm using fdatasych, which *does* perform faster than fsych on my system. Could this make the difference? -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> On my machine I see a sizable difference (more than 2x) in the rate at >> which simple INSERT statements are processed as separate transactions >> and as large batches --- if I have fsync on. With fsync off, nearly no >> difference. > I'm using fdatasych, which *does* perform faster than fsych on my system. > Could this make the difference? No; you still have to write the data and wait for the disk to spin. (FWIW, PG defaults to wal_sync_method = open_datasync on my system, and that's what I used in checking the speed just now. So I wasn't actually executing any fsync() calls either.) On lots of PC hardware, the disks are configured to lie and report write complete as soon as they've accepted the data into their internal buffers. If you see very little difference between fsync on and fsync off, or if you are able to benchmark transaction rates in excess of your disk's RPM, you should suspect that your disk drive is lying to you. As an example: in testing INSERT speed on my old HP box just now, I got measured rates of about 16000 inserts/minute with fsync off, and 5700/min with fsync on (for 1 INSERT per transaction). Knowing that my disk drive is 6000 RPM, the latter number is credible. On my PC I get numbers way higher than the disk rotation rate :-( regards, tom lane
Tom, > As an example: in testing INSERT speed on my old HP box just now, > I got measured rates of about 16000 inserts/minute with fsync off, and > 5700/min with fsync on (for 1 INSERT per transaction). Knowing that my > disk drive is 6000 RPM, the latter number is credible. On my PC I get > numbers way higher than the disk rotation rate :-( Thanks for the info. As long as I have your ear, what's your opinion on the risk level of running with fsynch off on a production system? I've seen a lot of posts on this list opining the lack of danger, but I'm a bit paranoid. -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > Thanks for the info. As long as I have your ear, what's your opinion on the > risk level of running with fsynch off on a production system? Depends on how much you trust your hardware, kernel, and power source. Fsync off does not introduce any danger from Postgres crashes --- we always write data out of userspace to the kernel before committing. The question is whether writes can be relied on to get to disk once the kernel has 'em. There is a definite risk of data corruption (not just lost transactions, but actively inconsistent database contents) if you suffer a system-level crash while running with fsync off. The theory of WAL (which remember means write *ahead* log) is that it protects you from data corruption as long as WAL records always hit disk before the associated changes in database data files do. Then after a crash you can replay the WAL to make sure you have actually done all the changes described by each readable WAL record, and presto you're consistent up to the end of the readable WAL. But if data file writes can get to disk in advance of their WAL record, you could have a situation where some but not all changes described by a WAL record are in the database after a system crash and recovery. This could mean incompletely applied transactions, broken indexes, or who knows what. When you get right down to it, what we use fsync for is to force write ordering --- Unix kernels do not guarantee write ordering any other way. We use it to ensure WAL records hit disk before data file changes do. Bottom line: I wouldn't run with fsync off in a mission-critical database. If you're prepared to accept a risk of having to restore from your last backup after a system crash, maybe it's okay. regards, tom lane
Tom, > When you get right down to it, what we use fsync for is to force write > ordering --- Unix kernels do not guarantee write ordering any other way. > We use it to ensure WAL records hit disk before data file changes do. > > Bottom line: I wouldn't run with fsync off in a mission-critical > database. If you're prepared to accept a risk of having to restore from > your last backup after a system crash, maybe it's okay. Thanks for that overview. Sadly, even with fsynch on, I was forced to restore from backup because the data needs to be 100% reliable and the crash was due to a disk lockup on a checkpoint ... beyond the ability of WAL to deal with, I think. One last, last question: I was just asked a question on IRC, and I can't find docs defining fsynch, fdatasynch, opensynch, and opendatasynch beyond section 11.3 which just says that they are all synch methods. Are there docs? -- -Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > One last, last question: I was just asked a question on IRC, and I > can't find docs defining fsynch, fdatasynch, opensynch, and > opendatasynch beyond section 11.3 which just says that they are all > synch methods. Are there docs? Section 11.3 of what? The only mention of open_datasync that I see in the docs is in the Admin Guide chapter 3: http://developer.postgresql.org/docs/postgres/runtime-config.html#RUNTIME-CONFIG-WAL which saith WAL_SYNC_METHOD (string) Method used for forcing WAL updates out to disk. Possible values are FSYNC (call fsync() at each commit), FDATASYNC (call fdatasync() at each commit), OPEN_SYNC (write WAL files with open() option O_SYNC), or OPEN_DATASYNC (write WAL files with open() option O_DSYNC). Not all of these choices are available on all platforms. This option can only be set at server start or in the postgresql.conf file. This may not help you much to decide which to use :-(, but it does tell you what they are. regards, tom lane
On Fri, 22 Nov 2002, Josh Berkus wrote: > Scott, > > > > > The absolutely most important thing to do to speed up inserts and > > > > updates > > > > is to squeeze as many as you can into one transaction. > > I was discussing this on IRC, and nobody could verify this assertion. > Do you have an example of bunlding multiple writes into a transaction > giving a performance gain? Yes, my own experience. It's quite easy to test if you have a database with a large table to play with, use pg_dump to dump a table with the -d switch (makes the dump use insert statements.) Then, make two versions of the dump, one which has a begin;end; pair around all the inserts and one that doesn't, then use psql -e to restore both dumps. The difference is HUGE. Around 10 to 20 times faster with the begin end pairs. I'd think that anyone who's used postgresql for more than a few months could corroborate my experience.
Scott, > It's quite easy to test if you have a database with a large table to play > with, use pg_dump to dump a table with the -d switch (makes the dump use > insert statements.) Then, make two versions of the dump, one which has a > begin;end; pair around all the inserts and one that doesn't, then use psql > -e to restore both dumps. The difference is HUGE. Around 10 to 20 times > faster with the begin end pairs. > > I'd think that anyone who's used postgresql for more than a few months > could corroborate my experience. Ouch! No need to get testy about it. Your test works as you said; the way I tried testing it before was different. Good to know. However, this approach is only useful if you are doing rapidfire updates or inserts coming off a single connection. But then it is *very* useful. -- -Josh Berkus Aglio Database Solutions San Francisco
On Mon, 25 Nov 2002, Josh Berkus wrote: > Scott, > > > It's quite easy to test if you have a database with a large table to play > > with, use pg_dump to dump a table with the -d switch (makes the dump use > > insert statements.) Then, make two versions of the dump, one which has a > > begin;end; pair around all the inserts and one that doesn't, then use psql > > -e to restore both dumps. The difference is HUGE. Around 10 to 20 times > > faster with the begin end pairs. > > > > I'd think that anyone who's used postgresql for more than a few months > > could corroborate my experience. > > Ouch! > > No need to get testy about it. > > Your test works as you said; the way I tried testing it before was different. > Good to know. However, this approach is only useful if you are doing > rapidfire updates or inserts coming off a single connection. But then it is > *very* useful. I didn't mean that in a testy way, it's just that after you've sat through a fifteen minute wait while a 1000 records are inserted, you pretty quickly switch to the method of inserting them all in one big transaction. That's all. Note that the opposite is what really gets people in trouble. I've seen folks inserting rather large amounts of data, say into ten or 15 tables, and their web servers were crawling under parallel load. Then, they put them into a single transaction and they just flew. The funny thing it, they've often avoided transactions because they figured they'd be slower than just inserting the rows, and you kinda have to make them sit down first before you show them the performance increase from putting all those inserts into a single transaction. No offense meant, really. It's just that you seemed to really doubt that putting things into one transaction helped, and putting things into one big transaction if like the very first postgresql lesson a lot of newcomers learn. :-)
>The funny thing it, they've often avoided transactions because they >figured they'd be slower than just inserting the rows, and you kinda have >to make them sit down first before you show them the performance increase >from putting all those inserts into a single transaction. > >No offense meant, really. It's just that you seemed to really doubt that >putting things into one transaction helped, and putting things into one >big transaction if like the very first postgresql lesson a lot of >newcomers learn. :-) Scott, I'm new to postgresql, and as you suggested, this is counter-intuitive to me. I would have thought that having to store all the inserts to be able to roll them back would take longer. Is my thinking wrong or not relevant? Why is this not the case? Thanks, Tim
> I'm new to postgresql, and as you suggested, this is > counter-intuitive to me. I would have thought that having to store > all the inserts to be able to roll them back would take longer. Is > my thinking wrong or not relevant? Why is this not the case? Typically that is the case. But Postgresql switches it around a little bit. Different trade-offs. No rollback log, but other processes are forced to go through you're left over garbage (hence 'vacuum'). It's still kinda slow with hundreds of connections (as compared to Oracle) -- but close enough that a license fee -> hardware purchase funds transfer more than makes up for it. Get yourself a 1GB battery backed ramdisk on it's own scsi chain for WAL and it'll fly no matter what size of transaction you use ;) -- Rod Taylor <rbt@rbt.ca>
On Mon, 25 Nov 2002, Tim Gardner wrote: > >The funny thing it, they've often avoided transactions because they > >figured they'd be slower than just inserting the rows, and you kinda have > >to make them sit down first before you show them the performance increase > >from putting all those inserts into a single transaction. > > > >No offense meant, really. It's just that you seemed to really doubt that > >putting things into one transaction helped, and putting things into one > >big transaction if like the very first postgresql lesson a lot of > >newcomers learn. :-) > > Scott, > > I'm new to postgresql, and as you suggested, this is > counter-intuitive to me. I would have thought that having to store > all the inserts to be able to roll them back would take longer. Is > my thinking wrong or not relevant? Why is this not the case? Your thinking on this is wrong, and it is counter-intuitive to think that a transaction would speed things up. Postgresql is very different from other databases. Postgresql was designed from day one as a transactional database. Which is why it was so bothersome that an Oracle marketroid recently was telling the .org folks why they shouldn't use Postgresql because it didn't have transactions. Postgresql may have a few warts here and there, but not supporting transactions has NEVER been a problem for it. There are two factors that make Postgresql so weird in regards to transactions. One it that everything happens in a transaction (we won't mention truncate for a while, it's the only exception I know of.) The next factor that makes for fast inserts of large amounts of data in a transaction is MVCC. With Oracle and many other databases, transactions are written into a seperate log file, and when you commit, they are inserted into the database as one big group. This means you write your data twice, once into the transaction log, and once into the database. With Postgresql's implementation of MVCC, all your data are inserted in real time, with a transaction date that makes the other clients ignore them (mostly, other read committed transactions may or may not see them.) If there are indexes to update, they are updated in the same "invisible until committed" way. All this means that your inserts don't block anyone else's reads as well. This means that when you commit, all postgresql does is make them visible. In the event you roll back a transaction, the tuples are all just marked as dead and they get ignored. It's interesting when you work with folks who came from other databases. My coworker, who's been using Postgresql for about 2 years now, had an interesting experience when he first started here. He was inserting something like 10,000 rows. He comes over and tells me there must be something wrong with the database, as his inserts have been running for 10 minutes, and he's not even halfway through. So I had him stop the inserts, clean out the rows (it was a new table for a new project) and wrap all 10,000 inserts into a transaction. What had been running for 10 minutes now ran in about 30 seconds. He was floored. Well, good luck on using postgresql, and definitely keep in touch with the performance and general mailing lists. They're a wealth of useful info.
On 25 Nov 2002, Rod Taylor wrote: > > I'm new to postgresql, and as you suggested, this is > > counter-intuitive to me. I would have thought that having to store > > all the inserts to be able to roll them back would take longer. Is > > my thinking wrong or not relevant? Why is this not the case? > > Typically that is the case. But Postgresql switches it around a little > bit. Different trade-offs. No rollback log, but other processes are > forced to go through you're left over garbage (hence 'vacuum'). Yeah, which means you always need to do a vacuum on a table after a lot of updates/deletes. And analyze after a lot of inserts/updates/deletes. > It's still kinda slow with hundreds of connections (as compared to > Oracle) -- but close enough that a license fee -> hardware purchase > funds transfer more than makes up for it. Ain't it amazing how much hardware a typical Oracle license can buy? ;^) Heck, even the license cost MS-SQL server is enough to buy a nice quad Xeon with all the trimmings nowadays. Then you can probably still have enough left over for one of the pgsql developers to fly out and train your folks on it.
>With Postgresql's implementation of MVCC, all your data are inserted in >real time, with a transaction date that makes the other clients ignore >them (mostly, other read committed transactions may or may not see them.) > >If there are indexes to update, they are updated in the same "invisible >until committed" way. > >All this means that your inserts don't block anyone else's reads as well. > >This means that when you commit, all postgresql does is make them visible. scott, Exactly the kind of explanation/understanding I was hoping for! Thank you! Tim
On Mon, 25 Nov 2002, scott.marlowe wrote: > On Mon, 25 Nov 2002, Tim Gardner wrote: > > > I'm new to postgresql, and as you suggested, this is > > counter-intuitive to me. I would have thought that having to store > > all the inserts to be able to roll them back would take longer. Is > > my thinking wrong or not relevant? Why is this not the case? > > Your thinking on this is wrong, and it is counter-intuitive to think that > a transaction would speed things up. Postgresql is very different from > other databases. Sorry that came out like that, I meant to write: I meant to add in there that I thought the same way at first, and only after a little trial and much error did I realize that I was thinking in terms of how other databases did things. I.e. most people make the same mistake when starting out with pgsql.
I have seen a number of real-world situations where bundling inserts into transactions made a considerable difference - sometimes as much as a 100x speed-up, and not just in Postgresql databases, but also commercial systems (my experience is in Oracle & Sybase). I've often used an idiom of building up rows until I hit some high-water mark, and then insert those rows in one fell swoop - it's almost always measurably faster than one-at-a-time. Sidebar: a number of years ago, while contracting at a regional telephone company, I became peripherally enmired in a gigantic billing-system-makeover fiasco. Upon initial deployment, the system was so slow at processing that it was taking about 30 hours for each day of billing data. After a week or so, when it became apparent that fundamental Cash Flow was threatened, there were multi-hour conference calls, in which various VPs called for massive h/w upgrades and/or lawsuits against Oracle. An astute cohort of mine asked to see some of the code, and found out that the original developers (at the telco) had created a bloated and slow control system in C++, using semaphores or somesuch, to *serialize* inserts/updates/deletes, and so they had gigantic home-built queues of insert jobs. Not only were they not bundling updates in transactions, they were only ever doing one transaction at a time. (Apparently, they never learned RDBMS fundamentals.) He told them to rip out all that code, and let Oracle (like any other decent RDBMS) handle the update ordering. The resultant speed-up factor was several hundred times. -R On Mon, 25 Nov 2002 15:59:16 -0700 (MST), "scott.marlowe" <scott.marlowe@ihs.com> said: > On Mon, 25 Nov 2002, Josh Berkus wrote: > > > Scott, > > > > > It's quite easy to test if you have a database with a large table to play > > > with, use pg_dump to dump a table with the -d switch (makes the dump use > > > insert statements.) Then, make two versions of the dump, one which has a > > > begin;end; pair around all the inserts and one that doesn't, then use psql > > > -e to restore both dumps. The difference is HUGE. Around 10 to 20 times > > > faster with the begin end pairs. > > > > > > I'd think that anyone who's used postgresql for more than a few months > > > could corroborate my experience. > > > > Ouch! > > > > No need to get testy about it. > > > > Your test works as you said; the way I tried testing it before was different. > > Good to know. However, this approach is only useful if you are doing > > rapidfire updates or inserts coming off a single connection. But then it is > > *very* useful. > > I didn't mean that in a testy way, it's just that after you've sat > through > a fifteen minute wait while a 1000 records are inserted, you pretty > quickly switch to the method of inserting them all in one big > transaction. That's all. > > Note that the opposite is what really gets people in trouble. I've seen > folks inserting rather large amounts of data, say into ten or 15 tables, > and their web servers were crawling under parallel load. Then, they put > them into a single transaction and they just flew. > > The funny thing it, they've often avoided transactions because they > figured they'd be slower than just inserting the rows, and you kinda have > to make them sit down first before you show them the performance increase > from putting all those inserts into a single transaction. > > No offense meant, really. It's just that you seemed to really doubt that > putting things into one transaction helped, and putting things into one > big transaction if like the very first postgresql lesson a lot of > newcomers learn. :-) > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Scott, > No offense meant, really. It's just that you seemed to really doubt that > putting things into one transaction helped, and putting things into one > big transaction if like the very first postgresql lesson a lot of > newcomers learn. :-) Not so odd, if you think about it. After all, this approach is only useful for a series of small update/insert statements on a single connection. Thinking about it, I frankly never do this except as part of a stored procedure ... which, in Postgres, is automatically a transaction. I'm lucky enough that my data loads have all been adaptable to COPY statements, which bypasses this issue completely. -- -Josh Berkus Aglio Database Solutions San Francisco
On Mon, 2002-11-25 at 18:23, scott.marlowe wrote: > On Mon, 25 Nov 2002, Tim Gardner wrote: > [snip] > > There are two factors that make Postgresql so weird in regards to > transactions. One it that everything happens in a transaction (we won't > mention truncate for a while, it's the only exception I know of.) Why is this so weird? Do I use the /other/ weird RDBMS? (Rdb/VMS) > The next factor that makes for fast inserts of large amounts of data in a > transaction is MVCC. With Oracle and many other databases, transactions > are written into a seperate log file, and when you commit, they are > inserted into the database as one big group. This means you write your > data twice, once into the transaction log, and once into the database. You are just deferring the pain. Whereas others must flush from log to "database files", they do not have to VACUUM or VACUUM ANALYZE. > With Postgresql's implementation of MVCC, all your data are inserted in > real time, with a transaction date that makes the other clients ignore > them (mostly, other read committed transactions may or may not see them.) Is this unusual? (Except that Rdb/VMS uses a 64-bit integer (a Transaction Sequence Number) instead of a timestamp, because Rdb, cominging from VAX/VMS is natively cluster-aware, and it's not guaranteed that all nodes have the exact same timestamp. [snip] > In the event you roll back a transaction, the tuples are all just marked > as dead and they get ignored. What if you are in a 24x365 environment? Doing a VACUUM ANALYZE would really slow down the nightly operations. > It's interesting when you work with folks who came from other databases. > My coworker, who's been using Postgresql for about 2 years now, had an > interesting experience when he first started here. He was inserting > something like 10,000 rows. He comes over and tells me there must be > something wrong with the database, as his inserts have been running for 10 > minutes, and he's not even halfway through. So I had him stop the > inserts, clean out the rows (it was a new table for a new project) and > wrap all 10,000 inserts into a transaction. What had been running for 10 > minutes now ran in about 30 seconds. Again, why is this so unusual????? -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "they love our milk and honey, but preach about another | | way of living" | | Merle Haggard, "The Fighting Side Of Me" | +------------------------------------------------------------+
On Mon, 2002-11-25 at 18:43, Rich Scott wrote: [snip] > upgrades and/or lawsuits against Oracle. An astute cohort of mine asked > to > see some of the code, and found out that the original developers (at the > telco) > had created a bloated and slow control system in C++, using semaphores or > somesuch, > to *serialize* inserts/updates/deletes, and so they had gigantic > home-built > queues of insert jobs. Not only were they not bundling updates in > transactions, > they were only ever doing one transaction at a time. (Apparently, they > never > learned RDBMS fundamentals.) He told them to rip out all that code, and > let > Oracle (like any other decent RDBMS) handle the update ordering. The > resultant > speed-up factor was several hundred times. Just goes to show the difference between RDBMSs. Even with the biggest Alphas possible at the time, our Customer Service Center app was crawling because of lock conflicts (each transaction affected ~10-12 tables). Rdb/VMS works best with a TP monitor (specifically ACMS). We were'nt using one, and weren't handling lock conflicts in the best way. Thus, the slowdowns. The developers finally wrote a home-grown dedicated TP-like server, using DEC Message Queue and /serialized/ data flow. Without all of the lock conflicts, performance warped forward. The reason for this, I think, is that Oracle serializes things itself, whereas Rdb/VMS expects ACMS to do the serialization. (Since both Rdb & ACMS were both written by DEC, that's understandable I think.) -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "they love our milk and honey, but preach about another | | way of living" | | Merle Haggard, "The Fighting Side Of Me" | +------------------------------------------------------------+
Ron Johnson <ron.l.johnson@cox.net> writes: > On Mon, 2002-11-25 at 18:23, scott.marlowe wrote: >> The next factor that makes for fast inserts of large amounts of data in a >> transaction is MVCC. With Oracle and many other databases, transactions >> are written into a seperate log file, and when you commit, they are >> inserted into the database as one big group. This means you write your >> data twice, once into the transaction log, and once into the database. > You are just deferring the pain. Whereas others must flush from log > to "database files", they do not have to VACUUM or VACUUM ANALYZE. Sure, it's just shuffling the housekeeping work from one place to another. The thing that I like about Postgres' approach is that we put the housekeeping in a background task (VACUUM) rather than in the critical path of foreground transaction commit. regards, tom lane
Tim Gardner <tgardner@codeHorse.com> writes: >> All this means that your inserts don't block anyone else's reads as well. >> This means that when you commit, all postgresql does is make them visible. > Exactly the kind of explanation/understanding I was hoping for! There's another point worth making. What Scott was pointing out is that whether you commit or roll back a transaction costs about the same, in Postgres, as far as tuple update processing is concerned. At the end of a transaction, we have both new (inserted/updated) and old (deleted/replaced) tuples laying about in the database. Commit marks the transaction committed in pg_clog; abort marks it aborted instead; neither one lifts a finger to touch the tuples. (Subsequent visitors to the tuples will mark them "good" or "dead" based on consulting pg_clog, but we don't try to do that during transaction commit.) But having said all that, transaction commit is more expensive than transaction abort, because we have to flush the transaction commit WAL record to disk before we can report "transaction successfully committed". That means waiting for the disk to spin. Transaction abort doesn't have to wait --- that's because if there's a crash and the abort record never makes it to disk, the default assumption on restart will be that the transaction aborted, anyway. So the basic reason that it's worth batching multiple updates into one transaction is that you only wait for the commit record flush once, not once per update. This makes no difference worth mentioning if your updates are big, but on modern hardware you can update quite a few individual rows in the time it takes the disk to spin once. (BTW, if you set fsync = off, then the performance difference goes away, because we don't wait for the commit record to flush to disk ... but then you become vulnerable to problems after a system crash.) regards, tom lane
On Mon, 2002-11-25 at 19:30, scott.marlowe wrote: > On 25 Nov 2002, Rod Taylor wrote: > > > > I'm new to postgresql, and as you suggested, this is > > > counter-intuitive to me. I would have thought that having to store > > > all the inserts to be able to roll them back would take longer. Is > > > my thinking wrong or not relevant? Why is this not the case? > > > > Typically that is the case. But Postgresql switches it around a little > > bit. Different trade-offs. No rollback log, but other processes are > > forced to go through you're left over garbage (hence 'vacuum'). > > Yeah, which means you always need to do a vacuum on a table after a lot of > updates/deletes. And analyze after a lot of inserts/updates/deletes. A good auto-vacuum daemon will help that out :) Not really any different than an OO dbs garbage collection process -- except PGs vacuum is several orders of magnitude faster. -- Rod Taylor <rbt@rbt.ca>
On Mon, 2002-11-25 at 21:30, Tom Lane wrote: > Ron Johnson <ron.l.johnson@cox.net> writes: > > On Mon, 2002-11-25 at 18:23, scott.marlowe wrote: > >> The next factor that makes for fast inserts of large amounts of data in a > >> transaction is MVCC. With Oracle and many other databases, transactions > >> are written into a seperate log file, and when you commit, they are > >> inserted into the database as one big group. This means you write your > >> data twice, once into the transaction log, and once into the database. > > > You are just deferring the pain. Whereas others must flush from log > > to "database files", they do not have to VACUUM or VACUUM ANALYZE. > > Sure, it's just shuffling the housekeeping work from one place to > another. The thing that I like about Postgres' approach is that we > put the housekeeping in a background task (VACUUM) rather than in the > critical path of foreground transaction commit. If you have a quiescent point somewhere in the middle of the night... It's all about differing philosophies, though, and there's no way that Oracle will re-write Rdb/VMS (they bought it from DEC in 1997 for it's high-volume OLTP technolgies) and you all won't re-write Postgres... -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "they love our milk and honey, but preach about another | | way of living" | | Merle Haggard, "The Fighting Side Of Me" | +------------------------------------------------------------+
tom lane wrote: > Sure, it's just shuffling the housekeeping work from one place to > another. The thing that I like about Postgres' approach is that we > put the housekeeping in a background task (VACUUM) rather than in the > critical path of foreground transaction commit. Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM was not required (or was done automagically). The need for periodic VACUUM just gives ammunition to the PostgreSQL opponents who can claim we are deferring work but that it amounts to the same thing. A fully automatic background VACUUM will significantly reduce but will not eliminate this perceived weakness. However, it always seemed to me there should be some way to reuse the space more dynamically and quickly than a background VACUUM thereby reducing the percentage of tuples that are expired in heavy update cases. If only a very tiny number of tuples on the disk are expired this will reduce the aggregate performance/space penalty of MVCC into insignificance for the majority of uses. Couldn't we reuse tuple and index space as soon as there are no transactions that depend on the old tuple or index values. I have imagined that this was always part of the long-term master plan. Couldn't we keep a list of dead tuples in shared memory and look in the list first when deciding where to place new values for inserts or updates so we don't have to rely on VACUUM (even a background one)? If there are expired tuple slots in the list these would be used before allocating a new slot from the tuple heap. The only issue is determining the lowest transaction ID for in-process transactions which seems relatively easy to do (if it's not already done somewhere). In the normal shutdown and startup case, a tuple VACUUM could be performed automatically. This would normally be very fast since there would not be many tuples in the list. Index slots would be handled differently since these cannot be substituted one for another. However, these could be recovered as part of every index page update. Pages would be scanned before being written and any expired slots that had transaction ID's lower than the lowest active slot would be removed. This could be done for non-leaf pages as well and would result in only reorganizing a page that is already going to be written thereby not adding much to the overall work. I don't think that internal pages that contain pointers to values in nodes further down the tree that are no longer in the leaf nodes because of this partial expired entry elimination will cause a problem since searches and scans will still work fine. Does VACUUM do something that could not be handled in this realtime manner? - Curtis
On Mon, Nov 25, 2002 at 05:30:00PM -0700, scott.marlowe wrote: > > Ain't it amazing how much hardware a typical Oracle license can buy? ;^) Not to mention the hardware budget for a typical Oracle installation. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Mon, Nov 25, 2002 at 07:41:03PM -0600, Ron Johnson wrote: > > What if you are in a 24x365 environment? Doing a VACUUM ANALYZE would > really slow down the nightly operations. Why? After upgrading to 7.2, we find it a good idea to do frequent vacuum analyse on frequently-changed tables. It doesn't block, and if you vacuum frequently enough, it goes real fast. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Tue, 26 Nov 2002, Andrew Sullivan wrote: > On Mon, Nov 25, 2002 at 07:41:03PM -0600, Ron Johnson wrote: > > > > What if you are in a 24x365 environment? Doing a VACUUM ANALYZE would > > really slow down the nightly operations. > > Why? After upgrading to 7.2, we find it a good idea to do frequent > vacuum analyse on frequently-changed tables. It doesn't block, and > if you vacuum frequently enough, it goes real fast. For example, I just ran pgbench -c 20 -t 200 (20 concurrent's) with a script in the background that looked like this: #!/bin/bash for ((a=0;a=1;a=0)) do { vacuumdb -z postgres } done (i.e. run vacuumdb in analyze against the database continuously.) Output of top: 71 processes: 63 sleeping, 8 running, 0 zombie, 0 stopped CPU0 states: 66.2% user, 25.1% system, 0.0% nice, 8.1% idle CPU1 states: 79.4% user, 18.3% system, 0.0% nice, 1.2% idle Mem: 254660K av, 249304K used, 5356K free, 26736K shrd, 21720K buff Swap: 3084272K av, 1300K used, 3082972K free 142396K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 21381 postgres 11 0 1304 1304 868 S 10.8 0.5 0:00 pgbench 21393 postgres 14 0 4832 4832 4116 R 8.4 1.8 0:00 postmaster 21390 postgres 9 0 4880 4880 4164 S 7.8 1.9 0:00 postmaster 21385 postgres 14 0 4884 4884 4168 R 6.7 1.9 0:00 postmaster 21399 postgres 9 0 4768 4768 4076 S 6.3 1.8 0:00 postmaster 21402 postgres 9 0 4776 4776 4076 S 6.1 1.8 0:00 postmaster 21383 postgres 14 0 4828 4828 4112 R 5.9 1.8 0:00 postmaster 21386 postgres 14 0 4872 4872 4156 R 5.9 1.9 0:00 postmaster 21392 postgres 9 0 4820 4820 4104 S 5.9 1.8 0:00 postmaster 21409 postgres 11 0 4600 4600 3544 R 5.8 1.8 0:00 postmaster 21387 postgres 9 0 4824 4824 4108 S 5.4 1.8 0:00 postmaster 21394 postgres 9 0 4808 4808 4092 S 5.4 1.8 0:00 postmaster 21391 postgres 9 0 4816 4816 4100 S 5.0 1.8 0:00 postmaster 21398 postgres 9 0 4796 4796 4088 S 5.0 1.8 0:00 postmaster 21384 postgres 9 0 4756 4756 4040 R 4.8 1.8 0:00 postmaster 21389 postgres 9 0 4788 4788 4072 S 4.8 1.8 0:00 postmaster 21397 postgres 9 0 4772 4772 4056 S 4.6 1.8 0:00 postmaster 21388 postgres 9 0 4780 4780 4064 S 4.4 1.8 0:00 postmaster 21396 postgres 9 0 4756 4756 4040 S 4.3 1.8 0:00 postmaster 21395 postgres 14 0 4760 4760 4044 S 4.1 1.8 0:00 postmaster 21401 postgres 14 0 4736 4736 4036 R 4.1 1.8 0:00 postmaster 21400 postgres 9 0 4732 4732 4028 S 2.9 1.8 0:00 postmaster 21403 postgres 9 0 1000 1000 820 S 2.4 0.3 0:00 vacuumdb 21036 postgres 9 0 1056 1056 828 R 2.0 0.4 0:27 top 18615 postgres 9 0 1912 1912 1820 S 1.1 0.7 0:01 postmaster 21408 postgres 9 0 988 988 804 S 0.7 0.3 0:00 psql So, pgbench is the big eater of CPU at 10%, each postmaster using about 5%, and vacuumdb using 2.4%. Note that after a second, the vacuumdb use drops off to 0% until it finishes and runs again. The output of the pgbench without vacuumdb running, but with top, to be fair was: number of clients: 20 number of transactions per client: 200 number of transactions actually processed: 4000/4000 tps = 54.428632 (including connections establishing) tps = 54.847276 (excluding connections establishing) While the output with the vacuumdb running continuously was: number of clients: 20 number of transactions per client: 200 number of transactions actually processed: 4000/4000 tps = 52.114343 (including connections establishing) tps = 52.873435 (excluding connections establishing) So, the difference in performance was around 4% slower. I'd hardly consider that a big hit against the database. Note that in every test I've made up and run, the difference is at most 5% with vacuumdb -z running continuously in the background. Big text fields, lots of math, lots of fks, etc... Yes, vacuum WAS a problem long ago, but since 7.2 came out it's only a "problem" in terms of remember to run it.
On Tue, Nov 26, 2002 at 11:06:47AM -0700, scott.marlowe wrote: > So, the difference in performance was around 4% slower. > > I'd hardly consider that a big hit against the database. > > Note that in every test I've made up and run, the difference is at most 5% > with vacuumdb -z running continuously in the background. Big text fields, > lots of math, lots of fks, etc... Also, it's important to remember that you may see a considerable improvement in efficiency of some queries if you vacuum often, (it's partly dependent on the turnover in your database -- if it never changes, you don't need to vacuum often). So a 5% hit in regular performance may be worth it over the long haul, if certain queries are way cheaper to run. (That is, while you may get 4% slower performance overall, if the really slow queries are much faster, the fast queries running slower may well be worth it. In my case, certainly, I think it is.) A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Tue, 26 Nov 2002, Andrew Sullivan wrote: > On Tue, Nov 26, 2002 at 11:06:47AM -0700, scott.marlowe wrote: > > So, the difference in performance was around 4% slower. > > > > I'd hardly consider that a big hit against the database. > > > > Note that in every test I've made up and run, the difference is at most 5% > > with vacuumdb -z running continuously in the background. Big text fields, > > lots of math, lots of fks, etc... > > Also, it's important to remember that you may see a considerable > improvement in efficiency of some queries if you vacuum often, (it's > partly dependent on the turnover in your database -- if it never > changes, you don't need to vacuum often). So a 5% hit in regular > performance may be worth it over the long haul, if certain queries > are way cheaper to run. (That is, while you may get 4% slower > performance overall, if the really slow queries are much faster, the > fast queries running slower may well be worth it. In my case, > certainly, I think it is.) Agreed. We used to run vacuumdb at night only when we were running 7.1, and we had a script top detect if it had hung or anything. I.e. vacuuming was still a semi-dangerous activity. I now have it set to run every hour (-z -a switches to vacuumdb). I'd run it more often but we just don't have enough load to warrant it.
Good ideas. I think the master solution is to hook the statistics daemon information into an automatic vacuum that could _know_ which tables need attention. --------------------------------------------------------------------------- Curtis Faith wrote: > tom lane wrote: > > Sure, it's just shuffling the housekeeping work from one place to > > another. The thing that I like about Postgres' approach is that we > > put the housekeeping in a background task (VACUUM) rather than in the > > critical path of foreground transaction commit. > > Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM > was not required (or was done automagically). The need for periodic VACUUM > just gives ammunition to the PostgreSQL opponents who can claim we are > deferring work but that it amounts to the same thing. > > A fully automatic background VACUUM will significantly reduce but will not > eliminate this perceived weakness. > > However, it always seemed to me there should be some way to reuse the space > more dynamically and quickly than a background VACUUM thereby reducing the > percentage of tuples that are expired in heavy update cases. If only a very > tiny number of tuples on the disk are expired this will reduce the aggregate > performance/space penalty of MVCC into insignificance for the majority of > uses. > > Couldn't we reuse tuple and index space as soon as there are no transactions > that depend on the old tuple or index values. I have imagined that this was > always part of the long-term master plan. > > Couldn't we keep a list of dead tuples in shared memory and look in the list > first when deciding where to place new values for inserts or updates so we > don't have to rely on VACUUM (even a background one)? If there are expired > tuple slots in the list these would be used before allocating a new slot from > the tuple heap. > > The only issue is determining the lowest transaction ID for in-process > transactions which seems relatively easy to do (if it's not already done > somewhere). > > In the normal shutdown and startup case, a tuple VACUUM could be performed > automatically. This would normally be very fast since there would not be many > tuples in the list. > > Index slots would be handled differently since these cannot be substituted > one for another. However, these could be recovered as part of every index > page update. Pages would be scanned before being written and any expired > slots that had transaction ID's lower than the lowest active slot would be > removed. This could be done for non-leaf pages as well and would result in > only reorganizing a page that is already going to be written thereby not > adding much to the overall work. > > I don't think that internal pages that contain pointers to values in nodes > further down the tree that are no longer in the leaf nodes because of this > partial expired entry elimination will cause a problem since searches and > scans will still work fine. > > Does VACUUM do something that could not be handled in this realtime manner? > > - Curtis > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Mon, 2002-11-25 at 23:27, Ron Johnson wrote: > On Mon, 2002-11-25 at 21:30, Tom Lane wrote: > > Ron Johnson <ron.l.johnson@cox.net> writes: > > > On Mon, 2002-11-25 at 18:23, scott.marlowe wrote: > > >> The next factor that makes for fast inserts of large amounts of data in a > > >> transaction is MVCC. With Oracle and many other databases, transactions > > >> are written into a seperate log file, and when you commit, they are > > >> inserted into the database as one big group. This means you write your > > >> data twice, once into the transaction log, and once into the database. > > > > > You are just deferring the pain. Whereas others must flush from log > > > to "database files", they do not have to VACUUM or VACUUM ANALYZE. > > > > Sure, it's just shuffling the housekeeping work from one place to > > another. The thing that I like about Postgres' approach is that we > > put the housekeeping in a background task (VACUUM) rather than in the > > critical path of foreground transaction commit. > > If you have a quiescent point somewhere in the middle of the night... > You seem to be implying that running vacuum analyze causes some large performance issues, but it's just not the case. I run a 24x7 operation, and I have a few tables that "turn over" within 15 minutes. On these tables I run vacuum analyze every 5 - 10 minutes and really there is little/no performance penalty. Robert Treat
"Curtis Faith" <curtis@galtair.com> writes: > tom lane wrote: >> Sure, it's just shuffling the housekeeping work from one place to >> another. The thing that I like about Postgres' approach is that we >> put the housekeeping in a background task (VACUUM) rather than in the >> critical path of foreground transaction commit. > Couldn't we reuse tuple and index space as soon as there are no transactions > that depend on the old tuple or index values. I have imagined that this was > always part of the long-term master plan. > Couldn't we keep a list of dead tuples in shared memory and look in the list > first when deciding where to place new values for inserts or updates so we > don't have to rely on VACUUM (even a background one)? ISTM that either of these ideas would lead to pushing VACUUM overhead into the foreground transactions, which is exactly what we don't want to do. Keep in mind also that shared memory is finite ... *very* finite. It's bad enough trying to keep per-page status in there (cf FSM) --- per-tuple status is right out. I agree that automatic background VACUUMing would go a long way towards reducing operational problems. regards, tom lane
I always wandered if VACUUM is the right name for the porcess. Now, when PostgreSQL is actively challenging in Enterprise space, it might be a good idea to give it a more enterprise-like name. Try to think how it is looking for an outside person to see us, database professionals hold lenghty discussions about the ways we vacuum a database. Why should you need to vacuum a database? Is it dirty? In my personal opinion, something like "space reclaiming daemon", "free-list organizer", "tuple recyle job" or "segment coalesce process" would sound more business-like . Regards, Nick ----- Original Message ----- From: "Bruce Momjian" <pgman@candle.pha.pa.us> To: "Curtis Faith" <curtis@galtair.com> Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; "Ron Johnson" <ron.l.johnson@cox.net>; "PgSQL Performance ML" <pgsql-performance@postgresql.org>; <pgsql-hackers@postgresql.org> Sent: Tuesday, November 26, 2002 9:09 PM Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update > > Good ideas. I think the master solution is to hook the statistics > daemon information into an automatic vacuum that could _know_ which > tables need attention. > > -------------------------------------------------------------------------- - > > Curtis Faith wrote: > > tom lane wrote: > > > Sure, it's just shuffling the housekeeping work from one place to > > > another. The thing that I like about Postgres' approach is that we > > > put the housekeeping in a background task (VACUUM) rather than in the > > > critical path of foreground transaction commit. > > > > Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM > > was not required (or was done automagically). The need for periodic VACUUM > > just gives ammunition to the PostgreSQL opponents who can claim we are > > deferring work but that it amounts to the same thing. > > > > A fully automatic background VACUUM will significantly reduce but will not > > eliminate this perceived weakness. > > > > However, it always seemed to me there should be some way to reuse the space > > more dynamically and quickly than a background VACUUM thereby reducing the > > percentage of tuples that are expired in heavy update cases. If only a very > > tiny number of tuples on the disk are expired this will reduce the aggregate > > performance/space penalty of MVCC into insignificance for the majority of > > uses. > > > > Couldn't we reuse tuple and index space as soon as there are no transactions > > that depend on the old tuple or index values. I have imagined that this was > > always part of the long-term master plan. > > > > Couldn't we keep a list of dead tuples in shared memory and look in the list > > first when deciding where to place new values for inserts or updates so we > > don't have to rely on VACUUM (even a background one)? If there are expired > > tuple slots in the list these would be used before allocating a new slot from > > the tuple heap. > > > > The only issue is determining the lowest transaction ID for in-process > > transactions which seems relatively easy to do (if it's not already done > > somewhere). > > > > In the normal shutdown and startup case, a tuple VACUUM could be performed > > automatically. This would normally be very fast since there would not be many > > tuples in the list. > > > > Index slots would be handled differently since these cannot be substituted > > one for another. However, these could be recovered as part of every index > > page update. Pages would be scanned before being written and any expired > > slots that had transaction ID's lower than the lowest active slot would be > > removed. This could be done for non-leaf pages as well and would result in > > only reorganizing a page that is already going to be written thereby not > > adding much to the overall work. > > > > I don't think that internal pages that contain pointers to values in nodes > > further down the tree that are no longer in the leaf nodes because of this > > partial expired entry elimination will cause a problem since searches and > > scans will still work fine. > > > > Does VACUUM do something that could not be handled in this realtime manner? > > > > - Curtis > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Don't 'kill -9' the postmaster > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
Just for the humor of it, as well as to confirm Nick's perspective, years ago on our inhouse developed Burroughs mainframe dbms, we had a process called "garbage collect". Nicolai Tufar wrote: >I always wandered if VACUUM is the right name for the porcess. Now, when >PostgreSQL >is actively challenging in Enterprise space, it might be a good idea to give >it a more >enterprise-like name. Try to think how it is looking for an outside person >to see >us, database professionals hold lenghty discussions about the ways we >vacuum a database. Why should you need to vacuum a database? Is it >dirty? In my personal opinion, something like "space reclaiming daemon", >"free-list organizer", "tuple recyle job" or "segment coalesce process" >would >sound more business-like . > >Regards, >Nick > > >----- Original Message ----- >From: "Bruce Momjian" <pgman@candle.pha.pa.us> >To: "Curtis Faith" <curtis@galtair.com> >Cc: "Tom Lane" <tgl@sss.pgh.pa.us>; "Ron Johnson" <ron.l.johnson@cox.net>; >"PgSQL Performance ML" <pgsql-performance@postgresql.org>; ><pgsql-hackers@postgresql.org> >Sent: Tuesday, November 26, 2002 9:09 PM >Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of >insert/delete/update > > > > >>Good ideas. I think the master solution is to hook the statistics >>daemon information into an automatic vacuum that could _know_ which >>tables need attention. >> >>-------------------------------------------------------------------------- >> >> >- > > >>Curtis Faith wrote: >> >> >>>tom lane wrote: >>> >>> >>>>Sure, it's just shuffling the housekeeping work from one place to >>>>another. The thing that I like about Postgres' approach is that we >>>>put the housekeeping in a background task (VACUUM) rather than in the >>>>critical path of foreground transaction commit. >>>> >>>> >>>Thinking with my marketing hat on, MVCC would be a much bigger win if >>> >>> >VACUUM > > >>>was not required (or was done automagically). The need for periodic >>> >>> >VACUUM > > >>>just gives ammunition to the PostgreSQL opponents who can claim we are >>>deferring work but that it amounts to the same thing. >>> >>>A fully automatic background VACUUM will significantly reduce but will >>> >>> >not > > >>>eliminate this perceived weakness. >>> >>>However, it always seemed to me there should be some way to reuse the >>> >>> >space > > >>>more dynamically and quickly than a background VACUUM thereby reducing >>> >>> >the > > >>>percentage of tuples that are expired in heavy update cases. If only a >>> >>> >very > > >>>tiny number of tuples on the disk are expired this will reduce the >>> >>> >aggregate > > >>>performance/space penalty of MVCC into insignificance for the majority >>> >>> >of > > >>>uses. >>> >>>Couldn't we reuse tuple and index space as soon as there are no >>> >>> >transactions > > >>>that depend on the old tuple or index values. I have imagined that this >>> >>> >was > > >>>always part of the long-term master plan. >>> >>>Couldn't we keep a list of dead tuples in shared memory and look in the >>> >>> >list > > >>>first when deciding where to place new values for inserts or updates so >>> >>> >we > > >>>don't have to rely on VACUUM (even a background one)? If there are >>> >>> >expired > > >>>tuple slots in the list these would be used before allocating a new slot >>> >>> >from > > >>>the tuple heap. >>> >>>The only issue is determining the lowest transaction ID for in-process >>>transactions which seems relatively easy to do (if it's not already done >>>somewhere). >>> >>>In the normal shutdown and startup case, a tuple VACUUM could be >>> >>> >performed > > >>>automatically. This would normally be very fast since there would not be >>> >>> >many > > >>>tuples in the list. >>> >>>Index slots would be handled differently since these cannot be >>> >>> >substituted > > >>>one for another. However, these could be recovered as part of every >>> >>> >index > > >>>page update. Pages would be scanned before being written and any expired >>>slots that had transaction ID's lower than the lowest active slot would >>> >>> >be > > >>>removed. This could be done for non-leaf pages as well and would result >>> >>> >in > > >>>only reorganizing a page that is already going to be written thereby not >>>adding much to the overall work. >>> >>>I don't think that internal pages that contain pointers to values in >>> >>> >nodes > > >>>further down the tree that are no longer in the leaf nodes because of >>> >>> >this > > >>>partial expired entry elimination will cause a problem since searches >>> >>> >and > > >>>scans will still work fine. >>> >>>Does VACUUM do something that could not be handled in this realtime >>> >>> >manner? > > >>>- Curtis >>> >>> >>> >>>---------------------------(end of broadcast)--------------------------- >>>TIP 4: Don't 'kill -9' the postmaster >>> >>> >>> >>-- >> Bruce Momjian | http://candle.pha.pa.us >> pgman@candle.pha.pa.us | (610) 359-1001 >> + If your life is a hard drive, | 13 Roberts Road >> + Christ can be your backup. | Newtown Square, Pennsylvania >> >> >19073 > > >>---------------------------(end of broadcast)--------------------------- >>TIP 2: you can get off all lists at once with the unregister command >> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >> >> >> > > >---------------------------(end of broadcast)--------------------------- >TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > >
In a similar vein, setting the way back machine to the mid 80s when I was in the USAF and teaching the computer subsystem of the A-10 INS test station, we had old reclaimed Sperry 1650 computers (the precursor to the 1750) that had come out of the 1960 era fire control systems on battleships like the Missouri and what not. When the OS went south, it would put up a message that said "System Crash at address XXXXXXX" or something very similar. A colonol saw that and insisted that the folks who wrote the OS change the word crash, since in the Air Force crash (as in plane crash) had such bad connotations. So, it got changed to "System Fault at address xxxxxxxxx" For the first month or two that happened, folks would ask what a system fault was and what to do with it. They new that a crash would need the machine to be power cycled but didn't know what to do with a system fault. Shortly after that, the manual for the test station had a little section added to it that basically said a system fault was a crash. :-) On Wed, 27 Nov 2002, Jim Beckstrom wrote: > Just for the humor of it, as well as to confirm Nick's perspective, > years ago on our inhouse developed Burroughs mainframe dbms, we had a > process called "garbage collect". > > Nicolai Tufar wrote: > > >I always wandered if VACUUM is the right name for the porcess. Now, when > >PostgreSQL > >is actively challenging in Enterprise space, it might be a good idea to give > >it a more > >enterprise-like name. Try to think how it is looking for an outside person > >to see > >us, database professionals hold lenghty discussions about the ways we > >vacuum a database. Why should you need to vacuum a database? Is it > >dirty? In my personal opinion, something like "space reclaiming daemon", > >"free-list organizer", "tuple recyle job" or "segment coalesce process" > >would > >sound more business-like . > >