Thread: A long-running transaction
I have a Java (java 1.1) program that I wrote some years ago, to read records from a text file and insert it into a ostgresql database. One of the assumptions I made was that one file contained one day's data, maybe as many as 1500 records, and I coded it to do the whole lot as one transaction so either a single file was loaded in its entirity, or none of its data was. I lost the Java code, but revived the idea and I've collected about two years' data using (Linux) shell scripts, and loading the data using psql. Then, I found the Java code on a disused hard disk:-) I made the necessary changes for it to build in java 1.5, and used psql to extract data from my new database in the correct format for the old program. This time, I have a little more data than I ever loaded at once before: summer@Bandicoot:~$ wc -l testdata 6242217 testdata summer@Bandicoot:~$ \ls -hl testdata -rw-r--r-- 1 summer summer 285M 2007-03-28 22:32 testdata summer@Bandicoot:~$ Now, it wouldn't surprise me if postgresql used lots of memory - but how much could it possibly need? My laptop, where I first tried this, has 1.25 Gbytes, so I could allow it some. It wouldn't surprise me a lot if it used lots of memory and caused all sorts of problems, but no, it's chugging away, still using no more RAM than it could have had on my old Pentium 133 all those years ago. In the process of checking it out, I've set it running on a machine with a AMD Sempron(tm) 2400+ running Kubuntu 6.10 (kernel is 2.6.17-6-server-xen0) and 512 Mbytes of RAM. This is the java program:-) summer pts/6 :0.0 Thu20 5days 1:07 1:07 /usr/bin/gij-4.1 -cp /usr/s It's been running five days so far, and I can see where it's up to by attaching strace. It's reading 2k of the input file every few seconds. Okay, clearly something's wrong, and I don't think it's all my crddu code. No probs swapping: summer@Bandicoot:~$ free total used free shared buffers cached Mem: 460800 456472 4328 0 860 262164 -/+ buffers/cache: 193448 267352 Swap: 1461872 284 1461588 summer@Bandicoot:~$ It is hitting the disk pretty hard now on this machine, but the laptop's still going too, and the disk seems to run about half the time, part of a second running, part idle (but the intervals are getting shorter). It struck me as fairly curious that neither postgresql nor the application was hogging the CPU. Perhaps the laptop is more interesting: look at the size of the buffer pool: summer@Echidna:~> free total used free shared buffers cached Mem: 1295528 1268548 26980 0 3976 392388 -/+ buffers/cache: 872184 423344 Swap: 1941496 32656 1908840 summer@Echidna:~> Again, no problem with over-use of RAM, and I'm logged on using KDE too and that's running fine. It's been running a little longer here: summer pts/23 28Mar07 5days 25:12 25:11 java -cp /home/summer/Classes/:/usr/share/p This is Sun's Java 1.5 on OpenSUSE 10.2. This is what suggested I should write: summer@Echidna:~> procinfo Linux 2.6.18.8-0.1-default (geeko@buildhost) (gcc 4.1.2 20061115) #1 1CPU [Echidna.] Memory: Total Used Free Shared Buffers Mem: 1295528 1271720 23808 0 3716 Swap: 1941496 32656 1908840 Bootup: Tue Mar 27 18:50:19 2007 Load average: 2.21 2.65 2.69 2/243 19305 user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1: 3079516r20087664w nice : 0:05:39.64 0.1% page out: 197016649 system: 2d 20:38:37.13 40.1% page act: 87906251 IOwait: 2d 0:46:37.33 28.5% page dea: 16218135 hw irq: 0:44:46.71 0.4% page flt: 306255213 sw irq: 0:50:04.69 0.5% swap in : 4026 idle : 1d 0:36:29.73 14.4% swap out: 9552 uptime: 7d 2:59:20.97 context : 702502442 irq 0: 153880209 timer irq 7: 0 parport0 irq 1: 69402 i8042 irq 8: 2 rtc irq 2: 0 cascade [4] irq 9: 1696942 acpi irq 3: 4 irq 10: 1 irq 4: 4 irq 11: 71842329 ehci_hcd:usb1, uhci_ irq 5: 28545863 Intel 82801DB-ICH4 irq 12: 467432 i8042 irq 6: 1 irq 14: 25021586 ide0 summer@Echidna:~> Look at that line beginning "System:" two days 20 hours in the Linux kernel. It's my guess that the Linux kernel is spending a great deal of time manipulating that buffer pool. This shows postgresql taking 60% CPU: summer@Echidna:~> ps xaru USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND summer 20237 0.2 1.2 263716 15988 pts/23 Rl+ Mar28 25:11 java -cp /home/summer/Class postgres 19321 0.1 0.3 19844 3984 ? D 21:50 0:00 postgres: summer stocksshar summer 19344 0.0 0.0 2484 852 pts/25 R+ 21:52 0:00 ps xaru summer@Echidna:~> ps xaru USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 20248 60.0 0.8 20760 11536 ? D Mar28 5167:27 postgres: summer sharetrad postgres 19321 0.1 0.3 19844 4008 ? D 21:50 0:00 postgres: summer stocksshar summer 19348 0.0 0.0 2480 848 pts/25 R+ 21:52 0:00 ps xaru summer@Echidna:~> Note that pid 20248 is the one, the other has nothing to do with the problem. This is the SQL I used to create the table: String createTableStatement = "create table TJS0102_trades (" + " trade date" + ", ttime int2" + ", ASX varchar(7)" + ", thigh int4" + ", tlow int4" + ", tclose int4" + ", topen int4" + ", tvolume int4" + ", tvalue int4" // + ")" + ", unique(ASX,trade,ttime)" + ");"; For each record, I update a non-key field in another table; the source data for that other table is less than a megabyte. I have changed the program so as to load each day's data as a single transaction; it ran on a Pentium IV 2.8 Ghz with HT, 512 Mbytes in about three and an half hours. This is more-or-less what I expected. If you agree with me that postgresql should do better, now is a good time to ask for this as a test case. Note, you could also get suitable test data from float.com.au - it's not where mine came from, but it should be substantially the same.
On Tue, Apr 03, 2007 at 10:16:13PM +0800, John Summerfield wrote: > It is hitting the disk pretty hard now on this machine, but the laptop's > still going too, and the disk seems to run about half the time, part of > a second running, part idle (but the intervals are getting shorter). > > It struck me as fairly curious that neither postgresql nor the > application was hogging the CPU. Why? Nothing about this seems likely CPU bound. It's probably I/O. I note is number: > IOwait: 2d 0:46:37.33 28.5% page dea: 16218135 which is pretty awful. Also > For each record, I update a non-key field in another table; the source > data for that other table is less than a megabyte. this is a real issue. Basically, you're constrained at the rotation speed of your disk, because for each record, you have to first find then update one row somewhere else. A -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
Andrew Sullivan wrote: > On Tue, Apr 03, 2007 at 10:16:13PM +0800, John Summerfield wrote: >> It is hitting the disk pretty hard now on this machine, but the laptop's >> still going too, and the disk seems to run about half the time, part of >> a second running, part idle (but the intervals are getting shorter). >> >> It struck me as fairly curious that neither postgresql nor the >> application was hogging the CPU. > > Why? Nothing about this seems likely CPU bound. It's probably I/O. > I note is number: For the first day or so, the disk light was switching off about half the time. > >> IOwait: 2d 0:46:37.33 28.5% page dea: 16218135 > > which is pretty awful. Also It was some days in before it became so. Eventually, the machines (I kept it running on the laptop because of the amount of RAM plus another) did begin to thrash, but that certainly wasn't the case at first. Note that the proportion of system time is fairly unusual: user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1: 3079516r20087664w nice : 0:05:39.64 0.1% page out: 197016649 system: 2d 20:38:37.13 40.1% page act: 87906251 The Linux kernel's clearly doing a lot of work, and the disk supports DMA and DMA is turned on, so it's not using A PIO mode. According to hdparm, it's using udma5. > >> For each record, I update a non-key field in another table; the source >> data for that other table is less than a megabyte. > > this is a real issue. Basically, you're constrained at the rotation > speed of your disk, because for each record, you have to first find > then update one row somewhere else. It should be in cache: it's all one transaction, and on the laptop, there's over a Gbyte of RAM. Indeed, I would expect postgresql itself to cache it (except the fact it uses so little RAM suggests it doesn't do that at all). > > A >
On Wed, Apr 11, 2007 at 12:41:23AM +0800, John Summerfield wrote: > The Linux kernel's clearly doing a lot of work, and the disk supports You might also be into context-switch hell. What processor, which kernel, and which Postgres version again? > >>For each record, I update a non-key field in another table; the source > >>data for that other table is less than a megabyte. > > > >this is a real issue. Basically, you're constrained at the rotation > >speed of your disk, because for each record, you have to first find > >then update one row somewhere else. > > It should be in cache: it's all one transaction, and on the laptop, It's not in cache if you're updating -- you have to write it. > there's over a Gbyte of RAM. Indeed, I would expect postgresql itself to > cache it (except the fact it uses so little RAM suggests it doesn't do > that at all). What do you have configured as your shared buffers? If you haven't given very much, there won't be much in the way of buffers used, of course. Note that there's a much earlier diminishing return on the size of shared buffers in Postgres than in many systems. A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
Andrew Sullivan wrote: > On Wed, Apr 11, 2007 at 12:41:23AM +0800, John Summerfield wrote: > >> The Linux kernel's clearly doing a lot of work, and the disk supports > > You might also be into context-switch hell. What processor, which > kernel, and which Postgres version again? on opensuse 10.2 (the laptop) model name : Intel(R) Pentium(R) M processor 1500MHz 2.6.18.8-0.1-xen Might not have been xen-enabled, I switch fairly often summer@Lyrebird:~> rpm -qa postg\* postgresql-server-8.1.5-13 postgresql-libs-8.1.5-13 postgresql-devel-8.1.5-13 postgresql-jdbc-8.1-12.2 postgresql-contrib-8.1.5-13 postgresql-pl-8.1.5-15 postgresql-8.1.5-13 summer@Lyrebird:~> Also: model name : AMD Sempron(tm) 2400+ 2.6.17-6-server-xen0 summer@Bandicoot:~$ dpkg --list postg\* | grep ^ii ii postgresql-8.2 8.2.3-1~edgy1 object-relational SQL database, version 8.2 > >>>> For each record, I update a non-key field in another table; the source >>>> data for that other table is less than a megabyte. >>> this is a real issue. Basically, you're constrained at the rotation >>> speed of your disk, because for each record, you have to first find >>> then update one row somewhere else. >> It should be in cache: it's all one transaction, and on the laptop, > > It's not in cache if you're updating -- you have to write it. Linux caches writes, I don't think it should be hitting disk at all. The table being updated contains records 7482 (658K raw data) of which probably fewer than 2000 are being updated, and typically the same ones all the time: we're updating the date of the latest trade. > >> there's over a Gbyte of RAM. Indeed, I would expect postgresql itself to >> cache it (except the fact it uses so little RAM suggests it doesn't do >> that at all). > > What do you have configured as your shared buffers? If you haven't > given very much, there won't be much in the way of buffers used, of > course. Note that there's a much earlier diminishing return on the > size of shared buffers in Postgres than in many systems. Laptop (1.25 Gbytes) shared_buffers = 1000 # min 16 or max_connections*2, 8KB each AMD (512 Mbytes less some for video) shared_buffers = 24MB # min 128kB or max_connections*16kB Note we're only running one connexion here, and data transfers (I believe) are tens of bytes: here is a typical data record: 2005-02-03,AAC,1.900,1.800,1.850,1.820,328984,0 > > A >
On Wed, Apr 11, 2007 at 05:54:45AM +0800, John Summerfield wrote: > Linux caches writes, I don't think it should be hitting disk at all. The I _sure hope_ you don't care about this data, then. That's not a real safe way to work. But. . . > table being updated contains records 7482 (658K raw data) of which > probably fewer than 2000 are being updated, and typically the same ones > all the time: we're updating the date of the latest trade. . . . this is likely your problem. The updates probably get slower and slower. What's happening is that you're expiring a row _for each update_, which means it _isn't_ the same row every time. This is approximately the worst use model for PostgreSQL's MVCC approach. Worse, though it's not the same row, you have to grovel through all the dead rows to find the actually live one. So that's probably what's killing you. > Laptop (1.25 Gbytes) > shared_buffers = 1000 # min 16 or max_connections*2, > 8KB each so you have 8000 K configured as your shared buffers there. That's as much as you'll ever use for shared memory by Postgres. You can probably bump a little in this case. Your other config seems ok to me. But I don't think this is your problem -- the update pattern is. A -- Andrew Sullivan | ajs@crankycanuck.ca If they don't do anything, we don't need their acronym. --Josh Hamilton, on the US FEMA
Andrew Sullivan wrote: > On Wed, Apr 11, 2007 at 05:54:45AM +0800, John Summerfield wrote: >> Linux caches writes, I don't think it should be hitting disk at all. The > > I _sure hope_ you don't care about this data, then. That's not a > real safe way to work. But. . . As I said in the first place, this is all one transaction. No doubt that contributes to the problem. > >> table being updated contains records 7482 (658K raw data) of which >> probably fewer than 2000 are being updated, and typically the same ones >> all the time: we're updating the date of the latest trade. > > . . . this is likely your problem. The updates probably get slower > and slower. What's happening is that you're expiring a row _for each > update_, which means it _isn't_ the same row every time. This is > approximately the worst use model for PostgreSQL's MVCC approach. > Worse, though it's not the same row, you have to grovel through all > the dead rows to find the actually live one. So that's probably > what's killing you. I don't understand what you're telling me. Your surmise correctly reflects my experience, but I don't understand the explanation. Each row's identified by a key, and the key doesn't change. I don't know how Postgresql works, but 25 or so years ago I had a good understanding of how ADABAS* (now MAXDB I believe, but doubtless rewritten several times) worked than. And the OS then was IBM's OS/VS or a close relative from Fujitsu, OSIV/F4, and those worked very differently from how Linux works today. ADABAS would put the updated record right back where it came from, it maintained some free space in each block in case an update increased the size of a record. Whether Postgresql does that, or something different I don't know; the Adabas technique would handle this process well. I'll point out this line again, in case your explanation means you think postgresql is spending lots of CPU: user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1: 3079516r20087664w nice : 0:05:39.64 0.1% page out: 197016649 system: 2d 20:38:37.13 40.1% page act: 87906251 That line above. The way I figure it the kernel's doing an enormous amount of work handling its buffers. System time is overhead, it's like a government tax - you gotta pay it, but you try really hard not to pay too much. user time is time doing real work (perhaps not when it's in postgresql, but in general), and one does not expect to see it dwarfed by the system time. Here, we have the kernel-space code spending twice as much time doing something as the user-space code. It's not normal Linux kernel behaviour, and it's not normal on my laptop. The best idea I have is that postgresql is doing something that causes this bad behaviour. If you think that updating records as I do causes this, I won't argue the point because I simply do no know. I don't need a workaround, I have two: one is to commit the updates when the date field in the input records changes, and the other is to create the input data in separate files, one for each date. Both give reasonable performance, the first took around three and a half hours (different system, currently declines to boot:-((, it's a Pentium IV 3.0 Ghz with HT), the other ran a little longer on the laptop. Roughly what I expected. Since all the work is within one transaction, there's no real need for postgresql to write data at all, unless it's running out of memory. * ADABAS then didn't do SQL, and wasn't relational. It was about when IBM's DB2 came out, and Cicom had Supra. According to Cincom, DB2 wasn't relational either. > >> Laptop (1.25 Gbytes) >> shared_buffers = 1000 # min 16 or max_connections*2, >> 8KB each > > so you have 8000 K configured as your shared buffers there. That's > as much as you'll ever use for shared memory by Postgres. You can > probably bump a little in this case. Your other config seems ok to > me. But I don't think this is your problem -- the update pattern is. In both cases, I am using default values. Unless someone thinks there's clearly something to be gained by changing them, I won't. > > A >
On Wed, Apr 11, 2007 at 10:23:34PM +0800, John Summerfield wrote: > Each row's identified by a key, and the key doesn't change. That doesn't matter. > ADABAS would put the updated record right back where it came from, it That isn't how PostgreSQL works. I'm having a hard time laying my hands on the bits of the docs that explain this (anyone?), but MVCC stands fo Multi-Version Concurrency Control. When you write a row, it has a version number. That version number tells other transactions what version of the row they look at. Here's a _really over simplified_ explanation, at a conceptual level. [Note to experts: please don't correct details of how this is wrong. I know that. But John's problem is that he doesn't have this conceptually yet. The following explanation has worked for me in the past in getting the idea across first, so that the details can later make sense.] Suppose you have two transactions, t1 and t2. They start at approximately the same time. I'm assuming they're READ COMMITTED. Suppose t1 updates row R. That version of R has a number R(t1). Suppose at the same time, t2 is looking at R. The _version_ of R that t2 sees is R(tn), where n<1. This way, t2 does not have to wait on t1 in order for t2 to proceed (unless t2 actually modifies R. Ignore that for the sake of this explanation, please). The way this works is that, in marking R(t1), the system says "R(t1) is valid for transactions that committed after t1". Since t2 hasn't committed, it sees the old version. So when t1 commits, there are _two_ versions of R -- R(t1) and R(tn),n<1. When all transactions have committed such that there is no (tn),n<1, then the row is marked dead, and can be recovered by VACUUM (this is one of the main things VACUUM does. The idea is to move the housekeeping of concurrency control out of the main transaction. Oracle does something conceptually similar, except using rollback segments, which is why long-running transactions can exhaust the supply of rollback segments on Oracle). Now, you're thinking, "But this is all in one transaction!" Yes, but the implementation details make it impossible that t1 rewrite the same row over and over again, so your rewriting of the same row again and again actually is creating huge numbers of dead rows. The unfortunate thing is that you have to follow the whole dead-row chain to find the currently live version of your row. The upshot of this is that updating the same row over and over in one transaction will make your transaction go slower on each round, until you are nearly stopped. That's why a lot of updates of the sort you describe are in fact the worst case under Postgres. EnterpriseDB seems to have a proposal (which I believe is going to make 8.3) that will mitigate but not completely solve some of this. > user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1: > 3079516r20087664w > nice : 0:05:39.64 0.1% page out: 197016649 > system: 2d 20:38:37.13 40.1% page act: 87906251 > > That line above. The way I figure it the kernel's doing an enormous > amount of work handling its buffers. No, I think it's doing an enormous amount of I/O, because it has to keep looking for these new rows (and scanning over the old areas while it's at it). This is system time, if I recall correctly through the haze that is now my understanding of Linux I/O, because it's down at the I/O subsystem and can't actually be done by the user program. (Feel free to correct me on this. But I still bet it's dead rows.) > Since all the work is within one transaction, there's no real need for > postgresql to write data at all, unless it's running out of memory. No, I'm sure this has to do with the multi-version row writing. Your alternatives are ok, but I'd be more tempted to commit a range of data in a timestamped series, and then join on a subselect picking the latest insert (and add another daemon to knock old values off the bottom of the list). > In both cases, I am using default values. Unless someone thinks there's > clearly something to be gained by changing them, I won't. The PostgreSQL instal defaults are there to get things started, not to perform well. There is a long-standing debate in the community about whether that's a good idea (and the defaults are way more sensible than they used to be), but in general you won't get really good performance from the default configuration of a PostgreSQL installation. Here is a possibly useful bit of reading to start with on tuning, although it's getting long in the tooth: http://www.powerpostgresql.com/Downloads/annotated_conf_80.html Also http://varlena.com/varlena/GeneralBits/Tidbits/perf.html And the -performance list will probably help you more than this one. Hope this helps. A -- Andrew Sullivan | ajs@crankycanuck.ca Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz
Andrew Sullivan wrote: > On Wed, Apr 11, 2007 at 10:23:34PM +0800, John Summerfield wrote: > >>Each row's identified by a key, and the key doesn't change. > > > That doesn't matter. > > >>ADABAS would put the updated record right back where it came from, it > > > That isn't how PostgreSQL works. > > I'm having a hard time laying my hands on the bits of the docs that > explain this (anyone?), but MVCC stands fo Multi-Version Concurrency > Control. When you write a row, it has a version number. That > version number tells other transactions what version of the row they > look at. > > Here's a _really over simplified_ explanation, at a conceptual level. > [Note to experts: please don't correct details of how this is wrong. > I know that. But John's problem is that he doesn't have this > conceptually yet. The following explanation has worked for me in the > past in getting the idea across first, so that the details can later > make sense.] > > Suppose you have two transactions, t1 and t2. They start at > approximately the same time. I'm assuming they're READ COMMITTED. > > Suppose t1 updates row R. That version of R has a number R(t1). > > Suppose at the same time, t2 is looking at R. The _version_ of R > that t2 sees is R(tn), where n<1. This way, t2 does not have to > wait on t1 in order for t2 to proceed (unless t2 actually modifies R. > Ignore that for the sake of this explanation, please). > > The way this works is that, in marking R(t1), the system says "R(t1) is > valid for transactions that committed after t1". Since t2 hasn't > committed, it sees the old version. So when t1 commits, there are > _two_ versions of R -- R(t1) and R(tn),n<1. When all transactions > have committed such that there is no (tn),n<1, then the row is marked > dead, and can be recovered by VACUUM (this is one of the main things > VACUUM does. The idea is to move the housekeeping of concurrency > control out of the main transaction. Oracle does something > conceptually similar, except using rollback segments, which is why > long-running transactions can exhaust the supply of rollback segments > on Oracle). > > Now, you're thinking, "But this is all in one transaction!" Yes, but > the implementation details make it impossible that t1 rewrite the > same row over and over again, so your rewriting of the same row again > and again actually is creating huge numbers of dead rows. The > unfortunate thing is that you have to follow the whole dead-row chain > to find the currently live version of your row. Why is this implementation preferable to not doing that? > > The upshot of this is that updating the same row over and over in one > transaction will make your transaction go slower on each round, until > you are nearly stopped. That's why a lot of updates of the sort you > describe are in fact the worst case under Postgres. EnterpriseDB Where is enterprisedb? enterprisedb.com seems to be held by a squatter in Kingston. About where they're playing an important cricket competition about now. > seems to have a proposal (which I believe is going to make 8.3) that > will mitigate but not completely solve some of this. I can see we need before and after, but why all those intermediate rows? How can they possibly be used? If I roll back the transaction, they're all discarded, and if I commit then only the last is required. Also, I don't see why (apparently) a sequential search is used; surely if all of these rows might be required, still a pointer to the last would be right? Or last two, if you need the ability to check the order. Is there a development version I can try, with this improvement in it? > > >>user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1: >>3079516r20087664w >>nice : 0:05:39.64 0.1% page out: 197016649 >>system: 2d 20:38:37.13 40.1% page act: 87906251 >> >>That line above. The way I figure it the kernel's doing an enormous >>amount of work handling its buffers. > > > No, I think it's doing an enormous amount of I/O, because it has to > keep looking for these new rows (and scanning over the old areas > while it's at it). This is system time, if I recall correctly > through the haze that is now my understanding of Linux I/O, because > it's down at the I/O subsystem and can't actually be done by the user > program. (Feel free to correct me on this. But I still bet it's > dead rows.) Your argument is fairly convicing, and it does reflect that the postmaster seems always to be doing an update, but I am still dubious. 1. For the first day or so, my observation was that the disk was not particularly busy. It seemed to be running about half the time. It is fairly noticible because, for reasons I don't understand at all, it (on the laptop) generates EMI and upsets the radio in the next room. Unplugging the power cord stops the EMI - go figure, but of course I only have a few hours before the battery runs out. 2. System time, unless I am mistaken (and that is possible), is time the CPU is active in the kernel. It is separate from IOWait. I'm reposting the whole lot: Bootup: Tue Mar 27 18:50:19 2007 Load average: 2.21 2.65 2.69 2/243 19305 user : 1d 3:17:04.03 16.0% page in : 131097310 disk 1: 3079516r20087664w nice : 0:05:39.64 0.1% page out: 197016649 system: 2d 20:38:37.13 40.1% page act: 87906251 IOwait: 2d 0:46:37.33 28.5% page dea: 16218135 hw irq: 0:44:46.71 0.4% page flt: 306255213 sw irq: 0:50:04.69 0.5% swap in : 4026 idle : 1d 0:36:29.73 14.4% swap out: 9552 uptime: 7d 2:59:20.97 context : 702502442 If I add user + system + iowait + idle + get to about the 7 days amd some total uptime: I expect if I did the arithmentic more precisely, the result would match even better. > > >>Since all the work is within one transaction, there's no real need for >>postgresql to write data at all, unless it's running out of memory. > > > No, I'm sure this has to do with the multi-version row writing. Your > alternatives are ok, but I'd be more tempted to commit a range of > data in a timestamped series, and then join on a subselect picking > the latest insert (and add another daemon to knock old values off the > bottom of the list). I don't plan on programming round this, because it's a conversion problem, not ongoing and in any event, dumping the source database into separate files for each date works well enough. but if I did I'd simply use a vector in Java and cache the updates myself, and write the updates before comitting. In the usual case, there's no significant difference. > > >>In both cases, I am using default values. Unless someone thinks there's >>clearly something to be gained by changing them, I won't. > > > The PostgreSQL instal defaults are there to get things started, not > to perform well. There is a long-standing debate in the community > about whether that's a good idea (and the defaults are way more > sensible than they used to be), but in general you won't get really > good performance from the default configuration of a PostgreSQL > installation. and the distribution's defaults are not necessarily the same as yours. That said, I didn't see anything in postgresql.conf that obviously needed changing; with only one connexion I couldn't see a need to increase shared memory, and I didn't see anything at all about buffer pools. > > Here is a possibly useful bit of reading to start with on tuning, > although it's getting long in the tooth: > > http://www.powerpostgresql.com/Downloads/annotated_conf_80.html > > Also > > http://varlena.com/varlena/GeneralBits/Tidbits/perf.html There's nothing that leaps out at me; thw WAL stuff doesn't apply until I commit, and I haven't. Work memory, maybe, but I'm not doing complex queries. Effective cache size might. > > And the -performance list will probably help you more than this one. At present I'm trying to clarify in my mind the nature of the problem. What I'm trying to do seems to me reasonable. I have some data, and I want it all in or none of it, so it fits the idea of a single transaction. It might be that my demands exceed Postgresql's current capabilities, but by itself it doesn't make what I'm trying to do unreasonable. > > Hope this helps. I appreciate the time you're spending on this. I will be especially pleased if the final result is an improvement in postgresql. > > A >
On Fri, Apr 13, 2007 at 09:02:58AM +0800, John Summerfield wrote: > > Why is this implementation preferable to not doing that? Because this way, readers never wait for writers. On most database systems, that pending UPDATE would block anyone reading the row too, even if they weren't going to write on it. It's a trade-off. It happens to mean that the trade is bad for the thing you happen to be doing, but since most database operations aren't bulk ones in the way you're trying to achieve this, that's quite possibly a reasonable trade off. > Where is enterprisedb? enterprisedb.com seems to be held by a squatter > in Kingston. About where they're playing an important cricket > competition about now. Google took me here in one: http://www.enterprisedb.com But anyway, their stuff is built on top of Pg, so going to their site won't help you. > I can see we need before and after, but why all those intermediate rows? Because in the general case, you need the intermediate rows. The problem is that you'd have to write a special piece of code to catch the case where nobody else can see the row that you're about to expire, and that test isn't free. Therefore, you do it the same way any other row gets expired. > Also, I don't see why (apparently) a sequential search is used; surely > if all of these rows might be required, still a pointer to the last > would be right? Or last two, if you need the ability to check the order. No, it's not a seqscan. It's following the chain of versions of the row. You don't know until you look at the actual row whether the version of it you are looking at is valid for you. There isn't any other place to store that information. (If you want someone who really understands how all this works to explain it to you in more accurate terms, you should ask the question on -hackers. They'll point you to the developers' docs that I can't seem to find right now.) > Is there a development version I can try, with this improvement in it? Well, you could try using the current CVS HEAD, which is close to feature freeze. It'd sure be a service to the community, at least, because we'd learn whether the proposed change fixes this sort of case. (But I wouldn't trust my data to the HEAD for real. You said you're not actually in production yet, though.) > 1. For the first day or so, my observation was that the disk was not > particularly busy. That's completely consistent with the theory I have. As the number of dead tuples goes up, your disk activity will slowly get worse. > At present I'm trying to clarify in my mind the nature of the problem. > What I'm trying to do seems to me reasonable. I have some data, and I > want it all in or none of it, so it fits the idea of a single transaction. > > It might be that my demands exceed Postgresql's current capabilities, > but by itself it doesn't make what I'm trying to do unreasonable. No, it's not unreasonable, but it happens to be a pessimal case under Postgres. A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler
On Fri, Apr 13, 2007 at 07:49:19AM -0400, Andrew Sullivan wrote: > On Fri, Apr 13, 2007 at 09:02:58AM +0800, John Summerfield wrote: > > 1. For the first day or so, my observation was that the disk was not > > particularly busy. > > That's completely consistent with the theory I have. As the number > of dead tuples goes up, your disk activity will slowly get worse. This simple demonstration occurred to me on the subway on the way here, to show that the dead tuples really will mount. testing=# \d testtab Table "public.testtab"Column | Type | Modifiers --------+---------+------------------------------------------------------id | integer | not null default nextval('testtab_id_seq'::regclass)col1 | text | col2 | text | Indexes: "testtab_pkey" PRIMARY KEY, btree (id) testing=# SELECT * from testtab;id | col1 | col2 ----+------+------ 1 | one | (1 row) Now, we check how many dead tuples we have: testing=# VACUUM VERBOSE testtab ; INFO: vacuuming "public.testtab" INFO: index "testtab_pkey" now contains 1 row versions in 2 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "testtab": found 0 removable, 1 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. There were 3 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_90325" INFO: index "pg_toast_90325_index" now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_90325": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Notice the lines: INFO: "testtab": found 0 removable, 1 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. So, let's do an update BEGIN testing=# UPDATE testtab set col2='1'; UPDATE 1 testing=# commit; COMMIT This time, when we run vacuum, we get this (snipped out for brevity): INFO: "testtab": found 1 removable, 1 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. So, several updates in one transaction: testing=# begin; BEGIN testing=# UPDATE testtab set col2='2'; UPDATE 1 testing=# UPDATE testtab set col2='3'; UPDATE 1 testing=# UPDATE testtab set col2='4'; UPDATE 1 testing=# UPDATE testtab set col2='5'; UPDATE 1 testing=# commit; COMMIT This time, VACUUM VERBOSE tells us that all of those were dead: INFO: "testtab": found 4 removable, 1 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. Ok, what if we just update without actually changing anything? testing=# begin; BEGIN testing=# UPDATE testtab set col2='5'; UPDATE 1 testing=# UPDATE testtab set col2='5'; UPDATE 1 testing=# UPDATE testtab set col2='5'; UPDATE 1 testing=# UPDATE testtab set col2='5'; UPDATE 1 testing=# UPDATE testtab set col2='5'; UPDATE 1 testing=# commit; COMMIT We get the same result: INFO: "testtab": found 5 removable, 1 nonremovable row versions in 1 pages DETAIL: 0 dead row versions cannot be removed yet. A > > > At present I'm trying to clarify in my mind the nature of the problem. > > What I'm trying to do seems to me reasonable. I have some data, and I > > want it all in or none of it, so it fits the idea of a single transaction. > > > > It might be that my demands exceed Postgresql's current capabilities, > > but by itself it doesn't make what I'm trying to do unreasonable. > > No, it's not unreasonable, but it happens to be a pessimal case under > Postgres. > > A > > -- > Andrew Sullivan | ajs@crankycanuck.ca > "The year's penultimate month" is not in truth a good way of saying > November. > --H.W. Fowler > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Andrew Sullivan | ajs@crankycanuck.ca Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz
Hi Andrew, On Fri, 2007-04-13 at 07:49 -0400, Andrew Sullivan wrote: > Because in the general case, you need the intermediate rows. The > problem is that you'd have to write a special piece of code to catch > the case where nobody else can see the row that you're about to > expire, and that test isn't free. Therefore, you do it the same way > any other row gets expired. Just curious: is there a way to defeat MVCC? i.e., if you can lock the database exclusively, there won't be any readers or writers? Joe
On Fri, Apr 13, 2007 at 12:42:36PM -0400, Joe wrote: > > Just curious: is there a way to defeat MVCC? No. That is, > i.e., if you can lock the > database exclusively, there won't be any readers or writers? while this is true, it doesn't shut off the way writes go through the system. MVCC is rather close to the central guts of PostgreSQL. If you want to understand more about why this is the case, what the trade-off considerations are, &c., then I stronly advise you to troll through the -hackers archives. Approximately once every 8-10 months someone comes along with an idea for a change that disables MVCC in just this or that case. Most of the time, the actual effects of this are different than people think they will be -- concurrency is hard, and making changes so that it's impossible to get into a race is tricky. This is why the in-place storage management that EnterpriseDB contributed was both somewhat controversial and somewhat limited in its application. A -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie