Thread: Data loss, vacuum, transaction wrap-around
I want to see if there is a concensus of opinion out there. We've all known that data loss "could" happen if vacuum is not run and you perform more than 2b transactions. These days with faster and bigger computers and disks, it more likely that this problem can be hit in months -- not years. To me, the WORST thing a program can do is lose data. (Certainly this is bad for a database.) I don't think there is any real excuse for this. While the 2b transaction problem was always there, it seemed so remote that I never obcessed about it. Now that it seems like a real problem that more than one user has hit, I am worried. In fact, I think it is so bad, that I think we need to back-port a fix to previous versions and issue a notice of some kind. Here as my suggestions: (1) As Tom has already said, at some point start issuing warning in the log that vacuum needs to be run. (2) At some point, stop accepting transactions on anything but template1, issuing an error saying the vacuum needs to be run. (3) Either with psql on template1 or "postgres" or some "vacuumall" program, open the database in single user mode or on template1 and vacuum database. (4) This should remain even after autovacuum is in place. If for some reason auto vacuum is installed but not running, we still need to protect the data from a stupid admin. (Last time I looked, auto vacuum used various stats, and that may be something an admin disables.) (5) Vacuum could check for a wrap-around condition in the database cluster and take it upon itself to run more broadly even if it was directed only towards a table. We've been saying that mysql is ok if you don't care about your data, I would hate if people started using this issue against postgresql.
pgsql@mohawksoft.com writes: > In fact, I think it is so bad, that I think we need to back-port a fix to > previous versions and issue a notice of some kind. They already do issue notices --- see VACUUM. A real fix (eg the forcible stop we were talking about earlier) will not be reasonable to back-port. regards, tom lane
Tom Lane wrote: >pgsql@mohawksoft.com writes: > > >>In fact, I think it is so bad, that I think we need to back-port a fix to >>previous versions and issue a notice of some kind. >> >> > >They already do issue notices --- see VACUUM. > >A real fix (eg the forcible stop we were talking about earlier) will not >be reasonable to back-port. > I hope this question isn't too stupid.... Is it be possible to create a "vacuum wraparound" or "vacuum xidreset" command which would do the work required to fix the wraparound problem, without being as expensive as a normal vacuum of an entire database?
More suggestions: (1) At startup, postmaster checks for an XID, if it is close to a problem, force a vacuum. (2) At "sig term" shutdown, can the postmaster start a vacuum? (3) When the XID count goes past the "trip wire" can it spontaneously issue a vacuum? NOTE: Suggestions 1 and 2 are for 8.0 and prior. 3 is for later than 8.0.1
"Matthew T. O'Connor" <matthew@zeut.net> writes: > I hope this question isn't too stupid.... > Is it be possible to create a "vacuum wraparound" or "vacuum xidreset" > command which would do the work required to fix the wraparound problem, > without being as expensive as a normal vacuum of an entire database? I don't think it'd be worth the trouble. You could skip index cleanup if you didn't actually delete any tuples, but you'd still have to do all of the scanning work. The cases where people think they don't need to do vacuum (because the table didn't have any deleted tuples) wouldn't get any cheaper at all. regards, tom lane
Tom Lane wrote: > pgsql@mohawksoft.com writes: > >>In fact, I think it is so bad, that I think we need to back-port a fix to >>previous versions and issue a notice of some kind. > > > They already do issue notices --- see VACUUM. > > A real fix (eg the forcible stop we were talking about earlier) will not > be reasonable to back-port. Would at least a automated warning mechanism be a reasonable backport? Sincerely, Joshua D. Drake > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Command Prompt, Inc., your source for PostgreSQL replication, professional support, programming, managed services, shared and dedicated hosting. Home of the Open Source Projects plPHP, plPerlNG, pgManage, and pgPHPtoolkit. Contact us now at: +1-503-667-4564 - http://www.commandprompt.com
Attachment
"Joshua D. Drake" <jd@commandprompt.com> writes: > Tom Lane wrote: >> A real fix (eg the forcible stop we were talking about earlier) will not >> be reasonable to back-port. > Would at least a automated warning mechanism be a reasonable backport? No, because the hard part of the problem actually is detecting that the condition exists in a reasonably cheap way. The check in VACUUM is really extremely expensive, which is why we don't make it except after completing a database-wide vacuum. Once we have an XID limit value sitting in shared memory then the code to use it (generate warnings and/or error out) is simple; it's initializing that value during postmaster start that I consider too complicated/risky to stick into existing versions. There is another issue here, which is that I have no faith that the people who actually need this are going to be clueful enough to update to 7.4.8 or 7.3.10 or whatever they'd need... regards, tom lane
pgsql@mohawksoft.com writes: > More suggestions: > (1) At startup, postmaster checks for an XID, if it is close to a problem, > force a vacuum. Useless to a system that's run 24x7; also presumes the existence of a complete solution anyway (since getting the postmaster to find that out is the hard part). > (2) At "sig term" shutdown, can the postmaster start a vacuum? Certainly not. We have to assume that SIGTERM means we are under a short-term sentence of death from init. And if it's a manual stop it doesn't sound much better: the sort of DBA that needs this "feature" is likely to decide he should kill -9 the postmaster because it's taking too long to shut down. > (3) When the XID count goes past the "trip wire" can it spontaneously > issue a vacuum? Only in the database you're connected to, which very likely isn't where the problem is. Moreover, having N backends all decide they need to do this at once doesn't sound like a winner. Furthermore, this still presumes the existence of the hard part of the solution, which is knowing where the trip point is. regards, tom lane
> There is another issue here, which is that I have no faith that the > people who actually need this are going to be clueful enough to update > to 7.4.8 or 7.3.10 or whatever they'd need... Well I can't argue with that one ;) > > regards, tom lane -- Command Prompt, Inc., your source for PostgreSQL replication, professional support, programming, managed services, shared and dedicated hosting. Home of the Open Source Projects plPHP, plPerlNG, pgManage, and pgPHPtoolkit. Contact us now at: +1-503-667-4564 - http://www.commandprompt.com
Attachment
Tom Lane <tgl@sss.pgh.pa.us> writes: > > (3) When the XID count goes past the "trip wire" can it spontaneously > > issue a vacuum? > > Only in the database you're connected to, which very likely isn't where > the problem is. Moreover, having N backends all decide they need to do > this at once doesn't sound like a winner. Furthermore, this still > presumes the existence of the hard part of the solution, which is > knowing where the trip point is. Alright, I have a suggestion. If the database kept a "oldest xid" for each table then there wouldn't be any expensive work to scan the table looking for an oldest xid. The only time "oldest xid" needs to be updated is when vacuum is run, which is precisely when it would be known. There could be a per-database "oldest xid" that any vacuum on any table updates (by skimming all the "oldest xid"s for the current database). If that's stored in the shared pg_database table then it's accessible regardless of what database you connect to, no? Then on every connection and every n-thousandth transaction you just have to check the "oldest xid" for all the databases, and make sure the difference between the oldest one and the current xid is reasonable. -- greg
Greg Stark <gsstark@mit.edu> writes: > There could be a per-database "oldest xid" that any vacuum on any table > updates (by skimming all the "oldest xid"s for the current database). If > that's stored in the shared pg_database table then it's accessible regardless > of what database you connect to, no? You mean like pg_database.datfrozenxid? The problem is not that we don't have the data. The problem is getting the data to where it's needed, which is GetNewTransactionId(). That routine cannot be responsible for finding it out for itself, because we haven't yet started a transaction at the time where we need to know if it's safe to consume an XID. regards, tom lane
On Sat, 19 Feb 2005 04:10 am, Tom Lane wrote: > pgsql@mohawksoft.com writes: > > In fact, I think it is so bad, that I think we need to back-port a fix to > > previous versions and issue a notice of some kind. > > They already do issue notices --- see VACUUM. > > A real fix (eg the forcible stop we were talking about earlier) will not > be reasonable to back-port. > Not to be rude, but if backporting is not an option, why do we not just focus on the job of getting autovacuum into 8.1, and not have to think about how a patch that will warn users will work? Regards Russell Smith
> On Sat, 19 Feb 2005 04:10 am, Tom Lane wrote: >> pgsql@mohawksoft.com writes: >> > In fact, I think it is so bad, that I think we need to back-port a fix >> to >> > previous versions and issue a notice of some kind. >> >> They already do issue notices --- see VACUUM. >> >> A real fix (eg the forcible stop we were talking about earlier) will not >> be reasonable to back-port. >> > Not to be rude, but if backporting is not an option, why do we not just > focus on the job of getting autovacuum into 8.1, and not have to think > about how a patch that will warn users will work? Unless I'm mistaken, even "autovacuum" may not be enough. AFAIK, autovacuum depends on the statistics daemon, and some admins may turn that off for performance. Even so, how unlikely is it that autovacuum doesn't run. I think there should be a 100% no data loss fail safe. Anything less is a cop-out. I can't see one successful argument that starts with data loss and ends with maintenence.
Russell Smith wrote: >On Sat, 19 Feb 2005 04:10 am, Tom Lane wrote: > > >>pgsql@mohawksoft.com writes: >> >> >>>In fact, I think it is so bad, that I think we need to back-port a fix to >>>previous versions and issue a notice of some kind. >>> >>> >>They already do issue notices --- see VACUUM. >> >>A real fix (eg the forcible stop we were talking about earlier) will not >>be reasonable to back-port. >> >> >> >Not to be rude, but if backporting is not an option, why do we not just >focus on the job of getting autovacuum into 8.1, and not have to think >about how a patch that will warn users will work? > > > > What if autovacuum is turned off for some reason? Or fails? A more graceful failure along the lines suggested would be a good thing, ISTM. I agree with Tom about not backpatching, though. The situation seems analogous with a car owner who neglects the clear instructions in the manual to perform regular oil changes and then finds to his great surprise that the car stops running. It's hardly the manufacturer's fault. cheers andrew
In <4216A911.9020704@dunslane.net>, on 02/18/05 at 09:48 PM, Andrew Dunstan <andrew@dunslane.net> said: >Russell Smith wrote: >>On Sat, 19 Feb 2005 04:10 am, Tom Lane wrote: >> >> >>>pgsql@mohawksoft.com writes: >>> >>> >>>>In fact, I think it is so bad, that I think we need to back-port a fix to >>>>previous versions and issue a notice of some kind. >>>> >>>> >>>They already do issue notices --- see VACUUM. >>> >>>A real fix (eg the forcible stop we were talking about earlier) will not >>>be reasonable to back-port. >>> >>> >>> >>Not to be rude, but if backporting is not an option, why do we not just >>focus on the job of getting autovacuum into 8.1, and not have to think >>about how a patch that will warn users will work? >> >> >> >> >What if autovacuum is turned off for some reason? Or fails? A more >graceful failure along the lines suggested would be a good thing, ISTM. >I agree with Tom about not backpatching, though. The situation seems >analogous with a car owner who neglects the clear instructions in the >manual to perform regular oil changes and then finds to his great >surprise that the car stops running. It's hardly the manufacturer's >fault. >cheers >andrew I never did like car analogies... At least a car comes with an IDIOT light or maybe even an oil pressure gauge. Something like that (the idiot light) is missing from postgreSQL. The oil pressure gauge would be good to have, kind of like a gauge that lets you when you are about to run out of fuel. Of course the best thing is the auto-fill gas tank. -- ----------------------------------------------------------- lsunley@mb.sympatico.ca -----------------------------------------------------------
pgsql@mohawksoft.com writes: > I think there should be a 100% no data loss fail safe. Possibly we need to recalibrate our expectations here. The current situation is that PostgreSQL will not lose data if: 1. Your disk drive doesn't screw up (eg, lie about write complete, or just plain die on you).2. Your kernel and filesystemdon't screw up.3. You follow the instructions about routine vacuuming.4. You don't hit any bugs that we don't knowabout. I agree that it's a nice idea to be able to eliminate assumption #3 from our list of gotchas, but the big picture is that it's hard to believe that doing this will make for a quantum jump in the overall level of reliability. I think I listed the risks in roughly the right order of severity ... I'm willing to fix this for 8.1 (and am already in process of drafting a patch), especially since it ties into some other known problems such as the pg_pwd/pg_group files not being properly reconstructed after PITR recovery. But I think that a "Chinese fire drill" is not called for, and backpatching a significant but poorly tested change falls into that category IMHO. regards, tom lane
> pgsql@mohawksoft.com writes: >> I think there should be a 100% no data loss fail safe. OK, maybe I was overly broad in my statement, but I assumed a context that I guess you missed. Don't you think that in normal operations, i.e. with no hardware of OS failure, we should see any data loss as unacceptable? If a bug causes data loss, it is a big deal right? > > Possibly we need to recalibrate our expectations here. The current > situation is that PostgreSQL will not lose data if: > > 1. Your disk drive doesn't screw up (eg, lie about write complete, > or just plain die on you). > 2. Your kernel and filesystem don't screw up. > 3. You follow the instructions about routine vacuuming. > 4. You don't hit any bugs that we don't know about. > See, here is where I strongly disagree.Items (1) and (2) are completely out of our control and no one would blame PostgreSQL. Item (4) is an issue with all software, every now and then people hit bugs and the bug is reported and assumed to get fixed. Item (3) is just nasty, RTFM or else sucka! I think it is a very user hostile stance. > I agree that it's a nice idea to be able to eliminate assumption #3 from > our list of gotchas, but the big picture is that it's hard to believe > that doing this will make for a quantum jump in the overall level of > reliability. I think I listed the risks in roughly the right order of > severity ... Sometimes the edge conditions of a problem are not so obscure. I think (3) is a huge issue, iamgine I'm in this meeting: DBA: We can't use PostgreSQL, if we forget to do normal maintenence we'll lose all our data. ME: Well, there is an amount of truth in that, but we just won't forget. DBA: Sorry, I don't trust it. CTO: Mark, I think joe has some serious issues that need to be resolved before we can move on this. Boom!! Lost. > > I'm willing to fix this for 8.1 (and am already in process of drafting a > patch), especially since it ties into some other known problems such as > the pg_pwd/pg_group files not being properly reconstructed after PITR > recovery. But I think that a "Chinese fire drill" is not called for, > and backpatching a significant but poorly tested change falls into that > category IMHO. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
[ Shrugs ] and looks at other database systems ... CA has put Ingres into Open Source last year. Very reliable system with a replicator worth looking at. Just a thought. -------- Ursprüngliche Nachricht -------- Betreff: Re: [HACKERS] Data loss, vacuum, transaction wrap-around Datum: Sat, 19 Feb 2005 09:15:14 -0500 (EST) Von: pgsql@mohawksoft.com An: "Tom Lane" <tgl@sss.pgh.pa.us> CC: pgsql@mohawksoft.com, "Russell Smith" <mr-russ@pws.com.au>, pgsql-hackers@postgresql.org Referenzen: <16672.24.91.171.78.1108743398.squirrel@mail.mohawksoft.com> <19175.1108746609@sss.pgh.pa.us> <200502191302.58099.mr-russ@pws.com.au> <16585.24.91.171.78.1108780763.squirrel@mail.mohawksoft.com> <6391.1108784131@sss.pgh.pa.us> > pgsql@mohawksoft.com writes:>> I think there should be a 100% no data loss fail safe. OK, maybe I was overly broad in my statement, but I assumed a context that I guess you missed. Don't you think that in normal operations, i.e. with no hardware of OS failure, we should see any data loss as unacceptable? If a bug causes data loss, it is a big deal right?>> Possibly we need to recalibrate our expectations here. The current>situation is that PostgreSQL will not lose data if:>> 1. Your disk drive doesn't screw up (eg, lie about writecomplete,> or just plain die on you).> 2. Your kernel and filesystem don't screw up.> 3. You follow theinstructions about routine vacuuming.> 4. You don't hit any bugs that we don't know about.> See, here is where I strongly disagree.Items (1) and (2) are completely out of our control and no one would blame PostgreSQL. Item (4) is an issue with all software, every now and then people hit bugs and the bug is reported and assumed to get fixed. Item (3) is just nasty, RTFM or else sucka! I think it is a very user hostile stance. > I agree that it's a nice idea to be able to eliminate assumption #3 from> our list of gotchas, but the big picture is thatit's hard to believe> that doing this will make for a quantum jump in the overall level of> reliability. I think I listedthe risks in roughly the right order of> severity ... Sometimes the edge conditions of a problem are not so obscure. I think (3) is a huge issue, iamgine I'm in this meeting: DBA: We can't use PostgreSQL, if we forget to do normal maintenence we'll lose all our data. ME: Well, there is an amount of truth in that, but we just won't forget. DBA: Sorry, I don't trust it. CTO: Mark, I think joe has some serious issues that need to be resolved before we can move on this. Boom!! Lost. >> I'm willing to fix this for 8.1 (and am already in process of drafting a> patch), especially since it ties into some otherknown problems such as> the pg_pwd/pg_group files not being properly reconstructed after PITR> recovery. But I thinkthat a "Chinese fire drill" is not called for,> and backpatching a significant but poorly tested change falls into that>category IMHO.>> regards, tom lane>> ---------------------------(end of broadcast)--------------------------->TIP 7: don't forget to increase your free space map settings> ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Fri, 18 Feb 2005 22:35:31 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > pgsql@mohawksoft.com writes: > > I think there should be a 100% no data loss fail safe. > > Possibly we need to recalibrate our expectations here. The current > situation is that PostgreSQL will not lose data if: > > 1. Your disk drive doesn't screw up (eg, lie about write complete, > or just plain die on you). > 2. Your kernel and filesystem don't screw up. > 3. You follow the instructions about routine vacuuming. > 4. You don't hit any bugs that we don't know about. > I'm not an expert but a happy user. My opinion is: 1) there is nothing to do with #1 and #2. 2) #4 is not a big problem because of the velocity developers fix those when a bug is found. 3) All databases has some type of maintenance routine, in informix for example we have (update statistics, and there are others for oracle) of course they are for performance reasons, but vacuum is too for that and additionally give us the XID wraparound. So, to have a maintenance routine in PostgreSQL is not bad. *Bad* is to have a DBA(1) with no clue about the tool is using. Tools that do to much are an incentive in hire *no clue* people. (1) DBA: DataBase Administrator or DataBase Aniquilator??? regards, Jaime Casanova
In <c2d9e70e050219112379204df4@mail.gmail.com>, on 02/19/05 at 02:23 PM, Jaime Casanova <systemguards@gmail.com> said: >On Fri, 18 Feb 2005 22:35:31 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >pgsql@mohawksoft.com writes: >> > I think there should be a 100% no data loss fail safe. >> >> Possibly we need to recalibrate our expectations here. The current >> situation is that PostgreSQL will not lose data if: >> >> 1. Your disk drive doesn't screw up (eg, lie about write complete, >> or just plain die on you). >> 2. Your kernel and filesystem don't screw up. >> 3. You follow the instructions about routine vacuuming. >> 4. You don't hit any bugs that we don't know about. >> >I'm not an expert but a happy user. My opinion is: >1) there is nothing to do with #1 and #2. >2) #4 is not a big problem because of the velocity developers fix those >when a bug is found. >3) All databases has some type of maintenance routine, in informix for >example we have (update statistics, and there are others for oracle) of >course they are for performance reasons, but vacuum is too for that and >additionally give us the XID wraparound. >So, to have a maintenance routine in PostgreSQL is not bad. *Bad* is to >have a DBA(1) with no clue about the tool is using. Tools that do to much >are an incentive in hire *no clue* people. >(1) DBA: DataBase Administrator or DataBase Aniquilator??? >regards, >Jaime Casanova Bad mouthing the people who use your software is a good way to make sure no one uses the software. The catastrophic failure of the database because a maintenence function is not performed is a problem with the software, not with the people using it. -- ----------------------------------------------------------- lsunley@mb.sympatico.ca -----------------------------------------------------------
On Sat, Feb 19, 2005 at 13:35:25 -0500, lsunley@mb.sympatico.ca wrote: > > The catastrophic failure of the database because a maintenence function is > not performed is a problem with the software, not with the people using > it. There doesn't seem to be disagreement that something should be done going forward. The disagreement sems to be what effort should be made in back porting fixes to previous versions.
> On Fri, 18 Feb 2005 22:35:31 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> pgsql@mohawksoft.com writes: >> > I think there should be a 100% no data loss fail safe. >> >> Possibly we need to recalibrate our expectations here. The current >> situation is that PostgreSQL will not lose data if: >> >> 1. Your disk drive doesn't screw up (eg, lie about write >> complete, >> or just plain die on you). >> 2. Your kernel and filesystem don't screw up. >> 3. You follow the instructions about routine vacuuming. >> 4. You don't hit any bugs that we don't know about. >> > I'm not an expert but a happy user. My opinion is: > 1) there is nothing to do with #1 and #2. > 2) #4 is not a big problem because of the velocity developers fix > those when a bug is found. > > 3) All databases has some type of maintenance routine, in informix for > example we have (update statistics, and there are others for oracle) > of course they are for performance reasons, but vacuum is too for that > and additionally give us the XID wraparound. > So, to have a maintenance routine in PostgreSQL is not bad. *Bad* is > to have a DBA(1) with no clue about the tool is using. Tools that do > to much are an incentive in hire *no clue* people. > > (1) DBA: DataBase Administrator or DataBase Aniquilator??? PostgreSQL is such an awesome project. The only thing it seems to suffer from is a disregard for its users.
> On Sat, Feb 19, 2005 at 13:35:25 -0500, > lsunley@mb.sympatico.ca wrote: >> >> The catastrophic failure of the database because a maintenence function >> is >> not performed is a problem with the software, not with the people using >> it. > > There doesn't seem to be disagreement that something should be done going > forward. > > The disagreement sems to be what effort should be made in back porting > fixes to previous versions. Now, lets imagine PostgreSQL is being developed by a large company. QA announces it has found a bug that will cause all the users data to disappear if they don't run a maintenence program correctly. Vacuuming one or two tables is not enough, you have to vacuum all tables in all databases. This bug would get marked as a critical error and a full scale effort would be made to contact previous users to upgrade or check their procedures.
pgsql@mohawksoft.com wrote: > >PostgreSQL is such an awesome project. The only thing it seems to suffer >from is a disregard for its users. > > > > Mark, This is completely untrue and very offensive. Here's a tip I've often found useful even though I have also often ignored it (and later regretted doing so): before you hit the Send button, take a few deep breaths and count to 10. There is no news in the problem you're complaining of. It's completely known and documented. You've stated before that you've been using PostgreSQL for years - why is this suddenly so urgent that we have to drop everything and backpatch old releases? Please move along, there's nothing to see here, these are not the bugs you've been looking for. cheers andrew
Andrew Dunstan wrote: > There is no news in the problem you're complaining of. It's completely > known and documented. You've stated before that you've been using > PostgreSQL for years - why is this suddenly so urgent that we have to > drop everything and backpatch old releases? Please move along, there's > nothing to see here, these are not the bugs you've been looking for. To be fair to Mark, there does seem to be an increasing number of reports of this issue. In spite of the in-the-works fix for 8.1, it would be a pity to see customers losing data from xid wrap-around. However the quandary is this : even if we did back patches for every version, said customers probably wouldn't know they needed to apply them - hmmm, not much help there. We might be better off merely announcing the need to use vacuumdb on www.postgresql.org! regards Mark (the other one)
> [ Shrugs ] and looks at other database systems ... > > CA has put Ingres into Open Source last year. > > Very reliable system with a replicator worth looking at. > > Just a thought. The discussion on hackers is how to make PostgreSQL better. There are many different perspectives, differences are argued and concensus reached, and a better PostgreSQL emerges. Going to another database would be counter productive to the process.
Mark Kirkwood <markir@coretech.co.nz> writes: > To be fair to Mark, there does seem to be an increasing number of > reports of this issue. In spite of the in-the-works fix for 8.1, it > would be a pity to see customers losing data from xid wrap-around. The question is whether we are willing to back-patch a fairly large amount of not-very-well-tested code into 8.0. See http://archives.postgresql.org/pgsql-patches/2005-02/msg00123.php http://archives.postgresql.org/pgsql-committers/2005-02/msg00127.php http://archives.postgresql.org/pgsql-committers/2005-02/msg00131.php I personally don't think it's worth the risk. The code works well enough to commit to development tip, but it's fundamentally alpha quality code. regards, tom lane
On Sat, Feb 19, 2005 at 18:04:42 -0500, > > Now, lets imagine PostgreSQL is being developed by a large company. QA > announces it has found a bug that will cause all the users data to > disappear if they don't run a maintenence program correctly. Vacuuming one > or two tables is not enough, you have to vacuum all tables in all > databases. Except that Postgres isn't a large company and doing the work of back patching and testing old versions will be done instead of more important work. > This bug would get marked as a critical error and a full scale effort > would be made to contact previous users to upgrade or check their > procedures. I don't think all commercial companies would do that. I doubt that even most of them would.
> On Sat, Feb 19, 2005 at 18:04:42 -0500, >> >> Now, lets imagine PostgreSQL is being developed by a large company. QA >> announces it has found a bug that will cause all the users data to >> disappear if they don't run a maintenence program correctly. Vacuuming >> one >> or two tables is not enough, you have to vacuum all tables in all >> databases. > > Except that Postgres isn't a large company and doing the work of > back patching and testing old versions will be done instead of > more important work. PostgreSQL is an open source project that plays with the big guys. Look at the Linux kernel. Imagine their file system guys thinking this way. Linux would still be Linus' hobbie. > >> This bug would get marked as a critical error and a full scale effort >> would be made to contact previous users to upgrade or check their >> procedures. > > I don't think all commercial companies would do that. I doubt that even > most of them would. Database companies? You bet they would.
On Sunday 20 February 2005 00:30, Tom Lane wrote: > Mark Kirkwood <markir@coretech.co.nz> writes: > > To be fair to Mark, there does seem to be an increasing number of > > reports of this issue. In spite of the in-the-works fix for 8.1, it > > would be a pity to see customers losing data from xid wrap-around. > > The question is whether we are willing to back-patch a fairly large > amount of not-very-well-tested code into 8.0. See > http://archives.postgresql.org/pgsql-patches/2005-02/msg00123.php > http://archives.postgresql.org/pgsql-committers/2005-02/msg00127.php > http://archives.postgresql.org/pgsql-committers/2005-02/msg00131.php > > I personally don't think it's worth the risk. The code works well > enough to commit to development tip, but it's fundamentally alpha > quality code. > I would lean away from putting it in 8.0, however aren't we planning an 8.0.x release that will have a beta and/or rc testing for arc related changes? If so I might be open to putting it in that release (though the bits requiring initdb are a killer). -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On Sun, 20 Feb 2005 pgsql@mohawksoft.com wrote: > > On Sat, Feb 19, 2005 at 18:04:42 -0500, > >> > >> Now, lets imagine PostgreSQL is being developed by a large company. QA > >> announces it has found a bug that will cause all the users data to > >> disappear if they don't run a maintenence program correctly. Vacuuming > >> one > >> or two tables is not enough, you have to vacuum all tables in all > >> databases. > > > > Except that Postgres isn't a large company and doing the work of > > back patching and testing old versions will be done instead of > > more important work. > > PostgreSQL is an open source project that plays with the big guys. Look at > the Linux kernel. Imagine their file system guys thinking this way. Linux > would still be Linus' hobbie. So, you are certain that every Linux file system bug has been patched all the way back to say kernel version 1.0 then? Do you have any evidence of this claim? > >> This bug would get marked as a critical error and a full scale effort > >> would be made to contact previous users to upgrade or check their > >> procedures. > > > > I don't think all commercial companies would do that. I doubt that even > > most of them would. > > Database companies? You bet they would. Do you have any evidence or are you merely spouting an opinion as fact?
> On Sun, 20 Feb 2005 pgsql@mohawksoft.com wrote: > >> > On Sat, Feb 19, 2005 at 18:04:42 -0500, >> >> >> >> Now, lets imagine PostgreSQL is being developed by a large company. >> QA >> >> announces it has found a bug that will cause all the users data to >> >> disappear if they don't run a maintenence program correctly. >> Vacuuming >> >> one >> >> or two tables is not enough, you have to vacuum all tables in all >> >> databases. >> > >> > Except that Postgres isn't a large company and doing the work of >> > back patching and testing old versions will be done instead of >> > more important work. >> >> PostgreSQL is an open source project that plays with the big guys. Look >> at >> the Linux kernel. Imagine their file system guys thinking this way. >> Linux >> would still be Linus' hobbie. > > So, you are certain that every Linux file system bug has been patched all > the way back to say kernel version 1.0 then? Do you have any evidence of > this claim? No one is suggesting back to version 1.0, but critical data loss bugs that are present and relvent in used prior versions are fixed. > >> >> This bug would get marked as a critical error and a full scale effort >> >> would be made to contact previous users to upgrade or check their >> >> procedures. >> > >> > I don't think all commercial companies would do that. I doubt that >> even >> > most of them would. >> >> Database companies? You bet they would. > > Do you have any evidence or are you merely spouting an opinion as fact? > With Oracle and DB2, yes I have some personal experience.
On Sun, 20 Feb 2005 pgsql@mohawksoft.com wrote: > > On Sun, 20 Feb 2005 pgsql@mohawksoft.com wrote: > > > >> > On Sat, Feb 19, 2005 at 18:04:42 -0500, > >> >> > >> >> Now, lets imagine PostgreSQL is being developed by a large company. > >> QA > >> >> announces it has found a bug that will cause all the users data to > >> >> disappear if they don't run a maintenence program correctly. > >> Vacuuming > >> >> one > >> >> or two tables is not enough, you have to vacuum all tables in all > >> >> databases. > >> > > >> > Except that Postgres isn't a large company and doing the work of > >> > back patching and testing old versions will be done instead of > >> > more important work. > >> > >> PostgreSQL is an open source project that plays with the big guys. Look > >> at > >> the Linux kernel. Imagine their file system guys thinking this way. > >> Linux > >> would still be Linus' hobbie. > > > > So, you are certain that every Linux file system bug has been patched all > > the way back to say kernel version 1.0 then? Do you have any evidence of > > this claim? > > No one is suggesting back to version 1.0, but critical data loss bugs that > are present and relvent in used prior versions are fixed. I still doubt your claim about patching and youhaven't given any evidence, but let's just make the assumption it's true because otherwise even trying to hold a discussion is fruitless. I also dispute your claim based on the backpatching claim that Linux would be Linus' hobby if the file system guys thought this way. Given that "stable" Linux branches often aren't, if there weren't aggregators who provide upgrades that are at least supposedly tested and reasonably stable and sometimes did their own back patching, Linux wouldn't have the sort of success it does because people would have to do alot more choosing between getting bug fix X and doing huge amounts of tests to make sure nothing else is broken. Thus, I believe you are greatly overstating the effect that your first claim has towards your second to the point of making an invalid argument. Personally, I'd in general wish that 8.0 got a fix for this because that way we could (after sufficient testing) push an 8.0 version that we considered stable to suggest people move to. However, I don't have a whole lot of time to do such a patch nor to do "sufficient" testing, and I'm not arrogant enough to believe I can order around volunteers and companies I'm not a customer of. > >> >> This bug would get marked as a critical error and a full scale effort > >> >> would be made to contact previous users to upgrade or check their > >> >> procedures. > >> > > >> > I don't think all commercial companies would do that. I doubt that > >> even > >> > most of them would. > >> > >> Database companies? You bet they would. > > > > Do you have any evidence or are you merely spouting an opinion as fact? > > > With Oracle and DB2, yes I have some personal experience. My last company's experience with Oracle support still leaves me questioning that claim. They basically got "don't do that then or move to the newest major revision" when they had a construct which caused the server to stop responding. It's not the same conditions (although I believe the DBA did reload from backup because noone could guarantee that there couldn't possibly have been dataloss), but it's certainly not indicative of the sort of "full scale efforts" you're describing.
Tom Lane wrote: > The question is whether we are willing to back-patch a fairly large > amount of not-very-well-tested code into 8.0. See > http://archives.postgresql.org/pgsql-patches/2005-02/msg00123.php > http://archives.postgresql.org/pgsql-committers/2005-02/msg00127.php > http://archives.postgresql.org/pgsql-committers/2005-02/msg00131.php > > I personally don't think it's worth the risk. The code works well > enough to commit to development tip, but it's fundamentally alpha > quality code. I think this makes the most sense. If we are going to do an extended testing period for 8.0.without-arc then bundling it in there might worth considering. regards Mark
On Feb 20, 2005, at 11:02 AM, Stephan Szabo wrote: > My last company's experience with Oracle support still leaves me > questioning that claim. They basically got "don't do that then or > move to > the newest major revision" when they had a construct which caused the > server to stop responding. For the record, that is the kind of support I've got from Informix in the past. Even for issues where the db would reliably return invalid results. What is great is I have one informix db that I cannot take a backup of without causing several DAYS of downtime. Their solution: Do an in-place upgrade and hope it works. Yes, they actually said "And hopefully it will work. You'll need to take a backup to be safe" "But I can't take a backup. That is the point of this call" "Oh, well... umm.. it SHOULD work!" you get the idea. as for the xid wraparound... issuing a NOTICE / ERROR in a new version will be good, but backpatching won't be needed. As others have said, the people who really need this are not smart enough to upgrade / watch for patches / RTFM -- Jeff Trout <jeff@jefftrout.com> http://www.jefftrout.com/ http://www.stuarthamm.net/
Jeff wrote: > > On Feb 20, 2005, at 11:02 AM, Stephan Szabo wrote: > > > My last company's experience with Oracle support still leaves me > > questioning that claim. They basically got "don't do that then or > > move to > > the newest major revision" when they had a construct which caused the > > server to stop responding. > > For the record, that is the kind of support I've got from Informix in > the past. > Even for issues where the db would reliably return invalid results. > > What is great is I have one informix db that I cannot take a backup of > without causing several DAYS of downtime. Their solution: Do an > in-place upgrade and hope it works. Yes, they actually said "And > hopefully it will work. You'll need to take a backup to be safe" "But I > can't take a backup. That is the point of this call" "Oh, well... > umm.. it SHOULD work!" "Uh, did we mention we are sure it would work in DB2." :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, Feb 19, 2005 at 04:49:03PM -0500, pgsql@mohawksoft.com wrote: > PostgreSQL is such an awesome project. The only thing it seems to suffer > from is a disregard for its users. Gee. And all this time I thought that "free support from the guy who wrote the code and gave it to you" was better regard for the users that "cryptic support by someone whos is reading a script and who's afraid of the legal department." Silly me. A -- Andrew Sullivan | ajs@crankycanuck.ca When my information changes, I alter my conclusions. What do you do sir? --attr. John Maynard Keynes