Thread: Thoughts on maintaining 7.3
Hello, With the recent stint of pg_upgrade statements and the impending release of 7.4 what do people think about having a dedicated maintenance team for 7.3? 7.3 is a pretty solid release and I think people will be hard pressed to upgrade to 7.4. Of course a lot of people will, but I have customer that are just now upgrading to 7.3 because of legacy application and migratory issues. Anyway I was considering a similar situation to how Linux works where their is a maintainer for each release... Heck even Linux 2.0 still released until recently. Of course the theory being that we backport "some" features and fix any bugs that we find? What are people's thoughts on this? SIncerley, Joshua Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming, shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com The most reliable support for the most reliable Open Source database.
On Tue, 30 Sep 2003, Joshua D. Drake wrote: > Hello, > > With the recent stint of pg_upgrade statements and the impending > release of 7.4 what do people think about having a dedicated maintenance > team for 7.3? 7.3 is a pretty solid release and I think people will be > hard pressed to upgrade to 7.4. Of course a lot of people will, but I > have customer that are just now upgrading to 7.3 because of legacy > application and migratory issues. > > Anyway I was considering a similar situation to how Linux works where > their is a maintainer for each release... Heck even Linux 2.0 still > released until recently. > > Of course the theory being that we backport "some" features and fix > any bugs that we find? > > What are people's thoughts on this? The key issue here is that those creating the patches need to spend the time to create appropriate ones for v7.3, and not many seem willing ... Tom generally does alot of work on back-patching where appropriate, but those patches are generally either very critical, or benign to changes since v7.3 ... The main detractor from us doing this up to this point has been, I believe, testing to make sure any back patches don't break *any* of the various OS ports, testing that generally only gets done while in a Beta freeze ... Not saying that if someone submit'd patches to v7.3, they wouldn't get applied ... only that, to date, the work/effort has been greater then the overall benefit, and nobody has step'd up to the plate to do it ...
On Wed, 2003-10-01 at 08:36, Marc G. Fournier wrote: > > > On Tue, 30 Sep 2003, Joshua D. Drake wrote: > > > Hello, > > > > With the recent stint of pg_upgrade statements and the impending > > release of 7.4 what do people think about having a dedicated maintenance > > team for 7.3? 7.3 is a pretty solid release and I think people will be > > hard pressed to upgrade to 7.4. Of course a lot of people will, but I > > have customer that are just now upgrading to 7.3 because of legacy > > application and migratory issues. > > > > Anyway I was considering a similar situation to how Linux works where > > their is a maintainer for each release... Heck even Linux 2.0 still > > released until recently. > > > > Of course the theory being that we backport "some" features and fix > > any bugs that we find? > > > > What are people's thoughts on this? > > The key issue here is that those creating the patches need to spend the > time to create appropriate ones for v7.3, and not many seem willing ... > Tom generally does alot of work on back-patching where appropriate, but > those patches are generally either very critical, or benign to changes > since v7.3 ... > > The main detractor from us doing this up to this point has been, I > believe, testing to make sure any back patches don't break *any* of the > various OS ports, testing that generally only gets done while in a Beta > freeze ... > > Not saying that if someone submit'd patches to v7.3, they wouldn't get > applied ... only that, to date, the work/effort has been greater then the > overall benefit, and nobody has step'd up to the plate to do it ... Maybe I've mis-read Joshua's intentions, but I got the impression that this 7.3 maintainer would follow the patches list and backport patches whenever possible. This way folks coding for 7.4/7.5 can stay focused on that, but folks who can't upgrade to 7.4 for whatever reason can still get some features / improvements. Several linux distros already do this for many packages, and personally I've always been surprised that, given postgresql's major release upgrade issues, that no commercial company has stepped in to offer this in the past. I think what Joshua is wondering is how much cooperation would he get from the community if he was willing to donate these efforts back into project. While your concerns about testing are valid, there are already issues with that for minor releases, as evidenced by our need to do the quick 7.3.4 after trouble in 7.3.3. Not to mention how little testing is happening to the code that's been back patched into 7.3 since 7.3.4... Hmm... maybe thats actually an argument against having more changes get put in, OTOH if Joshua can address the testing issues maybe there would be an overall improvement. I personally think it's a good idea for *someone* to do this, but I'll leave it to core to decide if they want to put the projects stamp of approval on it for any official community release. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On Tue, 30 Sep 2003, Joshua D. Drake wrote: > Hello, > > With the recent stint of pg_upgrade statements and the impending > release of 7.4 what > do people think about having a dedicated maintenance team for 7.3? 7.3 > is a pretty > solid release and I think people will be hard pressed to upgrade to 7.4. > Of course > a lot of people will, but I have customer that are just now upgrading to > 7.3 because > of legacy application and migratory issues. > > Anyway I was considering a similar situation to how Linux works where > their is a > maintainer for each release... Heck even Linux 2.0 still released until > recently. > > Of course the theory being that we backport "some" features and fix > any bugs that > we find? > > What are people's thoughts on this? It seems to me the upgrade from 7.2 to 7.4 is easier than an upgrade to 7.3, since at least 7.4's pg_dumpall can connect to a 7.2 database and suck in everything, whereas in 7.3 I had to dump with 7.2's dumpall and then tweak the file by hand a fair bit to get it to go into 7.3. With 7.4 I'm finding upgrading to be easier. I'll likely upgrade out production servers to 7.4.0 when it comes out and wind up skipping 7.3 altogether.
On Wed, 1 Oct 2003, Robert Treat wrote: > Maybe I've mis-read Joshua's intentions, but I got the impression that > this 7.3 maintainer would follow the patches list and backport patches > whenever possible. This way folks coding for 7.4/7.5 can stay focused on > that, but folks who can't upgrade to 7.4 for whatever reason can still > get some features / improvements. The problem, I think (and please note that I'm not against it, just playing major devil's advocate here) is that there have always been some major fundamental coding changes between releases that there are very few patches that are "back-patchable" without having to do some heavy re-writes ... > Several linux distros already do this for many packages, and personally > I've always been surprised that, given postgresql's major release > upgrade issues, that no commercial company has stepped in to offer this > in the past. I think what Joshua is wondering is how much cooperation > would he get from the community if he was willing to donate these > efforts back into project. Using Linux/FreeBS/Insert OS Here as an example is like comparing apples to oranges ... take FreeBSD as an example, since I know it ... 5.x has had some *major* re-writes to the kernel done to it, getting rid of 'the Giant Lock' that SMP in 4.x uses ... those changes are not back-patchable, since then you'd have 5.x ... there are alot of changes to the 5.x kernel that rely on those changes, and are therefore not *easily* back-patchable ... Now, userland software is a totally different case, since they are rarely "tied" to the kernel itself ... Think of PostgreSQL as the kernel, not as the distro ... how many changes from one kernel release ae easily patched into an older one, without having to take alot of other baggage back with it ... ? > I personally think it's a good idea for *someone* to do this, but I'll > leave it to core to decide if they want to put the projects stamp of > approval on it for any official community release. I don't believe anyone would work against this, nor could I imagine that anyone would think it was "a bad idea", I'm just curious as to how possible it is to do ...
"Marc G. Fournier" <scrappy@postgresql.org> writes: > On Tue, 30 Sep 2003, Joshua D. Drake wrote: >> Of course the theory being that we backport "some" features and fix >> any bugs that we find? > Not saying that if someone submit'd patches to v7.3, they wouldn't get > applied ... only that, to date, the work/effort has been greater then the > overall benefit, and nobody has step'd up to the plate to do it ... The idea of backporting features scares me; I really doubt that you can get enough beta-testing on a back branch to be confident that you haven't broken anything with a feature addition. In any case you'd be quite limited in what you could do without forcing an initdb. Another issue is that people expect dot-releases to be absolutely rock solid. If you start introducing new features then you considerably increase the risk of introducing new bugs. (I'm still embarrassed about 7.3.3's failure-to-start bug...) Our past practice has been to back-port only bug fixes, and only critical or low-risk ones at that. I think this could be done in a more thorough fashion, and it could be continued longer than we've done in the past, but you shouldn't set the scope of the maintenance effort any wider than that. regards, tom lane
On Wed, 2003-10-01 at 09:14, Robert Treat wrote: > Maybe I've mis-read Joshua's intentions, but I got the impression that > this 7.3 maintainer would follow the patches list and backport patches > whenever possible. This way folks coding for 7.4/7.5 can stay focused on > that, but folks who can't upgrade to 7.4 for whatever reason can still > get some features / improvements. I don't think there's a need for a formalized "7.3 maintainer" -- if individuals would like to see particular fixes backported to 7.3, they can read pgsql-patches and post backported patches themselves. If someone wants to go ahead and do that, I wouldn't complain. (Similarly, if there is enough demand for a commercial company to do something similar for their customers, that might also be a good idea). However, I think it's a bad idea to backport any features into older releases. The reason 7.3.x is really stable is precisely that it has had a lot of testing and bugfixing work done, but no new features. Furthermore, adding more features to 7.3.x reduces the incentive to upgrade to 7.4, worsening the support problem: the more people using old releases, the more demand there will be for backported features, leading to more people using 7.3, leading to more demand for ... (FWIW, I think that any energy we might spend on a 7.3 maintainer would be better directed at improving the upgrade story...) -Neil
On Wed, 2003-10-01 at 10:49, Neil Conway wrote: > On Wed, 2003-10-01 at 09:14, Robert Treat wrote: > > Maybe I've mis-read Joshua's intentions, but I got the impression that > > this 7.3 maintainer would follow the patches list and backport patches > > whenever possible. This way folks coding for 7.4/7.5 can stay focused on > > that, but folks who can't upgrade to 7.4 for whatever reason can still > > get some features / improvements. > > I don't think there's a need for a formalized "7.3 maintainer" -- if > individuals would like to see particular fixes backported to 7.3, they > can read pgsql-patches and post backported patches themselves. If > someone wants to go ahead and do that, I wouldn't complain. (Similarly, > if there is enough demand for a commercial company to do something > similar for their customers, that might also be a good idea). ok > > However, I think it's a bad idea to backport any features into older > releases. The reason 7.3.x is really stable is precisely that it has had > a lot of testing and bugfixing work done, but no new features. eh.. i could see some things, like tsearch2 or pg_autovacuum, which afaik are almost if not completely compatible with 7.3, which will not get back ported. Also fixes in some of the extra tools like psql could be very doable, I know I had a custom psql for 7.2 that back patched the \timing option and some of the pager fixes. now, weather that could be done with stuff closer to core, i don't know... btw personally i'm fine with these things not being packpatched, though if someone came out with a 7.3 pg_autovacuum rpm, or a 7.3 psql rpm, i'm sure a lot of people would use it. > Furthermore, adding more features to 7.3.x reduces the incentive to > upgrade to 7.4, worsening the support problem: the more people using old > releases, the more demand there will be for backported features, leading > to more people using 7.3, leading to more demand for ... > > (FWIW, I think that any energy we might spend on a 7.3 maintainer would > be better directed at improving the upgrade story...) > <homer>mmm. in place upgrade</homer> Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On Tue, Sep 30, 2003 at 09:37:26AM -0700, Joshua D. Drake wrote: > > Of course the theory being that we backport "some" features and fix > any bugs that > we find? I would argue _very strongly_ against backporting features. The backporting of features into the Linux kernel is an extremely good analogy in this case. Someone gets the clever idea that this or that feature from 2.1/2.3/2.5 is desperately needed in 2.0/2.2/2.4 and merrily goes about adding all sorts of new cruft to the so-called stable release. As a result, we have plenty of examples of massive filesystem corruption, modules that used to work and just plain don't any more, sudden surprise hardware incompatibilites, &c. All too frequently releases in the "stable" series are one right atop the other. What's worse, all these additional features are bound up with the important remote-root-type patches that make it into later releases of the kernel. As a result, it's a lot of work to compile a known-safe and known-clean kernel for use on one's own machines. Patching an older release to fix critical, data-mangling bugs is one thing. But if people want the latest nifty feature backported to an old release, let 'em pay the developer to do it in their private source tree, and not force on the rest of us the job of sorting out what crucial patches we need to apply to our old, pristine source of PostgreSQL 7.3.4. If you're really going to trust your database software, you do not allow new features to be added after having carefully teated all your applications against the system. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
> eh.. i could see some things, like tsearch2 or pg_autovacuum, which > afaik are almost if not completely compatible with 7.3, which will not > get back ported. Also fixes in some of the extra tools like psql could > be very doable, I know I had a custom psql for 7.2 that back patched the > \timing option and some of the pager fixes. now, weather that could be > done with stuff closer to core, i don't know... Sure but businesses don't like to upgrade unless they have too. If we really want to attract more business to using PostgreSQL then they need to feel like they don't have to upgrade every 12 months. Upgrading is expensive and it rarely goes as smoothly as a dump/restore. > > Furthermore, adding more features to 7.3.x reduces the incentive to > > upgrade to 7.4, worsening the support problem: the more people using old > > releases, the more demand there will be for backported features, leading > > to more people using 7.3, leading to more demand for ... I am considering a time limited type thing. Not open ended. Something like 18 or 24 months (max) from release of the new version. You can't expect business to consider that timeframe during the development of the new release. They want to see the new release in action for a period of time. They also want time to play with the new release without sacrificing support for the previous release. > <homer>mmm. in place upgrade</homer> In reality in place upgrade will never work. Sure we can build a script that will deal with PostgreSQL itself, but not user defined data types, operators, functions etc... Those are all things that need stable time to migrate and test. Sincerely, Joshua Drake > > Robert Treat > -- Co-Founder Command Prompt, Inc. The wheel's spinning but the hamster's dead
> I would argue _very strongly_ against backporting features. For massive features sure but an example of a feature that works very well and easily with 7.3 is the preloading of libs. Sincerely, Joshua Drake -- Co-Founder Command Prompt, Inc. The wheel's spinning but the hamster's dead
On Wed, 2003-10-01 at 09:41, Marc G. Fournier wrote: > On Wed, 1 Oct 2003, Robert Treat wrote: > > > Several linux distros already do this for many packages, and personally > > I've always been surprised that, given postgresql's major release > > upgrade issues, that no commercial company has stepped in to offer this > > in the past. I think what Joshua is wondering is how much cooperation > > would he get from the community if he was willing to donate these > > efforts back into project. > > Using Linux/FreeBS/Insert OS Here as an example is like comparing apples > to oranges ... take FreeBSD as an example, since I know it ... 5.x has had > some *major* re-writes to the kernel done to it, getting rid of 'the Giant > Lock' that SMP in 4.x uses ... those changes are not back-patchable, since > then you'd have 5.x ... there are alot of changes to the 5.x kernel that > rely on those changes, and are therefore not *easily* back-patchable ... > > Now, userland software is a totally different case, since they are rarely > "tied" to the kernel itself ... > you missed my point. some distro's (red hat, suse, mandrake, etc..) backpatch into their distributed packages separate of the packages original source tree. this is great for folks who may want/need a new change, but can't upgrade to latest source for some reason. > Think of PostgreSQL as the kernel, not as the distro ... how many changes > from one kernel release ae easily patched into an older one, without > having to take alot of other baggage back with it ... ? I wasn't thinking of PostgreSQL as a distro, but actually I think that view is somewhat valid, since there are enough add ons to core that one could modify without having to make huge changes. As Tom pointed out, with the restriction of not being able to initdb, you're probably pretty limited on what you can push back, but I think there's still enough there that folks might want to look at it. (The recent bugs in pltcl handling dropped columns come to mind, though maybe Tom backpatched those? Cant recall) Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On Wed, Oct 01, 2003 at 08:49:51AM -0700, Joshua D. Drake wrote: > > I would argue _very strongly_ against backporting features. > > For massive features sure but an example of a feature that works > very well and easily with 7.3 is the preloading of libs. Then let people patch the stable releases themselves, or pay companies to produce such mini-branches (and thereby pay the cost of the necessary testing, &c.). How does one know in advance which set of "working well and easily" features can be back ported and be sure not to break on some release of IRIX, Solaris, AIX, or SCO? Those are not platforms that get the kind of kicking that Linux and FreeBSD do, but people are still relying on the dot releases not to break anything on those platforms. I think that Postgres has a tradition that, when a release is stable, it's _stable, man_ -- a tradition that other software (commercial or not) should emulate. I'd hate to see that go overboard in an attempt to add features to the main releases. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Wed, 2003-10-01 at 11:48, Joshua D. Drake wrote: > Sure but businesses don't like to upgrade unless they have too. Granted, but maintaining old releases doesn't come at zero cost. It may benefit some users, but the relevant question is whether that benefit is worth the cost. The time someone spends backpatching changes into old releases (and thoroughly testing those changes, and fixing the regressions those changes cause) is presumably time that would otherwise be spent improving the latest release of PostgreSQL. So when the bugfix is important, has been well-tested, and is unlikely to cause regressions, backpatching the change to previous stable releases is a good idea. When this isn't the case (and even more so if it's a feature and not a bugfix), I don't think it justifies the cost (and the risk of destabilization) for most users. In summary, I think the status quo is basically okay. Perhaps we should backpatch a few more things, but we're basically in the right ballpark. > In reality in place upgrade will never work. Perhaps not, but the upgrade story can certainly be made more palatable. I think that's the actual problem here -- rather than skating around it by making it less necessary to do the upgrade in the first place, I think our time is better spent making upgrades as painless as possible. Just IMHO, of course (especially since I'm not particularly interested in doing the work on the upgrade process myself). -Neil
>With 7.4 I'm finding upgrading to be easier. I'll likely upgrade out >production servers to 7.4.0 when it comes out and wind up skipping 7.3 >altogether. > > Sure but I talking about people who are running 7.3 and are happy with it. The reality is that for probably 95% of the people out there , there is no reason for 7.4. When you have existing system that works... why upgrade? That is one of the benefits of Open Source stuff, we no longer get force into un-needed upgrade cycles. We use PostgreSQL for everything, and I don't have any inclination to upgrade to 7.4 except that it is 7.4. I only have two customers that will see any real benefit from going to 7.4. The rest are going to stay on 7.3 because they don't want: A. The downtime B. Unknown or unexpected problems C. A brand new database D. Migration costs When you deal with the systems I do, the cost to a customer to migrate to 7.4 would be in the minimum of 10,000-20,000 dollars. They start to ask why were upgrading with those numbers. That is not to say that 7.4 is not worth it from a technical sense but for my customers, "If it ain't broke, don't fix it" is a mantra and the reality is that 7.3 is not broke in their minds. There is limitations pg_dump/pg_restore has some issues, having to reindex the database (which 7.4 doesn't fix), vacuum (which 7.4 doesn't fix) but my customers accept them as that. Your mileage may vary but I can only talk from my experience. Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com The most reliable support for the most reliable Open Source database.
>Maybe I've mis-read Joshua's intentions, but I got the impression that >this 7.3 maintainer would follow the patches list and backport patches >whenever possible. This way folks coding for 7.4/7.5 can stay focused on >that, but folks who can't upgrade to 7.4 for whatever reason can still >get some features / improvements. > > And bug fixes but yes that is accurate. Sincerely, Joshua Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com The most reliable support for the most reliable Open Source database.
>I don't believe anyone would work against this, nor could I imagine that >anyone would think it was "a bad idea", I'm just curious as to how >possible it is to do ... > > For most things probably not that possible. For things like: Simple feature enhancements (preloading of libs) Fixing pl/Language bugs (and making sure they still work on 7.3) Buffer overflow fixes Security problems (the fact that alter user/createuser with encrypted password ' will go into a .psqlhistory file is horrendous) pg_dump/pg_restore enhancements Would entirely be possible. Sincerely, Joshua rake > >---------------------------(end of broadcast)--------------------------- >TIP 7: don't forget to increase your free space map settings > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com The most reliable support for the most reliable Open Source database.
Joshua D. Drake wrote: > > For most things probably not that possible. For things like: > > Simple feature enhancements (preloading of libs) How long is a piece of string? When does something stop being simple? > > Fixing pl/Language bugs (and making sure they still work on 7.3) > Buffer overflow fixes Everyone seems to agree that bugs should be fixed. > > Security problems (the fact that alter user/createuser with encrypted > password ' will go into a .psqlhistory file is horrendous) you can avoid this in the create case by using createuser -P instead of psql. Or by using psql -c (although that might put stuff in your shell history ;-) Maybe there's a good case for an alteruser counterpart to createuser. > > pg_dump/pg_restore enhancements > Which ones? If it is things known to be broken being fixed that comes under the bug fix category. cheers andrew
On Wed, 1 Oct 2003, Joshua D. Drake wrote: > > >With 7.4 I'm finding upgrading to be easier. I'll likely upgrade out > >production servers to 7.4.0 when it comes out and wind up skipping 7.3 > >altogether. > > > > > > Sure but I talking about people who are running 7.3 and are happy with > it. The reality is that for probably 95% of the people > out there , there is no reason for 7.4. When you have existing system > that works... why upgrade? That is one of the benefits > of Open Source stuff, we no longer get force into un-needed upgrade cycles. Agreed, we've been on 7.2 for a while now because it just works. The regex substring introduced in 7.3 was a pretty cool feature, for instance, that makes life easy. > When you deal with the systems I do, the cost to a customer to migrate > to 7.4 would be in the minimum of 10,000-20,000 dollars. > They start to ask why were upgrading with those numbers. then maybe they would be willing to donate some small amount each ($500 or so) to pay for backporting issues. Since mostly what I'd want on an older version would be bug / security fixes, that $500 should go a long way towards backporting. > That is not to say that 7.4 is not worth it from a technical sense but > for my customers, "If it ain't broke, don't fix it" is a mantra and > the reality is that 7.3 is not broke in their minds. There is > limitations pg_dump/pg_restore has some issues, having to reindex the > database > (which 7.4 doesn't fix), vacuum (which 7.4 doesn't fix) but my customers > accept them as that. I was under the imporession that 7.4 removed the need to reindex caused by monotonically increasing index keys, no? > Your mileage may vary but I can only talk from my experience. Yeah, I would rather have had more back porting to 7.2 because there were tons of little improvements form 7.2 to 7.3 I could have used while waiting for 7.4's improved pg_dumpall to come along. Cheers:-)
>then maybe they would be willing to donate some small amount each ($500 or >so) to pay for backporting issues. Since mostly what I'd want on an older >version would be bug / security fixes, that $500 should go a long way >towards backporting. > > Sure. >I was under the imporession that 7.4 removed the need to reindex caused by >monotonically increasing index keys, no? > > Someone else brought that up. Maybe I am misunderstanding something but it was my understanding that 7.4 fixes alot of the issues but one of the issues (index bloat) although improved is not entirely fixed and thus we would still need reindex? Tom am I on crack? >Yeah, I would rather have had more back porting to 7.2 because there were >tons of little improvements form 7.2 to 7.3 I could have used while >waiting for 7.4's improved pg_dumpall to come along. > > Well there ya go :) Sincerely, Joshua Drake >Cheers:-) > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
On Wed, Oct 01, 2003 at 11:53:12AM -0700, Joshua D. Drake wrote: > > >Eh? In 7.4 you should not need to reindex. > > I thought tom was saying that the index bloat was "better" in 7.4 but it > was not gone... thus we would still need reindex yes? The problem has been "corrected enough" for there to be no need to reindex, AFAIK. I think what Tom is concerned about is that this hasn't been tested enough with big datasets. Also there a little loss of index pages but it's much less (orders of magnitude, I think) than what was before. This is because the index won't shrink "vertically". -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman)
On Wed, 2003-10-01 at 15:31, Joshua D. Drake wrote: > > >then maybe they would be willing to donate some small amount each ($500 or > >so) to pay for backporting issues. Since mostly what I'd want on an older > >version would be bug / security fixes, that $500 should go a long way > >towards backporting. > > > > > Sure. > and the question as i thought was being discussed (or should be discussed) was what is the level of interest in having this work kept in the community cvs tree vs. someone else's quasi-forked branch... Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
>and the question as i thought was being discussed (or should be >discussed) was what is the level of interest in having this work kept in >the community cvs tree vs. someone else's quasi-forked branch... > > It is my thinking that regardless of commercial backing that the PostgreSQL project as a whole would gain better validity within the commercial world if we maintained releases longer. It is really irrelevant whether somebody pays me or you 500.00 buck to make a patch and submit it to the tree. What is relevant IMHO is that the community is backing a release for longer than 12-18 months. Yes a commercial company could just pick it up and say ... hey we will support it for x (Mammoth 7.3.4 is supported until 2005 for example) but I was more looking at this from an overall community perspective. Sincerely, Joshua D. Drake >Robert Treat > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
"Joshua D. Drake" <jd@commandprompt.com> writes: > ... having to reindex the database (which 7.4 doesn't fix), It's supposed to fix it. What are you expecting not to be fixed? regards, tom lane
Hello, When I was reading hackers about the fixes you had made, it stated that the index bloat problems should be better. I took that as meaning that although it won't be required nearly as often, we still may need to reindex occassionaly. It was later pointed out to me that this may not be the case, to wit I responded: Tom, am I on crack? Sincerely, Joshua Drake Tom Lane wrote: >"Joshua D. Drake" <jd@commandprompt.com> writes: > > >>... having to reindex the database (which 7.4 doesn't fix), >> >> > >It's supposed to fix it. What are you expecting not to be fixed? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
"Joshua D. Drake" <jd@commandprompt.com> writes: > When I was reading hackers about the fixes you had made, it stated > that the index bloat problems should be better. I took > that as meaning that although it won't be required nearly as often, we > still may need to reindex occassionaly. The critical word there is "may". The index compression code covers some cases and not others. Depending on your usage pattern you might or might not ever need to reindex. I *think* that most people won't need to reindex any more, but I'm waiting on field reports from 7.4 to find out for sure. In any case, people who aren't upgrading from 7.3 because they think 7.4 won't help them are making a self-fulfilling negative prophecy. regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > I think what Tom is concerned about is that this hasn't been tested > enough with big datasets. Also there a little loss of index pages but > it's much less (orders of magnitude, I think) than what was before. > This is because the index won't shrink "vertically". The fact that we won't remove levels shouldn't be meaningful at all --- I mean, if the index was once big enough to require a dozen btree levels, and you delete everything, are you going to be upset that it drops to 13 pages rather than 2? I doubt it. The reason I'm waffling about whether the problem is completely fixed or not is that the existing code will only remove-and-recycle completely empty btree pages. As long as you have one key left on a page it will stay there. So you could end up with ridiculously low percentage-filled situations. This could be fixed by collapsing together adjacent more-than-half-empty pages, but we ran into a lot of problems trying to do that in a concurrent fashion. So I'm waiting to find out if real usage patterns have a significant issue with this or not. For example, if you have a timestamp index and you routinely clean out all entries older than N-days-ago, you won't have a problem in 7.4. If your pattern is to delete nine out of every ten entries (maybe you drop minute-by-minute entries and keep only hourly entries after awhile) then you might find the index loading getting unpleasantly low. We'll have to see whether it's a problem in practice. I'm willing to revisit the page-merging problem if it's proven to be a real practical problem, but it looked hard enough that I think it's more profitable to spend the development effort elsewhere until it's proven necessary. regards, tom lane
Robert Treat <xzilla@users.sourceforge.net> writes: > and the question as i thought was being discussed (or should be > discussed) was what is the level of interest in having this work kept in > the community cvs tree vs. someone else's quasi-forked branch... I see no reason that the maintenance shouldn't be done in the community CVS archive. The problem is where to find the people who want to do it. Of course we have to trust those people enough to give them write access to the community archive, but if they can't be trusted with that, one wonders who's going to trust their work product either. regards, tom lane
> For example, if you have a timestamp index and you routinely clean out > all entries older than N-days-ago, you won't have a problem in 7.4. > If your pattern is to delete nine out of every ten entries (maybe you > drop minute-by-minute entries and keep only hourly entries after awhile) > then you might find the index loading getting unpleasantly low. We'll > have to see whether it's a problem in practice. I'm willing to revisit > the page-merging problem if it's proven to be a real practical problem, > but it looked hard enough that I think it's more profitable to spend the > development effort elsewhere until it's proven necessary. A pattern I have on a few tables is to record daily data. After a period of time, create an entry for a week that is the sums of 7 days, after another period of time compress 4 weeks into a month. Index is on the date representing the block. It's a new insert, but would go onto the old page. Anyway, I don't have that much data (~20M rows) -- but I believe it is a real-world example of this pattern.
Hello, Possible scenario for maintaining 7.3: Only one or two committers using a two stage cvs... one stage for testing (not including sandbox), one stage for commit. Scheduled releases based on non-critical fixes. Quarterly? Of course critical fixes should be released as soon as plausible. Separate mailing list for 7.3 issues, concerns etc... Which would help develop it's own temporary community. Thoughts? Joshua D. Drake Tom Lane wrote: >Robert Treat <xzilla@users.sourceforge.net> writes: > > >>and the question as i thought was being discussed (or should be >>discussed) was what is the level of interest in having this work kept in >>the community cvs tree vs. someone else's quasi-forked branch... >> >> > >I see no reason that the maintenance shouldn't be done in the community >CVS archive. The problem is where to find the people who want to do it. >Of course we have to trust those people enough to give them write access >to the community archive, but if they can't be trusted with that, one >wonders who's going to trust their work product either. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
On Thu, Oct 02, 2003 at 10:47:06 -0700, "Joshua D. Drake" <jd@commandprompt.com> wrote: > Hello, > > Possible scenario for maintaining 7.3: > > Only one or two committers using a two stage cvs... one stage for > testing (not including sandbox), one stage for commit. > Scheduled releases based on non-critical fixes. Quarterly? Of course > critical fixes should be released as soon as plausible. > > Separate mailing list for 7.3 issues, concerns etc... Which would help > develop it's own temporary community. > > Thoughts? It might be better to split into two different trees. One just gets bug fixes, the other gets bug fixes plus enhancements that won't require an initdb.
On Thu, Oct 02, 2003 at 02:15:33PM -0500, Bruno Wolff III wrote: > It might be better to split into two different trees. One just gets bug fixes, > the other gets bug fixes plus enhancements that won't require an initdb. Yes, please. Please, please do not force all users to accept new features in "stable" trees. -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
>Yes, please. Please, please do not force all users to accept new >features in "stable" trees. > > What if the feature does break compatibility with old features? What if it is "truly" a new feature? One example would be that we are considering reworking pg_dump/restore a bit to support batch uploads and interactive mode. It would not break compatibility with anything but would greatly enhance one's ability to actually backup and restore large volume sets. Sincerely, Joshua Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC - S/JDBC Postgresql support, programming, shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL.Org - Editor-N-Chief - http://www.postgresql.org
"Joshua D. Drake" <jd@commandprompt.com> writes: > >Yes, please. Please, please do not force all users to accept new > > features in "stable" trees. > What if the feature does break compatibility with old features? > What if it is "truly" a new feature? > > One example would be that we are considering reworking > pg_dump/restore a bit to support batch uploads and interactive mode. > It would not break compatibility with anything but would > greatly enhance one's ability to actually backup and restore > large volume sets. Well, since those are separate programs and not intimately tied to the backend, you could distribute them separately for people who need them... -Doug
On Fri, 3 Oct 2003, Joshua D. Drake wrote: > > >Yes, please. Please, please do not force all users to accept new > >features in "stable" trees. > > > > > What if the feature does break compatibility with old features? > What if it is "truly" a new feature? > > One example would be that we are considering reworking > pg_dump/restore a bit to support batch uploads and interactive mode. > It would not break compatibility with anything but would > greatly enhance one's ability to actually backup and restore > large volume sets. for stuff like this, why not just break off a gborg project for it, seperate from the distros? We could pull in the changes as beta starts on a dev cycle, but then pg_dump/pg_restore could be maintained on its own release cycle, and you could easily get 'back features' in like this ...
Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > I think what Tom is concerned about is that this hasn't been tested > > enough with big datasets. Also there a little loss of index pages but > > it's much less (orders of magnitude, I think) than what was before. > > This is because the index won't shrink "vertically". > > The fact that we won't remove levels shouldn't be meaningful at all --- > I mean, if the index was once big enough to require a dozen btree > levels, and you delete everything, are you going to be upset that it > drops to 13 pages rather than 2? I doubt it. > > The reason I'm waffling about whether the problem is completely fixed or > not is that the existing code will only remove-and-recycle completely > empty btree pages. As long as you have one key left on a page it will > stay there. So you could end up with ridiculously low percentage-filled > situations. This could be fixed by collapsing together adjacent > more-than-half-empty pages, but we ran into a lot of problems trying to > do that in a concurrent fashion. So I'm waiting to find out if real > usage patterns have a significant issue with this or not. Though the new code will put empty index pages into the free-space map, will it also shrink the index file to remove those pages? For example, if I have 200M rows in a table, and I delete all of them except 100, does the index shrink, or the pages just become available for reuse. With VACUUM FULL, we have a way to shrink the heap. Do we shrink the index? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Andrew Sullivan wrote: > On Thu, Oct 02, 2003 at 02:15:33PM -0500, Bruno Wolff III wrote: > > It might be better to split into two different trees. One just gets bug fixes, > > the other gets bug fixes plus enhancements that won't require an initdb. > > Yes, please. Please, please do not force all users to accept new > features in "stable" trees. One word of warning --- PostgreSQL has grown partially because we gain people but rarely lose them, and our stable releases help that. I was talking to someone about OS/X recently and the frequent breakage in their OS releases is hurting their adoption rate --- you hit one or two buggy releases in a row, and you start thinking about using something else --- same is true for buggy Linux kernels, which Andrew described earlier. If we are going to back-patch more aggressively, we _have_ to be sure that those back-patched releases have the same quality as all our other releases. I know people already know this, but it is worth mentioning specifically --- my point is that more agressive backpatching has risks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Fri, 3 Oct 2003, Andrew Sullivan wrote: > On Thu, Oct 02, 2003 at 02:15:33PM -0500, Bruno Wolff III wrote: > > It might be better to split into two different trees. One just gets bug fixes, > > the other gets bug fixes plus enhancements that won't require an initdb. > > Yes, please. Please, please do not force all users to accept new > features in "stable" trees. I wanted to say something similar earlier in this thread. To me the stable branches are not for feature introduction. If features are going to be introduced it is better to not have them applied in a manner which means a pure bug fix only version can't be obtained. Obviously this means having two branches if features are going to be introduced. I agree sometimes one looks at new developments and thinks how good it would be to have that feature, imagine what it'll be like when tablespaces are introduced and you're using the previous stable version, but those features need to be kept separate from the version that fixes that particularly nasty index corruption someone only provided a fix for 12 months after the version you have based your system around was released. One could argue that what is really needed is a collection of patches providing a pick and choose facility for features, with dependecies where unavoidable of course. The patches being applicable to the latest bug patched version of the stable branch. As an example take tsearch2. If that were core code, not optional, contrib material, and one was running a 7.3 series server but wanted the nifty features of tsearch2 instead of tsearch, would you expect all people upgrading within the stable 7.3 branch for bug fixes to be forced to use tsearch2 and not tsearch? -- Nigel J. Andrews
>If we are going to back-patch more aggressively, we _have_ to be sure >that those back-patched releases have the same quality as all our other >releases. > > I know that I am probably being semantic here but I in know way want to be more aggressive with back patching. My thoughts for 98% of things in on bugfixes within the existing tree only. Although I am sure for some things we can use (at least as a guide) code being written in 7.4. My whole purpose in bringing the idea up is to increase the adoption rate. My thought isn't to be more agressive per say, but more responsible in our releases. Like I said, I may be, being semantic. Sincerely, Joshua Drake >I know people already know this, but it is worth mentioning specifically >--- my point is that more agressive backpatching has risks. > > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Though the new code will put empty index pages into the free-space map, > will it also shrink the index file to remove those pages? If there are free pages at the end, yes --- but it won't move pages around. This is about the same story as for plain VACUUM ... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Though the new code will put empty index pages into the free-space map, > > will it also shrink the index file to remove those pages? > > If there are free pages at the end, yes --- but it won't move pages > around. This is about the same story as for plain VACUUM ... I know indexes behave the same as heap for vacuum. My point was that the vacuum full case is different. Vacuum full moves heap tuples from the end to fill slots and then frees the pages at the end via truncation. (100% compaction, guaranteed.) We can't move index tuples around like that, of course, so that leaves us with partially filled pages. Do we move empty index pages to the end before truncation during vacuum full? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Do we move empty index pages to the end before truncation during vacuum > full? No. You'd be better off using REINDEX for that, I think. IIRC we have speculated about making VAC FULL fix the indexes via REINDEX rather than indexbulkdelete. regards, tom lane
On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Do we move empty index pages to the end before truncation during vacuum > > full? > > No. You'd be better off using REINDEX for that, I think. IIRC we have > speculated about making VAC FULL fix the indexes via REINDEX rather than > indexbulkdelete. I can't agree with that idea. Imagine having to VACUUM FULL a huge table. Not only it will take the lot required to do the VACUUM in the heap itself, it will also have to rebuild all indexes from scratch. I think there are scenarios where the REINDEX will be much worse, say when there are not too many deleted tuples (but in that case, why is the user doing VACUUM FULL in the first place?). Of course there are also scenario where the opposite is true. I wonder if VACUUM FULL could choose what method to use based on some statistics. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Vivir y dejar de vivir son soluciones imaginarias. La existencia está en otra parte" (Andre Breton)
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Do we move empty index pages to the end before truncation during vacuum > > full? > > No. You'd be better off using REINDEX for that, I think. IIRC we have > speculated about making VAC FULL fix the indexes via REINDEX rather than > indexbulkdelete. I guess my point is that if you forget to run regular vacuum for a month, then realize the problem, you can just do a VACUUM FULL and the heap is back to a perfect state as if you had been running regular vacuum all along. That is not true of indexes. It would be nice if it would. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, Oct 04, 2003 at 11:17:09PM -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Do we move empty index pages to the end before truncation during vacuum > > > full? > > > > No. You'd be better off using REINDEX for that, I think. IIRC we have > > speculated about making VAC FULL fix the indexes via REINDEX rather than > > indexbulkdelete. > > I guess my point is that if you forget to run regular vacuum for a > month, then realize the problem, you can just do a VACUUM FULL and the > heap is back to a perfect state as if you had been running regular > vacuum all along. That is not true of indexes. It would be nice if it > would. In this scenario, the VACUUM FULL-does-REINDEX idea would be the perfect fit because it will probably be much faster than doing indexbulkdelete. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Endurecerse, pero jamás perder la ternura" (E. Guevara)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Sat, Oct 04, 2003 at 11:41:17AM -0400, Tom Lane wrote: >> No. You'd be better off using REINDEX for that, I think. IIRC we have >> speculated about making VAC FULL fix the indexes via REINDEX rather than >> indexbulkdelete. > I can't agree with that idea. Why not? There is plenty of anecdotal evidence in the archives saying that it's faster to drop indexes, VACUUM FULL, recreate indexes than to VACUUM FULL with indexes in place. Most of those reports date from before we had the lazy-vacuum alternative, but I don't think that renders them less relevant. > Imagine having to VACUUM FULL a huge > table. Not only it will take the lot required to do the VACUUM in the > heap itself, it will also have to rebuild all indexes from scratch. A very large chunk of VACUUM FULL's runtime is spent fooling with the indexes. Have you looked at the code in any detail? It goes like this: 1. Scan heap looking for dead tuples and free space. 2. Make a pass over the indexes to delete index entries for dead tuples. 3. Copy remaining live tuples to lower-numbered pages to compact heap. 3a. Every time we copy a tuple, make new index entries pointing to its new location. (The old index entries still remain,though.) 4. Commit transaction so that new copies of moved tuples are good and old ones are not. 5. Make a pass over the indexes to delete index entries for old copies of moved tuples. When there are only a few tuples being moved, this isn't too bad of a strategy. But when there are lots, steps 2, 3a, and 5 represent a huge amount of work. What's worse, step 3a swells the index well beyond its final size. This used to mean permanent index bloat. Nowadays step 5 will be able to recover some of that space --- but not at zero cost. I think it's entirely plausible that dropping steps 2, 3a, and 5 in favor of an index rebuild at the end could be a winner. > I think there are scenarios where the REINDEX will be much worse, say when > there are not too many deleted tuples (but in that case, why is the user > doing VACUUM FULL in the first place?). Yeah, I think that's exactly the important point. These days there's not a lot of reason to do VACUUM FULL unless you have a major amount of restructuring to do. I would once have favored maintaining two code paths with two strategies, but now I doubt it's worth the trouble. (Or I should say, we have two code paths, the other being lazy VACUUM --- do we need three?) regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> No. You'd be better off using REINDEX for that, I think. > I guess my point is that if you forget to run regular vacuum for a > month, then realize the problem, you can just do a VACUUM FULL and the > heap is back to a perfect state as if you had been running regular > vacuum all along. That is not true of indexes. It would be nice if it > would. A VACUUM FULL that invoked REINDEX would accomplish that *better* than one that didn't, because of the problem of duplicate entries for moved tuples. See my response just now to Alvaro. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> No. You'd be better off using REINDEX for that, I think. > > > I guess my point is that if you forget to run regular vacuum for a > > month, then realize the problem, you can just do a VACUUM FULL and the > > heap is back to a perfect state as if you had been running regular > > vacuum all along. That is not true of indexes. It would be nice if it > > would. > > A VACUUM FULL that invoked REINDEX would accomplish that *better* than > one that didn't, because of the problem of duplicate entries for moved > tuples. See my response just now to Alvaro. Right, REINDEX is closer to what you expect VACUUM FULL to be doing --- it mimicks the heap result of full compaction. I think Alvero's point is that if you are doing VACUUM FULL on a large table with only a few expired tuples, the REINDEX could take a while, which would seem strange considering you only have a few expired tuples --- maybe we should reindex only if +10% of the heap rows are expired, or the index contains +10% empty space, or something like that. Of course, that is very abitrary, but only VACUUM knows how many rows it is moving --- the user typically will not know that. In an extreme case with always REINDEX, I can imagine a site that is doing only VACUUM FULL at night, but no regular vacuums, and they find they can't do VACUUM FULL at night anymore because it is taking too long. By doing REINDEX always, we eliminate some folks are are happy doing VACUUM FULL at night, because very few tuples are expired. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > By doing REINDEX always, we eliminate some folks are are happy > doing VACUUM FULL at night, because very few tuples are expired. But if they have very few tuples expired, why do they need VACUUM FULL? Seems to me that VACUUM FULL should be designed to cater to the case of significant updates. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > By doing REINDEX always, we eliminate some folks are are happy > > doing VACUUM FULL at night, because very few tuples are expired. > > But if they have very few tuples expired, why do they need VACUUM FULL? > Seems to me that VACUUM FULL should be designed to cater to the case > of significant updates. Right, they could just run vacuum, and my 10% idea was bad because the vacuum full would take an unpredictable amount of time to run depending on whether it does a reindex. One idea would be to allow VACUUM, VACUUM DATA (no reindex), and VACUUM FULL (reindex). However, as you said, we might not need VACUUM DATA --- I am just not sure. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, Oct 04, 2003 at 11:53:49PM -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > Imagine having to VACUUM FULL a huge > > table. Not only it will take the lot required to do the VACUUM in the > > heap itself, it will also have to rebuild all indexes from scratch. > > A very large chunk of VACUUM FULL's runtime is spent fooling with the > indexes. Have you looked at the code in any detail? It goes like this: Hmm. No, I haven't looked at that code too much. You are probably right, of course. Maybe the indexes could be dropped altogether and then recreated after the vacuum is over, similar to what the cluster code does. This would be similar to REINDEX, I suppose. (I haven't actually looked at the REINDEX code either.) > > I think there are scenarios where the REINDEX will be much worse, say when > > there are not too many deleted tuples (but in that case, why is the user > > doing VACUUM FULL in the first place?). > > Yeah, I think that's exactly the important point. These days there's > not a lot of reason to do VACUUM FULL unless you have a major amount of > restructuring to do. I would once have favored maintaining two code > paths with two strategies, but now I doubt it's worth the trouble. > (Or I should say, we have two code paths, the other being lazy VACUUM > --- do we need three?) There are two points that could be made here: 1. We do not want users having to think too hard about what kind of VACUUM they want. This probably botches Bruce's idea of an additional VACUUM DATA command. 2. We do not want to expose the VACUUM command family at all. The decisions about what code paths should be taken are best left to the backend-integrated vacuum daemon, which has probably much better information than users. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "You knock on that door or the sun will be shining on places inside you that the sun doesn't usually shine" (en Death: "The High Cost of Living")
Alvaro Herrera wrote: > > Yeah, I think that's exactly the important point. These days there's > > not a lot of reason to do VACUUM FULL unless you have a major amount of > > restructuring to do. I would once have favored maintaining two code > > paths with two strategies, but now I doubt it's worth the trouble. > > (Or I should say, we have two code paths, the other being lazy VACUUM > > --- do we need three?) > > There are two points that could be made here: > > 1. We do not want users having to think too hard about what kind of > VACUUM they want. This probably botches Bruce's idea of an additional > VACUUM DATA command. > > 2. We do not want to expose the VACUUM command family at all. The > decisions about what code paths should be taken are best left to the > backend-integrated vacuum daemon, which has probably much better > information than users. Agreed. We need to head in a direction where vacuum is automatic. I guess the question is whether an automatic method would ever user VACUUM DATA? I just did a simple test. I did:test=> CREATE TABLE test (x INT, y TEXT);CREATE TABLEtest=> INSERT INTO test VALUES (1,'lk;jasdflkjlkjawsiopfjqwerfokjasdflkj');INSERT 17147 1test=> INSERT INTO test SELECT * FROM test;{ repeat until 65k rowsare inserted, so there are 131k rows}test=> INSERT INTO test SELECT 2, y FROM test;INSERT 0 131072test=> DELETE FROMtest WHERE x=1;DELETE 131072test=> \timingTiming is on.test=> VACUUM FULL;VACUUMTime: 4661.82 mstest=> INSERT INTO testSELECT 3, y FROM test;INSERT 0 131072Time: 7925.57 mstest=> CREATE INDEX i ON test(x);CREATE INDEXTime: 3337.96 mstest=>DELETE FROM test WHERE x=2;DELETE 131072Time: 3204.18 mstest=> VACUUM FULL;VACUUMTime: 10523.69 mstest=> REINDEXTABLE test;REINDEXTime: 2193.14 ms Now, as I understand it, this is the worst-case for VACUUM FULL. What we have here is 4661.82 for VACUUM FULL without an index, and 10523.69 for VACUUM FULL with an index, and REINDEX takes 2193.14. If we assume VACUUM FULL with REINDEX will equal the time of VACUUM without the index plus the REINDEX time, we have 4661.82 + 2193.14, or 6854.96 vs. 10523.69, so clearly VACUUM REINDEX is a win for this case. What I don't know is what percentage of a table has to be expired for REINDEX to be a win. I assume if only one row is expired, you get 4661.82 + 2193.14 vs. just 4661.82, roughly. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Fri, Oct 03, 2003 at 09:17:16AM -0700, Joshua D. Drake wrote: > > > What if the feature does break compatibility with old features? > What if it is "truly" a new feature? There is _no_ mechanism in the community right now for testing all these new features in the so-called stable tree. I have lately been taking the position that Linux is only a second-best choice for production use, precisely because of the constant introduction of shiny new features in the supposed stable branch. Without using something like RHAS or Debian stable, I think one is asking for trouble. One needs to do a great deal of testing on any new kernel release -- even a dot release -- just to be reasonably confident that it won't eat filesystems, introduce some new incompatibility, &c. From my point of view, this is similar to the position one is in with Windows: you need to quintuple-check every security patch and hot fix, because it is as likely as not to break something very badly. One of the things I have always liked about PostgreSQL is that a stable release really is stable. Except for mighty serious, low risk items, nothing gets backported to the stable releases. If _you_ want to back port things, go nuts: the source is there. But the main release does not get changed that way. That's a good thing. I happen to oversee one of those installations you mentioned, where it costs us lots of money to upgrade. If people start adding features to the stable tree, it will cost me almost as much to keep up with the small, important, must-be-applied fixes as it would to upgrade. Because I know those features won'r receive the testing they really need, I'll have no choice but to hammer on them all myself. In the current situation, those happen infrequently enough that I can do it. But if one starts introducing all sorts of extra features, I'll have to test _all_ of it. Or start maintaining a completely separate tree into which I put only the few patches I want. Why should everyone pay that cost for the sake of those people who want to eat their cake and have it too? If you want the new features, you gotta pay the cost of the upgrade, or pay someone else to support the new features for you. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew Sullivan wrote: > On Fri, Oct 03, 2003 at 09:17:16AM -0700, Joshua D. Drake wrote: > > > > > What if the feature does break compatibility with old features? > > What if it is "truly" a new feature? > > There is _no_ mechanism in the community right now for testing all > these new features in the so-called stable tree. > > I have lately been taking the position that Linux is only a > second-best choice for production use, precisely because of the > constant introduction of shiny new features in the supposed stable > branch. Without using something like RHAS or Debian stable, I think > one is asking for trouble. One needs to do a great deal of testing Agreed. Great Bridge was going to test our releases and only distribute the good ones --- obviously they were thinking of Linux kernels and not PostgreSQL. You almost need a commercial company to do testing with Linux kernels. PostgreSQL doesn't require this, and I think Linux is popular _in_ _spite_ of their buggy backported kernels (odd numbers?), not because of it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: >Andrew Sullivan wrote: > > >>On Fri, Oct 03, 2003 at 09:17:16AM -0700, Joshua D. Drake wrote: >> >> >>>What if the feature does break compatibility with old features? >>>What if it is "truly" a new feature? >>> >>> >>There is _no_ mechanism in the community right now for testing all >>these new features in the so-called stable tree. >> >>I have lately been taking the position that Linux is only a >>second-best choice for production use, precisely because of the >>constant introduction of shiny new features in the supposed stable >>branch. Without using something like RHAS or Debian stable, I think >>one is asking for trouble. One needs to do a great deal of testing >> >> > >Agreed. Great Bridge was going to test our releases and only distribute >the good ones --- obviously they were thinking of Linux kernels and not >PostgreSQL. You almost need a commercial company to do testing with >Linux kernels. PostgreSQL doesn't require this, and I think Linux is >popular _in_ _spite_ of their buggy backported kernels (odd numbers?), >not because of it. > > > The reason there is a lot of backporting in Linux kernels is that there is such a lot of time (2 years or more) between major kernel releases. This is not surprising given the kernel's complexity, but it is not the case here, with releases every 6 months or so. In general I agree that only true bug fixes should go in later versions of official releases after they are out - if anyone wants to backpatch features they can, but then they wear the risk. Do it on GBorg if you like, but not in the main tree. cheers andrew
Andrew Dunstan wrote: > >Agreed. Great Bridge was going to test our releases and only distribute > >the good ones --- obviously they were thinking of Linux kernels and not > >PostgreSQL. You almost need a commercial company to do testing with > >Linux kernels. PostgreSQL doesn't require this, and I think Linux is > >popular _in_ _spite_ of their buggy backported kernels (odd numbers?), > >not because of it. > > > > > > > The reason there is a lot of backporting in Linux kernels is that there > is such a lot of time (2 years or more) between major kernel releases. > This is not surprising given the kernel's complexity, but it is not the > case here, with releases every 6 months or so. But the kernel goes through this reliable/unreliable cycle --- they would be better off just making the old kernel more and more reliable and focusing on the new kernel for features. The reliable/unreliable cycle will kill your user base. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Joshua D. Drake wrote: > But the kernel goes through this reliable/unreliable cycle --- they > > >would be better off just making the old kernel more and more reliable > >and focusing on the new kernel for features. > > > >The reliable/unreliable cycle will kill your user base. > > > > > The popularity of Linux would argue that statement a great deal. Fine, let it argue. I said _in_ _spite_ of the backpatching, not because of it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Hello, O.k. so everyone is basically in agreement of "no new features" to be backported. How do we implement a stable release maintainer for back releases? I assume we set a scope of of what would go in security/bug fixes only? Sincerely, Joshua Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC - S/JDBC Postgresql support, programming, shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL.Org - Editor-N-Chief - http://www.postgresql.org
But the kernel goes through this reliable/unreliable cycle --- they >would be better off just making the old kernel more and more reliable >and focusing on the new kernel for features. > >The reliable/unreliable cycle will kill your user base. > > The popularity of Linux would argue that statement a great deal. Sincerely, Joshua Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC - S/JDBC Postgresql support, programming, shared hosting and dedicated hosting. +1-503-222-2783 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL.Org - Editor-N-Chief - http://www.postgresql.org
Joshua D. Drake wrote: >>eh.. i could see some things, like tsearch2 or pg_autovacuum, which >>afaik are almost if not completely compatible with 7.3, which will not >>get back ported. Also fixes in some of the extra tools like psql could >>be very doable, I know I had a custom psql for 7.2 that back patched the >>\timing option and some of the pager fixes. now, weather that could be >>done with stuff closer to core, i don't know... > > > Sure but businesses don't like to upgrade unless they have too. If we > really want to attract more business to using PostgreSQL then they need > to feel like they don't have to upgrade every 12 months. Upgrading is > expensive and it rarely goes as smoothly as a dump/restore. I have made the following experience: If a new application is deployed and if it stays unchanged 99% of all bugs in the database or the software itself will be found within a comparatively short amount of time. If a business partner decides to continue to work on his application (which means changing it) he will accept new PostgreSQL releases. Up to now upgrading PostgreSQL has never been a problem because have expected major releases to be stable. In addition to that dump/restore worked nicely. I remember having slightly more work when we switched to 7.3 because somehow type casts are handled differently (less implicit casts - I think that was the problem) but for that purpose intelligent customers have testing environments so that nothing evil can happen on the production system. I don't think back porting features is a good idea. As Marc said: PostgreSQL is the kernel and not an ordinary package. Personally I think that a database product should always be a rock solid product. Unless applications such as, let's say, xclock, database are truly critical and customers won't forget about releases eating data. However, in my opinion they can understand that maintenance is necessary. > When you deal with the systems I do, the cost to a customer to migrate > to 7.4 would be in the minimum of 10,000-20,000 dollars. > They start to ask why were upgrading with those numbers. What did you do to cause these costs????? We have several huge and critical customers as well but none of them would cause costs like that. If everything works nicely: Why would you change the release anyway? Why would you back-port new features if you don't accept downtimes? If something has been working for months there are not that many bugs you can expect. In case of disaster there are still options to fix bugs. That's what commercial guys are here for. Fortunately we haven't ever seen a situation in which something really severe has been broken. Buffer overflows: Usually this kind of bugs can be fixed within just a few lines. I have been working with PostgreSQL for 4 years now. All together I have encountered 3-4 bugs which caused me some headache and which I haven't known. I guess 1 per year is more than acceptable. Regards, Hans -- Cybertec Geschwinde u Schoenig Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria Tel: +43/2952/30706 or +43/660/816 40 77 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
> I have lately been taking the position that Linux is only a > second-best choice for production use, precisely because of the > constant introduction of shiny new features in the supposed stable > branch. That's what all us FreeBSD users learnt a long time ago :P Chris
Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > I think what Tom is concerned about is that this hasn't been tested > > enough with big datasets. Also there a little loss of index pages but > > it's much less (orders of magnitude, I think) than what was before. > > This is because the index won't shrink "vertically". > > The fact that we won't remove levels shouldn't be meaningful at all --- > I mean, if the index was once big enough to require a dozen btree > levels, and you delete everything, are you going to be upset that it > drops to 13 pages rather than 2? I doubt it. > > The reason I'm waffling about whether the problem is completely fixed or > not is that the existing code will only remove-and-recycle completely > empty btree pages. As long as you have one key left on a page it will > stay there. So you could end up with ridiculously low percentage-filled > situations. This could be fixed by collapsing together adjacent > more-than-half-empty pages, but we ran into a lot of problems trying to > do that in a concurrent fashion. So I'm waiting to find out if real > usage patterns have a significant issue with this or not. If we have an exclusive lock during VACUUM FULL, should we just collapse the pages rather than REINDEX? I realize we might have lots of expired index tuples because VACUUM FULL creates new ones as part of reorganizing the heap. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian wrote: > > The reason I'm waffling about whether the problem is completely fixed or > > not is that the existing code will only remove-and-recycle completely > > empty btree pages. As long as you have one key left on a page it will > > stay there. So you could end up with ridiculously low percentage-filled > > situations. This could be fixed by collapsing together adjacent > > more-than-half-empty pages, but we ran into a lot of problems trying to > > do that in a concurrent fashion. So I'm waiting to find out if real > > usage patterns have a significant issue with this or not. > > If we have an exclusive lock during VACUUM FULL, should we just collapse > the pages rather than REINDEX? I realize we might have lots of expired > index tuples because VACUUM FULL creates new ones as part of > reorganizing the heap. Never mind --- I remember now that we are going to use VACUUM for a few updates, and VACUUM FULL for big updates. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Added to TODO: * Have VACUUM FULL use REINDEX rather than index vacuum --------------------------------------------------------------------------- Alvaro Herrera wrote: > On Sat, Oct 04, 2003 at 11:53:49PM -0400, Tom Lane wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > > Imagine having to VACUUM FULL a huge > > > table. Not only it will take the lot required to do the VACUUM in the > > > heap itself, it will also have to rebuild all indexes from scratch. > > > > A very large chunk of VACUUM FULL's runtime is spent fooling with the > > indexes. Have you looked at the code in any detail? It goes like this: > > Hmm. No, I haven't looked at that code too much. You are probably > right, of course. Maybe the indexes could be dropped altogether and > then recreated after the vacuum is over, similar to what the cluster > code does. This would be similar to REINDEX, I suppose. (I haven't > actually looked at the REINDEX code either.) > > > > > I think there are scenarios where the REINDEX will be much worse, say when > > > there are not too many deleted tuples (but in that case, why is the user > > > doing VACUUM FULL in the first place?). > > > > Yeah, I think that's exactly the important point. These days there's > > not a lot of reason to do VACUUM FULL unless you have a major amount of > > restructuring to do. I would once have favored maintaining two code > > paths with two strategies, but now I doubt it's worth the trouble. > > (Or I should say, we have two code paths, the other being lazy VACUUM > > --- do we need three?) > > There are two points that could be made here: > > 1. We do not want users having to think too hard about what kind of > VACUUM they want. This probably botches Bruce's idea of an additional > VACUUM DATA command. > > 2. We do not want to expose the VACUUM command family at all. The > decisions about what code paths should be taken are best left to the > backend-integrated vacuum daemon, which has probably much better > information than users. > > -- > Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) > "You knock on that door or the sun will be shining on places inside you > that the sun doesn't usually shine" (en Death: "The High Cost of Living") > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073