Thread: VACUUM, 24/7 availability and 7.2
I'm doing some work for a smallish company which conducts its business largely online. Currently they have a legacy mishmash of Oracle and MySQL databases which they wish to unify one one platform (RDBMS with client access via browser and custom serverside applications for employees and customers). PostgreSQL would be my primary candidate. However the company's operating requirments mean that the data needed for interaction with customers / website users must be available on a 24/7 basis. This is primarily a) data related to product ordering and tables for storing order data; and b) website user authentication and personalisation data (logins, user preferences etc). It is therefore not an option to have these databases offline at regular intervals for any significant length of time for VACUUMing. Replicating data to say MySQL databases is technically feasible, at least in the case of b) above, but not desirable. Are there any existing "native" PostgreSQL solutions to this problem? More importantly, what is the situation on VACUUM for release 7.2? It seems from the pgsql-hackers list that there are plans for a none-exclusively locking VACUUM, e.g.: http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us (sorry about the long URL); how far advanced are they, and is there any kind of release schedule for 7.2? Any answers (or pointers thereto, haven't found any myself :-() much appreciated Ian Barwick -- Remove SUNGLASSES to reply ;-)
Ian Barwick <SUNGLASSESbarwick@gmx.net> writes: > PostgreSQL would be my primary candidate. However the company's > operating requirments mean that the data needed for interaction > with customers / website users must be available on a 24/7 basis. > This is primarily a) data related to product ordering and > tables for storing order data; and b) website user authentication > and personalisation data (logins, user preferences etc). > > It is therefore not an option to have these databases offline > at regular intervals for any significant length of time for > VACUUMing. Replicating data to say MySQL databases is > technically feasible, at least in the case of b) above, but > not desirable. Are there any existing "native" PostgreSQL solutions > to this problem? VACUUM doesn't require taking the DB offline. It does lock tables while it's vacuuming them, which may or may not cause problems depending on how long the vacuum takes. You seem to understand this; I'm just correcting your terminology. ;) You can reduce the time taken by VACUUM by running it more often, which may make it a non-issue for you. Depends entirely on table size and user activity. > More importantly, what is the situation on VACUUM for release 7.2? > It seems from the pgsql-hackers list that there are plans for > a none-exclusively locking VACUUM, e.g.: As I understand it: * Non-intrusive VACUUM will be in 7.2 * 7.2 is about to enter beta (this week hopefully) I would say it's likely that beta will last two or three months based on my experience with the 7.1 beta period. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
Just to keep things in perspective, how large are your current databases, and what do you or the company consider to be a signficant length of time? Right now I have a development database with just a few thousand records of test data, and vacuum takes just a very few seconds a day. I think I recall hearing on this list of it taking a minute or three for databases several gigabytes in size. For some sites this would be tolerable, for others it wouldn't. I'm also interested to hear what the future holds for vacuum. If nothing else, it couldn't hurt postgresql's public relations. :-) --Wes Sheldahl Ian Barwick <SUNGLASSESbarwick%gmx.net@interlock.lexmark.com> on 10/10/2001 07:27:56 AM To: pgsql-general%postgresql.org@interlock.lexmark.com cc: (bcc: Wesley Sheldahl/Lex/Lexmark) Subject: [GENERAL] VACUUM, 24/7 availability and 7.2 I'm doing some work for a smallish company which conducts its business largely online. Currently they have a legacy mishmash of Oracle and MySQL databases which they wish to unify one one platform (RDBMS with client access via browser and custom serverside applications for employees and customers). PostgreSQL would be my primary candidate. However the company's operating requirments mean that the data needed for interaction with customers / website users must be available on a 24/7 basis. This is primarily a) data related to product ordering and tables for storing order data; and b) website user authentication and personalisation data (logins, user preferences etc). It is therefore not an option to have these databases offline at regular intervals for any significant length of time for VACUUMing. Replicating data to say MySQL databases is technically feasible, at least in the case of b) above, but not desirable. Are there any existing "native" PostgreSQL solutions to this problem? More importantly, what is the situation on VACUUM for release 7.2? It seems from the pgsql-hackers list that there are plans for a none-exclusively locking VACUUM, e.g.: http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us (sorry about the long URL); how far advanced are they, and is there any kind of release schedule for 7.2? Any answers (or pointers thereto, haven't found any myself :-() much appreciated Ian Barwick -- Remove SUNGLASSES to reply ;-) ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
On Wed, 10 Oct 2001 wsheldah@lexmark.com wrote: > Just to keep things in perspective, how large are your current databases, and > what do you or the company consider to be a signficant length of time? Right > now I have a development database with just a few thousand records of test data, > and vacuum takes just a very few seconds a day. I think I recall hearing on > this list of it taking a minute or three for databases several gigabytes in > size. For some sites this would be tolerable, for others it wouldn't. The runtime seems to be dependent on what hind of activity you do. If you frequently rollback inserts, vacuum seems to take longer. Say, a database with 1m records which endures 5m inserts + rollback is going to take a while to vacuum. Just my informal observations, not based on structured testing. -jwb
> The runtime seems to be dependent on what hind of activity you do. If you > frequently rollback inserts, vacuum seems to take longer. Say, a database > with 1m records which endures 5m inserts + rollback is going to take a > while to vacuum. As another data point, daily vacuums on our database take 25 to 30 minutes. This is on a dual PIII 933Mhz with 2GB RAM and the database is on a 4 Drive RAID 10 array (stripe + mirror) of 15K RPM SCSI drives. Our database is 9GB. I don't really have a good estimate as to the number of insertions and updates that occur daily. I have some mass updates that I hold until the weekend. PostgreSQL is 7.03. I see vacuum downtime as the number 1 drawback to PostgreSQL.
On Wed, Oct 10, 2001 at 04:22:07PM -0400, wsheldah@lexmark.com wrote: > > > Just to keep things in perspective, how large are your current databases, and > what do you or the company consider to be a signficant length of time? Right > now I have a development database with just a few thousand records of test data, > and vacuum takes just a very few seconds a day. I think I recall hearing on > this list of it taking a minute or three for databases several gigabytes in > size. For some sites this would be tolerable, for others it wouldn't. We are having some trouble with some tables in which we have lots of update's (and insert/delete's). "A lot" being several thousands per day (I haven't measured the exact numbers recently). VACUUM is running twice a day and locks these tables a long time where 10-15 minutes is not exceptional. This table has only approx 100k records, but these are updated very often and that seems to cause to much rubbish for vacuum > > I'm also interested to hear what the future holds for vacuum. If nothing else, > it couldn't hurt postgresql's public relations. :-) indeed. we are currently considering moving the guilty tables over til MySQL as we're not using any advanced features in it, while keeping all other data in PostgreSQL. Anything that keeps us from doing that would be good PR :) /erwin > > --Wes Sheldahl > > > > > > Ian Barwick <SUNGLASSESbarwick%gmx.net@interlock.lexmark.com> on 10/10/2001 > 07:27:56 AM > > To: pgsql-general%postgresql.org@interlock.lexmark.com > cc: (bcc: Wesley Sheldahl/Lex/Lexmark) > Subject: [GENERAL] VACUUM, 24/7 availability and 7.2 > > > I'm doing some work for a smallish company which conducts > its business largely online. Currently they have a legacy > mishmash of Oracle and MySQL databases which they wish > to unify one one platform (RDBMS with client access via > browser and custom serverside applications for employees > and customers). > > PostgreSQL would be my primary candidate. However the company's > operating requirments mean that the data needed for interaction > with customers / website users must be available on a 24/7 basis. > This is primarily a) data related to product ordering and > tables for storing order data; and b) website user authentication > and personalisation data (logins, user preferences etc). > > It is therefore not an option to have these databases offline > at regular intervals for any significant length of time for > VACUUMing. Replicating data to say MySQL databases is > technically feasible, at least in the case of b) above, but > not desirable. Are there any existing "native" PostgreSQL solutions > to this problem? > > More importantly, what is the situation on VACUUM for release 7.2? > It seems from the pgsql-hackers list that there are plans for > a none-exclusively locking VACUUM, e.g.: > > http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us > > > (sorry about the long URL); how far advanced are they, and is > there any kind of release schedule for 7.2? > > Any answers (or pointers thereto, haven't found any myself :-() > much appreciated > > > Ian Barwick > > -- > > Remove SUNGLASSES to reply ;-) > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Erwin Lansing -- http://droso.org I love deadlines. -- Douglas Adams I love the whooshing sound they make as the fly by.
On Wed, Oct 10, 2001 at 04:22:07PM -0400, wsheldah@lexmark.com wrote: > > > Just to keep things in perspective, how large are your current databases, and > what do you or the company consider to be a signficant length of time? Right > now I have a development database with just a few thousand records of test data, > and vacuum takes just a very few seconds a day. I think I recall hearing on > this list of it taking a minute or three for databases several gigabytes in > size. For some sites this would be tolerable, for others it wouldn't. We are having some trouble with some tables in which we have lots of update's (and insert/delete's). "A lot" being several thousands per day (I haven't measured the exact numbers recently). VACUUM is running twice a day and locks these tables a long time where 10-15 minutes is not exceptional. This table has only approx 100k records, but these are updated very often and that seems to cause to much rubbish for vacuum > > I'm also interested to hear what the future holds for vacuum. If nothing else, > it couldn't hurt postgresql's public relations. :-) indeed. we are currently considering moving the guilty tables over til MySQL as we're not using any advanced features in it, while keeping all other data in PostgreSQL. Anything that keeps us from doing that would be good PR :) /erwin > > --Wes Sheldahl > > > > > > Ian Barwick <SUNGLASSESbarwick%gmx.net@interlock.lexmark.com> on 10/10/2001 > 07:27:56 AM > > To: pgsql-general%postgresql.org@interlock.lexmark.com > cc: (bcc: Wesley Sheldahl/Lex/Lexmark) > Subject: [GENERAL] VACUUM, 24/7 availability and 7.2 > > > I'm doing some work for a smallish company which conducts > its business largely online. Currently they have a legacy > mishmash of Oracle and MySQL databases which they wish > to unify one one platform (RDBMS with client access via > browser and custom serverside applications for employees > and customers). > > PostgreSQL would be my primary candidate. However the company's > operating requirments mean that the data needed for interaction > with customers / website users must be available on a 24/7 basis. > This is primarily a) data related to product ordering and > tables for storing order data; and b) website user authentication > and personalisation data (logins, user preferences etc). > > It is therefore not an option to have these databases offline > at regular intervals for any significant length of time for > VACUUMing. Replicating data to say MySQL databases is > technically feasible, at least in the case of b) above, but > not desirable. Are there any existing "native" PostgreSQL solutions > to this problem? > > More importantly, what is the situation on VACUUM for release 7.2? > It seems from the pgsql-hackers list that there are plans for > a none-exclusively locking VACUUM, e.g.: > > http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us > > > (sorry about the long URL); how far advanced are they, and is > there any kind of release schedule for 7.2? > > Any answers (or pointers thereto, haven't found any myself :-() > much appreciated > > > Ian Barwick > > -- > > Remove SUNGLASSES to reply ;-) > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Erwin Lansing -- http://droso.org I love deadlines. -- Douglas Adams I love the whooshing sound they make as the fly by.
>> More importantly, what is the situation on VACUUM for release 7.2? >> It seems from the pgsql-hackers list that there are plans for >> a none-exclusively locking VACUUM, e.g.: >> >> http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us >> >> (sorry about the long URL); how far advanced are they, It's long since done. >> and is there any kind of release schedule for 7.2? Beta release is ... um ... any day now. I think we're just waiting on Lockhart to say he's done tweaking the datetime datatypes. regards, tom lane
Erwin Lansing <erwin@lansing.dk> writes: > We are having some trouble with some tables in which we have lots of > update's (and insert/delete's). "A lot" being several thousands per day > (I haven't measured the exact numbers recently). VACUUM is running twice > a day and locks these tables a long time where 10-15 minutes is not > exceptional. This table has only approx 100k records, but these are > updated very often and that seems to cause to much rubbish for vacuum You might try running VACUUM much more often, as it'll run faster with less work to do. If running it every hour takes the locked time down to 30 seconds, you might find that acceptable (depending on the application of course). Worth a try until 7.2 is solid... -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863
> >> More importantly, what is the situation on VACUUM for release 7.2? > >> It seems from the pgsql-hackers list that there are plans for > >> a none-exclusively locking VACUUM, e.g.: > >> > >> > http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us > >> > >> (sorry about the long URL); how far advanced are they, > >It's long since done. ==>> This means that it will not be included in 7.2? I've read 7.2 documentation on line and i've seen that the VACUUM command is changed: now, when run in normal mode (giving to the backend the VACUUM command without any parameter), the tables don't need to be locked and also that the command does not minimize the space of the database (as instead the actual 7.1.3 VACUUM does). From the documentation: ----- Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table. VACUUM FULL does more extensive processing, including moving of tuples across blocks to try to compact the table to the minimum number of disk blocks. This form is much slower and requires an exclusive lock on each table while it is being processed. ----- This way to do is similar (but not equal) to the LAZY VACUUM specified by Tom Lane in the above link... In conclusion, the new VACUUM command as described above will be include in the 7.2 version of Postgresql? Denis Gasparin: denis@edistar.com --------------------------- Programmer & System Administrator - Edistar srl
> >> More importantly, what is the situation on VACUUM for release 7.2? > >> It seems from the pgsql-hackers list that there are plans for > >> a none-exclusively locking VACUUM, e.g.: > >> > >> > http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us > >> > >> (sorry about the long URL); how far advanced are they, > >It's long since done. ==>> This means that it will not be included in 7.2? I've read 7.2 documentation on line and i've seen that the VACUUM command is changed: now, when run in normal mode (giving to the backend the VACUUM command without any parameter), the tables don't need to be locked and also that the command does not minimize the space of the database (as instead the actual 7.1.3 VACUUM does). From the documentation: ----- Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table. VACUUM FULL does more extensive processing, including moving of tuples across blocks to try to compact the table to the minimum number of disk blocks. This form is much slower and requires an exclusive lock on each table while it is being processed. ----- This way to do is similar (but not equal) to the LAZY VACUUM specified by Tom Lane in the above link... In conclusion, the new VACUUM command as described above will be include in the 7.2 version of Postgresql? Denis Gasparin: denis@edistar.com --------------------------- Programmer & System Administrator - Edistar srl
On Wed, 10 Oct 2001, Ian Barwick wrote: > I'm doing some work for a smallish company which conducts > its business largely online. Currently they have a legacy > mishmash of Oracle and MySQL databases which they wish > to unify one one platform (RDBMS with client access via > browser and custom serverside applications for employees > and customers). > > PostgreSQL would be my primary candidate. However the company's > operating requirments mean that the data needed for interaction > with customers / website users must be available on a 24/7 basis. > This is primarily a) data related to product ordering and > tables for storing order data; and b) website user authentication > and personalisation data (logins, user preferences etc). > > It is therefore not an option to have these databases offline > at regular intervals for any significant length of time for > VACUUMing. Replicating data to say MySQL databases is > technically feasible, at least in the case of b) above, but > not desirable. Are there any existing "native" PostgreSQL solutions > to this problem? > > More importantly, what is the situation on VACUUM for release 7.2? > It seems from the pgsql-hackers list that there are plans for > a none-exclusively locking VACUUM, e.g.: > > http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us > > (sorry about the long URL); how far advanced are they, and is > there any kind of release schedule for 7.2? > > Any answers (or pointers thereto, haven't found any myself :-() > much appreciated There is a faster, non-exclusive-locking VACUUM in the CVS now; this should become part of 7.2. You can download the nightly snapsot and build it to test it with your application. HTH.
where in the postgres dox can we find info abt the vacuum feature? thx...
On Mon, Oct 15, 2001 at 05:06:05PM +0200, Denis Gasparin wrote: > Ok, little language qui pro quo... > > I'm sorry for the error... No need to be sorry! Just thought I'd clarify. -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.com> M6K 3E3 +1 416 646 3304 x110
Denis Gasparin <denis@edinet.it> writes: > This way to do is similar (but not equal) to the LAZY VACUUM specified by > Tom Lane in the above link... > In conclusion, the new VACUUM command as described above will be include in > the 7.2 version of Postgresql? The on-line docs are accurate (at least the VACUUM reference page is; I don't think I've updated all of the Admin Guide discussions yet). The link quoted earlier was just an early proposal about what it should look like, not a final spec. regards, tom lane
On Fri, 12 Oct 2001, Joel Burton wrote: > You can download the nightly snapsot and build Sorry, I havn´t found the nightly snapshot. Any URL for people like me who have trouble to use CVS due to firewall constraints? Kind regards Andreas.
Try: http://www.ca.postgresql.org/ftpsite/dev/ > -----Original Message----- > From: Tille, Andreas [SMTP:TilleA@rki.de] > Sent: Tuesday, October 16, 2001 10:45 AM > Cc: PostgreSQL General > Subject: Re: [GENERAL] VACUUM, 24/7 availability and 7.2 > > On Fri, 12 Oct 2001, Joel Burton wrote: > > > You can download the nightly snapsot and build > Sorry, I havn´t found the nightly snapshot. Any URL for people like > me who have trouble to use CVS due to firewall constraints? > > Kind regards > > Andreas. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html
Ok, little language qui pro quo... I'm sorry for the error... Denis At 15/10/01 17.00, Andrew Sullivan wrote: >On Mon, Oct 15, 2001 at 10:40:17AM +0200, Denis Gasparin wrote: > > > > > >It's long since done. > > > > ==>> This means that it will not be included in 7.2? I've read 7.2 > >No, it means it _will_ be included. > >A > >-- >---- >Andrew Sullivan 87 Mowat Avenue >Liberty RMS Toronto, Ontario Canada ><andrew@libertyrms.com> M6K 3E3 > +1 416 646 3304 x110
> and vacuum takes just a very few seconds a day. I think I recall hearing on > this list of it taking a minute or three for databases several gigabytes in > size. For some sites this would be tolerable, for others it wouldn't. It depends more on user activity and number/size of indexes than over all database size from what I've seen.. In one database I have 20ish tables, some have 70,000ish records but the whole database isn't very big -- VACUUM takes a while because of the amount of UPDATE'd and DELETE'd records I have every day, and the number of indexes (lots!).... > I'm also interested to hear what the future holds for vacuum. If nothing else, > it couldn't hurt postgresql's public relations. :-) > > --Wes Sheldahl