Thread: VACUUM, 24/7 availability and 7.2

VACUUM, 24/7 availability and 7.2

From
Ian Barwick
Date:
I'm doing some work for a smallish company which conducts
its business largely online. Currently they have a legacy
mishmash of Oracle and MySQL databases which they wish
to unify one one platform (RDBMS with client access via
browser and custom serverside applications for employees
and customers).

PostgreSQL would be my primary candidate. However the company's
operating requirments mean that the data needed for interaction
with customers / website users must be available on a 24/7 basis.
This is primarily a) data related to product ordering and
tables for storing order data; and b) website user authentication
and personalisation data (logins, user preferences etc).

It is therefore not an option to have these databases offline
at regular intervals for any significant length of time for
VACUUMing. Replicating data to say MySQL databases is
technically feasible, at least in the case of b) above, but
not desirable. Are there any existing "native" PostgreSQL solutions
to this problem?

More importantly, what is the situation on VACUUM for release 7.2?
It seems from the pgsql-hackers list that there are plans for
a none-exclusively locking VACUUM, e.g.:


http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us

(sorry about the long URL); how far advanced are they, and is
there any kind of release schedule for 7.2?

Any answers (or pointers thereto, haven't found any myself :-()
much appreciated


Ian Barwick

--

Remove SUNGLASSES to reply ;-)

Re: VACUUM, 24/7 availability and 7.2

From
Doug McNaught
Date:
Ian Barwick <SUNGLASSESbarwick@gmx.net> writes:

> PostgreSQL would be my primary candidate. However the company's
> operating requirments mean that the data needed for interaction
> with customers / website users must be available on a 24/7 basis.
> This is primarily a) data related to product ordering and
> tables for storing order data; and b) website user authentication
> and personalisation data (logins, user preferences etc).
>
> It is therefore not an option to have these databases offline
> at regular intervals for any significant length of time for
> VACUUMing. Replicating data to say MySQL databases is
> technically feasible, at least in the case of b) above, but
> not desirable. Are there any existing "native" PostgreSQL solutions
> to this problem?

VACUUM doesn't require taking the DB offline.  It does lock tables
while it's vacuuming them, which may or may not cause problems
depending on how long the vacuum takes.  You seem to understand this;
I'm just correcting your terminology.  ;)

You can reduce the time taken by VACUUM by running it more often,
which may make it a non-issue for you.  Depends entirely on table size
and user activity.

> More importantly, what is the situation on VACUUM for release 7.2?
> It seems from the pgsql-hackers list that there are plans for
> a none-exclusively locking VACUUM, e.g.:

As I understand it:

* Non-intrusive VACUUM will be in 7.2
* 7.2 is about to enter beta (this week hopefully)

I would say it's likely that beta will last two or three months based
on my experience with the 7.1 beta period.

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: VACUUM, 24/7 availability and 7.2

From
wsheldah@lexmark.com
Date:

Just to keep things in perspective, how large are your current databases, and
what do you or the company consider to be a signficant length of time?  Right
now I have a development database with just a few thousand records of test data,
and vacuum takes just a very few seconds a day.  I think I recall hearing on
this list of it taking a minute or three for databases several gigabytes in
size.  For some sites this would be tolerable, for others it wouldn't.

I'm also interested to hear what the future holds for vacuum.  If nothing else,
it couldn't hurt postgresql's public relations.  :-)

--Wes Sheldahl





Ian Barwick <SUNGLASSESbarwick%gmx.net@interlock.lexmark.com> on 10/10/2001
07:27:56 AM

To:   pgsql-general%postgresql.org@interlock.lexmark.com
cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
Subject:  [GENERAL] VACUUM, 24/7 availability and 7.2


I'm doing some work for a smallish company which conducts
its business largely online. Currently they have a legacy
mishmash of Oracle and MySQL databases which they wish
to unify one one platform (RDBMS with client access via
browser and custom serverside applications for employees
and customers).

PostgreSQL would be my primary candidate. However the company's
operating requirments mean that the data needed for interaction
with customers / website users must be available on a 24/7 basis.
This is primarily a) data related to product ordering and
tables for storing order data; and b) website user authentication
and personalisation data (logins, user preferences etc).

It is therefore not an option to have these databases offline
at regular intervals for any significant length of time for
VACUUMing. Replicating data to say MySQL databases is
technically feasible, at least in the case of b) above, but
not desirable. Are there any existing "native" PostgreSQL solutions
to this problem?

More importantly, what is the situation on VACUUM for release 7.2?
It seems from the pgsql-hackers list that there are plans for
a none-exclusively locking VACUUM, e.g.:


http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us


(sorry about the long URL); how far advanced are they, and is
there any kind of release schedule for 7.2?

Any answers (or pointers thereto, haven't found any myself :-()
much appreciated


Ian Barwick

--

Remove SUNGLASSES to reply ;-)

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org





Re: VACUUM, 24/7 availability and 7.2

From
"Jeffrey W. Baker"
Date:

On Wed, 10 Oct 2001 wsheldah@lexmark.com wrote:

> Just to keep things in perspective, how large are your current databases, and
> what do you or the company consider to be a signficant length of time?  Right
> now I have a development database with just a few thousand records of test data,
> and vacuum takes just a very few seconds a day.  I think I recall hearing on
> this list of it taking a minute or three for databases several gigabytes in
> size.  For some sites this would be tolerable, for others it wouldn't.

The runtime seems to be dependent on what hind of activity you do.  If you
frequently rollback inserts, vacuum seems to take longer.  Say, a database
with 1m records which endures 5m inserts + rollback is going to take a
while to vacuum.

Just my informal observations, not based on structured testing.

-jwb


Re: VACUUM, 24/7 availability and 7.2

From
"Bryan White"
Date:
> The runtime seems to be dependent on what hind of activity you do.  If you
> frequently rollback inserts, vacuum seems to take longer.  Say, a database
> with 1m records which endures 5m inserts + rollback is going to take a
> while to vacuum.

As another data point, daily vacuums on our database take 25 to 30 minutes.
This is on a dual PIII 933Mhz with 2GB RAM and the database is on a 4 Drive
RAID 10 array (stripe + mirror) of 15K RPM SCSI drives.  Our database is
9GB.  I don't really have a good estimate as to the number of insertions and
updates that occur daily.  I have some mass updates that I hold until the
weekend. PostgreSQL is 7.03.  I see vacuum downtime as the number 1 drawback
to PostgreSQL.


Re: VACUUM, 24/7 availability and 7.2

From
Erwin Lansing
Date:
On Wed, Oct 10, 2001 at 04:22:07PM -0400, wsheldah@lexmark.com wrote:
>
>
> Just to keep things in perspective, how large are your current databases, and
> what do you or the company consider to be a signficant length of time?  Right
> now I have a development database with just a few thousand records of test data,
> and vacuum takes just a very few seconds a day.  I think I recall hearing on
> this list of it taking a minute or three for databases several gigabytes in
> size.  For some sites this would be tolerable, for others it wouldn't.

We are having some trouble with some tables in which we have lots of
update's (and insert/delete's). "A lot" being several thousands per day
(I haven't measured the exact numbers recently). VACUUM is running twice
a day and locks these tables a long time where 10-15 minutes is not
exceptional. This table has only approx 100k records, but these are
updated very often and that seems to cause to much rubbish for vacuum
>
> I'm also interested to hear what the future holds for vacuum.  If nothing else,
> it couldn't hurt postgresql's public relations.  :-)

indeed. we are currently considering moving the guilty tables over til
MySQL as we're not using any advanced features in it, while keeping all
other data in PostgreSQL. Anything that keeps us from doing that would
be good PR :)

/erwin

>
> --Wes Sheldahl
>
>
>
>
>
> Ian Barwick <SUNGLASSESbarwick%gmx.net@interlock.lexmark.com> on 10/10/2001
> 07:27:56 AM
>
> To:   pgsql-general%postgresql.org@interlock.lexmark.com
> cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
> Subject:  [GENERAL] VACUUM, 24/7 availability and 7.2
>
>
> I'm doing some work for a smallish company which conducts
> its business largely online. Currently they have a legacy
> mishmash of Oracle and MySQL databases which they wish
> to unify one one platform (RDBMS with client access via
> browser and custom serverside applications for employees
> and customers).
>
> PostgreSQL would be my primary candidate. However the company's
> operating requirments mean that the data needed for interaction
> with customers / website users must be available on a 24/7 basis.
> This is primarily a) data related to product ordering and
> tables for storing order data; and b) website user authentication
> and personalisation data (logins, user preferences etc).
>
> It is therefore not an option to have these databases offline
> at regular intervals for any significant length of time for
> VACUUMing. Replicating data to say MySQL databases is
> technically feasible, at least in the case of b) above, but
> not desirable. Are there any existing "native" PostgreSQL solutions
> to this problem?
>
> More importantly, what is the situation on VACUUM for release 7.2?
> It seems from the pgsql-hackers list that there are plans for
> a none-exclusively locking VACUUM, e.g.:
>
>
http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us
>
>
> (sorry about the long URL); how far advanced are they, and is
> there any kind of release schedule for 7.2?
>
> Any answers (or pointers thereto, haven't found any myself :-()
> much appreciated
>
>
> Ian Barwick
>
> --
>
> Remove SUNGLASSES to reply ;-)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
--
Erwin Lansing         --         http://droso.org

I love deadlines.                       -- Douglas Adams
I love the whooshing sound they make as the fly by.

Re: VACUUM, 24/7 availability and 7.2

From
Erwin Lansing
Date:
On Wed, Oct 10, 2001 at 04:22:07PM -0400, wsheldah@lexmark.com wrote:
>
>
> Just to keep things in perspective, how large are your current databases, and
> what do you or the company consider to be a signficant length of time?  Right
> now I have a development database with just a few thousand records of test data,
> and vacuum takes just a very few seconds a day.  I think I recall hearing on
> this list of it taking a minute or three for databases several gigabytes in
> size.  For some sites this would be tolerable, for others it wouldn't.

We are having some trouble with some tables in which we have lots of
update's (and insert/delete's). "A lot" being several thousands per day
(I haven't measured the exact numbers recently). VACUUM is running twice
a day and locks these tables a long time where 10-15 minutes is not
exceptional. This table has only approx 100k records, but these are
updated very often and that seems to cause to much rubbish for vacuum
>
> I'm also interested to hear what the future holds for vacuum.  If nothing else,
> it couldn't hurt postgresql's public relations.  :-)

indeed. we are currently considering moving the guilty tables over til
MySQL as we're not using any advanced features in it, while keeping all
other data in PostgreSQL. Anything that keeps us from doing that would
be good PR :)

/erwin

>
> --Wes Sheldahl
>
>
>
>
>
> Ian Barwick <SUNGLASSESbarwick%gmx.net@interlock.lexmark.com> on 10/10/2001
> 07:27:56 AM
>
> To:   pgsql-general%postgresql.org@interlock.lexmark.com
> cc:    (bcc: Wesley Sheldahl/Lex/Lexmark)
> Subject:  [GENERAL] VACUUM, 24/7 availability and 7.2
>
>
> I'm doing some work for a smallish company which conducts
> its business largely online. Currently they have a legacy
> mishmash of Oracle and MySQL databases which they wish
> to unify one one platform (RDBMS with client access via
> browser and custom serverside applications for employees
> and customers).
>
> PostgreSQL would be my primary candidate. However the company's
> operating requirments mean that the data needed for interaction
> with customers / website users must be available on a 24/7 basis.
> This is primarily a) data related to product ordering and
> tables for storing order data; and b) website user authentication
> and personalisation data (logins, user preferences etc).
>
> It is therefore not an option to have these databases offline
> at regular intervals for any significant length of time for
> VACUUMing. Replicating data to say MySQL databases is
> technically feasible, at least in the case of b) above, but
> not desirable. Are there any existing "native" PostgreSQL solutions
> to this problem?
>
> More importantly, what is the situation on VACUUM for release 7.2?
> It seems from the pgsql-hackers list that there are plans for
> a none-exclusively locking VACUUM, e.g.:
>
>
http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us
>
>
> (sorry about the long URL); how far advanced are they, and is
> there any kind of release schedule for 7.2?
>
> Any answers (or pointers thereto, haven't found any myself :-()
> much appreciated
>
>
> Ian Barwick
>
> --
>
> Remove SUNGLASSES to reply ;-)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
--
Erwin Lansing         --         http://droso.org

I love deadlines.                       -- Douglas Adams
I love the whooshing sound they make as the fly by.

Re: VACUUM, 24/7 availability and 7.2

From
Tom Lane
Date:
>> More importantly, what is the situation on VACUUM for release 7.2?
>> It seems from the pgsql-hackers list that there are plans for
>> a none-exclusively locking VACUUM, e.g.:
>>
>>
http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us
>>
>> (sorry about the long URL); how far advanced are they,

It's long since done.

>> and is there any kind of release schedule for 7.2?

Beta release is ... um ... any day now.  I think we're just waiting on
Lockhart to say he's done tweaking the datetime datatypes.

            regards, tom lane

Re: VACUUM, 24/7 availability and 7.2

From
Doug McNaught
Date:
Erwin Lansing <erwin@lansing.dk> writes:

> We are having some trouble with some tables in which we have lots of
> update's (and insert/delete's). "A lot" being several thousands per day
> (I haven't measured the exact numbers recently). VACUUM is running twice
> a day and locks these tables a long time where 10-15 minutes is not
> exceptional. This table has only approx 100k records, but these are
> updated very often and that seems to cause to much rubbish for vacuum

You might try running VACUUM much more often, as it'll run faster with
less work to do.  If running it every hour takes the locked time down
to 30 seconds, you might find that acceptable (depending on the
application of course).

Worth a try until 7.2 is solid...

-Doug
--
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863

Re: VACUUM, 24/7 availability and 7.2

From
Denis Gasparin
Date:
> >> More importantly, what is the situation on VACUUM for release 7.2?
> >> It seems from the pgsql-hackers list that there are plans for
> >> a none-exclusively locking VACUUM, e.g.:
> >>
> >>
>
http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us
> >>
> >> (sorry about the long URL); how far advanced are they,
>
>It's long since done.

==>> This means that it will not be included in 7.2? I've read 7.2
documentation on line and i've seen that the VACUUM command is changed:
now, when run in normal mode (giving to the backend the VACUUM command
without any parameter), the tables don't need to be locked  and also that
the command does not minimize the space of the database (as instead the
actual 7.1.3 VACUUM does). From the documentation:

-----
Plain VACUUM (without FULL) simply reclaims space and makes it available
for re-use. This form of the command can operate in parallel with normal
reading and writing of the table. VACUUM FULL does more extensive
processing, including moving of tuples across blocks to try to compact the
table to the minimum number of disk blocks. This form is much slower and
requires an exclusive lock on each table while it is being processed.
-----

This way to do is similar (but not equal) to the LAZY VACUUM specified by
Tom Lane in the above link...

In conclusion, the new VACUUM command as described above will be include in
the 7.2 version of Postgresql?

Denis Gasparin: denis@edistar.com
---------------------------
Programmer & System Administrator - Edistar srl



Re: VACUUM, 24/7 availability and 7.2

From
Denis Gasparin
Date:
> >> More importantly, what is the situation on VACUUM for release 7.2?
> >> It seems from the pgsql-hackers list that there are plans for
> >> a none-exclusively locking VACUUM, e.g.:
> >>
> >>
>
http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us
> >>
> >> (sorry about the long URL); how far advanced are they,
>
>It's long since done.

==>> This means that it will not be included in 7.2? I've read 7.2
documentation on line and i've seen that the VACUUM command is changed:
now, when run in normal mode (giving to the backend the VACUUM command
without any parameter), the tables don't need to be locked  and also that
the command does not minimize the space of the database (as instead the
actual 7.1.3 VACUUM does). From the documentation:

-----
Plain VACUUM (without FULL) simply reclaims space and makes it available
for re-use. This form of the command can operate in parallel with normal
reading and writing of the table. VACUUM FULL does more extensive
processing, including moving of tuples across blocks to try to compact the
table to the minimum number of disk blocks. This form is much slower and
requires an exclusive lock on each table while it is being processed.
-----

This way to do is similar (but not equal) to the LAZY VACUUM specified by
Tom Lane in the above link...

In conclusion, the new VACUUM command as described above will be include in
the 7.2 version of Postgresql?

Denis Gasparin: denis@edistar.com
---------------------------
Programmer & System Administrator - Edistar srl



Re: VACUUM, 24/7 availability and 7.2

From
Joel Burton
Date:
On Wed, 10 Oct 2001, Ian Barwick wrote:

> I'm doing some work for a smallish company which conducts
> its business largely online. Currently they have a legacy
> mishmash of Oracle and MySQL databases which they wish
> to unify one one platform (RDBMS with client access via
> browser and custom serverside applications for employees
> and customers).
>
> PostgreSQL would be my primary candidate. However the company's
> operating requirments mean that the data needed for interaction
> with customers / website users must be available on a 24/7 basis.
> This is primarily a) data related to product ordering and
> tables for storing order data; and b) website user authentication
> and personalisation data (logins, user preferences etc).
>
> It is therefore not an option to have these databases offline
> at regular intervals for any significant length of time for
> VACUUMing. Replicating data to say MySQL databases is
> technically feasible, at least in the case of b) above, but
> not desirable. Are there any existing "native" PostgreSQL solutions
> to this problem?
>
> More importantly, what is the situation on VACUUM for release 7.2?
> It seems from the pgsql-hackers list that there are plans for
> a none-exclusively locking VACUUM, e.g.:
>
>
http://groups.google.com/groups?q=vacuum&hl=en&group=comp.databases.postgresql.hackers&rnum=1&selm=12833.990140724%40sss.pgh.pa.us
>
> (sorry about the long URL); how far advanced are they, and is
> there any kind of release schedule for 7.2?
>
> Any answers (or pointers thereto, haven't found any myself :-()
> much appreciated

There is a faster, non-exclusive-locking VACUUM in the CVS now; this
should become part of 7.2. You can download the nightly snapsot and build
it to test it with your application.

HTH.



Re: VACUUM, 24/7 availability and 7.2

From
tek1
Date:
where in the postgres dox can we find info abt the vacuum feature?

thx...


Re: VACUUM, 24/7 availability and 7.2

From
Andrew Sullivan
Date:
On Mon, Oct 15, 2001 at 05:06:05PM +0200, Denis Gasparin wrote:
> Ok, little language qui pro quo...
>
> I'm sorry for the error...

No need to be sorry!  Just thought I'd clarify.

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.com>                               M6K 3E3
                                         +1 416 646 3304 x110


Re: VACUUM, 24/7 availability and 7.2

From
Tom Lane
Date:
Denis Gasparin <denis@edinet.it> writes:
> This way to do is similar (but not equal) to the LAZY VACUUM specified by
> Tom Lane in the above link...

> In conclusion, the new VACUUM command as described above will be include in
> the 7.2 version of Postgresql?

The on-line docs are accurate (at least the VACUUM reference page is;
I don't think I've updated all of the Admin Guide discussions yet).
The link quoted earlier was just an early proposal about what it should
look like, not a final spec.

            regards, tom lane

Re: VACUUM, 24/7 availability and 7.2

From
"Tille, Andreas"
Date:
On Fri, 12 Oct 2001, Joel Burton wrote:

> You can download the nightly snapsot and build
Sorry, I havn´t found the nightly snapshot.  Any URL for people like
me who have trouble to use CVS due to firewall constraints?

Kind regards

          Andreas.

Re: VACUUM, 24/7 availability and 7.2

From
"Henshall, Stuart - WCP"
Date:
Try:
http://www.ca.postgresql.org/ftpsite/dev/

> -----Original Message-----
> From:    Tille, Andreas [SMTP:TilleA@rki.de]
> Sent:    Tuesday, October 16, 2001 10:45 AM
> Cc:    PostgreSQL General
> Subject:    Re: [GENERAL] VACUUM, 24/7 availability and 7.2
>
> On Fri, 12 Oct 2001, Joel Burton wrote:
>
> > You can download the nightly snapsot and build
> Sorry, I havn´t found the nightly snapshot.  Any URL for people like
> me who have trouble to use CVS due to firewall constraints?
>
> Kind regards
>
>           Andreas.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

Re: VACUUM, 24/7 availability and 7.2

From
Denis Gasparin
Date:
Ok, little language qui pro quo...

I'm sorry for the error...

Denis

At 15/10/01 17.00, Andrew Sullivan wrote:
>On Mon, Oct 15, 2001 at 10:40:17AM +0200, Denis Gasparin wrote:
> > >
> > >It's long since done.
> >
> > ==>> This means that it will not be included in 7.2? I've read 7.2
>
>No, it means it _will_ be included.
>
>A
>
>--
>----
>Andrew Sullivan                               87 Mowat Avenue
>Liberty RMS                           Toronto, Ontario Canada
><andrew@libertyrms.com>                               M6K 3E3
>                                          +1 416 646 3304 x110


Re: VACUUM, 24/7 availability and 7.2

From
"Mitch Vincent"
Date:
> and vacuum takes just a very few seconds a day.  I think I recall hearing
on
> this list of it taking a minute or three for databases several gigabytes
in
> size.  For some sites this would be tolerable, for others it wouldn't.

It depends more on user activity and number/size of indexes than over all
database size from what I've seen.. In one database I have 20ish tables,
some have 70,000ish records but the whole database isn't very big -- VACUUM
takes a while because of the amount of UPDATE'd and DELETE'd records I have
every day, and the number of indexes (lots!)....

> I'm also interested to hear what the future holds for vacuum.  If nothing
else,
> it couldn't hurt postgresql's public relations.  :-)
>
> --Wes Sheldahl