Re: VACUUM and 24/7 database operation - Mailing list pgsql-general

From Gordan Bobic
Subject Re: VACUUM and 24/7 database operation
Date
Msg-id 001901c08562$0296cb60$8000000a@localdomain
Whole thread Raw
In response to VACUUM and 24/7 database operation  (Thomas.Favier@accelance.fr)
List pgsql-general
>    For one of our customer, we are running a PostgreSQL database on a
> dynamic PHP-driven site. This site has a minimum of 40 visitors at a
> time and must be responsive 24h a day.

And from the bandwidth and hit logs, they cannot determine a time of day
when there are hardly any hits? Possible, but it might be worth a
double-check.

>    The problem is with VACUUMing this table. It takes 2 long minutes
> everyday. Two minutes during wich no request can be done because of the
> lock on the table... (nearly every request is involving this large
> table). Our customer really dislike this fact and is planning to
> replace PostgreSQL with Oracle.

If they are sufficiently desperate over 2 minutes per day, which is, BTW,
less than 0.14% of the time, to want to replace it with Oracle at the full
cost of it (and last time I checked, Oracle in full server setup costs were
in 5 figures, just for the software), then I'd say let them. It's their
money they are throwing away. I am assuming here that they have checked and
confirmed that Oracle would no suffer a similar problem?

>    2 minutes is seen by our customer as sufficent for his customer to
> get away from his site.

That would be the 0.14%, would it? What sort of service are they running on
it?

> - Is 2 minutes a standard time for vacuuming a 500.000 rows table ?

I'd say that is pretty good. I have a 30M row table with 30K-100K
inserts/updates per day (not including selects here), and I'm quite happy
if I can get it to vacuum in less than 15 minutes on a P2/400.

> - Can it be reduced ?

I guess. If you run your database on am ultra-fast RAID stripe, put more
memory and a faster processor in the server, you could probably reduce it.

Have you considered running two servers? If you set up two replicated
servers, then you could run everything off one server. At a convenient time
of day, when there's minimum load, you could swap vacuum the backup one,
wait for the queued replicated queries to be executed, and then fail them
over. Then, you can vacuum the primary server, and make it wait as the
fail-over server until next time it's time to vacuum the database. The
fail-over should take a matter of seconds (fractions of seconds?),
depending on how you do it. A second server is likely to cost them less
than a full Oracle licence...

Incidentally, how do they handle backups? I am not sure how well a backup
of the database will work out with it still running, and records being
inserted during the backup. Running two servers will also get you around
this problem, in the same way, because you can shut down a database while
you back it up without loosing any uptime or data.


> - In a far future, what are the problems we can run into not vacuuming
> that table ? We have already seen that after a month, some transactions
> involving where id >= some_value take forever, so we supressed them.

Performance on inserts and updates will degrade, especially if there are
lots of deletes as well. Basically, the tables will get fragmented, because
deletes, AFAIK, do lazy deletion, so the tables aren't cleaned out until
you vacuum them. This also tends to mess up the index timings because there
is a lot of dead records in them.

Eventually, it will grind to a halt.

Depending on what your performance and requirements are, you could do a
vacuum once per week, perhaps?

Regards.

Gordon


pgsql-general by date:

Previous
From: "Mitch Vincent"
Date:
Subject: Re: plpgsql - cont'd
Next
From: Tom Lane
Date:
Subject: Re: VACUUM and 24/7 database operation