Re: Answering my own question - Mailing list pgsql-novice

From Francisco Reyes
Subject Re: Answering my own question
Date
Msg-id 20020517112027.O20627-100000@zoraida.natserv.net
Whole thread Raw
In response to Re: Answering my own question  (gerry.smit@lombard.ca)
List pgsql-novice
On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Here's the scenario. We have about 8 tables or so, in 4 regional databases.

Why do you have this "regional" databases?
Couldn't they all see one database?
Do you have a WAN?

> - two of the tables are updated via on-line transaction from our users.
> Probably a couple of hundred records a day , tops.
>       - records eligible for archiving are "SELECT INTO" and "DELETE"d
> about once a quarter. typically 1000 or so.

These records are so few that I don't even see why you
bother archiving them.


> - the rest of the tables are updated from the Mainframe batch cycle, at
> about 3:00 am local time.
>       - these records NEVER get updated, just more added each day.

Good.


> - the problem tables have about 800,000 records in them.
>       - the daily table getting 3000-5000 new records a day.

Not really much data added.

>       - the non-daily table is DROPd and rebuilt monthly.

Why do you drop it?
Why not truncate it instead?
In particular if you ever decide to use views, dropping the tables will be
a problem unless you recreate the views too.


> VACUUM is run nightly, at about 7:00 pm.

Just plain vacuum?  You may want to do a vacuum full.

>This is to make the system ready
>for UNIX back-ups, at about 9:30 pm.

What do you mean by Unix backups?  You should NOT be backing up the
database from the OS. You should do a pg_dump and then backup that file
only.

 Also, in the past, the VACUUM has
> taken hours, and seems to lock the database (not just the table). Thus
> we're reluctant to run VACUUM at 5:00 am, when daily batch finishes.

That is no longer an issue with 7.2 and higher. Run your vacuum analyze
right after your loads and before you start doing any large computations.
If those computations create additional files or modify substantial number
of records do yet another vacuum analyze.


> The primary reason for the hours of VACUUM was the non-daily table , above,
> back when we DELETEd most records, and then reloaded over two days.
> Deleting 780,000 records of an 800,000 row table creates an IMMENSELY long
> VACUUM.

Vacuum analyze is fairly quick.

>Hence our DROP and rebuild.

You can truncate instead. Near instantaneous and doesn't leave any records
behind.

> I suspect that what  I should do is :
> 1) continue nightly VACUUM's at about 7:00 pm

I think you should do a vacuum full.

> 2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring
> its ready for Sunday FULL disk back-ups.

Again you should not be backing up the database files, but the output of
pg_dump.

> Oh, we're a 16/5 operation, with some people dialing in or coming in on
> weekends. 24/7 is NOT a requirement.

Good.
It would help you a lot if you do nightly vacuum fulls. Specially if space
is an issue and/or you have many queries which do sequential scans.


pgsql-novice by date:

Previous
From: "Joel Burton"
Date:
Subject: Re: newbie: Column CHECK(col contains '@') ?
Next
From: gerry.smit@lombard.ca
Date:
Subject: Re: Answering my own question