Thread: Answering my own question

Answering my own question

From
gerry.smit@lombard.ca
Date:
I found, on http://www.argudo.org/postgresql/soft-tuning-10.html  my
answer. (Excellent site btw)  What I need to do is prefix my SELECT
statements for this table with the statement :

SET enable_seqscan='false';

as in :

SET enable_seqscan='false';SELECT * FROM pol_xref_d WHERE policy_no=1200079
ORDER BY sequence;


This change makes the response sub-second , a great improvement over the
11+ seconds we're currently experiencing.

I'ld love any explanation of why the Query plan was going sequential versus
indexed.

Gerry




Re: Answering my own question

From
gerry.smit@lombard.ca
Date:
Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
months ago , and only got this problem after that. I leave "ANALYZE" out of
VACUUM due to the large volume of (I though needless) output that gets
piped to my cron_log.

Does anyone have a feal for how "necessary" VACUUM ANALYZE is over "VACUUM"
??

Gerry




            
                      ghaverla@freenet.edm
            
                      onton.ab.ca                  To:      gerry.smit@lombard.ca
            
                                                   cc:      pgsql-novice@postgresql.org
            
                      15/05/2002 01:50 PM          Fax to:
            
                                                   Subject: Re: [NOVICE] Answering my own question
            

            





On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> I found, on http://www.argudo.org/postgresql/soft-tuning-10.html  my
> answer. (Excellent site btw)  What I need to do is prefix my SELECT
> statements for this table with the statement :
>
> SET enable_seqscan='false';
>
> as in :
>
> SET enable_seqscan='false';SELECT * FROM pol_xref_d WHERE
policy_no=1200079
> ORDER BY sequence;

Some else mentioned ANALYZE (or is it ANALYSE?).  Anyway,
I seem to remember reading someplace that with newer versions
of PostgreSQL, when you used to VACUUM on a regular basis,
you were supposed to change over to VACUUM ANALYSE (ANALYZE?)
some (all?) of the time.  My memory is fuzzy about this,
so perhaps the docs will have something on this.

Gord

Matter Realisations     http://www.materialisations.com/
Gordon Haverland, B.Sc. M.Eng. President
101  9504 182 St. NW    Edmonton, AB, CA  T5T 3A7
780/481-8019            ghaverla @ freenet.edmonton.ab.ca
780/993-1274 (cell)









Re: Answering my own question

From
Francisco Reyes
Date:
On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
> months ago , and only got this problem after that. I leave "ANALYZE" out of
> VACUUM due to the large volume of (I though needless) output that gets
> piped to my cron_log.

Display of activities is controllable through settings.
It is also somewhat silly to not run analyze just because it displays too
much data in a log.

> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over "VACUUM"
> Gerry

Like many other things in live the answer is: it depends.
Vacuum is about been able to reuse space which was used by rows that have
been deleted.

Vacuum Full is about recovering the disk space used by deleted rows.

Analyze is about re-computing statistics that the analyzer uses to
determine best strategies.

The answer to your question will be directly related to how much and how
often new data is inserted in your database.

If you NEVER change your data or make inserts then you NEVER have to run
either vacuum or analyze.

If you have a very heavily loaded environment with hundreds/thousands
transactions per minute, then an hourly vacuum analyze would probably
help.

Why don you tell us more about your particular scenario?

For instance I do ONE big load every morning from a production Foxpro
system to my reporting PostgreSQL system. I do one vacuum analyze just
after the load, then I compute some stats and populate some files, then
another vacuum analyze. At 10pm I do a vacuum full. This works great for
me.

Before I was doing one vacuum analyze after the stats computations and it
was horrible. Something which takes to compute less than an hour if I have
done a vacuum analyze took about 4 hours without vacuum analyze and that
was just ONE of the things that needed to be done. The first vacuum
analyze takes about 50 minutes and the second takes about 8 minutes. A lot
less than the 4+ hours of savings from having run vacuum analyze vs doing
the stats computations without them.


Re: Answering my own question

From
gerry.smit@lombard.ca
Date:
Well, having copies of stats from the server.log obscure timing info in the
VACUUM log doesn't help. And the stats are fairly unreadable, at least to
this novice.   :-)

vacuum  - FULL is new for me, and I've been wondering about how to recycle
disk space. THANKS.

vacuum - ANALYZE - prodcing stats for the QUERY PLANner is also something I
wasn'tt aware of.

Here's the scenario. We have about 8 tables or so, in 4 regional databases.

- two of the tables are updated via on-line transaction from our users.
Probably a couple of hundred records a day , tops.
      - records eligible for archiving are "SELECT INTO" and "DELETE"d
about once a quarter. typically 1000 or so.

- the rest of the tables are updated from the Mainframe batch cycle, at
about 3:00 am local time.
      - these records NEVER get updated, just more added each day.

- the problem tables have about 800,000 records in them.
      - the daily table getting 3000-5000 new records a day.
      - once a month, records more than 12 months old are purged, no
archive required.
      - the non-daily table is DROPd and rebuilt monthly.
             Newly generated records are loaded into it from batch
processes that take about two days time , at the start of each month.
      - thus the "high water" mark is highest just before monthend.
      - daily volumes are slowly increasing over time, so the "high water
mark" is also going higher.


VACUUM is run nightly, at about 7:00 pm.  This is to make the system ready
for UNIX back-ups, at about 9:30 pm. Also, in the past, the VACUUM has
taken hours, and seems to lock the database (not just the table). Thus
we're reluctant to run VACUUM at 5:00 am, when daily batch finishes.

The primary reason for the hours of VACUUM was the non-daily table , above,
back when we DELETEd most records, and then reloaded over two days.
Deleting 780,000 records of an 800,000 row table creates an IMMENSELY long
VACUUM. Hence our DROP and rebuild.


I suspect that what  I should do is :

1) continue nightly VACUUM's at about 7:00 pm
2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring
its ready for Sunday FULL disk back-ups.
3) I should run a VACUUM FULL ANALYZE tonight as a one-shot.
4) Consider adding the statement "SET enable_seqscan='false';" to the TWO
locations in our on-line system where the problem SELECT's are coded.
However, during my testing, SET returns the output "SET VARIABLE" which is
being interpretted as the first line of output from the accompanying SELECT
statement.  Any ideas on how to make the SET statement "quiet" ?

Step 4 may not be necessary if Steps 1-3 do the job, btw.


Oh, we're a 16/5 operation, with some people dialing in or coming in on
weekends. 24/7 is NOT a requirement.


Gerry Smit,
Toronto, Ontario





        
                      "Francisco
        
                      Reyes"                   To:      gerry.smit@lombard.ca
        
                      <lists@natserv.c         cc:      ghaverla@freenet.edmonton.ab.ca, pgsql-novice@postgresql.org
        
                      om>                      Fax to:
        
                                               Subject: Re: [NOVICE] Answering my own question
        
                      15/05/2002 03:29
        
                      PM
        

        

        




On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
> months ago , and only got this problem after that. I leave "ANALYZE" out
of
> VACUUM due to the large volume of (I though needless) output that gets
> piped to my cron_log.

Display of activities is controllable through settings.
It is also somewhat silly to not run analyze just because it displays too
much data in a log.

> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over
"VACUUM"
> Gerry

Like many other things in live the answer is: it depends.
Vacuum is about been able to reuse space which was used by rows that have
been deleted.

Vacuum Full is about recovering the disk space used by deleted rows.

Analyze is about re-computing statistics that the analyzer uses to
determine best strategies.

The answer to your question will be directly related to how much and how
often new data is inserted in your database.

If you NEVER change your data or make inserts then you NEVER have to run
either vacuum or analyze.

If you have a very heavily loaded environment with hundreds/thousands
transactions per minute, then an hourly vacuum analyze would probably
help.

Why don you tell us more about your particular scenario?

For instance I do ONE big load every morning from a production Foxpro
system to my reporting PostgreSQL system. I do one vacuum analyze just
after the load, then I compute some stats and populate some files, then
another vacuum analyze. At 10pm I do a vacuum full. This works great for
me.

Before I was doing one vacuum analyze after the stats computations and it
was horrible. Something which takes to compute less than an hour if I have
done a vacuum analyze took about 4 hours without vacuum analyze and that
was just ONE of the things that needed to be done. The first vacuum
analyze takes about 50 minutes and the second takes about 8 minutes. A lot
less than the 4+ hours of savings from having run vacuum analyze vs doing
the stats computations without them.








Re: Answering my own question

From
Manuel Sugawara
Date:
gerry.smit@lombard.ca writes:

> Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about
> 4-5 months ago , and only got this problem after that. I leave
> "ANALYZE" out of VACUUM due to the large volume of (I though
> needless) output that gets piped to my cron_log.
>
> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over
> "VACUUM" ??

The analyze part records statistical information about your
data. Without it the planner doesn't have information to decide a
better plan.

BTW if you are still in 7.1 you should upgrade. 7.2 is smarter about
the statistical information it records and is able to make better
decisions.

Regards,
Manuel.

ps. yes: mx -> México

Re: Answering my own question

From
"Balazs Wellisch"
Date:
Hi all,

I'm embarrassed to even ask this, but this being a novice list here it goes.

What exactly is Vacuum? Is there an equivalent of this in MS SQL Server?

Thanks,
Balazs Wellisch


Re: Answering my own question

From
Chris
Date:
>Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
>months ago , and only got this problem after that. I leave "ANALYZE" out of
>VACUUM due to the large volume of (I though needless) output that gets
>piped to my cron_log.
>
>Does anyone have a feal for how "necessary" VACUUM ANALYZE is over "VACUUM"
>??

As you noticed, pretty necessary :)

A vacuum analyze updates the statistics used when the planner decides on
what action to take (whether it's a sequential scan, using an index etc).

A plain vacuum removes dead tuples from the system, not much else.


-----------------
      Chris Smith
http://www.squiz.net/


Re: Answering my own question

From
gerry.smit@lombard.ca
Date:
As in another post, VACUUM ANALYZE has fixed my index, optimer, query plan
problem.

A follow-up question I have is about VACUUM FULL.

There's no mention of "FULL" as an arguement to VACUUM in the 7.1.3
doc/html  on-line reference manual.  Was FULL added to a later release?


Gerry




        
                      "Francisco
        
                      Reyes"                   To:      gerry.smit@lombard.ca
        
                      <lists@natserv.c         cc:      ghaverla@freenet.edmonton.ab.ca, pgsql-novice@postgresql.org
        
                      om>                      Fax to:
        
                                               Subject: Re: [NOVICE] Answering my own question
        
                      15/05/2002 03:29
        
                      PM
        

        

        




On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
> months ago , and only got this problem after that. I leave "ANALYZE" out
of
> VACUUM due to the large volume of (I though needless) output that gets
> piped to my cron_log.

Display of activities is controllable through settings.
It is also somewhat silly to not run analyze just because it displays too
much data in a log.

> Does anyone have a feal for how "necessary" VACUUM ANALYZE is over
"VACUUM"
> Gerry

Like many other things in live the answer is: it depends.
Vacuum is about been able to reuse space which was used by rows that have
been deleted.

Vacuum Full is about recovering the disk space used by deleted rows.

Analyze is about re-computing statistics that the analyzer uses to
determine best strategies.

The answer to your question will be directly related to how much and how
often new data is inserted in your database.

If you NEVER change your data or make inserts then you NEVER have to run
either vacuum or analyze.

If you have a very heavily loaded environment with hundreds/thousands
transactions per minute, then an hourly vacuum analyze would probably
help.

Why don you tell us more about your particular scenario?

For instance I do ONE big load every morning from a production Foxpro
system to my reporting PostgreSQL system. I do one vacuum analyze just
after the load, then I compute some stats and populate some files, then
another vacuum analyze. At 10pm I do a vacuum full. This works great for
me.

Before I was doing one vacuum analyze after the stats computations and it
was horrible. Something which takes to compute less than an hour if I have
done a vacuum analyze took about 4 hours without vacuum analyze and that
was just ONE of the things that needed to be done. The first vacuum
analyze takes about 50 minutes and the second takes about 8 minutes. A lot
less than the 4+ hours of savings from having run vacuum analyze vs doing
the stats computations without them.








Re: Answering my own question

From
gerry.smit@lombard.ca
Date:
Well, the VACUUM ANALYZE took 4 hours last night, but it seems to have
fixed the problem. EXPLAIN SELECT .... and SELECT from both tables is now
using the indicies and gets sub-second response, even with 800,000 rows of
data.

Now to see whether or not I make VACUUM ANALYZE  a nightly thing, or
weekly. 4 hours is long, but its been a LONG time since the last one (like
never).

btw I was confusing VERBOSE with ANALYZE.(Well, this IS the NOVICE list,
right?)  VERBOSE definitely makes the cron_log output too long to read.
ANALYZE produces nothing in the cron log, unless VERBOSE Is given as well.

Thanks for all the help so far folks, it's been bang on.

Gerry




        
                      Chris
        
                      <csmith@squiz.ne         To:      gerry.smit@lombard.ca
        
                      t>                       cc:      pgsql-novice@postgresql.org
        
                                               Fax to:
        
                      15/05/2002 07:28         Subject: Re: [NOVICE] Answering my own question
        
                      PM
        

        

        





>Now THAT's REALLY fascinating.  We upgraded from 6.5 to 7.1 about 4-5
>months ago , and only got this problem after that. I leave "ANALYZE" out
of
>VACUUM due to the large volume of (I though needless) output that gets
>piped to my cron_log.
>
>Does anyone have a feal for how "necessary" VACUUM ANALYZE is over
"VACUUM"
>??

As you noticed, pretty necessary :)

A vacuum analyze updates the statistics used when the planner decides on
what action to take (whether it's a sequential scan, using an index etc).

A plain vacuum removes dead tuples from the system, not much else.


-----------------
      Chris Smith
http://www.squiz.net/








Re: Answering my own question

From
"Adam Erickson"
Date:
> As in another post, VACUUM ANALYZE has fixed my index, optimer, query plan
> problem.
>
> A follow-up question I have is about VACUUM FULL.
>
> There's no mention of "FULL" as an arguement to VACUUM in the 7.1.3
> doc/html  on-line reference manual.  Was FULL added to a later release?

The changed vacuum in 7.[something].  Among other things, which I am unaware
of, vacuum no longer frees physical disk space when cleaning up tables.
Instead, it frees space in the file for pg to reuse.  VACUUM FULL was then
added for those who wanted vacuum to behave as it did before.

One nice thing about the new behavior is that it no longer needs to lock the
tables during vacuum.  Doing a vacuum full will lock the table though.

-Adam


Re: Answering my own question

From
gerry.smit@lombard.ca
Date:
Well, this is only my second day on the list, but I don't think "NOVICE"
could ever have a question to be embarrased about.

VACUUM does 3 things (at least in 7.2) :

-  VACUUM  (alone)  takes rows marked for deletion, and frees up the space
to be re-used by later INSERTs or UPDATEs.
      - I'm guessing, but it shure looks like DELETE doesn't physically
delete, it logically deletes.
      - similarly UPDATE doesn't seem to physically update in place, but
logically update by marking the old record as "deleteable" and INSERTing
(in effect) a new record with your changes.
      - all of which leaves a lot of Logically deleted, but physically
still present , records in your file.

- VACUUM FULL  apparently goes one step further than VACUUM, and actually
frees up the disk space for the operating system.
      - FULL is NOT an option to VACUUM in 7.1.3 so I'm guessing, based on
comments from other emails.
      - In our shop we're running Postgres 7.1.3 on a Solaris 2.(6?7?8)
E3500.  After a VACUUM, the unix files in pgsql/data/base are no smaller.
      - eventually these get so big, its worthwhile to pg_dump , DROP, and
reload a volatile table. At which point the unix files are smaller, and
response time is improved.
      - presumably in postgres 7.2+  , FULL was added as an option to
VACUUM to do this inherently. In IBM mainframe speak, it somewhat analogous
to doing a "freespace" and "de-frag".

- VACUUM ANALYZE, as I've learned so well these past two days, provides
stats on the remaining rows in your tables, allowing the QUERY PLAN
OPTIMIZER to best determine whether or not to use any indicies, and if so,
which ones.

- VACUUM VERBOSE - just dumps the output of the VACUUM to the calling
script or command line. A copy of said output is in your server.log file in
any case.

Interestingly enough VACUUM VERBOSE ANALYZE doesn't get you any stats from
the Analyzer, just the line "Analyzing...."  I was hoping for crytpic clues
for things like "commonly recurring key, index useless"  type messages. Or
"too few rows, index useless".


Gerry Smit,
Toronto, Canada.





              
                      "Balazs Wellisch"
              
                      <balazs@bwellisch.com>         To:      pgsql-novice@postgresql.org
              
                      Sent by:                       cc:
              
                      pgsql-novice-owner@pos         Fax to:
              
                      tgresql.org                    Subject: Re: [NOVICE] Answering my own question
              

              

              
                      15/05/2002 06:57 PM
              

              

              




Hi all,

I'm embarrassed to even ask this, but this being a novice list here it
goes.

What exactly is Vacuum? Is there an equivalent of this in MS SQL Server?

Thanks,
Balazs Wellisch


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster







Re: Answering my own question

From
gerry.smit@lombard.ca
Date:
VACUUM FULL is NOT available in 7.1.3, so it must be later.

Gerry




              
                      "Adam Erickson"
              
                      <adamre@cox.net>               To:      pgsql-novice@postgresql.org
              
                      Sent by:                       cc:
              
                      pgsql-novice-owner@pos         Fax to:
              
                      tgresql.org                    Subject: Re: [NOVICE] Answering my own question
              

              

              
                      16/05/2002 11:56 AM
              

              

              




> As in another post, VACUUM ANALYZE has fixed my index, optimer, query
plan
> problem.
>
> A follow-up question I have is about VACUUM FULL.
>
> There's no mention of "FULL" as an arguement to VACUUM in the 7.1.3
> doc/html  on-line reference manual.  Was FULL added to a later release?

The changed vacuum in 7.[something].  Among other things, which I am
unaware
of, vacuum no longer frees physical disk space when cleaning up tables.
Instead, it frees space in the file for pg to reuse.  VACUUM FULL was then
added for those who wanted vacuum to behave as it did before.

One nice thing about the new behavior is that it no longer needs to lock
the
tables during vacuum.  Doing a vacuum full will lock the table though.

-Adam


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)







Re: Answering my own question

From
Francisco Reyes
Date:
On Thu, 16 May 2002 gerry.smit@lombard.ca wrote:

> There's no mention of "FULL" as an arguement to VACUUM in the 7.1.3
> doc/html  on-line reference manual.  Was FULL added to a later release?

Yes.
I also recommend you upgrade to 7.2 when you can... actually it is now
7.2.1. :-)


Re: Answering my own question

From
Francisco Reyes
Date:
On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Here's the scenario. We have about 8 tables or so, in 4 regional databases.

Why do you have this "regional" databases?
Couldn't they all see one database?
Do you have a WAN?

> - two of the tables are updated via on-line transaction from our users.
> Probably a couple of hundred records a day , tops.
>       - records eligible for archiving are "SELECT INTO" and "DELETE"d
> about once a quarter. typically 1000 or so.

These records are so few that I don't even see why you
bother archiving them.


> - the rest of the tables are updated from the Mainframe batch cycle, at
> about 3:00 am local time.
>       - these records NEVER get updated, just more added each day.

Good.


> - the problem tables have about 800,000 records in them.
>       - the daily table getting 3000-5000 new records a day.

Not really much data added.

>       - the non-daily table is DROPd and rebuilt monthly.

Why do you drop it?
Why not truncate it instead?
In particular if you ever decide to use views, dropping the tables will be
a problem unless you recreate the views too.


> VACUUM is run nightly, at about 7:00 pm.

Just plain vacuum?  You may want to do a vacuum full.

>This is to make the system ready
>for UNIX back-ups, at about 9:30 pm.

What do you mean by Unix backups?  You should NOT be backing up the
database from the OS. You should do a pg_dump and then backup that file
only.

 Also, in the past, the VACUUM has
> taken hours, and seems to lock the database (not just the table). Thus
> we're reluctant to run VACUUM at 5:00 am, when daily batch finishes.

That is no longer an issue with 7.2 and higher. Run your vacuum analyze
right after your loads and before you start doing any large computations.
If those computations create additional files or modify substantial number
of records do yet another vacuum analyze.


> The primary reason for the hours of VACUUM was the non-daily table , above,
> back when we DELETEd most records, and then reloaded over two days.
> Deleting 780,000 records of an 800,000 row table creates an IMMENSELY long
> VACUUM.

Vacuum analyze is fairly quick.

>Hence our DROP and rebuild.

You can truncate instead. Near instantaneous and doesn't leave any records
behind.

> I suspect that what  I should do is :
> 1) continue nightly VACUUM's at about 7:00 pm

I think you should do a vacuum full.

> 2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring
> its ready for Sunday FULL disk back-ups.

Again you should not be backing up the database files, but the output of
pg_dump.

> Oh, we're a 16/5 operation, with some people dialing in or coming in on
> weekends. 24/7 is NOT a requirement.

Good.
It would help you a lot if you do nightly vacuum fulls. Specially if space
is an issue and/or you have many queries which do sequential scans.


Re: Answering my own question

From
gerry.smit@lombard.ca
Date:


Replies imbedded.

btw VACUUM ANALYZE has fixed this problem, For those of you looking for
solutions, we might be getting into "interesting background" vs "pgsql
problem".  AS a new comer to this list, should I be taking this reply
off-line?

Gerry




        
                      "Francisco
        
                      Reyes"                   To:      gerry.smit@lombard.ca
        
                      <lists@natserv.c         cc:      ghaverla@freenet.edmonton.ab.ca, "Pgsql Novice"
        
                      om>                      <pgsql-novice@postgresql.org>
        
                                               Fax to:
        
                      17/05/2002 11:30         Subject: Re: [NOVICE] Answering my own question
        
                      AM
        

        

        




On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Here's the scenario. We have about 8 tables or so, in 4 regional
databases.

Why do you have this "regional" databases?
Couldn't they all see one database?
Do you have a WAN?

Gerry : "Regional" is holdover from when these were on Sun w/s in offices.
Now a Single server in Toronto, with 4 DBs, keeps the tables small,
controls access, and we're REALLY new to SQL. Plus data is "regional" in
scope. Political requirements at the time of the server merge also
involved. WAN carries traffic to the regional users. We're toying with ONE
national DB, but the index speed problem in Central's table, (the origin of
this thread,) kept us thinking response time gets worse exponentially. Thus
inflicting 800,000 useless records on Atlantic slows their response for no
good reason. We will DEFINITLY be re-thinking that now that we've got
"ANALYZE" and this bug figured out. Speed is WAY UP these past two days.


> - two of the tables are updated via on-line transaction from our users.
> Probably a couple of hundred records a day , tops.
>       - records eligible for archiving are "SELECT INTO" and "DELETE"d
> about once a quarter. typically 1000 or so.

These records are so few that I don't even see why you
bother archiving them.

Gerry : builds up over time, and a novice's understanding that "fewer
records means faster table". PLus,  occasionally, we delete 20-30 thousand
records as an entire Brokerage cancels off.


> - the rest of the tables are updated from the Mainframe batch cycle, at
> about 3:00 am local time.
>       - these records NEVER get updated, just more added each day.

Good.


> - the problem tables have about 800,000 records in them.
>       - the daily table getting 3000-5000 new records a day.

Not really much data added.

>       - the non-daily table is DROPd and rebuilt monthly.

Why do you drop it?
Why not truncate it instead?
In particular if you ever decide to use views, dropping the tables will be
a problem unless you recreate the views too.

Gerry : The monthend process that updates the second large table
(pol_xref_b) gets updates of 20-90 thousand records at a time. That
involved, in 1st implementation (running on postgeSQL 6.5) we issued a
DELETE for the older versions of these records, and loading these records.
By weekend's completion, you're looking at 780,000 deleted records of
800,000, and then loading in another 780,000 (plus grows with time)
records. The VACUUM Sunday night would run 24 hours, locking the Database
Monday morning. So 2nd implementation, (ready the next month ,) was to DROP
the table, rebuild it, and then just let it load. the 20,000 lost records
were for "old" clients, and not considered a primary worry, certainly less
so than locking the database during Office hours.

Gerry : What is "truncate" ??
Gerry :  Not what are "Views", but how to use them, and install the?
That's a DB 101 question, and retorical. PostgreSQL for us was a freeware
start at SQL. As we learn things, and need things, we'll either find out
how Postgres does them, or jump to a commercial DB where we can hire
contractor's for turn key programming (including some knowledge transfer of
WHY they built whatever they might design).   Room for a plug here, anyone
know of commercial Postrges support?


> VACUUM is run nightly, at about 7:00 pm.

Just plain vacuum?  You may want to do a vacuum full.

Gerry : not available in 7.1.3  If all it does is free up disk space, not
critical at this time. And the team has more things to do. Fixing something
that isn't "broken" goes down in priority. (Sigh).

>This is to make the system ready
>for UNIX back-ups, at about 9:30 pm.

What do you mean by Unix backups?  You should NOT be backing up the
database from the OS. You should do a pg_dump and then backup that file
only.

Gerry : Current back-up / restore plan is, should something happen that the
RAID-5 can't handle, we'ld restore the entire $PGDIR. Yes it might have
changed during back-up, but we'll take that chance.  Interesting that 7.1
changed unix file names from table names to meaningless numbers, thus
hiding and preventing the ability to individually restore damaged tables.
From what I've seen, there's enough other data in other files that this is
an ABSOLUTE NO NO.

Gerry : Unix back-ups are generic, full on Sunday, incremental other
nights.  Postgres resides in a UFS. Implementing a pg_dump cron-script
makes sense, given the above comments (yours and mine).



 Also, in the past, the VACUUM has
> taken hours, and seems to lock the database (not just the table). Thus
> we're reluctant to run VACUUM at 5:00 am, when daily batch finishes.

That is no longer an issue with 7.2 and higher. Run your vacuum analyze
right after your loads and before you start doing any large computations.
If those computations create additional files or modify substantial number
of records do yet another vacuum analyze.


Gerry  : That puts 7.2.1 on the "mid-burner" for us at this point. Better
than "back-burner", like too many other things. Also, our first VACUUM
ANALYZE, run Wednesday night took about 5 hours for all DBs. Thursday night
VACUUM ANALYZE took the usual half hour or so. So your point of VACUUM
ANALYZE following 5:00 am batch updates, prior to on-line usage makes
sense, and would help end-users. Thanks.


> The primary reason for the hours of VACUUM was the non-daily table ,
above,
> back when we DELETEd most records, and then reloaded over two days.
> Deleting 780,000 records of an 800,000 row table creates an IMMENSELY
long
> VACUUM.

Vacuum analyze is fairly quick.

Gerry  : Noted, and now seen, even in 7.1.3  !  :-)

>Hence our DROP and rebuild.

You can truncate instead. Near instantaneous and doesn't leave any records
behind.

Gerry : I just read about TRUNCATE in the 7.1.3 on-line manual. I'll
definitly try that next in lieu of DROP next month-end. The worry is that
the VACUUM of the emptied table may take a long time. The DROP, rebuild
PREVENTS this, as there is NOTHING to VACUUM, just a new, empty table.
Something has to "free up" those rows, and if it's VACUUM, then I'm looking
at 24 hours again.  Any comments?  Or is that my postgres 6.5 experience
and not relevant to postgreSQL 7.x ?

Gerry : btw what do people LIKE to refer to Postgres as in writing?
Postgres, postgres, postgreSQL (one s or two?), pgsql?


> I suspect that what  I should do is :
> 1) continue nightly VACUUM's at about 7:00 pm

I think you should do a vacuum full.

Gerry : Agree, but not available until we upgrade to 7.2.1


> 2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring
> its ready for Sunday FULL disk back-ups.

Again you should not be backing up the database files, but the output of
pg_dump.

Gerry : Thanks. Such a cron-script has been on my mind for a while. RAID-5
is our primary protection. restore is our secondary. Clearly, I can't count
on tape restore of $PGDIR/data/base, so I'll work on said script , when I
have time (ouch sigh).

> Oh, we're a 16/5 operation, with some people dialing in or coming in on
> weekends. 24/7 is NOT a requirement.

Good.
It would help you a lot if you do nightly vacuum fulls. Specially if space
is an issue and/or you have many queries which do sequential scans.


Gerry : We CERTAINLY have MANY MANY queries, do they create space problems?
Or are we referring to the Optimizer's poor choice of sequential vs index
scans (now solved since I implemented VACUUM ANALYZE)? Disk space, whilst
cheap, is NOT unlimited. SO if the QUERIES are wasting space, then a VACUUM
FULL would be useful.   Poor man's version, pg_dump. pg_ctl stop. deletedb
"x"; createdb "x", pg_ctl start, reload, VACUUM ANALYZE.  There's a weekend
I'ld rather be camping.


Gerry Smit,
Toronto, Canada
(Go Leafs Go!)
(For gawd's sake, go! Begone!)