Re: Answering my own question - Mailing list pgsql-novice

From gerry.smit@lombard.ca
Subject Re: Answering my own question
Date
Msg-id OF59BB9496.62EBFD60-ON85256BBC.0055F734@lombard.ca
Whole thread Raw
In response to Answering my own question  (gerry.smit@lombard.ca)
List pgsql-novice


Replies imbedded.

btw VACUUM ANALYZE has fixed this problem, For those of you looking for
solutions, we might be getting into "interesting background" vs "pgsql
problem".  AS a new comer to this list, should I be taking this reply
off-line?

Gerry




        
                      "Francisco
        
                      Reyes"                   To:      gerry.smit@lombard.ca
        
                      <lists@natserv.c         cc:      ghaverla@freenet.edmonton.ab.ca, "Pgsql Novice"
        
                      om>                      <pgsql-novice@postgresql.org>
        
                                               Fax to:
        
                      17/05/2002 11:30         Subject: Re: [NOVICE] Answering my own question
        
                      AM
        

        

        




On Wed, 15 May 2002 gerry.smit@lombard.ca wrote:

> Here's the scenario. We have about 8 tables or so, in 4 regional
databases.

Why do you have this "regional" databases?
Couldn't they all see one database?
Do you have a WAN?

Gerry : "Regional" is holdover from when these were on Sun w/s in offices.
Now a Single server in Toronto, with 4 DBs, keeps the tables small,
controls access, and we're REALLY new to SQL. Plus data is "regional" in
scope. Political requirements at the time of the server merge also
involved. WAN carries traffic to the regional users. We're toying with ONE
national DB, but the index speed problem in Central's table, (the origin of
this thread,) kept us thinking response time gets worse exponentially. Thus
inflicting 800,000 useless records on Atlantic slows their response for no
good reason. We will DEFINITLY be re-thinking that now that we've got
"ANALYZE" and this bug figured out. Speed is WAY UP these past two days.


> - two of the tables are updated via on-line transaction from our users.
> Probably a couple of hundred records a day , tops.
>       - records eligible for archiving are "SELECT INTO" and "DELETE"d
> about once a quarter. typically 1000 or so.

These records are so few that I don't even see why you
bother archiving them.

Gerry : builds up over time, and a novice's understanding that "fewer
records means faster table". PLus,  occasionally, we delete 20-30 thousand
records as an entire Brokerage cancels off.


> - the rest of the tables are updated from the Mainframe batch cycle, at
> about 3:00 am local time.
>       - these records NEVER get updated, just more added each day.

Good.


> - the problem tables have about 800,000 records in them.
>       - the daily table getting 3000-5000 new records a day.

Not really much data added.

>       - the non-daily table is DROPd and rebuilt monthly.

Why do you drop it?
Why not truncate it instead?
In particular if you ever decide to use views, dropping the tables will be
a problem unless you recreate the views too.

Gerry : The monthend process that updates the second large table
(pol_xref_b) gets updates of 20-90 thousand records at a time. That
involved, in 1st implementation (running on postgeSQL 6.5) we issued a
DELETE for the older versions of these records, and loading these records.
By weekend's completion, you're looking at 780,000 deleted records of
800,000, and then loading in another 780,000 (plus grows with time)
records. The VACUUM Sunday night would run 24 hours, locking the Database
Monday morning. So 2nd implementation, (ready the next month ,) was to DROP
the table, rebuild it, and then just let it load. the 20,000 lost records
were for "old" clients, and not considered a primary worry, certainly less
so than locking the database during Office hours.

Gerry : What is "truncate" ??
Gerry :  Not what are "Views", but how to use them, and install the?
That's a DB 101 question, and retorical. PostgreSQL for us was a freeware
start at SQL. As we learn things, and need things, we'll either find out
how Postgres does them, or jump to a commercial DB where we can hire
contractor's for turn key programming (including some knowledge transfer of
WHY they built whatever they might design).   Room for a plug here, anyone
know of commercial Postrges support?


> VACUUM is run nightly, at about 7:00 pm.

Just plain vacuum?  You may want to do a vacuum full.

Gerry : not available in 7.1.3  If all it does is free up disk space, not
critical at this time. And the team has more things to do. Fixing something
that isn't "broken" goes down in priority. (Sigh).

>This is to make the system ready
>for UNIX back-ups, at about 9:30 pm.

What do you mean by Unix backups?  You should NOT be backing up the
database from the OS. You should do a pg_dump and then backup that file
only.

Gerry : Current back-up / restore plan is, should something happen that the
RAID-5 can't handle, we'ld restore the entire $PGDIR. Yes it might have
changed during back-up, but we'll take that chance.  Interesting that 7.1
changed unix file names from table names to meaningless numbers, thus
hiding and preventing the ability to individually restore damaged tables.
From what I've seen, there's enough other data in other files that this is
an ABSOLUTE NO NO.

Gerry : Unix back-ups are generic, full on Sunday, incremental other
nights.  Postgres resides in a UFS. Implementing a pg_dump cron-script
makes sense, given the above comments (yours and mine).



 Also, in the past, the VACUUM has
> taken hours, and seems to lock the database (not just the table). Thus
> we're reluctant to run VACUUM at 5:00 am, when daily batch finishes.

That is no longer an issue with 7.2 and higher. Run your vacuum analyze
right after your loads and before you start doing any large computations.
If those computations create additional files or modify substantial number
of records do yet another vacuum analyze.


Gerry  : That puts 7.2.1 on the "mid-burner" for us at this point. Better
than "back-burner", like too many other things. Also, our first VACUUM
ANALYZE, run Wednesday night took about 5 hours for all DBs. Thursday night
VACUUM ANALYZE took the usual half hour or so. So your point of VACUUM
ANALYZE following 5:00 am batch updates, prior to on-line usage makes
sense, and would help end-users. Thanks.


> The primary reason for the hours of VACUUM was the non-daily table ,
above,
> back when we DELETEd most records, and then reloaded over two days.
> Deleting 780,000 records of an 800,000 row table creates an IMMENSELY
long
> VACUUM.

Vacuum analyze is fairly quick.

Gerry  : Noted, and now seen, even in 7.1.3  !  :-)

>Hence our DROP and rebuild.

You can truncate instead. Near instantaneous and doesn't leave any records
behind.

Gerry : I just read about TRUNCATE in the 7.1.3 on-line manual. I'll
definitly try that next in lieu of DROP next month-end. The worry is that
the VACUUM of the emptied table may take a long time. The DROP, rebuild
PREVENTS this, as there is NOTHING to VACUUM, just a new, empty table.
Something has to "free up" those rows, and if it's VACUUM, then I'm looking
at 24 hours again.  Any comments?  Or is that my postgres 6.5 experience
and not relevant to postgreSQL 7.x ?

Gerry : btw what do people LIKE to refer to Postgres as in writing?
Postgres, postgres, postgreSQL (one s or two?), pgsql?


> I suspect that what  I should do is :
> 1) continue nightly VACUUM's at about 7:00 pm

I think you should do a vacuum full.

Gerry : Agree, but not available until we upgrade to 7.2.1


> 2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring
> its ready for Sunday FULL disk back-ups.

Again you should not be backing up the database files, but the output of
pg_dump.

Gerry : Thanks. Such a cron-script has been on my mind for a while. RAID-5
is our primary protection. restore is our secondary. Clearly, I can't count
on tape restore of $PGDIR/data/base, so I'll work on said script , when I
have time (ouch sigh).

> Oh, we're a 16/5 operation, with some people dialing in or coming in on
> weekends. 24/7 is NOT a requirement.

Good.
It would help you a lot if you do nightly vacuum fulls. Specially if space
is an issue and/or you have many queries which do sequential scans.


Gerry : We CERTAINLY have MANY MANY queries, do they create space problems?
Or are we referring to the Optimizer's poor choice of sequential vs index
scans (now solved since I implemented VACUUM ANALYZE)? Disk space, whilst
cheap, is NOT unlimited. SO if the QUERIES are wasting space, then a VACUUM
FULL would be useful.   Poor man's version, pg_dump. pg_ctl stop. deletedb
"x"; createdb "x", pg_ctl start, reload, VACUUM ANALYZE.  There's a weekend
I'ld rather be camping.


Gerry Smit,
Toronto, Canada
(Go Leafs Go!)
(For gawd's sake, go! Begone!)








pgsql-novice by date:

Previous
From: Francisco Reyes
Date:
Subject: Re: Answering my own question
Next
From: Lutz Horn
Date:
Subject: ADD CONSTRAINT NOT NULL, how?