Re: Answering my own question - Mailing list pgsql-novice
From | gerry.smit@lombard.ca |
---|---|
Subject | Re: Answering my own question |
Date | |
Msg-id | OF59BB9496.62EBFD60-ON85256BBC.0055F734@lombard.ca Whole thread Raw |
In response to | Answering my own question (gerry.smit@lombard.ca) |
List | pgsql-novice |
Replies imbedded. btw VACUUM ANALYZE has fixed this problem, For those of you looking for solutions, we might be getting into "interesting background" vs "pgsql problem". AS a new comer to this list, should I be taking this reply off-line? Gerry "Francisco Reyes" To: gerry.smit@lombard.ca <lists@natserv.c cc: ghaverla@freenet.edmonton.ab.ca, "Pgsql Novice" om> <pgsql-novice@postgresql.org> Fax to: 17/05/2002 11:30 Subject: Re: [NOVICE] Answering my own question AM On Wed, 15 May 2002 gerry.smit@lombard.ca wrote: > Here's the scenario. We have about 8 tables or so, in 4 regional databases. Why do you have this "regional" databases? Couldn't they all see one database? Do you have a WAN? Gerry : "Regional" is holdover from when these were on Sun w/s in offices. Now a Single server in Toronto, with 4 DBs, keeps the tables small, controls access, and we're REALLY new to SQL. Plus data is "regional" in scope. Political requirements at the time of the server merge also involved. WAN carries traffic to the regional users. We're toying with ONE national DB, but the index speed problem in Central's table, (the origin of this thread,) kept us thinking response time gets worse exponentially. Thus inflicting 800,000 useless records on Atlantic slows their response for no good reason. We will DEFINITLY be re-thinking that now that we've got "ANALYZE" and this bug figured out. Speed is WAY UP these past two days. > - two of the tables are updated via on-line transaction from our users. > Probably a couple of hundred records a day , tops. > - records eligible for archiving are "SELECT INTO" and "DELETE"d > about once a quarter. typically 1000 or so. These records are so few that I don't even see why you bother archiving them. Gerry : builds up over time, and a novice's understanding that "fewer records means faster table". PLus, occasionally, we delete 20-30 thousand records as an entire Brokerage cancels off. > - the rest of the tables are updated from the Mainframe batch cycle, at > about 3:00 am local time. > - these records NEVER get updated, just more added each day. Good. > - the problem tables have about 800,000 records in them. > - the daily table getting 3000-5000 new records a day. Not really much data added. > - the non-daily table is DROPd and rebuilt monthly. Why do you drop it? Why not truncate it instead? In particular if you ever decide to use views, dropping the tables will be a problem unless you recreate the views too. Gerry : The monthend process that updates the second large table (pol_xref_b) gets updates of 20-90 thousand records at a time. That involved, in 1st implementation (running on postgeSQL 6.5) we issued a DELETE for the older versions of these records, and loading these records. By weekend's completion, you're looking at 780,000 deleted records of 800,000, and then loading in another 780,000 (plus grows with time) records. The VACUUM Sunday night would run 24 hours, locking the Database Monday morning. So 2nd implementation, (ready the next month ,) was to DROP the table, rebuild it, and then just let it load. the 20,000 lost records were for "old" clients, and not considered a primary worry, certainly less so than locking the database during Office hours. Gerry : What is "truncate" ?? Gerry : Not what are "Views", but how to use them, and install the? That's a DB 101 question, and retorical. PostgreSQL for us was a freeware start at SQL. As we learn things, and need things, we'll either find out how Postgres does them, or jump to a commercial DB where we can hire contractor's for turn key programming (including some knowledge transfer of WHY they built whatever they might design). Room for a plug here, anyone know of commercial Postrges support? > VACUUM is run nightly, at about 7:00 pm. Just plain vacuum? You may want to do a vacuum full. Gerry : not available in 7.1.3 If all it does is free up disk space, not critical at this time. And the team has more things to do. Fixing something that isn't "broken" goes down in priority. (Sigh). >This is to make the system ready >for UNIX back-ups, at about 9:30 pm. What do you mean by Unix backups? You should NOT be backing up the database from the OS. You should do a pg_dump and then backup that file only. Gerry : Current back-up / restore plan is, should something happen that the RAID-5 can't handle, we'ld restore the entire $PGDIR. Yes it might have changed during back-up, but we'll take that chance. Interesting that 7.1 changed unix file names from table names to meaningless numbers, thus hiding and preventing the ability to individually restore damaged tables. From what I've seen, there's enough other data in other files that this is an ABSOLUTE NO NO. Gerry : Unix back-ups are generic, full on Sunday, incremental other nights. Postgres resides in a UFS. Implementing a pg_dump cron-script makes sense, given the above comments (yours and mine). Also, in the past, the VACUUM has > taken hours, and seems to lock the database (not just the table). Thus > we're reluctant to run VACUUM at 5:00 am, when daily batch finishes. That is no longer an issue with 7.2 and higher. Run your vacuum analyze right after your loads and before you start doing any large computations. If those computations create additional files or modify substantial number of records do yet another vacuum analyze. Gerry : That puts 7.2.1 on the "mid-burner" for us at this point. Better than "back-burner", like too many other things. Also, our first VACUUM ANALYZE, run Wednesday night took about 5 hours for all DBs. Thursday night VACUUM ANALYZE took the usual half hour or so. So your point of VACUUM ANALYZE following 5:00 am batch updates, prior to on-line usage makes sense, and would help end-users. Thanks. > The primary reason for the hours of VACUUM was the non-daily table , above, > back when we DELETEd most records, and then reloaded over two days. > Deleting 780,000 records of an 800,000 row table creates an IMMENSELY long > VACUUM. Vacuum analyze is fairly quick. Gerry : Noted, and now seen, even in 7.1.3 ! :-) >Hence our DROP and rebuild. You can truncate instead. Near instantaneous and doesn't leave any records behind. Gerry : I just read about TRUNCATE in the 7.1.3 on-line manual. I'll definitly try that next in lieu of DROP next month-end. The worry is that the VACUUM of the emptied table may take a long time. The DROP, rebuild PREVENTS this, as there is NOTHING to VACUUM, just a new, empty table. Something has to "free up" those rows, and if it's VACUUM, then I'm looking at 24 hours again. Any comments? Or is that my postgres 6.5 experience and not relevant to postgreSQL 7.x ? Gerry : btw what do people LIKE to refer to Postgres as in writing? Postgres, postgres, postgreSQL (one s or two?), pgsql? > I suspect that what I should do is : > 1) continue nightly VACUUM's at about 7:00 pm I think you should do a vacuum full. Gerry : Agree, but not available until we upgrade to 7.2.1 > 2) add a weekly VACUUM FULL ANALYZE to be run Saturday evening, ensuring > its ready for Sunday FULL disk back-ups. Again you should not be backing up the database files, but the output of pg_dump. Gerry : Thanks. Such a cron-script has been on my mind for a while. RAID-5 is our primary protection. restore is our secondary. Clearly, I can't count on tape restore of $PGDIR/data/base, so I'll work on said script , when I have time (ouch sigh). > Oh, we're a 16/5 operation, with some people dialing in or coming in on > weekends. 24/7 is NOT a requirement. Good. It would help you a lot if you do nightly vacuum fulls. Specially if space is an issue and/or you have many queries which do sequential scans. Gerry : We CERTAINLY have MANY MANY queries, do they create space problems? Or are we referring to the Optimizer's poor choice of sequential vs index scans (now solved since I implemented VACUUM ANALYZE)? Disk space, whilst cheap, is NOT unlimited. SO if the QUERIES are wasting space, then a VACUUM FULL would be useful. Poor man's version, pg_dump. pg_ctl stop. deletedb "x"; createdb "x", pg_ctl start, reload, VACUUM ANALYZE. There's a weekend I'ld rather be camping. Gerry Smit, Toronto, Canada (Go Leafs Go!) (For gawd's sake, go! Begone!)
pgsql-novice by date: