Thread: please please please PLEASE help!
Hi, I've asked this question a couple of times before on this forum but no one seems to be nice enough to point me to the right direction or help me out with any information, if possible. Please help me out with this because this is a very serious issue for me and I need to learn more about this. And here it is again: I've been running postgres on my server for over a year now and the tables have become huge. I have 3 tables that have data over 10GB each and these tables are read very very frequently. In fact, heavy searches on these tables are expected every 2 to 3 minutes. This unfortunately gives a very poor response time to the end user and so I'm looking at other alternatives now. Currently, the postgresql installation is on a single disk and so all the tables have their data read from a single disk. Searching on different tables by multiple users at the same time results in very slow searches, as it's mainly dependant on the spindle speed. I recently gained access to another server which has 3 SCSI disks. I know there is a way to mirror the tables across the three different disks but I'm not sure if it's as easy as symlinking the files (WAL files only?) across. Can anyone please tell me what to do here and how to harness the power of the three SCSI drives that I have. Which files in the data directory need to be moved? Is this safe? Can backups etc be easily done? Any information will be greatly appreciated. Thank you, Steve
Steve, I'm a little short on PostgreSQL experience, but gaining fast. I'm more of an Oracle nut. But I think you've pin pointedthe problem, namely disk contention. The easiest solution I can think of would be to stripe the volume group acrossall three drives thereby spreading the pain across the drives. One other item I'd look at is creating an index thatspecifically answers the most frequent queries. Depending on your OS, you could mirror the data across two of the drivesat the OS level, which would also spread the pain. Another solution, but it has a price tag on it, is to acquire anexternal disk array. These arrays have memory that they use as additional buffers. The cache gets populated with themost frequent accesses data & then your not limited by the drives anymore. Dick Goulet Senior Oracle DBA Oracle Certified 8i DBA -----Original Message----- From: Steve [mailto:steve@hotmail.com] Sent: Friday, July 23, 2004 2:11 AM To: pgsql-admin@postgresql.org Subject: [ADMIN] please please please PLEASE help! Hi, I've asked this question a couple of times before on this forum but no one seems to be nice enough to point me to the right direction or help me out with any information, if possible. Please help me out with this because this is a very serious issue for me and I need to learn more about this. And here it is again: I've been running postgres on my server for over a year now and the tables have become huge. I have 3 tables that have data over 10GB each and these tables are read very very frequently. In fact, heavy searches on these tables are expected every 2 to 3 minutes. This unfortunately gives a very poor response time to the end user and so I'm looking at other alternatives now. Currently, the postgresql installation is on a single disk and so all the tables have their data read from a single disk. Searching on different tables by multiple users at the same time results in very slow searches, as it's mainly dependant on the spindle speed. I recently gained access to another server which has 3 SCSI disks. I know there is a way to mirror the tables across the three different disks but I'm not sure if it's as easy as symlinking the files (WAL files only?) across. Can anyone please tell me what to do here and how to harness the power of the three SCSI drives that I have. Which files in the data directory need to be moved? Is this safe? Can backups etc be easily done? Any information will be greatly appreciated. Thank you, Steve ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Steve, I'm not as much of an expert on PostgreSQL as others, but at least I have SOME answer. The way to leverage the multi-disk capability is to use RAID, either software or hardware, rather than trying to twist the PostgreSQL structure and files to do so. RAID (I forget what the acronym stands for) is basically a number of methods to provide increased reliability and/or performance from multiple disks, while having them appear to the programs (including DB programs) as a single disk. You will want to look for RAID information elsewhere, especially since you should weigh the pros and cons of the various types of RAID, but basically what you will want is to have your data pread out uniformly across the disks so that any writes and reads (in your case, especially the reads) are done in parallel on all three devices, thus significantly cutting the access time. Another solution that MIGHT help is increasing the memory. If a heavily-used index can be cached in RAM it will speed searches up a lot... I would recommend that you also study ways to redistribute your data and/or add better indices. Keep in mind that an index will be quite large for a big table, and may not be used to its full extent even when it might seem that it should (at least in my experience). You can also use EXPLAIN and EXPLAIN ANALYZE to find out where your bottlenecks are, to see what else can be done. I hope this helps, and that the last paragraph didn't come across as patronizing. I'm aware that maybe you already did all that, but I can't know -- and if you didn't, then it would be a shame not to mention it! Andrew Steve wrote: > Hi, > > I've asked this question a couple of times before on this forum but no > one seems to be nice enough to point me to the right direction or help > me out with any information, if possible. Please help me out with this > because this is a very serious issue for me and I need to learn more > about this. And here it is again: > > I've been running postgres on my server for over a year now and the > tables have become huge. I have 3 tables that have data over 10GB each > and these tables are read very very frequently. In fact, heavy > searches on these tables are expected every 2 to 3 minutes. This > unfortunately gives a very poor response time to the end user and so > I'm looking at other alternatives now. > > Currently, the postgresql installation is on a single disk and so all > the tables have their data read from a single disk. Searching on > different tables by multiple users at the same time results in very > slow searches, as it's mainly dependant on the spindle speed. I > recently gained access to another server which has 3 SCSI disks. I > know there is a way to mirror the tables across the three different > disks but I'm not sure if it's as easy as symlinking the files (WAL > files only?) across. Can anyone please tell me what to do here and how > to harness the power of the three SCSI drives that I have. Which files > in the data directory need to be moved? Is this safe? Can backups etc > be easily done? Any information will be greatly appreciated. Thank you, > > > Steve > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > >
Steve, Are the three SCSI drives raided? If so, I would move the operating database to that machine. From what I understand it can be a real hassle to setup/maintain sym-linked tables. If you don't have the option to move the database, I would explore one of these two routes: Route one is replication. Slony1 was just released to the open source community. Replicate the data to the new machine, then distribute the database reads between the master and the slave. Route two is symlink database logs/txlogs/etc to the new machine. It takes extra write needs off of your main machine. Not as effective as route 1, I would say, but it would definately lessen the load. Kris >Hi, > >I've asked this question a couple of times before on this forum but no >one seems to be nice enough to point me to the right direction or help >me out with any information, if possible. Please help me out with this >because this is a very serious issue for me and I need to learn more >about this. And here it is again: > >I've been running postgres on my server for over a year now and the >tables have become huge. I have 3 tables that have data over 10GB each >and these tables are read very very frequently. In fact, heavy searches >on these tables are expected every 2 to 3 minutes. This unfortunately >gives a very poor response time to the end user and so I'm looking at >other alternatives now. > >Currently, the postgresql installation is on a single disk and so all >the tables have their data read from a single disk. Searching on >different tables by multiple users at the same time results in very slow >searches, as it's mainly dependant on the spindle speed. I recently >gained access to another server which has 3 SCSI disks. I know there is >a way to mirror the tables across the three different disks but I'm not >sure if it's as easy as symlinking the files (WAL files only?) across. >Can anyone please tell me what to do here and how to harness the power >of the three SCSI drives that I have. Which files in the data directory >need to be moved? Is this safe? Can backups etc be easily done? Any >information will be greatly appreciated. Thank you, > > >Steve > > >
Another thought - We had a similar issue recently. Our support guys dropped the database and then rebuilt it from a dump file. The size of the data directory went down from 12GB to less than 2GB. According to the sys ad that did the work postgres is not very good a reclaiming disk space after large quantities of tuples are deleted over time. HTH -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org]On Behalf Of Kris Kiger Sent: 29 July 2004 16:17 To: pgsql-admin@postgresql.org Subject: Re: [ADMIN] please please please PLEASE help! Steve, Are the three SCSI drives raided? If so, I would move the operating database to that machine. From what I understand it can be a real hassle to setup/maintain sym-linked tables. If you don't have the option to move the database, I would explore one of these two routes: Route one is replication. Slony1 was just released to the open source community. Replicate the data to the new machine, then distribute the database reads between the master and the slave. Route two is symlink database logs/txlogs/etc to the new machine. It takes extra write needs off of your main machine. Not as effective as route 1, I would say, but it would definately lessen the load. Kris >Hi, > >I've asked this question a couple of times before on this forum but no >one seems to be nice enough to point me to the right direction or help >me out with any information, if possible. Please help me out with this >because this is a very serious issue for me and I need to learn more >about this. And here it is again: > >I've been running postgres on my server for over a year now and the >tables have become huge. I have 3 tables that have data over 10GB each >and these tables are read very very frequently. In fact, heavy searches >on these tables are expected every 2 to 3 minutes. This unfortunately >gives a very poor response time to the end user and so I'm looking at >other alternatives now. > >Currently, the postgresql installation is on a single disk and so all >the tables have their data read from a single disk. Searching on >different tables by multiple users at the same time results in very slow >searches, as it's mainly dependant on the spindle speed. I recently >gained access to another server which has 3 SCSI disks. I know there is >a way to mirror the tables across the three different disks but I'm not >sure if it's as easy as symlinking the files (WAL files only?) across. >Can anyone please tell me what to do here and how to harness the power >of the three SCSI drives that I have. Which files in the data directory >need to be moved? Is this safe? Can backups etc be easily done? Any >information will be greatly appreciated. Thank you, > > >Steve > > > ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend
Hi, I don't want to offense you asking if you run vacuum full regullary. I suppose that the 10Gb are plain of usefull data, aren't they ? Luc ----- Original Message ----- From: "Steve" <steve@hotmail.com> To: <pgsql-admin@postgresql.org> Sent: Friday, July 23, 2004 8:10 AM Subject: [ADMIN] please please please PLEASE help! > Hi, > > I've asked this question a couple of times before on this forum but no > one seems to be nice enough to point me to the right direction or help > me out with any information, if possible. Please help me out with this > because this is a very serious issue for me and I need to learn more > about this. And here it is again: > > I've been running postgres on my server for over a year now and the > tables have become huge. I have 3 tables that have data over 10GB each > and these tables are read very very frequently. In fact, heavy searches > on these tables are expected every 2 to 3 minutes. This unfortunately > gives a very poor response time to the end user and so I'm looking at > other alternatives now. > > Currently, the postgresql installation is on a single disk and so all > the tables have their data read from a single disk. Searching on > different tables by multiple users at the same time results in very slow > searches, as it's mainly dependant on the spindle speed. I recently > gained access to another server which has 3 SCSI disks. I know there is > a way to mirror the tables across the three different disks but I'm not > sure if it's as easy as symlinking the files (WAL files only?) across. > Can anyone please tell me what to do here and how to harness the power > of the three SCSI drives that I have. Which files in the data directory > need to be moved? Is this safe? Can backups etc be easily done? Any > information will be greatly appreciated. Thank you, > > > Steve > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
"Rob Bamber" wrote: > Another thought - > > We had a similar issue recently. Our support guys dropped the > database and then rebuilt it from a dump file. The size of the data > directory went down from 12GB to less than 2GB. According to the sys > ad that did the work postgres is not very good a reclaiming disk space > after large quantities of tuples are deleted over time. And another thought, Have you tried clustering your tables on the most frequently used and/or time-important index in your joins? (remember to VACUUM ANALYZE after the cluster has completed) You might find a huge performance increase. Clustering should also fix the problem mentioned above because the table is physically re-written on disk. Use EXPLAIN ANALYZE to find out which is the most time-consuming part of your query and optimise that. Also, have you analyzed your database recently? If you've never run VACUUM ANALYZE your query planner's statistics have never been updated so the query planner might not be making the best choices. Ian -- Ian Freislich
Kris Kiger wrote: > Steve, > > Are the three SCSI drives raided? If so, I would move the operating > database to that machine. From what I understand it can be a real > hassle to setup/maintain sym-linked tables. If you don't have the > option to move the database, I would explore one of these two routes: > Route one is replication. Slony1 was just released to the open > source community. Replicate the data to the new machine, then > distribute the database reads between the master and the slave. > Route two is symlink database logs/txlogs/etc to the new machine. It > takes extra write needs off of your main machine. Not as effective as > route 1, I would say, but it would definately lessen the load. > > Kris > Thanks for replying Kris. Yes, the SCSI drives are raided. I believe in that case I don't really need to do anything other than moving the database to that machine? I think the drives are striped every 12K or so, but I'm not sure. Moving it to the new machine has made it a lot faster than before. It has around 2GB RAM and is a dual processor 3.0 Ghz Intel xeon. I've also 'chattr -R +A /path/to/data/dir' the database and I think it has made a slight difference. I'll test it a bit more and see if I can do something else to improve the speed. Steve
Rob Bamber wrote: > Another thought - > > We had a similar issue recently. Our support guys dropped the database and > then rebuilt it from a dump file. The size of the data directory went down > from 12GB to less than 2GB. According to the sys ad that did the work > postgres is not very good a reclaiming disk space after large quantities of > tuples are deleted over time. > > HTH > That's because you need to 'VACUUM [FULL | ANALYZE]' the database frequently. For example, in our case there are about 2000 updates/inserts and around 50 'deletes' every 3 minutes . 'Update' and 'delete' operations are most expensive (in terms of disk space utilization) as they tend to keep the updated/deleted tuples for a longer period of time, until VACUUM is run. I've set it up so that a daemon runs VACUUM ANALYZE every 6 minutes or so, and then a full vacuum, i.e. VACUUM FULL ANALYZE, every 30 mins. As far as I know, in postgresql 7.4, a simple VACUUM operation doesn't block read operations and so occassianlly running VACUUM on the most 'popular' tables would be a good idea. I'm sure in your case the frequency of calling vacuum could vary, but that is basically why the database reduced to 2GB from 12GB. VACUUM basically deletes unwanted tuples and indexes and so 'compresses' the amount of disk space used (and so effectively speeding up queries two to three orders of magnitude). Steve