Thread: Tuning PostgreSQL
Hi guys,
I am new to PostgreSQL and have done some "extensive" research already. If you could give me some advice/confirmation, I would be really grateful.
I am going to build a PostgreSQL database server for a client. This database will contain many tables (over 100, maybe more), with some of them containing over 1 million records pretty soon. This database will be accessed via a desktop application (Windows) and a web application (PHP/Apache). There may be over 100 people accessing the database at any given time (selecting, inserting and updating), but I don't think it will be a lot more than 100 at any given time. Most of the time, it will be less.
What I am thinking about is buying a server with the following specifications:
* 1 or 2 Intel Xeon processors (2.4 GHz).
* 2 Gigabytes of RAM (DDR/ECC).
* Three 36Gb SCSI160 disks (10.000rpm) in a RAID-5 config, giving 72Gb storage space (right?). The RAID-5 controller has a (hardware) cache of 128Mb.
* 100Mbit ethernet.
I will run RedHat Linux 9 (kernel 2.40) with PostgreSQL 7.3.3 on this server.
What would you think of this hardware config? Would it do? Of would 4Gb RAM be a lot better? What do you think about the need for two Xeon procs?
Finally, I have some questions about postgresql.conf (who doesnt?). After some research, I think I will go for the following settings initially. Maybe benchmarking will lead to other values, but I think these settings will be a fine starting point :
shared_buffers = 6000 (kernel.shmmax = 60000000)
sort_mem = 4096
max_connections = 150
vacuum_mem = 65536
What do you think of these settings? Do you have any other hints for optimizing PostgreSQL
Many many thanks in advance :)
On 21 Jul 2003 at 10:31, Alexander Priem wrote: > What I am thinking about is buying a server with the following specifications: > > * 1 or 2 Intel Xeon processors (2.4 GHz). > * 2 Gigabytes of RAM (DDR/ECC). > * Three 36Gb SCSI160 disks (10.000rpm) in a RAID-5 config, giving 72Gb storage > space (right?). The RAID-5 controller has a(hardware) cache of 128Mb. > * 100Mbit ethernet. > > I will run RedHat Linux 9 (kernel 2.40) with PostgreSQL 7.3.3 on this server. You might scale down a little on hardware front if required. Of course, if you can get it, get it. > What would you think of this hardware config? Would it do? Of would 4Gb RAM be > a lot better? What do you think about the need for two Xeon procs? I would say get an SMP board with one processor in it. If requierd you can upgrade. I suppose that would make hefty difference in price. > shared_buffers = 6000(kernel.shmmax = 60000000) > sort_mem = 4096 > max_connections = 150 > vacuum_mem = 65536 effective_cache_size noatime for data partition A good filesystem. WAL on separate drive. Now that is a good start.. Bye Shridhar -- QOTD: "I'm on a seafood diet -- I see food and I eat it."
Hi Alexander , On 21 Jul 2003 at 11:23, Alexander Priem wrote: > So the memory settings I specified are pretty much OK? As of now yes, You need to test with these settings and make sure that they perform as per your requirement. That tweaking will always be there... > What would be good guidelines for setting effective_cache_size, noatime ? I suggest you look at http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html. That should help you. > I use ext3 filesystem, which probably is not the best performer, is it? No. You also need to check ext2, reiser and XFS. There is no agreement between users as in what works best. You need to benchmark and decide. > I will set the WAL on a separate drive. What do I need to change in the conf > files to achive this? No. You need to shutdown postgresql server process and symlink WAL and clog directories in postgresql database cluster to another place. That should do it. HTH Bye Shridhar -- Meade's Maxim: Always remember that you are absolutely unique, just like everyone else.
Thanks, I will look at the site you sent me and purchase some hardware. Then I will run some benchmarks. Kind regards, Alexander. ----- Original Message ----- From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> To: "Alexander Priem" <ap@cict.nl> Cc: <pgsql-performance@postgresql.org> Sent: Monday, July 21, 2003 11:33 AM Subject: Re: [PERFORM] Tuning PostgreSQL > Hi Alexander , > > On 21 Jul 2003 at 11:23, Alexander Priem wrote: > > So the memory settings I specified are pretty much OK? > > As of now yes, You need to test with these settings and make sure that they > perform as per your requirement. That tweaking will always be there... > > > What would be good guidelines for setting effective_cache_size, noatime ? > > I suggest you look at > http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html. > > That should help you. > > > I use ext3 filesystem, which probably is not the best performer, is it? > > No. You also need to check ext2, reiser and XFS. There is no agreement between > users as in what works best. You need to benchmark and decide. > > > I will set the WAL on a separate drive. What do I need to change in the conf > > files to achive this? > > No. You need to shutdown postgresql server process and symlink WAL and clog > directories in postgresql database cluster to another place. That should do it. > > HTH > > Bye > Shridhar > > -- > Meade's Maxim: Always remember that you are absolutely unique, just like everyone else. >
Shridhar Daithankar wrote: > On 21 Jul 2003 at 11:23, Alexander Priem wrote: >>I use ext3 filesystem, which probably is not the best performer, is it? > > No. You also need to check ext2, reiser and XFS. There is no agreement between > users as in what works best. You need to benchmark and decide. Need? Maybe I'm a bit disillusioned, but are the performances between the filesystems differ so much as to warrant the additional effort? (e.g. XFS doesn't come with Red Hat 9 -- you'll have to patch the source, and compile it yourself). Benchmarking it properly before deployment is tough: are the test load on the db/fs representative of actual load? Is 0.5% reduction in CPU usage worth it? Did you test for catastrophic failure by pulling the plug during write operations (ext2) to test if the fs can handle it? Is the code base for the particular fs stable enough? Obscure bugs in the fs? For the record, we tried several filesystems, but stuck with 2.4.9's ext3 (Red Hat Advanced Server). Didn't hit a load high enough for the filesystem choices to matter after all. :( -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 5:30pm up 207 days, 8:35, 5 users, load average: 5.33, 5.16, 5.21
Attachment
On 21 Jul 2003 at 18:09, Ang Chin Han wrote: > Shridhar Daithankar wrote: > > On 21 Jul 2003 at 11:23, Alexander Priem wrote: > > >>I use ext3 filesystem, which probably is not the best performer, is it? > > > > No. You also need to check ext2, reiser and XFS. There is no agreement between > > users as in what works best. You need to benchmark and decide. > > Need? Maybe I'm a bit disillusioned, but are the performances between > the filesystems differ so much as to warrant the additional effort? > (e.g. XFS doesn't come with Red Hat 9 -- you'll have to patch the > source, and compile it yourself). Well, the benchmarking is not to prove which filesystem is fastest and feature rich but to find out which one suits your needs best. > Benchmarking it properly before deployment is tough: are the test load > on the db/fs representative of actual load? Is 0.5% reduction in CPU > usage worth it? Did you test for catastrophic failure by pulling the > plug during write operations (ext2) to test if the fs can handle it? Is > the code base for the particular fs stable enough? Obscure bugs in the fs? Well, that is what that 'benchmark' is supposed to find out. Call it pre- deployment testing or whatever other fancy name one sees fit. But it is a must in almost all serious usage. > For the record, we tried several filesystems, but stuck with 2.4.9's > ext3 (Red Hat Advanced Server). Didn't hit a load high enough for the > filesystem choices to matter after all. :( Good for you. You have time at hand to find out which one suits you best. Do the testing before you have load that needs another FS..:-) Bye Shridhar -- It would be illogical to assume that all conditions remain stable. -- Spock, "The Enterprise" Incident", stardate5027.3
Shridhar Daithankar wrote: > Good for you. You have time at hand to find out which one suits you best. Do > the testing before you have load that needs another FS..:-) Kinda my point is that when we've more load, we'd be using RAID-0 over RAID-5, or getting faster SCSI drives, or even turn fsync off if that's a bottleneck, because the different filesystems do not have that much performance difference[1] -- the filesystem is not a bottleneck. Just need to tweak most of them a bit, like noatime,data=writeback. [1] That is, AFAIK, from our testing. Please, please correct me if I'm wrong: has anyone found that different filesystems produces wildly different performance for postgresql, FreeBSD's filesystems not included? -- Linux homer 2.4.18-14 #1 Wed Sep 4 13:35:50 EDT 2002 i686 i686 i386 GNU/Linux 7:00pm up 207 days, 10:05, 5 users, load average: 5.00, 5.03, 5.06
Attachment
On 21 Jul 2003 at 19:27, Ang Chin Han wrote: > [1] That is, AFAIK, from our testing. Please, please correct me if I'm > wrong: has anyone found that different filesystems produces wildly > different performance for postgresql, FreeBSD's filesystems not included? well, when postgresql starts splitting table files after a gig, filesystem sure makes difference. IIRC, frommy last test XFS was at least 10-15% faster than reiserfs for such databases. That was around an year back, with mandrake 8.0. Bye Shridhar -- modesty, n.: Being comfortable that others will discover your greatness.
So where can I set the noatime & data=writeback variables? They are not PostgreSQL settings, but rather Linux settings, right? Where can I find these? Kind regards, Alexander Priem. ----- Original Message ----- From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> To: <pgsql-performance@postgresql.org> Sent: Monday, July 21, 2003 12:31 PM Subject: Re: [PERFORM] Tuning PostgreSQL > On 21 Jul 2003 at 18:09, Ang Chin Han wrote: > > > Shridhar Daithankar wrote: > > > On 21 Jul 2003 at 11:23, Alexander Priem wrote: > > > > >>I use ext3 filesystem, which probably is not the best performer, is it? > > > > > > No. You also need to check ext2, reiser and XFS. There is no agreement between > > > users as in what works best. You need to benchmark and decide. > > > > Need? Maybe I'm a bit disillusioned, but are the performances between > > the filesystems differ so much as to warrant the additional effort? > > (e.g. XFS doesn't come with Red Hat 9 -- you'll have to patch the > > source, and compile it yourself). > > Well, the benchmarking is not to prove which filesystem is fastest and feature > rich but to find out which one suits your needs best. > > > Benchmarking it properly before deployment is tough: are the test load > > on the db/fs representative of actual load? Is 0.5% reduction in CPU > > usage worth it? Did you test for catastrophic failure by pulling the > > plug during write operations (ext2) to test if the fs can handle it? Is > > the code base for the particular fs stable enough? Obscure bugs in the fs? > > Well, that is what that 'benchmark' is supposed to find out. Call it pre- > deployment testing or whatever other fancy name one sees fit. But it is a must > in almost all serious usage. > > > For the record, we tried several filesystems, but stuck with 2.4.9's > > ext3 (Red Hat Advanced Server). Didn't hit a load high enough for the > > filesystem choices to matter after all. :( > > Good for you. You have time at hand to find out which one suits you best. Do > the testing before you have load that needs another FS..:-) > > Bye > Shridhar > > -- > It would be illogical to assume that all conditions remain stable. -- Spock, "The Enterprise" Incident", stardate 5027.3 > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match
On 21 Jul 2003 at 13:45, Alexander Priem wrote: > So where can I set the noatime & data=writeback variables? They are not > PostgreSQL settings, but rather Linux settings, right? Where can I find > these? These are typicaly set in /etc/fstab.conf. These are mount settings. man mount for more details. The second setting data=writeback is ext3 specific, IIRC. HTH Bye Shridhar -- History tends to exaggerate. -- Col. Green, "The Savage Curtain", stardate 5906.4
Thanks, i'll look further into these mount setting. I was just thinking, the server will have a (RAID) controller containing 128Mb of battery-backed cache memory. This would really speed up inserts to the disk and would prevent data loss in case of a power-down also. What would you guys think of not using RAID5 in that case, but just a really fast 15.000 rpm SCSI-320 disk? Kind regards, Alexander. ----- Original Message ----- From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> To: <pgsql-performance@postgresql.org> Sent: Monday, July 21, 2003 2:05 PM Subject: Re: [PERFORM] Tuning PostgreSQL > On 21 Jul 2003 at 13:45, Alexander Priem wrote: > > > So where can I set the noatime & data=writeback variables? They are not > > PostgreSQL settings, but rather Linux settings, right? Where can I find > > these? > > These are typicaly set in /etc/fstab.conf. These are mount settings. man mount > for more details. > > The second setting data=writeback is ext3 specific, IIRC. > > HTH > > Bye > Shridhar > > -- > History tends to exaggerate. -- Col. Green, "The Savage Curtain", stardate > 5906.4 > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> What would you guys think of not using RAID5 in that case, but just a really > fast 15.000 rpm SCSI-320 disk? I'd say you must be able to tolerate losing all the data since your last database backup. Your battery backed cache, rotationalspeed, and transfer rate aren't going to help at all when the drive itself degrades and corrupts data. If youcan really only afford 3 drives, I'd have a single drive with the OS & WAL on it, and the data on a RAID-1 mirror setusing the other 2 drives. If you need more space for data, or want your OS drives to be mirrored - it's going to costmore. See if you can get 2x18GB drives for the OS and 2x73GB drives for the data. You have to consider how much headache that small amount of additional money is going to save you (and your users) down theroad. Roman -----Original Message----- From: Alexander Priem [mailto:ap@cict.nl] Sent: Mon 7/21/2003 5:43 AM To: shridhar_daithankar@persistent.co.in; pgsql-performance@postgresql.org Cc: Subject: Re: [PERFORM] Tuning PostgreSQL Thanks, i'll look further into these mount setting. I was just thinking, the server will have a (RAID) controller containing 128Mb of battery-backed cache memory. This would really speed up inserts to the disk and would prevent data loss in case of a power-down also. What would you guys think of not using RAID5 in that case, but just a really fast 15.000 rpm SCSI-320 disk? Kind regards, Alexander. ----- Original Message ----- From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> To: <pgsql-performance@postgresql.org> Sent: Monday, July 21, 2003 2:05 PM Subject: Re: [PERFORM] Tuning PostgreSQL > On 21 Jul 2003 at 13:45, Alexander Priem wrote: > > > So where can I set the noatime & data=writeback variables? They are not > > PostgreSQL settings, but rather Linux settings, right? Where can I find > > these? > > These are typicaly set in /etc/fstab.conf. These are mount settings. man mount > for more details. > > The second setting data=writeback is ext3 specific, IIRC. > > HTH > > Bye > Shridhar > > -- > History tends to exaggerate. -- Col. Green, "The Savage Curtain", stardate > 5906.4 > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
That's true, certainly, and with four disks (2x18 and 2x72 or 36), I would be able to (a) be safe and (b) split the data and WAL. Hmmm. Seems to me that this setup would be better than one RAID5 with three 36Gb disks, wouldn't you think so? With one RAID5 array, I would still have the data and the WAL on one volume... Thanks for all your help so far. Kind regards, Alexander Priem. ----- Original Message ----- From: "Roman Fail" <rfail@posportal.com> To: "Alexander Priem" <ap@cict.nl>; <shridhar_daithankar@persistent.co.in>; <pgsql-performance@postgresql.org> Sent: Monday, July 21, 2003 3:45 PM Subject: Re: [PERFORM] Tuning PostgreSQL > > What would you guys think of not using RAID5 in that case, but just a really > > fast 15.000 rpm SCSI-320 disk? > > > I'd say you must be able to tolerate losing all the data since your last database backup. Your battery backed cache, rotational speed, and transfer rate aren't going to help at all when the drive itself degrades and corrupts data. If you can really only afford 3 drives, I'd have a single drive with the OS & WAL on it, and the data on a RAID-1 mirror set using the other 2 drives. If you need more space for data, or want your OS drives to be mirrored - it's going to cost more. See if you can get 2x18GB drives for the OS and 2x73GB drives for the data. > > You have to consider how much headache that small amount of additional money is going to save you (and your users) down the road. > > Roman > > -----Original Message----- > From: Alexander Priem [mailto:ap@cict.nl] > Sent: Mon 7/21/2003 5:43 AM > To: shridhar_daithankar@persistent.co.in; pgsql-performance@postgresql.org > Cc: > Subject: Re: [PERFORM] Tuning PostgreSQL > > > > Thanks, i'll look further into these mount setting. > > I was just thinking, the server will have a (RAID) controller containing > 128Mb of battery-backed cache memory. This would really speed up inserts to > the disk and would prevent data loss in case of a power-down also. > > What would you guys think of not using RAID5 in that case, but just a really > fast 15.000 rpm SCSI-320 disk? > > Kind regards, > Alexander. > > > ----- Original Message ----- > From: "Shridhar Daithankar" <shridhar_daithankar@persistent.co.in> > To: <pgsql-performance@postgresql.org> > Sent: Monday, July 21, 2003 2:05 PM > Subject: Re: [PERFORM] Tuning PostgreSQL > > > > On 21 Jul 2003 at 13:45, Alexander Priem wrote: > > > > > So where can I set the noatime & data=writeback variables? They are not > > > PostgreSQL settings, but rather Linux settings, right? Where can I find > > > these? > > > > These are typicaly set in /etc/fstab.conf. These are mount settings. man > mount > > for more details. > > > > The second setting data=writeback is ext3 specific, IIRC. > > > > HTH > > > > Bye > > Shridhar > > > > -- > > History tends to exaggerate. -- Col. Green, "The Savage Curtain", stardate > > 5906.4 > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Alexander, > Hmmm. Seems to me that this setup would be better than one RAID5 with three > 36Gb disks, wouldn't you think so? With one RAID5 array, I would still have > the data and the WAL on one volume... Definitely. As I've said, my experience with RAID5 is that with less than 5 disks, it performs around 40% of a single scsi disk for large read-write operation on Postgres. If you have only 3 disks, I'd advocate one disk for WAL and one RAID 1 array for the database. -- Josh Berkus Aglio Database Solutions San Francisco
On 2003-07-21 09:06:10 -0700, Josh Berkus wrote: > Alexander, > > > Hmmm. Seems to me that this setup would be better than one RAID5 with three > > 36Gb disks, wouldn't you think so? With one RAID5 array, I would still have > > the data and the WAL on one volume... > > Definitely. As I've said, my experience with RAID5 is that with less than 5 > disks, it performs around 40% of a single scsi disk for large read-write > operation on Postgres. > > If you have only 3 disks, I'd advocate one disk for WAL and one RAID 1 array > for the database. > In this setup your database is still screwed if a single disk (the WAL disk) stops working. You'll have to revert to your last backup if this happens. The RAID-1 redundancy on your data disks buys you almost nothing: marginally better performance and no real redundancy should a single disk fail. I'd use RAID-5 if you absolutely cannot use more disks, but I would use RAID-10 or two RAID-1 partitions if you can afford to use 4 disks. Vincent van Leeuwen Media Design - http://www.mediadesign.nl/
Hi all, Vincent, You said that using RAID1, you don't have real redundancy. But RAID1 is mirroring, right? So if one of the two disks should fail, there should be no data lost, right? I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb drives. I don't know if I can get the money for this, but how would the following setup sound? Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL. Four 18Gb (15.000rpm) disks in RAID5 array for data. For the same amount of money, I could also get: Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL. Five/Six 36Gb (10.000rpm) disks in RAID5 array for data. Which would be the best of the above? The one with four 15k-rpm disks or the one with five/six 10k-rpm disks? Would these configs be better than all disks in one huge RAID5 array? There are so many possible configs with RAID....... Kind regards, Alexander Priem. ----- Original Message ----- From: "Vincent van Leeuwen" <pgsql.spam@vinz.nl> To: <pgsql-performance@postgresql.org> Sent: Monday, July 21, 2003 6:28 PM Subject: Re: [PERFORM] Tuning PostgreSQL > On 2003-07-21 09:06:10 -0700, Josh Berkus wrote: > > Alexander, > > > > > Hmmm. Seems to me that this setup would be better than one RAID5 with three > > > 36Gb disks, wouldn't you think so? With one RAID5 array, I would still have > > > the data and the WAL on one volume... > > > > Definitely. As I've said, my experience with RAID5 is that with less than 5 > > disks, it performs around 40% of a single scsi disk for large read-write > > operation on Postgres. > > > > If you have only 3 disks, I'd advocate one disk for WAL and one RAID 1 array > > for the database. > > > > In this setup your database is still screwed if a single disk (the WAL disk) > stops working. You'll have to revert to your last backup if this happens. The > RAID-1 redundancy on your data disks buys you almost nothing: marginally > better performance and no real redundancy should a single disk fail. > > I'd use RAID-5 if you absolutely cannot use more disks, but I would use > RAID-10 or two RAID-1 partitions if you can afford to use 4 disks. > > Vincent van Leeuwen > Media Design - http://www.mediadesign.nl/ > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On 2003-07-22 09:04:42 +0200, Alexander Priem wrote: > Hi all, > > Vincent, You said that using RAID1, you don't have real redundancy. But > RAID1 is mirroring, right? So if one of the two disks should fail, there > should be no data lost, right? > Right. But the proposal was a single disk for WAL, without redundancy, and I argued that wasn't really safe. RAID1 by itself is extremely safe, possibly even the safest RAID type there is. > I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb > drives. I don't know if I can get the money for this, but how would the > following setup sound? > > Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL. > Four 18Gb (15.000rpm) disks in RAID5 array for data. > Our own testing has shown that a 6 disk RAID-10 array is faster than what you describe. Of course, this is very much dependant on how much INSERT/UPDATES you generate (which taxes your WAL more), so your mileage may vary. > For the same amount of money, I could also get: > > Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL. > Five/Six 36Gb (10.000rpm) disks in RAID5 array for data. > It is said that a higher RPM is particularly useful for a WAL disk. So you might consider using two 18GB 15K rpm drives for a RAID-1 WAL disk (+OS and swap), and using 36GB 10K rpm disks in a RAID-5 array if you need that diskspace. > Which would be the best of the above? The one with four 15k-rpm disks or the > one with five/six 10k-rpm disks? > Would these configs be better than all disks in one huge RAID5 array? There > are so many possible configs with RAID....... > 15K rpm disks are significantly faster than 10K rpm disks. If your only concern is performance, buy 15K rpm disks. If you want more diskspace for your money, fall back to larger 10K rpm disks. I personally think seperate WAL disks are vastly overrated, since they haven't shown a big performance gain in our own tests. But as I have said, this is extremely dependant on the type of load you generate, so only your own tests can tell you what you should do in this respect. About RAID types: the fastest RAID type by far is RAID-10. However, this will cost you a lot of useable diskspace, so it isn't for everyone. You need at least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as much useable diskspace as possible and still want to be redundant. RAID-1 is very useful for small (2-disk) arrays. If you have the time and are settled on buying 6 disks, I'd test the following scenarios: - 6-disk RAID-10 array (should perform best) - 4-disk RAID-10 array containing data, 2-disk RAID-1 array for WAL, OS, etc - 4-disk RAID-5 array containing data, 2-disk RAID-1 array for WAL, OS, etc - 6-disk RAID-5 array (will probably perform worst) Hope this helps. Vincent van Leeuwen Media Design - http://www.mediadesign.nl/
On Mon, 2003-07-21 at 04:33, Shridhar Daithankar wrote: > Hi Alexander , > > On 21 Jul 2003 at 11:23, Alexander Priem wrote: [snip] > > I use ext3 filesystem, which probably is not the best performer, is it? > > No. You also need to check ext2, reiser and XFS. There is no agreement between > users as in what works best. You need to benchmark and decide. According to Jeremy Allison of SAMBA, ""They used ext3, which is one of the slowest filesystems on Linux," Allison said. "In a real comparative test, you would use XFS". http://www.linuxworld.com/story/32673.htm -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
Wow, I never figured how many different RAID configurations one could think of :) After reading lots of material, forums and of course, this mailing-list, I think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm each), one of those six disks will be a 'hot spare'. I will just put the OS, the WAL and the data one one volume. RAID10 is way to expensive :) If I understand correctly, this will give great read-performance, but less write-performance. But since this server will be equipped with an embedded RAID controller featuring 128Mb of battery-backed cache, I figure that this controller will negate that (at least somewhat). I will need to find out whether this cache can be configured so that it will ONLY cache WRITES, not READS.... Also because of this battery backed cache controller, I will go for the ext2 file system, mounted with 'noatime'. I will use a UPS, so I don't think I need the journaling of ext3. XFS is not natively supported by RedHat and I will go for the easy way here :) 1 Gb of RAM should be enough, I think. That is about the only point that almost everyone agrees on :) Do you think ECC is very important? The server I have in mind does not support it. Another one does, but is is about 1.000 euros more expensive :( One CPU should also be enough. As for postgresql.conf settings, I think I will start with the following : max_connections = 128 superuser_reserved_connections = 1 shared_buffers = 8192 max_fsm_relations = 1000 max_fsm_pages = 100000 wal_buffers = 32 sort_mem = 2048 vacuum_mem = 32768 effective_cache_size = 28672 (this one I'm not sure about, maybe this one needs to be higher) random_page_cost = 2 geq0_threshold = 20 This pretty much sums it up. What do you think about this config? It may not be the fastest, but a server like this will cost about 4750 euros, and that is including an Intel Xeon 2.4GHz cpu, redundant power supply, WITHOUT the UPS. Seems very reasonable to me... Kind regards, Alexander Priem. ----- Original Message ----- From: "Vincent van Leeuwen" <pgsql.spam@vinz.nl> To: <pgsql-performance@postgresql.org> Sent: Tuesday, July 22, 2003 11:40 AM Subject: Re: [PERFORM] Tuning PostgreSQL > On 2003-07-22 09:04:42 +0200, Alexander Priem wrote: > > Hi all, > > > > Vincent, You said that using RAID1, you don't have real redundancy. But > > RAID1 is mirroring, right? So if one of the two disks should fail, there > > should be no data lost, right? > > > > Right. But the proposal was a single disk for WAL, without redundancy, and I > argued that wasn't really safe. RAID1 by itself is extremely safe, possibly > even the safest RAID type there is. > > > I have been thinking some more. 18Gb drives are cheaper than 36 or 72Gb > > drives. I don't know if I can get the money for this, but how would the > > following setup sound? > > > > Two 18Gb (15.000rpm) disks in RAID1 array for Operating System + WAL. > > Four 18Gb (15.000rpm) disks in RAID5 array for data. > > > > Our own testing has shown that a 6 disk RAID-10 array is faster than what you > describe. Of course, this is very much dependant on how much INSERT/UPDATES > you generate (which taxes your WAL more), so your mileage may vary. > > > For the same amount of money, I could also get: > > > > Two 36Gb (10.000rpm) disks in RAID1 array for Operating System + WAL. > > Five/Six 36Gb (10.000rpm) disks in RAID5 array for data. > > > > It is said that a higher RPM is particularly useful for a WAL disk. So you > might consider using two 18GB 15K rpm drives for a RAID-1 WAL disk (+OS and > swap), and using 36GB 10K rpm disks in a RAID-5 array if you need that > diskspace. > > > Which would be the best of the above? The one with four 15k-rpm disks or the > > one with five/six 10k-rpm disks? > > Would these configs be better than all disks in one huge RAID5 array? There > > are so many possible configs with RAID....... > > > > 15K rpm disks are significantly faster than 10K rpm disks. If your only > concern is performance, buy 15K rpm disks. If you want more diskspace for your > money, fall back to larger 10K rpm disks. > > I personally think seperate WAL disks are vastly overrated, since they haven't > shown a big performance gain in our own tests. But as I have said, this is > extremely dependant on the type of load you generate, so only your own tests > can tell you what you should do in this respect. > > About RAID types: the fastest RAID type by far is RAID-10. However, this will > cost you a lot of useable diskspace, so it isn't for everyone. You need at > least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as > much useable diskspace as possible and still want to be redundant. RAID-1 is > very useful for small (2-disk) arrays. > > If you have the time and are settled on buying 6 disks, I'd test the following > scenarios: > - 6-disk RAID-10 array (should perform best) > - 4-disk RAID-10 array containing data, 2-disk RAID-1 array for WAL, OS, etc > - 4-disk RAID-5 array containing data, 2-disk RAID-1 array for WAL, OS, etc > - 6-disk RAID-5 array (will probably perform worst) > > > Hope this helps. > > Vincent van Leeuwen > Media Design - http://www.mediadesign.nl/ > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster
Wow, I never figured how many different RAID configurations one could think of :) After reading lots of material, forums and of course, this mailing-list, I think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm each), one of those six disks will be a 'hot spare'. I will just put the OS, the WAL and the data one one volume. RAID10 is way to expensive :) If I understand correctly, this will give great read-performance, but less write-performance. But since this server will be equipped with an embedded RAID controller featuring 128Mb of battery-backed cache, I figure that this controller will negate that (at least somewhat). I will need to find out whether this cache can be configured so that it will ONLY cache WRITES, not READS.... Also because of this battery backed cache controller, I will go for the ext2 file system, mounted with 'noatime'. I will use a UPS, so I don't think I need the journaling of ext3. XFS is not natively supported by RedHat and I will go for the easy way here :) 1 Gb of RAM should be enough, I think. That is about the only point that almost everyone agrees on :) Do you think ECC is very important? The server I have in mind does not support it. Another one does, but is is about 1.000 euros more expensive :( One CPU should also be enough. As for postgresql.conf settings, I think I will start with the following : max_connections = 128 superuser_reserved_connections = 1 shared_buffers = 8192 max_fsm_relations = 1000 max_fsm_pages = 100000 wal_buffers = 32 sort_mem = 2048 vacuum_mem = 32768 effective_cache_size = 28672 (this one I'm not sure about, maybe this one needs to be higher) random_page_cost = 2 geq0_threshold = 20 This pretty much sums it up. What do you think about this config? It may not be the fastest, but a server like this will cost about 4750 euros, and that is including an Intel Xeon 2.4GHz cpu, redundant power supply, WITHOUT the UPS. Seems very reasonable to me... Kind regards, Alexander Priem.
On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote: > file system, mounted with 'noatime'. I will use a UPS, so I don't think I > need the journaling of ext3. XFS is not natively supported by RedHat and I Just in case you're still thinking, why do you suppose that only power failures lead to system crashes? Surprise kernel panics due to bad hardware or OS upgrades with bugs in them, sudden failures because of bad memory, &c: all these things also can lead to crashes, and though super-redundant hardware can mitigate that risk, they can't eliminate them completely. This is not advice, of course, but for my money, its a bad idea not to use a journalled filesystem (or something similar) for production systems. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Tue, 2003-07-22 at 07:53, Alexander Priem wrote: > Wow, I never figured how many different RAID configurations one could think [snip] > Also because of this battery backed cache controller, I will go for the ext2 > file system, mounted with 'noatime'. I will use a UPS, so I don't think I > need the journaling of ext3. Oooooo, I don't think I'd do that!!!!! It's akin to saying, "I don't need to make backups, because I have RAID[1,5,10,1+0] If the power is out for 26 minutes and your UPS only lasts for 25 minutes, you could be in be in for a long, painful boot process if the box crashes. (For example, the UPS auto-shutdown daemon doesn't work properly, and no one can get to the console to shut it down properly before the batteries die.) -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote: > Wow, I never figured how many different RAID configurations one could think > of :) > > After reading lots of material, forums and of course, this mailing-list, I > think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm > each), one of those six disks will be a 'hot spare'. I will just put the OS, > the WAL and the data one one volume. RAID10 is way to expensive :) > > If I understand correctly, this will give great read-performance, but less > write-performance. But since this server will be equipped with an embedded > RAID controller featuring 128Mb of battery-backed cache, I figure that this > controller will negate that (at least somewhat). I will need to find out > whether this cache can be configured so that it will ONLY cache WRITES, not > READS.... I think the bigger isssue with RAID5 write performance in a database is that it hits every spindle. The real performance bottleneck you run into is latency, especially the latency of positioning the heads. I don't have any proof to this theory, but I believe this is why moving WAL and/or temp_db to seperate drives from the main database files can be a big benefit for some applications; not because of disk bandwidth but because it drastically cuts down the amount of time the heads have to spend flying around the disk. Of course, this is also highly dependant on how the filesystem operates, too. If it puts your WALs, temp_db, and database files very close to each other on the drive, splitting them out to seperate spindles won't help as much. -- Jim C. Nasby, Database Consultant jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
OK, another change of plans :) ext2 seems to be a bad idea. So i'll stick with ext3. Better safe than sorry... About the RAID-config: Maybe RAID-10 with six disks is affordable after all. I would have to take the smallest disks in this case, 18Gb per disk. So six 18Gb disks (15000rpm) would result in a total capacity of 54 Gb, right? This volume would hold OS, WAL and data, but since RAID10 appears to deliver such great performance (according to several people), in combination with the 128Mb of battery backed cache, this would be a good solution? Hmmm. I keep changing my mind about this. My Db would be mostly 'selecting', but there would also be pretty much inserting and updating done. But most of the work would be selects. So would this config be OK? Kind regards, Alexander. ----- Original Message ----- From: "Jim C. Nasby" <jim@nasby.net> To: "Alexander Priem" <ap@cict.nl> Cc: "Vincent van Leeuwen" <pgsql.spam@vinz.nl>; <pgsql-performance@postgresql.org> Sent: Tuesday, July 22, 2003 4:33 PM Subject: Re: [PERFORM] Tuning PostgreSQL > On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote: > > Wow, I never figured how many different RAID configurations one could think > > of :) > > > > After reading lots of material, forums and of course, this mailing-list, I > > think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm > > each), one of those six disks will be a 'hot spare'. I will just put the OS, > > the WAL and the data one one volume. RAID10 is way to expensive :) > > > > If I understand correctly, this will give great read-performance, but less > > write-performance. But since this server will be equipped with an embedded > > RAID controller featuring 128Mb of battery-backed cache, I figure that this > > controller will negate that (at least somewhat). I will need to find out > > whether this cache can be configured so that it will ONLY cache WRITES, not > > READS.... > > I think the bigger isssue with RAID5 write performance in a database is > that it hits every spindle. The real performance bottleneck you run into > is latency, especially the latency of positioning the heads. I don't > have any proof to this theory, but I believe this is why moving WAL > and/or temp_db to seperate drives from the main database files can be a > big benefit for some applications; not because of disk bandwidth but > because it drastically cuts down the amount of time the heads have to > spend flying around the disk. > > Of course, this is also highly dependant on how the filesystem operates, > too. If it puts your WALs, temp_db, and database files very close to > each other on the drive, splitting them out to seperate spindles won't > help as much. > -- > Jim C. Nasby, Database Consultant jim@nasby.net > Member: Triangle Fraternity, Sports Car Club of America > Give your computer some brain candy! www.distributed.net Team #1828 > > Windows: "Where do you want to go today?" > Linux: "Where do you want to go tomorrow?" > FreeBSD: "Are you guys coming, or what?" > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
On Tue, 2003-07-22 at 10:01, Alexander Priem wrote: > OK, another change of plans :) > > ext2 seems to be a bad idea. So i'll stick with ext3. Better safe than > sorry... Don't forget noatime! > About the RAID-config: Maybe RAID-10 with six disks is affordable after all. > I would have to take the smallest disks in this case, 18Gb per disk. So six > 18Gb disks (15000rpm) would result in a total capacity of 54 Gb, right? This > volume would hold OS, WAL and data, but since RAID10 appears to deliver such > great performance (according to several people), in combination with the > 128Mb of battery backed cache, this would be a good solution? > > Hmmm. I keep changing my mind about this. My Db would be mostly 'selecting', > but there would also be pretty much inserting and updating done. But most of > the work would be selects. So would this config be OK? Others may disagree, but I'd put the OS and executables on a separate disk from the db and WAL, and make it an IDE drive, since it's so much less expensive than SCSI disks. (Make a copy of the disk, and if it craps out, pop out the old disk, stick in the new disk, and fire the box right back up...) Thus, you'll have an OS/executables disk, and a separate DB disk, and never the twain shall meet. Theoretically, you could pick up those 6 drives and controller, move them to another machine, and the data should be just as it was on the other box. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
>>>>> "AP" == Alexander Priem <ap@cict.nl> writes: AP> Hmmm. I keep changing my mind about this. My Db would be mostly AP> 'selecting', but there would also be pretty much inserting and AP> updating done. But most of the work would be selects. So would AP> this config be OK? I'm about to order a new server. I haven't decided exactly how many disks I will get, but my plan is to get an 8-disk RAID10 with 15k RPM drives. I don't need the volume, just the speed and number of spindles, so I'm buying the smallest drives that meet my speed probably 18Gb each (sheesh! I remember getting my first 5Mb disk for my 8088 PC in college and thinking that was too much space). My mix is nearly even read/write, but probably a little biased towards the reading. This machine is replacing a 5-disk box that was switched from RAID5 to 4-disk RAID10 for data plus one system disk in January (what a pain that was to re-index, but that's another story). The switch from RAID5 to RAID10 made an enormous improvement in performance. The speedup wasn't from recreating the database: It was restored from a file-level backup so the actual files were not compacted or secretly "improved" in any way, other than my occasional reindexing. So I think your 6-disk RAID10 will be good. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On Tue, Jul 22, 2003 at 11:40:35 +0200, Vincent van Leeuwen <pgsql.spam@vinz.nl> wrote: > > About RAID types: the fastest RAID type by far is RAID-10. However, this will > cost you a lot of useable diskspace, so it isn't for everyone. You need at > least 4 disks for a RAID-10 array. RAID-5 is a nice compromise if you want as > much useable diskspace as possible and still want to be redundant. RAID-1 is > very useful for small (2-disk) arrays. Note that while raid 10 requires 4 disks, you get the space of 2 disks. This is the same ratio as for raid 1.
On Tue, 22 Jul 2003, Jim C. Nasby wrote: > On Tue, Jul 22, 2003 at 03:27:20PM +0200, Alexander Priem wrote: > > Wow, I never figured how many different RAID configurations one could think > > of :) > > > > After reading lots of material, forums and of course, this mailing-list, I > > think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm > > each), one of those six disks will be a 'hot spare'. I will just put the OS, > > the WAL and the data one one volume. RAID10 is way to expensive :) > > > > If I understand correctly, this will give great read-performance, but less > > write-performance. But since this server will be equipped with an embedded > > RAID controller featuring 128Mb of battery-backed cache, I figure that this > > controller will negate that (at least somewhat). I will need to find out > > whether this cache can be configured so that it will ONLY cache WRITES, not > > READS.... > > I think the bigger isssue with RAID5 write performance in a database is > that it hits every spindle. This is a common, and wrong misconception. If you are writing 4k out to a RAID5 of 10 disks, this is what happens: (assumiung 64k stipes...) READ data stripe (64k read) READ parity stripe (64k read) make changes to data stripe XOR new data stripe with old parity stripe to get a new parity stripe write new parity stripe (64k) write new data stripe (64k) So it's not as bad as you might think. No modern controller (or sw raid for linux) hits all the spindles anymore for writes. As you add more drives to a RAID5 writes actually get faster on average, because there's less chance of having contention for the same drives (remember, parity moves about in RAID5 so the parity disk isn't a choke point in RAID5 like it is in RAID4.) > The real performance bottleneck you run into > is latency, especially the latency of positioning the heads. I don't > have any proof to this theory, but I believe this is why moving WAL > and/or temp_db to seperate drives from the main database files can be a > big benefit for some applications; not because of disk bandwidth but > because it drastically cuts down the amount of time the heads have to > spend flying around the disk. This is absolutely true. moving the heads costs hugely. while most modern drives have SEEK times <10 ms, the SETTLE times tend to be about that as well, followed by the average of about 3 ms for rotational latency to allow the proper sector to be under the head (10krpm drives rotate once about every 6 ms.)
"scott.marlowe" <scott.marlowe@ihs.com> writes: > If you are writing 4k out to a RAID5 of 10 disks, this is what happens: > > (assumiung 64k stipes...) > READ data stripe (64k read) > READ parity stripe (64k read) > make changes to data stripe > XOR new data stripe with old parity stripe to get a new parity stripe > write new parity stripe (64k) > write new data stripe (64k) > > So it's not as bad as you might think. The main negative for RAID5 is that it had to do that extra READ. If you're doing lots of tiny updates then the extra latency to have to go read the parity block before it can write the parity block out is a real killer. For that reason people prefer 0+1 for OLTP systems. But you have to actually test your setup in practice to see if it hurts. A big data warehousing system will be faster under RAID5 than under RAID1+0 because of the extra disks in the stripeset. The more disks in the stripeset the more bandwidth you get. Even for OLTP systems I've had success with RAID5 or not depending largely on the quality of the implementation. The Hitachi systems were amazing. They had enough battery backed cache that the extra latency for the parity read/write cycle really never showed up at all. But it had a lot more than 128M. I think it had 1G and could be expanded. -- greg
On Thu, 2003-07-24 at 13:29, Greg Stark wrote: > "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > If you are writing 4k out to a RAID5 of 10 disks, this is what happens: > > > > (assumiung 64k stipes...) > > READ data stripe (64k read) > > READ parity stripe (64k read) > > make changes to data stripe > > XOR new data stripe with old parity stripe to get a new parity stripe > > write new parity stripe (64k) > > write new data stripe (64k) > > > > So it's not as bad as you might think. > > The main negative for RAID5 is that it had to do that extra READ. If you're > doing lots of tiny updates then the extra latency to have to go read the > parity block before it can write the parity block out is a real killer. For > that reason people prefer 0+1 for OLTP systems. > > But you have to actually test your setup in practice to see if it hurts. A big > data warehousing system will be faster under RAID5 than under RAID1+0 because > of the extra disks in the stripeset. The more disks in the stripeset the more > bandwidth you get. > > Even for OLTP systems I've had success with RAID5 or not depending largely on > the quality of the implementation. The Hitachi systems were amazing. They had > enough battery backed cache that the extra latency for the parity read/write > cycle really never showed up at all. But it had a lot more than 128M. I think > it had 1G and could be expanded. Your last paragraph just stole the objection to the 1st paragraph right out of my mouth, since enough cache will allow it to "batch" all those tiny updates into big updates. But those Hitachi controllers weren't plugged into x86-type boxen, were they? -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
On Wed, 2003-07-23 at 00:53, Alexander Priem wrote: > Wow, I never figured how many different RAID configurations one could think > of :) > > After reading lots of material, forums and of course, this mailing-list, I > think I am going for a RAID5 configuration of 6 disks (18Gb, 15.000 rpm > each), one of those six disks will be a 'hot spare'. I will just put the OS, > the WAL and the data one one volume. RAID10 is way to expensive :) The general heuristic is that RAID-5 is not the way to deal with databases. Now surely someone will disagree with me, but as I understand it RAID-5 has a bottleneck on a single disk for the (checksum) information. Bottleneck is not the word you want to hear in the context of "database server". RAID-1 (mirroring) or RAID-10 (sort-of-mirrored-RAID-5) is the best choice. As far as FS performance goes, a year or two ago I remember someone doing an evaluation of FS performance for PostgreSQL and they found that the best performance was... FAT Yep: FAT The reason is that a lot of what the database is doing, especially guaranteeing writes (WAL) and so forth is best handled through a filesystem that does not get in the way. The fundamentals will not have changed. It is for this reason that ext2 is very much likely to be better than ext3. XFS is possibly (maybe, perhaps) OK, because there are optimisations in there for databases, but the best optimisation is to not be there at all. That's why Oracle want direct IO to disk partitions so they can implement their own "filesystem" (i.e. record system... table system...) on a raw partition. Personally I don't plan to reboot my DB server more than once a year (if that (even my_laptop currently has 37 days uptime, not including suspend). On our DB servers I use ext2 (rather than ext3) mounted with noatime, and I bite the 15 minutes to fsck (once a year) rather than screw general performance with journalling database on top of journalling FS. I split pg_xlog onto a separate physical disk, if performance requirements are extreme. Catalyst's last significant project was to write the Domain Name registration system for .nz (using PostgreSQL). Currently we are developing the electoral roll for the same country (2.8 million electors living at 1.4 million addresses). We use Oracle (or Progress, or MySQL) if a client demands them, but we use PostgreSQL if we get to choose. Increasingly we get to choose. Good. Regards, Andrew. -- --------------------------------------------------------------------- Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Survey for nothing with http://survey.net.nz/ ---------------------------------------------------------------------
> Andrew McMillan wrote: > > The general heuristic is that RAID-5 is not the way to deal > with databases. Now surely someone will disagree with me, > but as I understand it RAID-5 has a bottleneck on a single > disk for the > (checksum) information. Bottleneck is not the word you want > to hear in the context of "database server". That's indeed incorrect. There is no single disk "special" in a Raid-5, you might be mistaking it for Raid-3/4 (where a single disk holds the checksum). In raid-5 the checksums are scattered around on all the harddisks. Raid-5's problem is the write-performance, but with a decent raid-controller it outperforms a less-decent raid-controller (with the same harddisks) on both read- and writeperformance which is running a raid-10. With a decent raid-controller you end up with "about the same" write performance as with raid-1, but slightly lower read performance. At least, that's what I was able to gather from some tests of a colleague of mine with different raid-setups. > RAID-1 (mirroring) or RAID-10 (sort-of-mirrored-RAID-5) is > the best choice. Raid-10 is _not_ similar to raid-5, it is raid1+0 i.e. a mirroring set of stripes (raid-0 is more-or-less a stripe). For databases, raid-10 is supposed to be the fastest, since you have the advantage of the striping for both reading and writing. While you also have the advantage of the mirroring for reading. The main disadvantage of raid-1 (and also of raid-10) is the heavy waste of harddisk space. Another advantage of raid-5 over raid-10 is that when you don't care about space, raid-5 is more save with four harddrives than raid-10 (i.e. set it up with a 3-disk+1spare). > As far as FS performance goes, a year or two ago I remember > someone doing an evaluation of FS performance for PostgreSQL > and they found that the best performance was... > > FAT > > Yep: FAT FAT has a few disadvantages afaik, I wouldn't use it for my database at least. > Personally I don't plan to reboot my DB server more than once > a year (if that (even my_laptop currently has 37 days uptime, > not including suspend). On our DB servers I use ext2 (rather > than ext3) mounted with noatime, and I bite the 15 minutes to > fsck (once a year) rather than screw general performance with > journalling database on top of journalling FS. I split > pg_xlog onto a separate physical disk, if performance > requirements are extreme. Well, reboting is not a problem with ext2, but crashing might be... And normally you don't plan a systemcrash ;) Ext3 and xfs handle that much better. Regards, Arjen
"Arjen van der Meijden" <acmmailing@vulcanus.its.tudelft.nl> writes: > Well, reboting is not a problem with ext2, but crashing might be... And > normally you don't plan a systemcrash ;) > Ext3 and xfs handle that much better. A journaling filesystem is good to use if you can set it to journal metadata but not file contents. PG's WAL logic can recover lost file contents, but we have no way to help out the filesystem if it's lost metadata. regards, tom lane
Since there seem to be a lot of different opinions regarding the various different RAID configurations I thought I'd post this link to the list: http://www.storagereview.com/guide2000/ref/hdd/perf/raid/index.html This is the best resource for information on RAID and hard drive performance I found online. I hope this helps. Balazs
Balasz, > Since there seem to be a lot of different opinions regarding the various > different RAID configurations I thought I'd post this link to the list: > http://www.storagereview.com/guide2000/ref/hdd/perf/raid/index.html Yeah ... this is a really good article. Made me realize why "stripey" RAID sucks for OLTP databases, unless you throw a lot of platters at them. -- Josh Berkus Aglio Database Solutions San Francisco
>>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: GS> But you have to actually test your setup in practice to see if it GS> hurts. A big data warehousing system will be faster under RAID5 GS> than under RAID1+0 because of the extra disks in the GS> stripeset. The more disks in the stripeset the more bandwidth you GS> get. Anyone have ideas on 14 spindles? I just ordered a disk subsystem with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell PERC3/DC controller (only 128MB cache, though). My plan was to do RAID10, but I think I'll play with RAID5 again and see which gives me the best performance. Unfortunatly, it is difficult to recreate the highly fragmented tables I have now (vacuum full takes over 14 hours on one of the tables) so I'm not sure how to best compare them. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > > GS> But you have to actually test your setup in practice to see if it > GS> hurts. A big data warehousing system will be faster under RAID5 > GS> than under RAID1+0 because of the extra disks in the > GS> stripeset. The more disks in the stripeset the more bandwidth you > GS> get. > > Anyone have ideas on 14 spindles? I just ordered a disk subsystem > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell > PERC3/DC controller (only 128MB cache, though). 14 drives on one SCSI card, eh? I'd be worried about saturating the bus. Maybe it's an old rule of thumb, but I would fill a SCSI chain more than half full. > My plan was to do RAID10, but I think I'll play with RAID5 again and > see which gives me the best performance. Unfortunatly, it is > difficult to recreate the highly fragmented tables I have now (vacuum > full takes over 14 hours on one of the tables) so I'm not sure how to > best compare them. Also IMO: if I needed something *really* high performance, I'd start with a mobo that has dual PCI buses (133MB/s can get swamped quickly by U320 devices) or PCI-X (but they're so new...). Then: - get dual U320 SCSI cards (one for each PCI bus) - plug them into dual redundant fast path external storage controllers that have, oh, 512MB RAM cache *each*) - dual port the drives, so they plug into both storage controllers Since I wouldn't put 14 drives on one SCSI chain, double what I just said, and only plug 7 drives in each controller. If your app needs One Big Honkin' Device, use the Linux Volume Manager (LVM) to merge the 2 RAID logical devices into one "super- logical" device. Yes, that's lot's of money, but is the data, and speed, important enough? -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
On Tue, 2003-07-29 at 08:14, Vivek Khera wrote: > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > > GS> But you have to actually test your setup in practice to see if it > GS> hurts. A big data warehousing system will be faster under RAID5 > GS> than under RAID1+0 because of the extra disks in the > GS> stripeset. The more disks in the stripeset the more bandwidth you > GS> get. > > Anyone have ideas on 14 spindles? I just ordered a disk subsystem > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell > PERC3/DC controller (only 128MB cache, though). <SNIP> Hey one comment on this. With dell Perc3/DC you should check the megaraid-devel list to find the best BIOS settings for maximum performance. There have been many comments on it and trials to get it going really well. All told though I totally love the LSI Megaraid ( which is what the perc3/dc is ) controllers. We use the Elite 1650 with seagate cheetah drives for a nice little array. --Will
Attachment
On 29 Jul 2003, Ron Johnson wrote: > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > > GS> But you have to actually test your setup in practice to see if it > > GS> hurts. A big data warehousing system will be faster under RAID5 > > GS> than under RAID1+0 because of the extra disks in the > > GS> stripeset. The more disks in the stripeset the more bandwidth you > > GS> get. > > > > Anyone have ideas on 14 spindles? I just ordered a disk subsystem > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell > > PERC3/DC controller (only 128MB cache, though). > > 14 drives on one SCSI card, eh? I'd be worried about saturating > the bus. I'm pretty sure those PERCs are based on the megaraid cards, which can handle 3 or 4 channels each... > Maybe it's an old rule of thumb, but I would fill a SCSI chain > more than half full. It's an old rule of thumb, but it still applies, it just takes more drives to saturate the channel. Figure ~ 30 to 50 MBytes a second per drive, on a U320 port it would take 10 drives to saturate it, and considering random accesses will be much slower than the max ~30 megs a second off the platter rate, it might take more than the max 14 drives to saturate U320.
On Tue, 2003-07-29 at 11:18, scott.marlowe wrote: > On 29 Jul 2003, Ron Johnson wrote: > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > > > > GS> But you have to actually test your setup in practice to see if it > > > GS> hurts. A big data warehousing system will be faster under RAID5 > > > GS> than under RAID1+0 because of the extra disks in the > > > GS> stripeset. The more disks in the stripeset the more bandwidth you > > > GS> get. > > > > > > Anyone have ideas on 14 spindles? I just ordered a disk subsystem > > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell > > > PERC3/DC controller (only 128MB cache, though). > > > > 14 drives on one SCSI card, eh? I'd be worried about saturating > > the bus. > > I'm pretty sure those PERCs are based on the megaraid cards, which can > handle 3 or 4 channels each... Each with 14 devices? If so, isn't that a concentrated point of failure, even if the channels are 1/2 full? > > Maybe it's an old rule of thumb, but I would fill a SCSI chain > > more than half full. > > It's an old rule of thumb, but it still applies, it just takes more drives > to saturate the channel. Figure ~ 30 to 50 MBytes a second per drive, on > a U320 port it would take 10 drives to saturate it, and considering random > accesses will be much slower than the max ~30 megs a second off the > platter rate, it might take more than the max 14 drives to saturate U320. Ok. You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
On 29 Jul 2003, Ron Johnson wrote: > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote: > > On 29 Jul 2003, Ron Johnson wrote: > > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > > > > > > GS> But you have to actually test your setup in practice to see if it > > > > GS> hurts. A big data warehousing system will be faster under RAID5 > > > > GS> than under RAID1+0 because of the extra disks in the > > > > GS> stripeset. The more disks in the stripeset the more bandwidth you > > > > GS> get. > > > > > > > > Anyone have ideas on 14 spindles? I just ordered a disk subsystem > > > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell > > > > PERC3/DC controller (only 128MB cache, though). > > > > > > 14 drives on one SCSI card, eh? I'd be worried about saturating > > > the bus. > > > > I'm pretty sure those PERCs are based on the megaraid cards, which can > > handle 3 or 4 channels each... > > Each with 14 devices? If so, isn't that a concentrated point of > failure, even if the channels are 1/2 full? Yep. I've built one once before when BIG hard drives were 9 gigs. :-) And it is a point of concentrated failure, which brings me to my favorite part about the LSI megaraid cards (which most / all perc3s are apparently.) If you build a RAID1+0 or 0+1, you can seperate it out so each sub part is on it's own card, and the other cards keep acting like one big card. Assuming the bad card isn't killing your PCI bus or draining the 12V rail or something. > > > Maybe it's an old rule of thumb, but I would fill a SCSI chain > > > more than half full. > > > > It's an old rule of thumb, but it still applies, it just takes more drives > > to saturate the channel. Figure ~ 30 to 50 MBytes a second per drive, on > > a U320 port it would take 10 drives to saturate it, and considering random > > accesses will be much slower than the max ~30 megs a second off the > > platter rate, it might take more than the max 14 drives to saturate U320. > > Ok. You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives. But that's seq scan. For many database applications, random access performance is much more important. Imagine 200 people entering reservations of 8k or less each into a transaction processing engine. Each transactions chance to hit an unoccupied spindle is what really counts. If there's 30 spindles, each doing a stripe's worth of access all the time, it's likely to never flood the channel. If random access is 1/4th the speed of seq scan, then you need to multiply it by 4 to get the number of drives that'd saturate the PCI bus.
On Tue, 2003-07-29 at 14:00, scott.marlowe wrote: > On 29 Jul 2003, Ron Johnson wrote: > > > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote: > > > On 29 Jul 2003, Ron Johnson wrote: > > > > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > > > > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > > > > > > > > GS> But you have to actually test your setup in practice to see if it > > > > > GS> hurts. A big data warehousing system will be faster under RAID5 > > > > > GS> than under RAID1+0 because of the extra disks in the > > > > > GS> stripeset. The more disks in the stripeset the more bandwidth you > > > > > GS> get. > > > > > > > > > > Anyone have ideas on 14 spindles? I just ordered a disk subsystem > > > > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell > > > > > PERC3/DC controller (only 128MB cache, though). > > > > > > > > 14 drives on one SCSI card, eh? I'd be worried about saturating > > > > the bus. > > > > > > I'm pretty sure those PERCs are based on the megaraid cards, which can > > > handle 3 or 4 channels each... > > > > Each with 14 devices? If so, isn't that a concentrated point of > > failure, even if the channels are 1/2 full? > > Yep. I've built one once before when BIG hard drives were 9 gigs. :-) > > And it is a point of concentrated failure, which brings me to my favorite > part about the LSI megaraid cards (which most / all perc3s are > apparently.) > > If you build a RAID1+0 or 0+1, you can seperate it out so each sub part is > on it's own card, and the other cards keep acting like one big card. > Assuming the bad card isn't killing your PCI bus or draining the 12V rail > or something. Sounds like my kinda card! Is the cache battery-backed up? How much cache can you stuff in them? > > > > Maybe it's an old rule of thumb, but I would fill a SCSI chain > > > > more than half full. > > > > > > It's an old rule of thumb, but it still applies, it just takes more drives > > > to saturate the channel. Figure ~ 30 to 50 MBytes a second per drive, on > > > a U320 port it would take 10 drives to saturate it, and considering random > > > accesses will be much slower than the max ~30 megs a second off the > > > platter rate, it might take more than the max 14 drives to saturate U320. > > > > Ok. You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives. > > But that's seq scan. For many database applications, random access > performance is much more important. Imagine 200 people entering > reservations of 8k or less each into a transaction processing engine. > Each transactions chance to hit an unoccupied spindle is what really > counts. If there's 30 spindles, each doing a stripe's worth of access all > the time, it's likely to never flood the channel. > > If random access is 1/4th the speed of seq scan, then you need to multiply > it by 4 to get the number of drives that'd saturate the PCI bus. Maybe it's just me, but I've never seen a purely TP system. Even if roll off the daily updates to a "reporting database" each night, some yahoo manager with enough juice to have his way still wants up-to-the-minute reports... Better yet, the Access Jockey, who thinks s/he's an SQL whiz but couldn't JOIN himself out of a paper bag... -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
On 29 Jul 2003, Ron Johnson wrote: > On Tue, 2003-07-29 at 14:00, scott.marlowe wrote: > > On 29 Jul 2003, Ron Johnson wrote: > > > > > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote: > > > > On 29 Jul 2003, Ron Johnson wrote: > > > > > > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > > > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > > > > > > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > > > > > > > > > > > > GS> But you have to actually test your setup in practice to see if it > > > > > > GS> hurts. A big data warehousing system will be faster under RAID5 > > > > > > GS> than under RAID1+0 because of the extra disks in the > > > > > > GS> stripeset. The more disks in the stripeset the more bandwidth you > > > > > > GS> get. > > > > > > > > > > > > Anyone have ideas on 14 spindles? I just ordered a disk subsystem > > > > > > with 14 high speed (U320 15kRPM) SCSI disks to hook up with a dell > > > > > > PERC3/DC controller (only 128MB cache, though). > > > > > > > > > > 14 drives on one SCSI card, eh? I'd be worried about saturating > > > > > the bus. > > > > > > > > I'm pretty sure those PERCs are based on the megaraid cards, which can > > > > handle 3 or 4 channels each... > > > > > > Each with 14 devices? If so, isn't that a concentrated point of > > > failure, even if the channels are 1/2 full? > > > > Yep. I've built one once before when BIG hard drives were 9 gigs. :-) > > > > And it is a point of concentrated failure, which brings me to my favorite > > part about the LSI megaraid cards (which most / all perc3s are > > apparently.) > > > > If you build a RAID1+0 or 0+1, you can seperate it out so each sub part is > > on it's own card, and the other cards keep acting like one big card. > > Assuming the bad card isn't killing your PCI bus or draining the 12V rail > > or something. > > Sounds like my kinda card! > > Is the cache battery-backed up? Yep > How much cache can you stuff in them? the old old old school MegaRAID428 could hold up to 128 Meg. I'm sure the new ones can handle 512Meg or more. > > > > > Maybe it's an old rule of thumb, but I would fill a SCSI chain > > > > > more than half full. > > > > > > > > It's an old rule of thumb, but it still applies, it just takes more drives > > > > to saturate the channel. Figure ~ 30 to 50 MBytes a second per drive, on > > > > a U320 port it would take 10 drives to saturate it, and considering random > > > > accesses will be much slower than the max ~30 megs a second off the > > > > platter rate, it might take more than the max 14 drives to saturate U320. > > > > > > Ok. You'd still saturate the 133MB/s PCI bus at 133/30 = 4.4 drives. > > > > But that's seq scan. For many database applications, random access > > performance is much more important. Imagine 200 people entering > > reservations of 8k or less each into a transaction processing engine. > > Each transactions chance to hit an unoccupied spindle is what really > > counts. If there's 30 spindles, each doing a stripe's worth of access all > > the time, it's likely to never flood the channel. > > > > If random access is 1/4th the speed of seq scan, then you need to multiply > > it by 4 to get the number of drives that'd saturate the PCI bus. > > Maybe it's just me, but I've never seen a purely TP system. I think most of them are running under TPF on a mainframe in a basement somewhere, like for airline reservations. I've never worked on one, but met one of the guys who runs one, and they use 12 mainframes for 6 live machines and each live machine has a failover machine behind it in sysplex mode. I kept thinking of the giant dinosaurs in Jurassic park... > Even if roll off the daily updates to a "reporting database" each > night, some yahoo manager with enough juice to have his way still > wants up-to-the-minute reports... Just because it's TP doesn't mean it doesn't have real time reporting. But expensive reports probably do get run at night. > Better yet, the Access Jockey, who thinks s/he's an SQL whiz but > couldn't JOIN himself out of a paper bag... I've seen a few who got joins and unions and what not, but explaining fks or transactions got me a glazed look... :-)
On Tue, 2003-07-29 at 15:09, scott.marlowe wrote: > On 29 Jul 2003, Ron Johnson wrote: > > > On Tue, 2003-07-29 at 14:00, scott.marlowe wrote: > > > On 29 Jul 2003, Ron Johnson wrote: > > > > > > > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote: > > > > > On 29 Jul 2003, Ron Johnson wrote: > > > > > > > > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > > > > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > > > > > > > > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: [snip] > > > But that's seq scan. For many database applications, random access > > > performance is much more important. Imagine 200 people entering > > > reservations of 8k or less each into a transaction processing engine. > > > Each transactions chance to hit an unoccupied spindle is what really > > > counts. If there's 30 spindles, each doing a stripe's worth of access all > > > the time, it's likely to never flood the channel. > > > > > > If random access is 1/4th the speed of seq scan, then you need to multiply > > > it by 4 to get the number of drives that'd saturate the PCI bus. > > > > Maybe it's just me, but I've never seen a purely TP system. > > I think most of them are running under TPF on a mainframe in a basement > somewhere, like for airline reservations. I've never worked on one, but > met one of the guys who runs one, and they use 12 mainframes for 6 live > machines and each live machine has a failover machine behind it in sysplex > mode. I kept thinking of the giant dinosaurs in Jurassic park... We have something similar running on Alphas and VMS; does about 8M Txn/day. Anyone who uses E-ZPass in the northeast eventually gets stuck in our systems. (Made me fear Big Brother...) > > Even if roll off the daily updates to a "reporting database" each > > night, some yahoo manager with enough juice to have his way still > > wants up-to-the-minute reports... > > Just because it's TP doesn't mean it doesn't have real time reporting. > But expensive reports probably do get run at night. Yes, but... There's always the exception. > > Better yet, the Access Jockey, who thinks s/he's an SQL whiz but > > couldn't JOIN himself out of a paper bag... > > I've seen a few who got joins and unions and what not, but explaining fks > or transactions got me a glazed look... :-) Wow! They understood joins? You lucky dog!!! -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
On Tue, 2003-07-29 at 15:38, Ron Johnson wrote: > On Tue, 2003-07-29 at 15:09, scott.marlowe wrote: > > On 29 Jul 2003, Ron Johnson wrote: > > > > > On Tue, 2003-07-29 at 14:00, scott.marlowe wrote: > > > > On 29 Jul 2003, Ron Johnson wrote: > > > > > > > > > On Tue, 2003-07-29 at 11:18, scott.marlowe wrote: > > > > > > On 29 Jul 2003, Ron Johnson wrote: > > > > > > > > > > > > > On Tue, 2003-07-29 at 10:14, Vivek Khera wrote: > > > > > > > > >>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: > > > > > > > > > > > > > > > > GS> "scott.marlowe" <scott.marlowe@ihs.com> writes: > [snip] [snip] > > I think most of them are running under TPF on a mainframe in a basement > > somewhere, like for airline reservations. I've never worked on one, but > > met one of the guys who runs one, and they use 12 mainframes for 6 live > > machines and each live machine has a failover machine behind it in sysplex > > mode. I kept thinking of the giant dinosaurs in Jurassic park... > > We have something similar running on Alphas and VMS; does about > 8M Txn/day. Anyone who uses E-ZPass in the northeast eventually > gets stuck in our systems. Oh, forget to mention: yes, they are in a 2-deep basement. -- +-----------------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA | | | | "I'm not a vegetarian because I love animals, I'm a vegetarian | | because I hate vegetables!" | | unknown | +-----------------------------------------------------------------+
>>>>> "RJ" == Ron Johnson <ron.l.johnson@cox.net> writes: RJ> On Tue, 2003-07-29 at 14:00, scott.marlowe wrote: RJ> Sounds like my kinda card! RJ> Is the cache battery-backed up? yep RJ> How much cache can you stuff in them? as per dell, the max is 128Mb, which was a bummer. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Hi all,
I am wondering about something: I have a table (let's call it "Table") which will contain a lot of records. Every record has a field named "deleted" which can be either NULL or a date value. When this field is NULL, the record in question may be used by a program. If the field contains a date, this field must be considered as "deleted" and cannot be used anymore.
The reason why I don't actually delete such records is that I want to be able to reference them from other locations (for history purposes).
What I am thinking about is creating two views for this table: Table_View and Table_History. Table_View would contain all records where "Deleted is null". Table_History would just contain all records (Select * From Table).
In my program most queries would need to view only the records where deleted is null.
Would " Select * from Table_View Where Name='xxx' " perform worse than " Select * from Table Where deleted is null and Name='xxx' " ?
I ask this because I would like it if I wouldn't have to type "where deleted is null" for about every query in my program. But I will not use this strategy if this would mean serious performance loss...
Thanks in Advance,
Alexander Priem.
"Alexander Priem" <ap@cict.nl> writes: > What I am thinking about is creating two views for this table: Table_View a= > nd Table_History. Table_View would contain all records where "Deleted is nu= > ll". Table_History would just contain all records (Select * From Table). > Would " Select * from Table_View Where Name=3D'xxx' " perform worse than " = > Select * from Table Where deleted is null and Name=3D'xxx' " ? They'd be exactly the same (modulo a few extra microseconds/milliseconds for the query planner to expand the view definition). regards, tom lane