Thread: questions about disk configurations
hi i have a question about best harddisk configuration for postgresql performance. of course i know that: - scsi is better than ide - 2 disks are better than 1 - 3 disks are better than 2 i know that with 3 disks one should move xlog to one drive, index files to second and tables to third. that's clear. but: will making software raid on this discs provide performance increase or decrease? which raid (0,1,5,10?) is best for postgresql? maybe it differs when it comes to different datatypes (i.e. raid "X" is best for indices, but "Y" best for tables). i'd like to know what are the options to store all this information (xlog, indices and tables). what configurations are best, what medium and what should be avoided at all cost. hope you can help me, and sorry for my english. depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ Mój Boże, spraw abym milczał, dopóki się nie upewnię, że naprawdę mam coś do powiedzenia. (c) 1998 depesz
On Mon, 2002-12-09 at 12:32, Hubert depesz Lubaczewski wrote: > hi > i have a question about best harddisk configuration for postgresql > performance. [...] Yo! A bit more data is needed before anybody can give you more help: - what is your budget? - how big will your databases be? - what's the read/write ratio? Even then you'll not get any good recipes, because there aren't any. You'll have to do benchmarks yourself. A few fundamental things that are probably true for most: - more RAM is always good. Independent from the disc architecture - if an access isn't going to the disc at all, it's always good. (if you're mostly writing this may be lessened). - always carefully tune the postgres installation (random page cost, sort mem, shared buffers, ... - all depend on your system and you application) - as you correctly said: distribute the load on many spindles. On a busy database, 4*20G is probably faster than 1*80G beyound this, experiences vary. RAID1 and RAID5 are rated differently by different people - and especially with RAID5 there are (I think) really performance differencies between the various products. RAID0 is fastest, of course, but you probably care for your data. For equally good implementations, RAID1 and RAID5 may have similar speed, especially if the RAID controller for RAID5 has enough RAM. If the active dataset on a RAID5 is bigger than the available caching RAM, write performance sucks as a single block write requires 2 reads and 2 writes. If the RAID5 controller has enough RAM (and a decent implementation), write performance can be almost equal to RAID1 (2 writes for a single block write). So far -- vbi -- this email is protected by a digital signature: http://fortytwo.ch/gpg NOTE: keyserver bugs! get my key here: https://fortytwo.ch/gpg/92082481
Attachment
On Mon, Dec 09, 2002 at 01:01:58PM +0100, Adrian 'Dagurashibanipal' von Bidder wrote: > A bit more data is needed before anybody can give you more help: > - what is your budget? > - how big will your databases be? > - what's the read/write ratio? my question as for now is purely theoretical. i'm not asking about any specific situation, but me may talk about medium sized web size. budget is irrelevant (i'd like to talk *only* about harddrives, not memory, architescure and so on). > - as you correctly said: distribute the load on many spindles. On a > busy database, 4*20G is probably faster than 1*80G as i said: i know that 3 disks are bettar than 1 (as for postgres installation, because system data and swap should be on 4th disc - but this is obvious). > beyound this, experiences vary. RAID1 and RAID5 are rated differently by > different people - and especially with RAID5 there are (I think) really > performance differencies between the various products. RAID0 is fastest, > of course, but you probably care for your data. that's exactly what i'm asking about: which raid is best suited for which data amongst out 3 sets (xlog, tables, indices). or maybe for some types of data single disc is better than raid for some strange reason? is it better to (when having 2 discs) setup raid 0/1 or to use tham separatelly as xlog/tables? depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ Mój Boże, spraw abym milczał, dopóki się nie upewnię, że naprawdę mam coś do powiedzenia. (c) 1998 depesz
Depesz, > i have a question about best harddisk configuration for postgresql > performance. > of course i know that: > - scsi is better than ide > - 2 disks are better than 1 > - 3 disks are better than 2 > > i know that with 3 disks one should move xlog to one drive, index files > to second and tables to third. > that's clear. Er, no, it's not. In fact, for a 3-disk config, I reccommend: Disk 1: OS, swap, system logs Disk 2: Data + Indexes Disk 3: Transaction Log > but: > will making software raid on this discs provide performance increase or > decrease? Hardware RAID can improve *read* performance, particilarly RAIDs 1, 01, and 10. For writing, the best you can do is having it not inhibit performance. The general testament is that *software* RAID does not improve things at all; actually, the best that can be said for Linux Software RAID 1 is that it does not harm performance much. > i'd like to know what are the options to store all this information > (xlog, indices and tables). what configurations are best, what medium > and what should be avoided at all cost. Ask specific questions. If you want the full performance tutorial, you'd have to pay a steep fee for 1-3 days of training. -- -Josh Berkus Aglio Database Solutions San Francisco
I don't know whether you have read this link but it was helpful to me. http://www.ca.postgresql.org/docs/momjian/hw_performance/0.html It discusses PostgreSQL Hardware Performance Tuning. Hope it helps! Keith Bottner -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Hubert depesz Lubaczewski Sent: Monday, December 09, 2002 7:05 AM To: Adrian 'Dagurashibanipal' von Bidder; pgsql-performance@postgresql.org Subject: Re: [PERFORM] questions about disk configurations On Mon, Dec 09, 2002 at 01:01:58PM +0100, Adrian 'Dagurashibanipal' von Bidder wrote: > A bit more data is needed before anybody can give you more help: > - what is your budget? > - how big will your databases be? > - what's the read/write ratio? my question as for now is purely theoretical. i'm not asking about any specific situation, but me may talk about medium sized web size. budget is irrelevant (i'd like to talk *only* about harddrives, not memory, architescure and so on). > - as you correctly said: distribute the load on many spindles. On a > busy database, 4*20G is probably faster than 1*80G as i said: i know that 3 disks are bettar than 1 (as for postgres installation, because system data and swap should be on 4th disc - but this is obvious). > beyound this, experiences vary. RAID1 and RAID5 are rated differently > by different people - and especially with RAID5 there are (I think) > really performance differencies between the various products. RAID0 is > fastest, of course, but you probably care for your data. that's exactly what i'm asking about: which raid is best suited for which data amongst out 3 sets (xlog, tables, indices). or maybe for some types of data single disc is better than raid for some strange reason? is it better to (when having 2 discs) setup raid 0/1 or to use tham separatelly as xlog/tables? depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ Mój Boże, spraw abym milczał, dopóki się nie upewnię, że naprawdę mam coś do powiedzenia. (c) 1998 depesz ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
On Mon, 2002-12-09 at 07:05, Hubert depesz Lubaczewski wrote: > On Mon, Dec 09, 2002 at 01:01:58PM +0100, Adrian 'Dagurashibanipal' von Bidder wrote: > > A bit more data is needed before anybody can give you more help: > > - what is your budget? > > - how big will your databases be? > > - what's the read/write ratio? > > my question as for now is purely theoretical. i'm not asking about any > specific situation, but me may talk about medium sized web size. budget > is irrelevant (i'd like to talk *only* about harddrives, not memory, > architescure and so on). What is "medium sized web"? The *system* *is* important!! Stuffing your box with RAM may, in fact, override your disks, if the RAM caches enough. > > - as you correctly said: distribute the load on many spindles. On a > > busy database, 4*20G is probably faster than 1*80G > > as i said: i know that 3 disks are bettar than 1 (as for postgres > installation, because system data and swap should be on 4th disc - but > this is obvious). > > > beyound this, experiences vary. RAID1 and RAID5 are rated differently by > > different people - and especially with RAID5 there are (I think) really > > performance differencies between the various products. RAID0 is fastest, > > of course, but you probably care for your data. > > that's exactly what i'm asking about: which raid is best suited for > which data amongst out 3 sets (xlog, tables, indices). or maybe for some > types of data single disc is better than raid for some strange reason? > is it better to (when having 2 discs) setup raid 0/1 or to use tham > separatelly as xlog/tables? These are *GENERALITIES*!!!! _All_ is dependent on which SCSI controller you choose, and how much cache it has!!!!!!!! - RAID0 does *great* at both reading and writing, but everyone knows that it is insecure. - RAID1 does better than JBOD at reading and writing, but not as good as RAID0. - RAID01 and RAID10 do just about as good as RAID0. - RAID5 does great with reads, but bad with writes, *unless* the controller has *lots* of cache. Then, write speeds are great. Slightly off topic: if I have Important Data, then I would not trust a caching controlller unless it has a battery backup. Unfortunately, the only "caching controlllers with battery backup" that I've seen are pretty expensive... -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "they love our milk and honey, but preach about another | | way of living" | | Merle Haggard, "The Fighting Side Of Me" | +------------------------------------------------------------+
> > Er, no, it's not. In fact, for a 3-disk config, I reccommend: > > Disk 1: OS, swap, system logs > Disk 2: Data + Indexes > Disk 3: Transaction Log What is the accepted way of splitting the data from pg_xlog? I've been testing some configurations for low budget performance, and I haven't been able to make this help vs. one disk.(under osx, ymmv) I rsync'd the pg_xlog directory to another disk, then set up a symlink pointing from the data/pg_xlog to /other/disk/pg_xlog. I then got tps numbers that were 2/3 of the single ide drive speed. The only explanation I can come up with is that somethingis seeking to the symlink, then doing the actual write on the other drive. I'm going to try this under linux using mount points, but I need to shuffle hardware first. any ideas? eric
Eric, > I'm going to try this under linux using mount points, but I need to shuffle hardware first. This is the only way I've done it. I'm not sure what the Mac problem is. -- -Josh Berkus Aglio Database Solutions San Francisco
On Mon, 9 Dec 2002, eric soroos wrote: > > > > Er, no, it's not. In fact, for a 3-disk config, I reccommend: > > > > Disk 1: OS, swap, system logs > > Disk 2: Data + Indexes > > Disk 3: Transaction Log > > What is the accepted way of splitting the data from pg_xlog? You really can't split it so to speak. It all needs to be in one place. Or do you mean splitting the load? Maybe putting it onto a RAID0 partition, but that's chancy. > I've been testing some configurations for low budget performance, and I > haven't been able to make this help vs. one disk. (under osx, ymmv) I haven't found anything that helps much either, except for fast drives. You can, however, turn on the noatime mounting option under Linux (BSD has something similar) and it should help speed things up on any file system. You can also try turning on the async option, but I'm not sure this is a problem or not for data integrity on a transaction log file system. Comments? > I rsync'd the pg_xlog directory to another disk, then set up a symlink > pointing from the data/pg_xlog to /other/disk/pg_xlog. > > I then got tps numbers that were 2/3 of the single ide drive speed. The > only explanation I can come up with is that something is seeking to the > symlink, then doing the actual write on the other drive. rsync isn't still running is it? you can just use the cp command while the database is shut down to move the pg_xlog dir. like so: pg_ctl stop mkdir /mnt/bigdog/pg_xlog chown postgres.postgres /mnt/bigdog/pg_xlog chmod 700 /mnt/bigdog/pg_xlog cd $PGDATA cp -Rfp pg_xlog/* /mnt/bigdog/pg_xlog/ mv pg_xlog pg_xlog.old (I always keep stuff till I'm sure I really don't need it.) ln -s /mnt/bigdog/pg_xlog pg_xlog pg_ctl start Don't forget, noatime in the mount options, makes a big difference.
> > I've been testing some configurations for low budget performance, and I > > haven't been able to make this help vs. one disk. (under osx, ymmv) > > I haven't found anything that helps much either, except for fast drives. > > You can, however, turn on the noatime mounting option under Linux (BSD has > something similar) and it should help speed things up on any file system. I don't think that's an option for osx/hfs mounts, at least mount doesn't list it. (not that mount really works on 10.1.x,but whatever) > You can also try turning on the async option, but I'm not sure this is a > problem or not for data integrity on a transaction log file system. > Comments? from man mount: async All I/O to the file system should be done asynchronously. This is a dangerous flag to set, and should not be used unless you are prepared to recreate the file system should your system crash. I'd guess that this is about the same as fsync = off, except that it's your os lying to you instead of your database. > > > I rsync'd the pg_xlog directory to another disk, then set up a symlink > > pointing from the data/pg_xlog to /other/disk/pg_xlog. > > > > I then got tps numbers that were 2/3 of the single ide drive speed. The > > only explanation I can come up with is that something is seeking to the > > symlink, then doing the actual write on the other drive. > > rsync isn't still running is it? you can just use the cp command while > the database is shut down to move the pg_xlog dir. like so: rsync == copy, it's just that I remember the command line switches for it. > pg_ctl stop > mkdir /mnt/bigdog/pg_xlog > chown postgres.postgres /mnt/bigdog/pg_xlog > chmod 700 /mnt/bigdog/pg_xlog > cd $PGDATA > cp -Rfp pg_xlog/* /mnt/bigdog/pg_xlog/ > mv pg_xlog pg_xlog.old (I always keep stuff till I'm sure I really don't > need it.) > ln -s /mnt/bigdog/pg_xlog pg_xlog > pg_ctl start > This is about what I did, except that /mnt/bigdog/pg_xlog == /Volumes/scsi1. Where you can do something different is mount bigdog at data/pg_xlog, instead of using the symlinks. Given the interestingstate of filesystem tools under osx, I can't really do that. (at least under 10.1.5, looks like the laptop running10.2 has a little more info. not that the laptop has room for a 3.5" 10k rpm scsi drive & pci scsi card for testing...) eric
On Mon, 9 Dec 2002, Josh Berkus wrote: > > Depesz, > > > i have a question about best harddisk configuration for postgresql > > performance. > > of course i know that: > > - scsi is better than ide > > - 2 disks are better than 1 > > - 3 disks are better than 2 > > > > i know that with 3 disks one should move xlog to one drive, index files > > to second and tables to third. > > that's clear. > > Er, no, it's not. In fact, for a 3-disk config, I reccommend: > > Disk 1: OS, swap, system logs > Disk 2: Data + Indexes > Disk 3: Transaction Log Actually, first I'd try one big RAID 5 and see how it runs. THEN I'd spend time mucking around with different configs if that wasn't fast enough. If you need x performance and get 10x with a RAID 5 then move on to more interesting problems. > > but: > > will making software raid on this discs provide performance increase or > > decrease? > > Hardware RAID can improve *read* performance, particilarly RAIDs 1, 01, and > 10. For writing, the best you can do is having it not inhibit performance. > The general testament is that *software* RAID does not improve things at all; > actually, the best that can be said for Linux Software RAID 1 is that it does > not harm performance much. Not in my experience. I'd estimate my test box with dual 18 Gig UW scsis runs about 1.5 to 1.8 times faster with the two drives in a RAID1 as if a single one is used. Bonnie confirms this. single drive can read about 25 Megs a second, a pair in a RAID1 reads at about 48 Megs a second. But as you pointed out in your reply, it's more important to look at how he's gonna drive the database. If it has to input hundreds of short queries a second, that's a whole different problem than a data warehouse with 500 people throwing 8 way joins at the data all day.
Scott, > Actually, first I'd try one big RAID 5 and see how it runs. THEN I'd > spend time mucking around with different configs if that wasn't fast > enough. If you need x performance and get 10x with a RAID 5 then move on > to more interesting problems. Depends on how much time you have to spend re-installing. IMHO, RAID 5 is slower that straight disks for Postgres, especially with large numbers of writes. This may not be true for $1000 RAID controllers, but I have yet to use one. I have a box with a low-end RAID 5 controller, and it drives like a single IDE drive on large UPDATE queries. Slower, somethimes. > Not in my experience. I'd estimate my test box with dual 18 Gig UW scsis > runs about 1.5 to 1.8 times faster with the two drives in a RAID1 as if > a single one is used. Bonnie confirms this. single drive can read about > 25 Megs a second, a pair in a RAID1 reads at about 48 Megs a second. This is Linux software RAID? > But as you pointed out in your reply, it's more important to look at how > he's gonna drive the database. If it has to input hundreds of short > queries a second, that's a whole different problem than a data warehouse > with 500 people throwing 8 way joins at the data all day. Definitely. -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Josh Berkus <josh@agliodbs.com> writes: >> I'm going to try this under linux using mount points, but I need to shuffle > hardware first. > This is the only way I've done it. I'm not sure what the Mac problem is. It sounds like OS X fails to optimize repeated lookups of the same symlink. I haven't tried to do any performance measurement of this myself, but if true a gripe to Apple would be in order. Most of the designs I've seen for clean tablespace handling will depend on symlinks much more than we do today, so a performance penalty for symlinks will *really* hurt further down the road. regards, tom lane
On Mon, 9 Dec 2002, Josh Berkus wrote: > > Scott, > > > Actually, first I'd try one big RAID 5 and see how it runs. THEN I'd > > spend time mucking around with different configs if that wasn't fast > > enough. If you need x performance and get 10x with a RAID 5 then move on > > to more interesting problems. > > Depends on how much time you have to spend re-installing. IMHO, RAID 5 is > slower that straight disks for Postgres, especially with large numbers of > writes. This may not be true for $1000 RAID controllers, but I have yet to > use one. Even the fastest RAID 5 boxes aren't superfast, but a RAID5 of 15k drives with a lot of drive in it does OK, since it can 1: spread small writes around on many different drives (i.e. if you have 12 drives, and a lot of small writes, a lot of them will be on different drives.) as well as spreading out random reads, while providing good large reads, i.e. sequential scans. The key to good RAID 5 is to throw as many drives as you possibly can at a problem, preferably across several SCSI interfaces. Or FC-AL. > I have a box with a low-end RAID 5 controller, and it drives like a single IDE > drive on large UPDATE queries. Slower, somethimes. Many low end RAID 5 controllers are pretty slow. The adaptec AIC133 series (I think that's the right number) are total dogs. The older AMI Mega raids were fast for their day, but any decent 350 MHz machine with a dual channed SymBIOS card will outrun it at RAID 5. > > Not in my experience. I'd estimate my test box with dual 18 Gig UW scsis > > runs about 1.5 to 1.8 times faster with the two drives in a RAID1 as if > > a single one is used. Bonnie confirms this. single drive can read about > > 25 Megs a second, a pair in a RAID1 reads at about 48 Megs a second. > > This is Linux software RAID? Yep. The kernel level drivers are quite fast in my experience, but they don't seem to give any improvement when layered (i.e. 1+0 or 0+1) over whatever is the slowest of the two layers. I.e. setting up a RAID5 of RAID0s results in almost the exact same performance as if you'd just setup the same number of drives under RAID 5 as you had mirror sets in RAID0. Since this is the case, you get better performance just going to RAID 5 with twice the disks and twice (-1n) the space.
i'm replying to my own letter to gather all replies in one mail. i got some replies. some of them are useful. some aren't really. i'm not really sure how the usage can modify "what is best for some part of database files". can you explain me how comes that for some uses it's best (performance-wise) to keep xlog's on straight disc, and tables on raid5 with lots' of disks, and for some other uses it's better to keep xlog on raid0 and tables on raid 10? anyway: what i understood is that usually the best (performance-wise), would be to put: xlog - separate - unraid'ed disk, or raid0 tables - any raid, but not raid 1 indices - any raid, but not raid 1 thanks for all replies. depesz -- hubert depesz lubaczewski http://www.depesz.pl/ ------------------------------------------------------------------------ Mój Boże, spraw abym milczał, dopóki się nie upewnię, że naprawdę mam coś do powiedzenia. (c) 1998 depesz
Attachment
On Tue, 2002-12-10 at 15:16, Hubert depesz Lubaczewski wrote: [snip] > anyway: what i understood is that usually the best (performance-wise), > would be to put: > xlog - separate - unraid'ed disk, or raid0 Unless your data is *easily* recreatable, NEVER RAID0!! > tables - any raid, but not raid 1 > indices - any raid, but not raid 1 Why not RAID1 (mirroring)? It speeds up both reads and writes. Things are also dependent on the RAID controller, since they all have different strengths and weaknesses. -- +---------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "My advice to you is to get married: If you find a good wife, | | you will be happy; if not, you will become a philosopher." | | Socrates | +---------------------------------------------------------------+
I have two tables A and B where A is a huge table with thousands of rows, B is a small table with only a couple of entries. I want to do something like SELECT A.ID A.Name FROM A JOIN B ON (A.ID = B.ID) And on the other hand I can have something like this SELECT A.ID A.Name FROM A WHERE A.ID IN (B_Id_List) B_Id_List is a string concatenation of B.ID. (ie, 1,2,3,4,5 ...) Which one is faster, more efficient? And if you could, which one is faster/more efficient under MS SQL Server 7? I am trying to develop a cross platform query, that is why I need to concern with performance under different databases. Thanks a lot! Wei
On Tue, 2002-12-10 at 23:20, Wei Weng wrote: > I have two tables A and B where A is a huge table with thousands of rows, B > is a small table with only a couple of entries. > > I want to do something like > > SELECT > A.ID > A.Name > FROM > A JOIN B ON (A.ID = B.ID) How is this query any different from: SELECT A.ID, A.Name FROM A, B WHERE A.ID = B.ID > And on the other hand I can have something like this > > SELECT > A.ID > A.Name > FROM > A > WHERE > A.ID IN (B_Id_List) > > B_Id_List is a string concatenation of B.ID. (ie, 1,2,3,4,5 ...) > > Which one is faster, more efficient? > > And if you could, which one is faster/more efficient under MS SQL Server 7? > I am trying to develop a cross platform query, that is why I need to > concern with performance under different databases. -- +---------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "My advice to you is to get married: If you find a good wife, | | you will be happy; if not, you will become a philosopher." | | Socrates | +---------------------------------------------------------------+
I don't think there is any. It is just another way to write an outer join. On Wed, 2002-12-11 at 00:46, Ron Johnson wrote: > On Tue, 2002-12-10 at 23:20, Wei Weng wrote: > > I have two tables A and B where A is a huge table with thousands of rows, B > > is a small table with only a couple of entries. > > > > I want to do something like > > > > SELECT > > A.ID > > A.Name > > FROM > > A JOIN B ON (A.ID = B.ID) > > How is this query any different from: > SELECT > A.ID, > A.Name > FROM > A, > B > WHERE > A.ID = B.ID > > > And on the other hand I can have something like this > > > > SELECT > > A.ID > > A.Name > > FROM > > A > > WHERE > > A.ID IN (B_Id_List) > > > > B_Id_List is a string concatenation of B.ID. (ie, 1,2,3,4,5 ...) > > > > Which one is faster, more efficient? > > > > And if you could, which one is faster/more efficient under MS SQL Server 7? > > I am trying to develop a cross platform query, that is why I need to > > concern with performance under different databases. -- Wei Weng Network Software Engineer KenCast Inc.
On Wed, Dec 11, 2002 at 11:26:20AM -0500, Wei Weng wrote: > I don't think there is any. It is just another way to write an outer > join. That's not exactly true. Doing A JOIN B ON (A.ID=B.ID) constrains the planner. See the section on explicit join order in the PostgreSQL manual. The IN locution, by the way, is almost always bad in Postgres. Avoid it. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110