Thread: What's better: Raid 0 or disk for seperate pg_xlog
From rom http://www.powerpostgresql.com/PerfList/ "even in a two-disk server, you can put the transaction log onto the operating system disk and reap some benefits." Context: I have a two disk server that is about to become dedicated to postgresql (it's a sun v40z running gentoo linux). What's "theoretically better"? 1) OS and pg_xlog on one disk, rest of postgresql on the other? (if I understand the above correctly) 2) Everything striped Raid 0? 3) <some answer from someone smarter than me> TIA, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
Attachment
Karim Nassar wrote: >Context: I have a two disk server that is about to become dedicated to >postgresql (it's a sun v40z running gentoo linux). > >What's "theoretically better"? > >1) OS and pg_xlog on one disk, rest of postgresql on the other? (if I > understand the above correctly) >2) Everything striped Raid 0? > How lucky are you feeling? If you don't mind doubling your chances of data loss (a bit worse than that because recovery is nearly impossible), go ahead and use RAID 0 (which of course is not RAID by definition). The WAL on a separate disk is your best bet if the problem is slow updates. If prevention of data loss is a consideration, RAID 1 (mirroring) is the answer.
Karim Nassar wrote: From rom http://www.powerpostgresql.com/PerfList/ > >"even in a two-disk server, you can put the transaction log onto the >operating system disk and reap some benefits." > >Context: I have a two disk server that is about to become dedicated to >postgresql (it's a sun v40z running gentoo linux). > >What's "theoretically better"? > >1) OS and pg_xlog on one disk, rest of postgresql on the other? (if I > understand the above correctly) >2) Everything striped Raid 0? >3) <some answer from someone smarter than me> > >TIA, > > With 2 disks, you have 3 options, RAID0, RAID1, and 2 independent disks. RAID0 - Fastest read and write speed. Not redundant, if either disk fails you lose everything on *both* disks. RAID1 - Redundant, slow write speed, but should be fast read speed. If one disk fails, you have a backup. 2 independent - With pg_xlog on a separate disk, writing (updates) should stay reasonably fast. If one disk dies, you lose that disk, but not both. How critical is your data? How update heavy versus read heavy, etc are you? Do you have a way to restore the database if something fails? If you do nightly pg_dumps, will you survive if you lose a days worth of transactions? In general I would recommend RAID1, because that is the safe bet. If your db is the bottleneck, and your data isn't all that critical, and you are read heavy, I would probably go with RAID1, if you are write heavy I would say 2 independent disks. John =:->
Attachment
Am Donnerstag, 10. März 2005 08:44 schrieb Karim Nassar: > From rom http://www.powerpostgresql.com/PerfList/ > > "even in a two-disk server, you can put the transaction log onto the > operating system disk and reap some benefits." > > Context: I have a two disk server that is about to become dedicated to > postgresql (it's a sun v40z running gentoo linux). > > What's "theoretically better"? > > 1) OS and pg_xlog on one disk, rest of postgresql on the other? (if I > understand the above correctly) > 2) Everything striped Raid 0? > 3) <some answer from someone smarter than me> Because of hard disk seeking times, a separate disk for WAL will be a lot better. regards
Thanks to all for the tips. On Thu, 2005-03-10 at 09:26 -0600, John A Meinel wrote: > How critical is your data? How update heavy versus read heavy, etc are you? Large, relatively infrequent uploads, with frequent reads. The application is a web front-end to scientific research data. The scientists have their own copy of the data, so if something went really bad, we could probably get them to upload again. > Do you have a way to restore the database if something fails? If > you do nightly pg_dumps, will you survive if you lose a days worth of > transactions? For now, we have access to a terabyte backup server, and the DB is small enough that my sysadmin lets me have hourly pg_dumps for last 24 hours backed up nightly. Veritas is configured to save daily pg_dumps for the last week, a weekly dump for the last month and a monthly version for the last 6 months. > In general I would recommend RAID1, because that is the safe bet. If > your db is the bottleneck, and your data isn't all that critical, and > you are read heavy, I would probably go with RAID1, if you are write > heavy I would say 2 independent disks. I feel that we have enough data safety such that I want to go for speed. Some of the queries are very large joins, and I am going for pure throughput at this point - unless someone can find a hole in my backup tactic. Of course, later we will have money to throw at more spindles. But for now, I am trying gaze in to the future and maximize my current capabilities. Seems to me that the "best" solution would be: * disk 0 partition 1..n - os mounts partition n+1 - /var/lib/postgres/data/pg_xlog * disk 1 partition 1 - /var/lib/postgres/data * Further (safe) performance gains can be had by adding more spindles as such: - first disk: RAID1 to disk 1 - next 2 disks: RAID 0 across the above Do I grok it? Thanks again, -- Karim Nassar Department of Computer Science Box 15600, College of Engineering and Natural Sciences Northern Arizona University, Flagstaff, Arizona 86011 Office: (928) 523-5868 -=- Mobile: (928) 699-9221
Karim Nassar wrote: >Thanks to all for the tips. > > ... >>In general I would recommend RAID1, because that is the safe bet. If >>your db is the bottleneck, and your data isn't all that critical, and >>you are read heavy, I would probably go with RAID1, if you are write >> >> ^^^^^ -> RAID0 >>heavy I would say 2 independent disks. >> >> > >I feel that we have enough data safety such that I want to go for speed. >Some of the queries are very large joins, and I am going for pure >throughput at this point - unless someone can find a hole in my backup >tactic. > >Of course, later we will have money to throw at more spindles. But for >now, I am trying gaze in to the future and maximize my current >capabilities. > > >Seems to me that the "best" solution would be: > >* disk 0 partition 1..n - os mounts > partition n+1 - /var/lib/postgres/data/pg_xlog > >* disk 1 partition 1 - /var/lib/postgres/data > >* Further (safe) performance gains can be had by adding more spindles as >such: > - first disk: RAID1 to disk 1 > - next 2 disks: RAID 0 across the above > > Sounds decent to me. I did make the mistake that you might want to consider a RAID0. But the performance gains might be small, and you potentially lose everything. But your update strategy seems dead on. >Do I grok it? > >Thanks again, > > John =:->
Attachment
Karim Nassar wrote: > Thanks to all for the tips. > > On Thu, 2005-03-10 at 09:26 -0600, John A Meinel wrote: > >>How critical is your data? How update heavy versus read heavy, etc are you? > > > Large, relatively infrequent uploads, with frequent reads. The > application is a web front-end to scientific research data. The > scientists have their own copy of the data, so if something went really > bad, we could probably get them to upload again. If you have very few updates and your reads aren't mostly from RAM you could be better off with simply mirroring (assuming that gains you read bandwidth). Failing that, use the tablespace feature to balance your read load as far as you can. -- Richard Huxton Archonet Ltd