Thread: Maximum Possible Insert Performance?
My situation is this. We have a semi-production server where we pre-process data and then upload the finished data to our production servers. We need the fastest possible write performance. Having the DB go corrupt due to power loss/OS crash is acceptable because we can always restore from last night and re-run everything that was done since then. I already have fsync off. Short of buying more hardware -- which I will probably do anyways once I figure out whether I need more CPU, memory or disk -- what else can I do to max out the speed? Operation mix is about 50% select, 40% insert, 10% update.
William Yu <wyu@talisys.com> writes: > [ we don't care about data integrity ] > I already have fsync off. Short of buying more hardware -- which I will > probably do anyways once I figure out whether I need more CPU, memory or > disk -- what else can I do to max out the speed? Operation mix is about > 50% select, 40% insert, 10% update. Batch operations so you commit more than one insert per transaction. (With fsync off, this isn't such a killer consideration as it would be with fsync on, but the per-transaction overhead is still nontrivial.) Get rid of as many integrity constraints as you feel can reasonably be postponed to the final upload. FK checks are particularly painful. Eliminate indexes where possible. Also (I hate to say this, but...) you should consider using Some Other Database. "I don't care about data integrity, only speed" sounds like a good fit to MySQL ... regards, tom lane
William, > I already have fsync off. Short of buying more hardware -- which I will > probably do anyways once I figure out whether I need more CPU, memory or > disk -- what else can I do to max out the speed? Operation mix is about > 50% select, 40% insert, 10% update. Disk. Multi-channel RAID is where it's at, and/or RAID with a great write cache enabled. For really fast updates, I'd suggest 6-disk or even 8-disk RAID 1+0. As soon as you have gobs of extra disk space, jack your checkpoint_buffers way up, like a couple of gigs. -- Josh Berkus Aglio Database Solutions San Francisco
William Yu wrote: > My situation is this. We have a semi-production server where we > pre-process data and then upload the finished data to our production > servers. We need the fastest possible write performance. Having the DB > go corrupt due to power loss/OS crash is acceptable because we can > always restore from last night and re-run everything that was done since > then. If you can, use COPY -- it is far faster than INSERT. See: http://www.postgresql.org/docs/current/static/sql-copy.html HTH, Joe
William Yu wrote: > My situation is this. We have a semi-production server where we > pre-process data and then upload the finished data to our production > servers. We need the fastest possible write performance. Having the DB > go corrupt due to power loss/OS crash is acceptable because we can > always restore from last night and re-run everything that was done since > then. > > I already have fsync off. Short of buying more hardware -- which I will > probably do anyways once I figure out whether I need more CPU, memory or > disk -- what else can I do to max out the speed? Operation mix is about > 50% select, 40% insert, 10% update. Mount WAL on RAM disk. WAL is most often hit area for heavy updates/inserts. If you spped that up, things should be pretty faster. A non-tried advice though. Given that you can afford a crash, I would say it is worth a try.. Shridhar
William Yu wrote: > My situation is this. We have a semi-production server where we > pre-process data and then upload the finished data to our production > servers. We need the fastest possible write performance. Having the DB > go corrupt due to power loss/OS crash is acceptable because we can > always restore from last night and re-run everything that was done since > then. > > I already have fsync off. Short of buying more hardware -- which I will > probably do anyways once I figure out whether I need more CPU, memory or > disk -- what else can I do to max out the speed? Operation mix is about > 50% select, 40% insert, 10% update. In line with what Tom Lane said, you may want to look at the various memory databases available (I'm not familiar with any one to recommend, though) If you can fit the whole database in RAM, that would work great, if not, you may be able to split the DB up and put the most used tables just in the memory database. I have also seen a number tutorials on how to put a database on a RAM disk. This helps, but it's still not as fast as a database server that's designed to keep all its data in RAM. -- Bill Moran Potential Technologies http://www.potentialtech.com
This is an intriguing thought which leads me to think about a similar solution for even a production server and that's a solid state drive for just the WAL. What's the max disk space the WAL would ever take up? There's quite a few 512MB/1GB/2GB solid state drives available now in the ~$200-$500 range and if you never hit those limits... When my current job batch is done, I'll save a copy of the dir and give the WAL on ramdrive a test. And perhaps even buy a Sandisk at the local store and run that through the hooper. Shridhar Daithankar wrote: > > Mount WAL on RAM disk. WAL is most often hit area for heavy > updates/inserts. If you spped that up, things should be pretty faster.
William, > When my current job batch is done, I'll save a copy of the dir and give > the WAL on ramdrive a test. And perhaps even buy a Sandisk at the local > store and run that through the hooper. We'll be interested in the results. The Sandisk won't be much of a performance test; last I checked, their access speed was about 1/2 that of a fast SCSI drive. But it could be a feasability test for the more expensive RAMdrive approach. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > William, > > >>When my current job batch is done, I'll save a copy of the dir and give >>the WAL on ramdrive a test. And perhaps even buy a Sandisk at the local >>store and run that through the hooper. > > > We'll be interested in the results. The Sandisk won't be much of a > performance test; last I checked, their access speed was about 1/2 that of a > fast SCSI drive. But it could be a feasability test for the more expensive > RAMdrive approach. > The SanDisks do seem a bit pokey at 16MBps. On the otherhand, you could get 4 of these suckers, put them in a mega-RAID-0 stripe for 64MBps. You shouldn't need to do mirroring with a solid state drive. Time to Google up some more solid state drive vendors.
William, > The SanDisks do seem a bit pokey at 16MBps. On the otherhand, you could > get 4 of these suckers, put them in a mega-RAID-0 stripe for 64MBps. You > shouldn't need to do mirroring with a solid state drive. I wouldn't count on RAID0 improving the speed of SANDisk's much. How are you connecting to them? USB? USB doesn't support fast parallel data access. Now, if it turns out that 256MB ramdisks are less than 1/5 the cost of 1GB ramdisks, then that's worth considering. You're right, though, mirroring a solid state drive is pretty pointless; if power fails, both mirrors are dead. As I said before, though, we're all very interested in this test. Using a ramdisk for WAL has been discussed on this list numerous times but not attempted by anyone who published their results. All that aside, though, I think you should also experiment with the Background Writer patch recently discussed on Hackers, as it may give you a performance boost as well. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > William, > > >>The SanDisks do seem a bit pokey at 16MBps. On the otherhand, you could >>get 4 of these suckers, put them in a mega-RAID-0 stripe for 64MBps. You >>shouldn't need to do mirroring with a solid state drive. > > > I wouldn't count on RAID0 improving the speed of SANDisk's much. How are you > connecting to them? USB? USB doesn't support fast parallel data access. You can get ATA SanDisks up to 2GB. Another vendor I checked out -- BitMicro -- has solid state drives for SATA, SCSI and FiberChannel. I'd definitely would not use USB SSDs -- USB performance would be so pokey to be useless. > Now, if it turns out that 256MB ramdisks are less than 1/5 the cost of 1GB > ramdisks, then that's worth considering. Looks like they're linear with size. SanDisk Flashdrive 1GB is about $1000 while 256MB is $250. > You're right, though, mirroring a solid state drive is pretty pointless; if > power fails, both mirrors are dead. Actually no. Solid state memory is non-volatile. They retain data even without power.
William Yu <wyu@talisys.com> writes: > > You're right, though, mirroring a solid state drive is pretty pointless; if > > power fails, both mirrors are dead. > > Actually no. Solid state memory is non-volatile. They retain data even without > power. Note that flash ram only has a finite number of write cycles before it fails. On the other hand that might not be so bad for WAL which writes sequentially, you can easily calculate how close you are to the maximum. For things like heap storage or swap it's awful as you can get hot spots that get written to thousands of times before the rest of the space is used. -- greg
William Yu wrote: > This is an intriguing thought which leads me to think about a similar > solution for even a production server and that's a solid state drive for > just the WAL. What's the max disk space the WAL would ever take up? > There's quite a few 512MB/1GB/2GB solid state drives available now in > the ~$200-$500 range and if you never hit those limits... Maximum number of WAL segments at any time in 2*(number of checkpoint segments)+1 IIRC. So if you have 3 checkpoint segments, you can not have more than 7 WAL segments at any time. Give or take 1. Correct me if I am wrong.. Shridhar
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > William Yu wrote: >> This is an intriguing thought which leads me to think about a similar >> solution for even a production server and that's a solid state drive for >> just the WAL. What's the max disk space the WAL would ever take up? > Maximum number of WAL segments at any time in 2*(number of checkpoint > segments)+1 IIRC. > So if you have 3 checkpoint segments, you can not have more than 7 WAL > segments at any time. Give or take 1. I don't believe that's a *hard* limit. The system tries to schedule checkpoints often enough to prevent WAL from getting bigger than that, but if you had a sufficiently big spike in update activity, it's at least theoretically possible that more than checkpoint_segments segments could be filled before the concurrently running checkpoint finishes and releases some old segments. The odds of this being a real problem are small, especially if you don't try to fit on an undersized SSD by reducing checkpoint_segments. I'd think that a 512Mb SSD would be plenty of space for ordinary update load levels ... regards, tom lane
Josh Berkus wrote: > William, > > >>When my current job batch is done, I'll save a copy of the dir and give >>the WAL on ramdrive a test. And perhaps even buy a Sandisk at the local >>store and run that through the hooper. > > > We'll be interested in the results. The Sandisk won't be much of a > performance test; last I checked, their access speed was about 1/2 that of a > fast SCSI drive. But it could be a feasability test for the more expensive > RAMdrive approach. Some initial numbers. I simulated a CPU increase by underclocking the processors. Most of the time, performance does not scale linearly with clock speed but since I also underclocked the FSB and memory bandwidth with the CPU, it's nearly an exact match. 1.15GHz 6.14 1.53GHz 6.97 +33% CPU = +13.5% performance I then simulated adding a heapload of extra memory by running my job a second time. Unfortunately, to keep my 25GB DB mostly cached in memory, the word heapload is too accurate. Run 1 6.97 Run 2 7.99 +14% I popped in an extra IDE hard drive to store the WAL files and that boosted the numbers by a little. From looking at iostat, the ratio looked like 300K/s WAL for 1MB/s data. WAL+Data on same disk 6.97 WAL+Data separated 7.26 +4% I then tried to put the WAL directory onto a ramdisk. I turned off swapping, created a tmpfs mount point and copied the pg_xlog directory over. Everything looked fine as far as I could tell but Postgres just panic'd with a "file permissions" error. Anybody have thoughts to why tmpfs would not work?
William Yu <wyu@talisys.com> writes: > I then tried to put the WAL directory onto a ramdisk. I turned off > swapping, created a tmpfs mount point and copied the pg_xlog directory > over. Everything looked fine as far as I could tell but Postgres just > panic'd with a "file permissions" error. Anybody have thoughts to why > tmpfs would not work? I'd say you got the file or directory ownership or permissions wrong. regards, tom lane
Tom Lane wrote: > William Yu <wyu@talisys.com> writes: > >>I then tried to put the WAL directory onto a ramdisk. I turned off >>swapping, created a tmpfs mount point and copied the pg_xlog directory >>over. Everything looked fine as far as I could tell but Postgres just >>panic'd with a "file permissions" error. Anybody have thoughts to why >>tmpfs would not work? > > > I'd say you got the file or directory ownership or permissions wrong. I did a mv instead of a cp which duplicates ownership & permissions exactly.
But the permissions of the base ramdisk might be wrong. I'd su to the user that you run postgres as (probably postgres), and make sure that you can go to the directory where the log and the database files are and make sure you can see the files. On Wed, Nov 26, 2003 at 10:03:47AM -0800, William Yu wrote: > Tom Lane wrote: > >William Yu <wyu@talisys.com> writes: > > > >>I then tried to put the WAL directory onto a ramdisk. I turned off > >>swapping, created a tmpfs mount point and copied the pg_xlog directory > >>over. Everything looked fine as far as I could tell but Postgres just > >>panic'd with a "file permissions" error. Anybody have thoughts to why > >>tmpfs would not work? > > > > > >I'd say you got the file or directory ownership or permissions wrong. > > I did a mv instead of a cp which duplicates ownership & permissions exactly. > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com
On Mon, 2003-11-24 at 19:16, Greg Stark wrote: > William Yu <wyu@talisys.com> writes: > > > > You're right, though, mirroring a solid state drive is pretty pointless; if > > > power fails, both mirrors are dead. > > > > Actually no. Solid state memory is non-volatile. They retain data even without > > power. > > Note that flash ram only has a finite number of write cycles before it fails. > > On the other hand that might not be so bad for WAL which writes sequentially, > you can easily calculate how close you are to the maximum. For things like > heap storage or swap it's awful as you can get hot spots that get written to > thousands of times before the rest of the space is used. I could be wrong, but I was under the impression that most of the newer flash disks tended to spread writes out over the drive so that hotspots are minimized. -- Suchandra Thapa <ssthapa@netzero.com>