Thread: MVCC for massively parallel inserts
How good is Postgres' performance for massive simultaneous insertions into the same heavily-indexed table? Are there any studies or benchmarks I can look at for that? I understand Postgres uses MVCC rather than conventional locking, which makes it easier to do parallel inserts. In my environment, I will have so many inserts that it is unworkable to have one machine do all the inserting -- it would max out the CPU of even a very powerful machine, and in any case I'd like to avoid spending that much money. One option is to use a cluster of commodity Intel machines running Linux or one of the BSD's. In many database environments, that wouldn't buy me much because only one machine could do inserts while all the others would be used for selects. But I'm going to have tons of inserts and few selects. So I really need many of the machines in the cluster to be able to simultaneously insert. Is this practicable in a clustered environment for Postgres? Thanks in advance for any insight or references, steve
"Steven D.Arnold" <stevena@neosynapse.net> writes: > So I really need many of the machines in the cluster to be > able to simultaneously insert. Is this practicable in a clustered > environment for Postgres? No. You are mistaken to think that CPU is the bottleneck, anyway. It is going to be disk bandwidth that's your problem, and throwing more CPUs at it would not help. If your insert load exceeds the available disk bandwidth, I think you have little choice but to figure out a way to divide the workload among multiple independent databases. regards, tom lane
>You are mistaken to think that CPU is the bottleneck, anyway. It is >going to be disk bandwidth that's your problem, and throwing more CPUs >at it would not help. If your insert load exceeds the available disk >bandwidth, I think you have little choice but to figure out a way to >divide the workload among multiple independent databases. > > > If you have the money to purchase multiple machines, you can throw more hard drives at it... Hard drives are cheap... (well ide/sata anyway). Sincerely, Joshua D. Drake > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Joshua D. Drake wrote: > If you have the money to purchase multiple machines, you can throw more > hard drives at it... > > Hard drives are cheap... (well ide/sata anyway). Properly set up in a RAID-1/0 array, you'll get much better "bandwidth" out of those drives. Whether you RAID in software or hardware is up to you and your budget - but if you choose hardware, make sure you actually get a hardware RAID controller, not one of the cheapies which are just multiple-IDE-controllers-with-special-drivers. We use a 3Ware 7500 (can't remember the model name for the life of me), with 8 drives in a 4x2 RAID 1/0. IIRC, if the inserts are done in a transaction, the indexing gets done at the end of the batch rather than after each insert. Regards Alex Satrapa
Alex Satrapa <alex@lintelsys.com.au> writes: > Properly set up in a RAID-1/0 array, you'll get much better "bandwidth" out of > those drives. Whether you RAID in software or hardware is up to you and your > budget - but if you choose hardware, make sure you actually get a hardware RAID > controller, not one of the cheapies which are just > multiple-IDE-controllers-with-special-drivers. > > We use a 3Ware 7500 (can't remember the model name for the life of me), with 8 > drives in a 4x2 RAID 1/0. I would agree and if you really need the I/O bandwidth you can go to much larger stripe sets than even this. The documentation I've seen before suggested there were benefits up to stripe sets as large as twelve disks across. That would be 24 drives if you're also doing mirroring. Ideally separating WAL, index, and heap files is good, but you would have to experiment to see which works out fastest for a given number of drives. There are also some alternative approaches that could increase your throughput. For example, you could have your multiple machines receiving the data log the data to text files. Then you could copy the text files over to the database periodically and load the with COPY which is faster than a database insert. Also, if it fits your model you could load the data into fresh unindexed tables and then build a new index. Building a new index is a quicker operation than handling individual inserts. That would make selects more complex though, but you perhaps that's not a concern. > IIRC, if the inserts are done in a transaction, the indexing gets done at the > end of the batch rather than after each insert. I believe this is wrong. The whole point of postgres's style of MVCC is that each transaction can go ahead and do whatever modifications it needs and mark it with its transaction id, any other transaction simply ignores the data marked with transaction ids of uncommitted transactions. When commit time arrives there's very little work to do to do the commit beyond simply marking the transaction as committed. -- greg
>>We use a 3Ware 7500 (can't remember the model name for the life of me), with 8 >>drives in a 4x2 RAID 1/0. >> >> > >I would agree and if you really need the I/O bandwidth you can go to much >larger stripe sets than even this. The documentation I've seen before >suggested there were benefits up to stripe sets as large as twelve disks >across. That would be 24 drives if you're also doing mirroring. > > > Something I have been toying with is getting two of the 12 drive 3Ware cards and running RAID 0+1 across them (with LVM). At just under 300 for the cards and only 80 bucks a drive (80 Gig)... that is alot of space, and a lot of speed for not a lot of money. Sincerely, Joshua D >Ideally separating WAL, index, and heap files is good, but you would have to >experiment to see which works out fastest for a given number of drives. > > > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
"Joshua D. Drake" <jd@commandprompt.com> writes: > Something I have been toying with is getting two of the 12 drive 3Ware cards > and running RAID 0+1 across them (with LVM). At just under 300 for the cards > and only 80 bucks a drive (80 Gig)... that is alot of space, and a lot of > speed for not a lot of money. Unless I have things backwards, raid 0+1 means if any drive fails the whole side of the mirror fails. if two drives fail you have a 50/50 chance of the them being on opposite sides of the mirror and losing the whole thing. Even if you don't have a double failure, resyncing seems like it would be a pain in this situation. LVM wouldn't know about the stripe set so it would mean resyncing the entire 12-disk array on the failed side of the mirror. I thought it was generally preferable to do Raid 1+0 (aka "raid 10") where any two drives can fail and as long as they aren't precisely opposite each other you're still ok. And resyncing just means resyncing the one replacement drive, not the whole array. -- greg
>>>>> "GS" == Greg Stark <gsstark@mit.edu> writes: GS> I would agree and if you really need the I/O bandwidth you can go GS> to much larger stripe sets than even this. The documentation I've GS> seen before suggested there were benefits up to stripe sets as GS> large as twelve disks across. That would be 24 drives if you're GS> also doing mirroring. I did a bunch of testing with a 14 disk SCSI array. I found that RAID5 was best over RAID10 and RAID50. GS> Ideally separating WAL, index, and heap files is good, but you GS> would have to experiment to see which works out fastest for a GS> given number of drives. I found that putting the WAL on its own array (in my case a mirror on the other RAID controller channel) helped quite a bit. I don't think it is easy to split off index files to alternate locations with Postgres. Increasing the number of checkpoint segments was one of the biggest improvements I observed for mass-insert performance (as tested while doing a restore on a multi-million row database.) The combination of having the WAL on a separate disk, and letting that grow to be quite large has been very good for my performance and also for reducing disk bandwidth requirements. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/