Thread: MVCC for massively parallel inserts

MVCC for massively parallel inserts

From
Steven D.Arnold
Date:
How good is Postgres' performance for massive simultaneous insertions
into the same heavily-indexed table?  Are there any studies or
benchmarks I can look at for that?

I understand Postgres uses MVCC rather than conventional locking, which
makes it easier to do parallel inserts.  In my environment, I will have
so many inserts that it is unworkable to have one machine do all the
inserting -- it would max out the CPU of even a very powerful machine,
and in any case I'd like to avoid spending that much money.  One option
is to use a cluster of commodity Intel machines running Linux or one of
the BSD's.  In many database environments, that wouldn't buy me much
because only one machine could do inserts while all the others would be
used for selects.  But I'm going to have tons of inserts and few
selects.  So I really need many of the machines in the cluster to be
able to simultaneously insert.  Is this practicable in a clustered
environment for Postgres?

Thanks in advance for any insight or references,
steve


Re: MVCC for massively parallel inserts

From
Tom Lane
Date:
"Steven D.Arnold" <stevena@neosynapse.net> writes:
> So I really need many of the machines in the cluster to be
> able to simultaneously insert.  Is this practicable in a clustered
> environment for Postgres?

No.

You are mistaken to think that CPU is the bottleneck, anyway.  It is
going to be disk bandwidth that's your problem, and throwing more CPUs
at it would not help.  If your insert load exceeds the available disk
bandwidth, I think you have little choice but to figure out a way to
divide the workload among multiple independent databases.

            regards, tom lane

Re: MVCC for massively parallel inserts

From
"Joshua D. Drake"
Date:
>You are mistaken to think that CPU is the bottleneck, anyway.  It is
>going to be disk bandwidth that's your problem, and throwing more CPUs
>at it would not help.  If your insert load exceeds the available disk
>bandwidth, I think you have little choice but to figure out a way to
>divide the workload among multiple independent databases.
>
>
>
If you have the money to purchase multiple machines, you can throw more
hard drives at it...

Hard drives are cheap... (well ide/sata anyway).

Sincerely,

Joshua D. Drake




>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL


Re: MVCC for massively parallel inserts

From
Alex Satrapa
Date:
Joshua D. Drake wrote:
> If you have the money to purchase multiple machines, you can throw more
> hard drives at it...
>
> Hard drives are cheap... (well ide/sata anyway).

Properly set up in a RAID-1/0 array, you'll get much better "bandwidth"
out of those drives. Whether you RAID in software or hardware is up to
you and your budget - but if you choose hardware, make sure you actually
get a hardware RAID controller, not one of the cheapies which are just
multiple-IDE-controllers-with-special-drivers.

We use a 3Ware 7500 (can't remember the model name for the life of me),
with 8 drives in a 4x2 RAID 1/0.

IIRC, if the inserts are done in a transaction, the indexing gets done
at the end of the batch rather than after each insert.

Regards
Alex Satrapa


Re: MVCC for massively parallel inserts

From
Greg Stark
Date:
Alex Satrapa <alex@lintelsys.com.au> writes:

> Properly set up in a RAID-1/0 array, you'll get much better "bandwidth" out of
> those drives. Whether you RAID in software or hardware is up to you and your
> budget - but if you choose hardware, make sure you actually get a hardware RAID
> controller, not one of the cheapies which are just
> multiple-IDE-controllers-with-special-drivers.
>
> We use a 3Ware 7500 (can't remember the model name for the life of me), with 8
> drives in a 4x2 RAID 1/0.

I would agree and if you really need the I/O bandwidth you can go to much
larger stripe sets than even this. The documentation I've seen before
suggested there were benefits up to stripe sets as large as twelve disks
across. That would be 24 drives if you're also doing mirroring.

Ideally separating WAL, index, and heap files is good, but you would have to
experiment to see which works out fastest for a given number of drives.

There are also some alternative approaches that could increase your
throughput. For example, you could have your multiple machines receiving the
data log the data to text files. Then you could copy the text files over to
the database periodically and load the with COPY which is faster than a
database insert.

Also, if it fits your model you could load the data into fresh unindexed
tables and then build a new index. Building a new index is a quicker operation
than handling individual inserts. That would make selects more complex though,
but you perhaps that's not a concern.

> IIRC, if the inserts are done in a transaction, the indexing gets done at the
> end of the batch rather than after each insert.

I believe this is wrong. The whole point of postgres's style of MVCC is that
each transaction can go ahead and do whatever modifications it needs and mark
it with its transaction id, any other transaction simply ignores the data
marked with transaction ids of uncommitted transactions. When commit time
arrives there's very little work to do to do the commit beyond simply marking
the transaction as committed.

--
greg

Re: MVCC for massively parallel inserts

From
"Joshua D. Drake"
Date:
>>We use a 3Ware 7500 (can't remember the model name for the life of me), with 8
>>drives in a 4x2 RAID 1/0.
>>
>>
>
>I would agree and if you really need the I/O bandwidth you can go to much
>larger stripe sets than even this. The documentation I've seen before
>suggested there were benefits up to stripe sets as large as twelve disks
>across. That would be 24 drives if you're also doing mirroring.
>
>
>

Something I have been toying with is getting two of the 12 drive 3Ware cards
and running RAID 0+1 across them (with LVM). At just under 300 for the cards
and only 80 bucks a drive (80 Gig)... that is alot of space, and a lot
of speed for
not a lot of money.

Sincerely,

Joshua D



>Ideally separating WAL, index, and heap files is good, but you would have to
>experiment to see which works out fastest for a given number of drives.
>
>
>
--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL


Re: MVCC for massively parallel inserts

From
Greg Stark
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:

> Something I have been toying with is getting two of the 12 drive 3Ware cards
> and running RAID 0+1 across them (with LVM). At just under 300 for the cards
> and only 80 bucks a drive (80 Gig)... that is alot of space, and a lot of
> speed for not a lot of money.

Unless I have things backwards, raid 0+1 means if any drive fails the whole
side of the mirror fails. if two drives fail you have a 50/50 chance of the
them being on opposite sides of the mirror and losing the whole thing.

Even if you don't have a double failure, resyncing seems like it would be a
pain in this situation. LVM wouldn't know about the stripe set so it would
mean resyncing the entire 12-disk array on the failed side of the mirror.

I thought it was generally preferable to do Raid 1+0 (aka "raid 10") where any
two drives can fail and as long as they aren't precisely opposite each other
you're still ok. And resyncing just means resyncing the one replacement drive,
not the whole array.

--
greg

Re: MVCC for massively parallel inserts

From
Vivek Khera
Date:
>>>>> "GS" == Greg Stark <gsstark@mit.edu> writes:

GS> I would agree and if you really need the I/O bandwidth you can go
GS> to much larger stripe sets than even this. The documentation I've
GS> seen before suggested there were benefits up to stripe sets as
GS> large as twelve disks across. That would be 24 drives if you're
GS> also doing mirroring.

I did a bunch of testing with a 14 disk SCSI array.  I found that RAID5 was
best over RAID10 and RAID50.

GS> Ideally separating WAL, index, and heap files is good, but you
GS> would have to experiment to see which works out fastest for a
GS> given number of drives.

I found that putting the WAL on its own array (in my case a mirror on
the other RAID controller channel) helped quite a bit.  I don't think
it is easy to split off index files to alternate locations with Postgres.

Increasing the number of checkpoint segments was one of the biggest
improvements I observed for mass-insert performance (as tested while
doing a restore on a multi-million row database.)

The combination of having the WAL on a separate disk, and letting that
grow to be quite large has been very good for my performance and also
for reducing disk bandwidth requirements.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD  +1-301-869-4449 x806
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/