Re: [HACKERS] Postgres Speed or lack thereof - Mailing list pgsql-hackers

From Cary O'Brien
Subject Re: [HACKERS] Postgres Speed or lack thereof
Date
Msg-id 199901181532.KAA13078@saltmine.radix.net
Whole thread Raw
List pgsql-hackers
Tom Lane wrote
>Vadim Mikheev <vadim@krs.ru> writes:
>> John Holland wrote:
>>> I've been lurking on this list for a little while. I have just done a
>>> little tinkering with Postgres 6.4.2, comparing it to Oracle and mySQL on
>>> Linux. It would appear that just doing a lot of inserts (ie 40000) in a
>>> loop is enormously slower in Postgres in two ways that I tried it.
>>> One - using a loop in Java that makes a JDBC call to insert.
>>> Two- using plpgsql as a comparision to PL/SQL.
>>> 
>>> I really like the idea of an open source DB and am impressed with a lot I
>>> see about PostgreSQL - however the speed difference is pretty bad -
>>> 4.5 minutes versus about 20 seconds.
>
>> Try to use BEGIN/END to run all inserts in single transaction
>> and please let us know results.
>
>I tried this myself and found that wrapping BEGIN/END around a series of
>INSERT statements didn't make much difference at all.
>
>On an HP 715 (75MHz RISC box, not very heavy iron by current standards),
>I inserted about 13000 records into an initially-empty table having 38
>columns (just because it's data I had handy...).  I timed it at:
>
>Individual INSERT statements (as generated by pg_dump -d):
>        33 inserts/sec
>Same with BEGIN/END wrapped around 'em:
>        34 inserts/sec
>Single COPY statement (as generated by pg_dump without -d):
>        1400 inserts/sec
>
>This was for a simple table with no indexes.  In reality, this table
>type has four b-tree indexes on different columns in our normal usage.
>The COPY speed dropped to 325 inserts/sec when I had the indexes in
>place.  I didn't bother trying the INSERTs that way.
>
>The conventional wisdom is that you should use COPY for bulk loading,
>and this result supports it...
>
>John didn't say what hardware he's using, so I don't know how comparable
>my result is to his 150 inserts/sec --- that might have been on a table
>with many fewer columns, or maybe his machine is just faster.
>
>As for where the time is going: "top" showed that the time for the
>INSERT ops was almost all going into backend CPU time.  My guess is
>that most of the cost is in parsing and planning the INSERT statements.
>Pre-planned statement skeletons might help, but I think the real answer
>will be to find a way to avoid constructing and parsing SQL statements
>for simple operations like INSERT.  (One reason I'm interested in the
>idea of a CORBA interface is that it might help here.)
>
>My guess is that Oracle and mySQL have APIs that allow the construction
>of an SQL INSERT command to be bypassed, and that's why they show up
>better on this operation.

Oracle has something called direct path loading.  It is very fast, but
fussy.  You need to drop indexes, do the direct path load, and restore
the indexes.  If you do a direct path load with indexes, or stop 
halfway (? was a while ago ?) the table gets stuck in a funny mode
and you have to drop/recreate it.  See the sqlload manual for
details.

We used this, plus segmenting the database, plus ping-ponging between
two tables[1] on one big project.  Worked great except they cancelled
the project two weeks before we were supposed to go on-line.  Still
bitter about that one.


-- cary

[1] I.e have two tables, a and b.  Load b while accesses are going to
a, then when b is all loaded and indexed, have the apps switch to looking
up using b and load the data into a.  I think we used views or something
to do the switch.


pgsql-hackers by date:

Previous
From: John Holland
Date:
Subject: Re: [HACKERS] Postgres Speed or lack thereof
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] I need a PostgreSQL vacation