Re: [HACKERS] Postgres Speed or lack thereof - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Postgres Speed or lack thereof
Date
Msg-id 21212.916630479@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Postgres Speed or lack thereof  (Vadim Mikheev <vadim@krs.ru>)
Responses Re: [HACKERS] Postgres Speed or lack thereof  (Tom <tom@sdf.com>)
Re: [HACKERS] Postgres Speed or lack thereof  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Vadim Mikheev <vadim@krs.ru> writes:
> John Holland wrote:
>> I've been lurking on this list for a little while. I have just done a
>> little tinkering with Postgres 6.4.2, comparing it to Oracle and mySQL on
>> Linux. It would appear that just doing a lot of inserts (ie 40000) in a
>> loop is enormously slower in Postgres in two ways that I tried it.
>> One - using a loop in Java that makes a JDBC call to insert.
>> Two- using plpgsql as a comparision to PL/SQL.
>> 
>> I really like the idea of an open source DB and am impressed with a lot I
>> see about PostgreSQL - however the speed difference is pretty bad -
>> 4.5 minutes versus about 20 seconds.

> Try to use BEGIN/END to run all inserts in single transaction
> and please let us know results.

I tried this myself and found that wrapping BEGIN/END around a series of
INSERT statements didn't make much difference at all.

On an HP 715 (75MHz RISC box, not very heavy iron by current standards),
I inserted about 13000 records into an initially-empty table having 38
columns (just because it's data I had handy...).  I timed it at:

Individual INSERT statements (as generated by pg_dump -d):33 inserts/sec
Same with BEGIN/END wrapped around 'em:34 inserts/sec
Single COPY statement (as generated by pg_dump without -d):1400 inserts/sec

This was for a simple table with no indexes.  In reality, this table
type has four b-tree indexes on different columns in our normal usage.
The COPY speed dropped to 325 inserts/sec when I had the indexes in
place.  I didn't bother trying the INSERTs that way.

The conventional wisdom is that you should use COPY for bulk loading,
and this result supports it...

John didn't say what hardware he's using, so I don't know how comparable
my result is to his 150 inserts/sec --- that might have been on a table
with many fewer columns, or maybe his machine is just faster.

As for where the time is going: "top" showed that the time for the
INSERT ops was almost all going into backend CPU time.  My guess is
that most of the cost is in parsing and planning the INSERT statements.
Pre-planned statement skeletons might help, but I think the real answer
will be to find a way to avoid constructing and parsing SQL statements
for simple operations like INSERT.  (One reason I'm interested in the
idea of a CORBA interface is that it might help here.)

My guess is that Oracle and mySQL have APIs that allow the construction
of an SQL INSERT command to be bypassed, and that's why they show up
better on this operation.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] doc/FAQ* files are hosed
Next
From: Tom
Date:
Subject: Re: [HACKERS] Postgres Speed or lack thereof