Thread: INSERT performance

INSERT performance

From
Razvan Surdulescu
Date:
I'm running PostgreSQL 7.3.4 on Windows 2000/Cygwin (AMD Athlon XP+
1800, 512MB RAM).

I want to insert about 500 records into a table. The table is heavily
indexed (has about 10-12 indices created on it). The insert is performed
in a transaction block.

If I keep the indices on the table, the insert takes about 12 seconds.
If I drop the indices, do the insert, and then re-create the indices,
the entire process takes about 3-4 seconds.

I am somewhat surprised at both of those performance numbers above -- I
would have expected both of them to be a lot smaller (about half of what
they are). 500 records is not that big of a number! In particular, the
transacted insert without indices should be very fast!

Are these numbers about in range to what I should expect from
PostgreSQL? I looked through the various performance FAQs and I cannot
use the COPY command since the data needs to be processed first, and
this can only be done in memory.

Any suggestions/pointers would be much appreciated.

Thanks,

Razvan.


Re: INSERT performance

From
"Dann Corbit"
Date:
> -----Original Message-----
> From: Razvan Surdulescu [mailto:surdules@yahoo.com]
> Sent: Wednesday, October 29, 2003 8:41 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] INSERT performance
>
>
> I'm running PostgreSQL 7.3.4 on Windows 2000/Cygwin (AMD Athlon XP+
> 1800, 512MB RAM).
>
> I want to insert about 500 records into a table. The table is heavily
> indexed (has about 10-12 indices created on it). The insert
> is performed
> in a transaction block.
>
> If I keep the indices on the table, the insert takes about 12
> seconds.
> If I drop the indices, do the insert, and then re-create the indices,
> the entire process takes about 3-4 seconds.
>
> I am somewhat surprised at both of those performance numbers
> above -- I
> would have expected both of them to be a lot smaller (about
> half of what
> they are). 500 records is not that big of a number! In
> particular, the
> transacted insert without indices should be very fast!
>
> Are these numbers about in range to what I should expect from
> PostgreSQL? I looked through the various performance FAQs and
> I cannot
> use the COPY command since the data needs to be processed first, and
> this can only be done in memory.
>
> Any suggestions/pointers would be much appreciated.

Cygwin?
It will be faster under Linux.

500 records in 12 seconds is about 42/second.  Hard to know if that is
good or bad.  Is the machine under heavy use?  Are the records extremely
long?

You can still use the copy command as an API.  It will be faster than
the inserts, but there are (of course) caveats with its use.
http://developer.postgresql.org/docs/postgres/libpq-copy.html

Provide the SQL that defines the table and its indexes.

Provide the cardinality of the table.

Provide the average machine load during the insert operation.

Probably, you can get better answers if you provide more information.

What kind of disk drives do you have on your machine?  (More
importantly, where does PostgreSQL data reside?)

Re: INSERT performance

From
surdules@yahoo.com (Razvan Surdulescu)
Date:
DCorbit@connx.com ("Dann Corbit") wrote in message
news:<D90A5A6C612A39408103E6ECDD77B829408C2E@voyager.corporate.connx.com>...
> Cygwin? It will be faster under Linux.

I agree, I would also expect it to be faster under Linux. Would you
expect that the performance under Linux should be an order of
magnitude faster? I know that Cygwin I/O goes through something akin
to a translation layer, but I don't know how much of a performance hit
I should expect from that.

> 500 records in 12 seconds is about 42/second.  Hard to know if that is
> good or bad.  Is the machine under heavy use?  Are the records extremely
> long?

No one else is using the machine, and the records are short (at most
around 1k each).

> You can still use the copy command as an API.  It will be faster than
> the inserts, but there are (of course) caveats with its use.
> http://developer.postgresql.org/docs/postgres/libpq-copy.html

Thanks, I'll look into it.

> Provide the SQL that defines the table and its indexes.

Here is the approximate SQL statement (I cannot provide the original
statement for intellectual property reasons):

CREATE TABLE data (
    id char(32) NOT NULL, -- auto-generated from PHP using md5(...)

    -- the fieldN fields below have different lengths
    field1 varchar(5),
    field2 varchar(50),
    field3 varchar(10),
    ...
    field 20 varchar(255),

    PRIMARY KEY (id)
);

CREATE INDEX idx_field1 ON data(field1);
CREATE INDEX idx_field2 ON data(field2);
...
CREATE INDEX idx_field20 ON data(field20);

> Provide the cardinality of the table.

The table is empty (cardinality = 0).

> Provide the average machine load during the insert operation.

I will have to measure this and get back to you with it in a future
post. Here is what I can say from memory right now:

* If I do the INSERT with the indexes enabled, the HDD thrashes
visibly (audibly?) and the operation takes about 12 seconds.

* If I drop the indices, do the INSERT, and re-create the indices, the
HDD no longer thrashes, and the operation takes about 3-4 seconds.

> Probably, you can get better answers if you provide more information.

I agree -- I hope the information above is more illuminating.

> What kind of disk drives do you have on your machine?  (More
> importantly, where does PostgreSQL data reside?)

I have an ATA-100 7200 RPM HDD. The PostgreSQL data resides on this
drive (which also contains the Cygwin installation).

Thanks again,

Razvan.

Re: INSERT performance

From
Tom Lane
Date:
surdules@yahoo.com (Razvan Surdulescu) writes:
> DCorbit@connx.com ("Dann Corbit") wrote in message
news:<D90A5A6C612A39408103E6ECDD77B829408C2E@voyager.corporate.connx.com>...
>> 500 records in 12 seconds is about 42/second.  Hard to know if that is
>> good or bad.  Is the machine under heavy use?  Are the records extremely
>> long?

>> Provide the SQL that defines the table and its indexes.

> CREATE INDEX idx_field1 ON data(field1);
> CREATE INDEX idx_field2 ON data(field2);
> ...
> CREATE INDEX idx_field20 ON data(field20);

Uh, do you actually need an index on every column?

It's obvious that the index insertions are where the time is going.
You're getting close to 900 index insertions per second, which is not
bad at all on consumer-grade ATA disk hardware, if you ask me.  It might
help to raise shared_buffers, if you didn't already do that ... but the
real solution here is to only index the columns that you are actually
intending to search on.

            regards, tom lane

Re: INSERT performance

From
Razvan Surdulescu
Date:
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> surdules@yahoo.com (Razvan Surdulescu) writes:
> > CREATE INDEX idx_field1 ON data(field1);
> > CREATE INDEX idx_field2 ON data(field2);
> > ...
> > CREATE INDEX idx_field20 ON data(field20);
>
> Uh, do you actually need an index on every column?

Yes -- I need to search on all these columns, the size
of the table is expected to get very large, and each
column contains very heterogenous data (so indexing
makes sense).

> It's obvious that the index insertions are where the
> time is going.
> You're getting close to 900 index insertions per
> second, which is not
> bad at all on consumer-grade ATA disk hardware, if
> you ask me.

That's helpful to know -- thanks.

> It might
> help to raise shared_buffers, if you didn't already
> do that ... but the
> real solution here is to only index the columns that
> you are actually
> intending to search on.

I will look into the shared_buffers setting -- I have
not done anything with it thus far.

Razvan.


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree