Thread: Performance

Performance

From

Steven Bradley

Date:

23 June 1999, 18:03:09

I'm having some problems achieving adequate performance from Postgres for a
real-time event logging application.  The way I'm interfacing to the
database may be the problem:

I have simplified the problem down to a single (non-indexed) table with
about a half-dozen columns (int4, timestamp, varchar, etc.)   I wrote a
quick and dirty C program which uses the libpq interface to INSERT records
into the table in real-time.  The best performance I could achieve was on
the order of 15 inserts per second.  What I need is something much closer
to 100 inserts per second.

I wanted to use a prepared SQL statement, but it turns out that Postgres
runs the query through the parser-planner-executor cycle on each iteration.There is no way to prevent this.

The next thing I though of doing was to "bulk load" several records in one
INSERT through the use of array processing.  Do any of the Postgres
interfaces support this?  (by arrays, I don't mean array columns in the
table).

I'm currently running Postgres 6.4.2.  I've heard that 6.5 has improved
performance; does anyone have any idea what the performance improvement is
like?

Is it unrealistic to expect Postgres to insert on the order of 100 records
per second on a Pentium 400 MHz/SCSI class machine running Linux?  (Solaris
on a comparable platform has about 1/2 the performance)

Thanks in advance...

Steven Bradley
Lawrence Livermore National Laboratory
sbradley@llnl.gov

Re: [INTERFACES] Performance

From

Tom Lane

Date:

23 June 1999, 19:31:28

Steven Bradley <sbradley@llnl.gov> writes:
> I wanted to use a prepared SQL statement, but it turns out that Postgres
> runs the query through the parser-planner-executor cycle on each iteration.
>  There is no way to prevent this.

> The next thing I though of doing was to "bulk load" several records in one
> INSERT through the use of array processing.  Do any of the Postgres
> interfaces support this?  (by arrays, I don't mean array columns in the
> table).

If you're just inserting literal data, use COPY FROM STDIN instead of
INSERTs.  Much less per-tuple overhead, although it's uglier.
        regards, tom lane

Re: [INTERFACES] Performance

From

Vadim Mikheev

Date:

23 June 1999, 21:21:23

Steven Bradley wrote:
> 
> I have simplified the problem down to a single (non-indexed) table with
> about a half-dozen columns (int4, timestamp, varchar, etc.)   I wrote a
> quick and dirty C program which uses the libpq interface to INSERT records
> into the table in real-time.  The best performance I could achieve was on
> the order of 15 inserts per second.  What I need is something much closer
> to 100 inserts per second.

Put INSERTs inside BEGIN/END to execute them in single transaction.

Vadim

Re: [INTERFACES] Performance

From

Thomas Lockhart

Date:

23 June 1999, 22:01:08

> Is it unrealistic to expect Postgres to insert on the order of 100 
> records per second on a Pentium 400 MHz/SCSI class machine running 
> Linux?  (Solaris on a comparable platform has about 1/2 the 
> performance)

I'd expect roughly equivalent performance on different platforms. I
measured the insert performance a couple of years ago and found
roughly 50 inserts per second on my dual processor, 180MHz PPro
system, which was the same performance as I had on an Ingres
installation on a medium-speed Alpha/OSF-1 box.

Without knowing more about your memory and disk setup, I'll guess that
you can use more memory, can bump buffer size by using "-B 1024" or
something similar, and running without mandatory fsync() (use the "-F"
option). However, istm that your system should do better than 10ips
before you invoke "-F", otherwise you'll just be working around
another underlying problem/bottleneck.

Good luck.
                  - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: [INTERFACES] Performance

From

Tom Ivar Helbekkmo

Date:

24 June 1999, 02:23:12

Steven Bradley <sbradley@llnl.gov> writes:

> I have simplified the problem down to a single (non-indexed) table with
> about a half-dozen columns (int4, timestamp, varchar, etc.)   I wrote a
> quick and dirty C program which uses the libpq interface to INSERT records
> into the table in real-time.  The best performance I could achieve was on
> the order of 15 inserts per second.  What I need is something much closer
> to 100 inserts per second.

I actually did pretty much the same thing just as a quick test of
version 6.5 the other day.  On my system (PostgreSQL 6.5 under NetBSD
1.4 on a 350MHz Pentium II with 128MB of RAM, and fast SCSI disks
behind an Adaptec 2940 on a PCI bus), I got about 25 inserts per
second.  I then wrapped the whole series of inserts in a BEGIN/END
block.  This increased the speed to about 1000 inserts per second.
Yup.  A thousand.  That's 40 times the speed.  Pretty cool.  :-)

For the actual application I have in mind, I'll be dropping BEGIN and
END statements into the command stream based on insert rate, I think.

-tih
-- 
Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"

Re: [INTERFACES] Performance

From

Maarten Boekhold

Date:

24 June 1999, 03:55:33

> second.  I then wrapped the whole series of inserts in a BEGIN/END
> block.  This increased the speed to about 1000 inserts per second.
> Yup.  A thousand.  That's 40 times the speed.  Pretty cool.  :-)

How did you measure this? Did you do (pseudo code):

start_time = now
SQL BEGIN
SQL INSERT's..
...
SQL END
end_time = now
print number_inserts / (end_time - start_time)

or
SQL BEGIN
start_time = now
SQL INSERT's
....
end_time = now
SQL END
print number_inserts / (end_time - start_time)

This makes a difference, since a lot of processing is done on commit time. The second
variant might thus be misleading.

Maarten

-- 

Maarten Boekhold, boekhold@tibco.com
TIBCO Finance Technology Inc.
The Atrium
Strawinskylaan 3051
1077 ZX Amsterdam, The Netherlands
tel: +31 20 3012158, fax: +31 20 3012358
http://www.tibco.com

Re: [INTERFACES] Performance

From

Tom Ivar Helbekkmo

Date:

24 June 1999, 04:08:09

Maarten Boekhold <boekhold@tibco.com> writes:

> This makes a difference, since a lot of processing is done on commit
> time. The second variant might thus be misleading.

Yup, but I did include the time to commit in the measurement.  In fact,
I used the time(1) command to time the whole run of the client program.

-tih
-- 
Popularity is the hallmark of mediocrity.  --Niles Crane, "Frasier"

Re: [INTERFACES] Performance

From

David Warnock

Date:

24 June 1999, 05:08:46

Steven,

I have not experimented much with this in Postgresql but in every other
dbms I have used the write speed is not consistant. After you run for a
while there will be a pause I guess while caches are flushed.

To be certain of capturing all your data I would think you need at least
2 threads. One to capure the data and put in into a queue of some sort.
The other would take it off the queue and insert it into postgresql.
Providing the average speed of postgresql is higher than the capture
rate you are OK.

If the dbms is being accessed by other users I wonder  if you might have
even more problems. If so maybe you should have 2 processes. One
captures to text files and starts a new text file every x rows. The
other processes a file at a time each as one transaction. You could use
a table in postgresql to keep track of the files and whether they have
been read in. Obviously this is not as transactionally safe but should
be able to handle load fluctuations much better.

You could of course then scale up by having multiple machines, one per
process and also by having more than one instance of the insert process
running at a time (on different machines). With the locoking schema of
v6.5 this should give a higher throughput than a single insert process.
You would have to experiment with the number of processes that give
maxium performance.

Dave

-- 
David Warnock
Sundayta Ltd