Thread: Performance
I'm having some problems achieving adequate performance from Postgres for a real-time event logging application. The way I'm interfacing to the database may be the problem: I have simplified the problem down to a single (non-indexed) table with about a half-dozen columns (int4, timestamp, varchar, etc.) I wrote a quick and dirty C program which uses the libpq interface to INSERT records into the table in real-time. The best performance I could achieve was on the order of 15 inserts per second. What I need is something much closer to 100 inserts per second. I wanted to use a prepared SQL statement, but it turns out that Postgres runs the query through the parser-planner-executor cycle on each iteration.There is no way to prevent this. The next thing I though of doing was to "bulk load" several records in one INSERT through the use of array processing. Do any of the Postgres interfaces support this? (by arrays, I don't mean array columns in the table). I'm currently running Postgres 6.4.2. I've heard that 6.5 has improved performance; does anyone have any idea what the performance improvement is like? Is it unrealistic to expect Postgres to insert on the order of 100 records per second on a Pentium 400 MHz/SCSI class machine running Linux? (Solaris on a comparable platform has about 1/2 the performance) Thanks in advance... Steven Bradley Lawrence Livermore National Laboratory sbradley@llnl.gov
Steven Bradley <sbradley@llnl.gov> writes: > I wanted to use a prepared SQL statement, but it turns out that Postgres > runs the query through the parser-planner-executor cycle on each iteration. > There is no way to prevent this. > The next thing I though of doing was to "bulk load" several records in one > INSERT through the use of array processing. Do any of the Postgres > interfaces support this? (by arrays, I don't mean array columns in the > table). If you're just inserting literal data, use COPY FROM STDIN instead of INSERTs. Much less per-tuple overhead, although it's uglier. regards, tom lane
Steven Bradley wrote: > > I have simplified the problem down to a single (non-indexed) table with > about a half-dozen columns (int4, timestamp, varchar, etc.) I wrote a > quick and dirty C program which uses the libpq interface to INSERT records > into the table in real-time. The best performance I could achieve was on > the order of 15 inserts per second. What I need is something much closer > to 100 inserts per second. Put INSERTs inside BEGIN/END to execute them in single transaction. Vadim
> Is it unrealistic to expect Postgres to insert on the order of 100 > records per second on a Pentium 400 MHz/SCSI class machine running > Linux? (Solaris on a comparable platform has about 1/2 the > performance) I'd expect roughly equivalent performance on different platforms. I measured the insert performance a couple of years ago and found roughly 50 inserts per second on my dual processor, 180MHz PPro system, which was the same performance as I had on an Ingres installation on a medium-speed Alpha/OSF-1 box. Without knowing more about your memory and disk setup, I'll guess that you can use more memory, can bump buffer size by using "-B 1024" or something similar, and running without mandatory fsync() (use the "-F" option). However, istm that your system should do better than 10ips before you invoke "-F", otherwise you'll just be working around another underlying problem/bottleneck. Good luck. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Steven Bradley <sbradley@llnl.gov> writes: > I have simplified the problem down to a single (non-indexed) table with > about a half-dozen columns (int4, timestamp, varchar, etc.) I wrote a > quick and dirty C program which uses the libpq interface to INSERT records > into the table in real-time. The best performance I could achieve was on > the order of 15 inserts per second. What I need is something much closer > to 100 inserts per second. I actually did pretty much the same thing just as a quick test of version 6.5 the other day. On my system (PostgreSQL 6.5 under NetBSD 1.4 on a 350MHz Pentium II with 128MB of RAM, and fast SCSI disks behind an Adaptec 2940 on a PCI bus), I got about 25 inserts per second. I then wrapped the whole series of inserts in a BEGIN/END block. This increased the speed to about 1000 inserts per second. Yup. A thousand. That's 40 times the speed. Pretty cool. :-) For the actual application I have in mind, I'll be dropping BEGIN and END statements into the command stream based on insert rate, I think. -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
> second. I then wrapped the whole series of inserts in a BEGIN/END > block. This increased the speed to about 1000 inserts per second. > Yup. A thousand. That's 40 times the speed. Pretty cool. :-) How did you measure this? Did you do (pseudo code): start_time = now SQL BEGIN SQL INSERT's.. ... SQL END end_time = now print number_inserts / (end_time - start_time) or SQL BEGIN start_time = now SQL INSERT's .... end_time = now SQL END print number_inserts / (end_time - start_time) This makes a difference, since a lot of processing is done on commit time. The second variant might thus be misleading. Maarten -- Maarten Boekhold, boekhold@tibco.com TIBCO Finance Technology Inc. The Atrium Strawinskylaan 3051 1077 ZX Amsterdam, The Netherlands tel: +31 20 3012158, fax: +31 20 3012358 http://www.tibco.com
Maarten Boekhold <boekhold@tibco.com> writes: > This makes a difference, since a lot of processing is done on commit > time. The second variant might thus be misleading. Yup, but I did include the time to commit in the measurement. In fact, I used the time(1) command to time the whole run of the client program. -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
Steven, I have not experimented much with this in Postgresql but in every other dbms I have used the write speed is not consistant. After you run for a while there will be a pause I guess while caches are flushed. To be certain of capturing all your data I would think you need at least 2 threads. One to capure the data and put in into a queue of some sort. The other would take it off the queue and insert it into postgresql. Providing the average speed of postgresql is higher than the capture rate you are OK. If the dbms is being accessed by other users I wonder if you might have even more problems. If so maybe you should have 2 processes. One captures to text files and starts a new text file every x rows. The other processes a file at a time each as one transaction. You could use a table in postgresql to keep track of the files and whether they have been read in. Obviously this is not as transactionally safe but should be able to handle load fluctuations much better. You could of course then scale up by having multiple machines, one per process and also by having more than one instance of the insert process running at a time (on different machines). With the locoking schema of v6.5 this should give a higher throughput than a single insert process. You would have to experiment with the number of processes that give maxium performance. Dave -- David Warnock Sundayta Ltd