Thread: insert performance riddle
I have two identical servers giving abysmal INSERT performance in pgsql 7.3.4, 7.4.8, and 8.1devel under no load or I/O contention at all (no dumps, no vacuums, no apps, etc). Any suggested investigations appreciated... Metric: I'm measuring average insert speed on the following table with the following psuedo-loop via DBI 1.48 and DBD::Pg 1.43 (older versions of DBD::Pg/DBI are slow, too): create table foo (id serial, msg varchar); for i in 1..1000 do insert into foo (msg, "this is a test message"); Results: The slow servers are doing 6...count'em, SIX...that's right, I said S-I-X inserts per second (QPS) on average. Measuring on 4 other systems, some of them heavily loaded, the measured range is 60-1200 QPS. Hardware: The slow servers are identical HP DL740s running RHEL AS 3.0 with kernel 2.4.21-4.0.2.ELsmp (RHLE AS 3.0), 12gb of RAM, RAID5 via Compaq Smart Array 5i/532 RAID controller. The systems I'm comparing to include: 4-cpu Itanium HP rx4650, HPUX 11.23, SAN (1000 QPS)... 4-cpu PA-RISC HP rp5470, HPUX 11.00, SAN (190 QPS)... 2-cpu Intel Dell 2650, Linux 2.4.20-30.7, RAID5 (350 QPS)... 1-cpu IBM T41 1.4ghz laptop (7200RPM IDE drives...I think they're lying because they report 1100 QPS)... What I've Tried: I've measured the raw I/O via bonnie and dd, and the speeds are comparable across these boxes, definitely not 10-100x differences. Some of the faster QPS boxes are slower on dd/bonnie times. I tested multiple versions of pgsql with same essential results. I tested multiple versions of DBD and DBD::Pg, same results. I've tested on old clusters and tested with newly initdb'd clusters, same results. Our next/last guess is that there is some sort of major slowdown in one of the system libraries, so we're thinking of wiping it clean and trying a different OS/version. Ideas? Ed
On Wed, Aug 10, 2005 at 05:02:46PM -0600, Ed L. wrote: > I have two identical servers giving abysmal INSERT performance in > pgsql 7.3.4, 7.4.8, and 8.1devel under no load or I/O contention > at all (no dumps, no vacuums, no apps, etc). Any suggested > investigations appreciated... > > Metric: I'm measuring average insert speed on the following > table with the following psuedo-loop via DBI 1.48 and DBD::Pg > 1.43 (older versions of DBD::Pg/DBI are slow, too): > > create table foo (id serial, msg varchar); > for i in 1..1000 do > insert into foo (msg, "this is a test message"); > > Results: The slow servers are doing 6...count'em, SIX...that's > right, I said S-I-X inserts per second (QPS) on average. Have you done any tests that eliminate the client code and client-server communcation? Something like the following: CREATE TABLE foo (id serial, msg varchar); CREATE FUNCTION insert_foo(integer) RETURNS void AS ' DECLARE i integer; BEGIN FOR i IN 1 .. $1 LOOP INSERT INTO foo (msg) VALUES (''this is a test message''); END LOOP; RETURN; END; ' LANGUAGE plpgsql VOLATILE STRICT; EXPLAIN ANALYZE SELECT insert_foo(1000); -- Michael Fuhr
On Wednesday August 10 2005 6:03 pm, Michael Fuhr wrote: > On Wed, Aug 10, 2005 at 05:02:46PM -0600, Ed L. wrote: > > I have two identical servers giving abysmal INSERT > > performance in pgsql 7.3.4, 7.4.8, and 8.1devel under no > > load or I/O contention at all (no dumps, no vacuums, no > > apps, etc). Any suggested investigations appreciated... > > > > Results: The slow servers are doing 6...count'em, > > SIX...that's right, I said S-I-X inserts per second (QPS) on > > average. > > Have you done any tests that eliminate the client code and > client-server communcation? Something like the following: > > CREATE FUNCTION insert_foo(integer) RETURNS void AS ' > DECLARE > i integer; > BEGIN > FOR i IN 1 .. $1 LOOP > INSERT INTO foo (msg) VALUES (''this is a test > message''); END LOOP; > > RETURN; > END; > ' LANGUAGE plpgsql VOLATILE STRICT; > > EXPLAIN ANALYZE SELECT insert_foo(1000); Michael, you seem to have nailed it. The local inserts (via Unix domain sockets?) that were running at 6 QPS ran at 6800 to 41000 QPS in a PL/pgSQL function. So, given the same DBI/DBD::Pg code is performing 2 orders of magnitude better on other similar systems, where would you look next? The local socket system call code? Where else? My next thought is to gprof the client code to identify the slowdown... Thanks, Ed
On Thursday August 11 2005 12:36 pm, Ed L. wrote: > On Wednesday August 10 2005 6:03 pm, Michael Fuhr wrote: > > On Wed, Aug 10, 2005 at 05:02:46PM -0600, Ed L. wrote: > > > I have two identical servers giving abysmal INSERT > > > performance in pgsql 7.3.4, 7.4.8, and 8.1devel under no > > > load or I/O contention at all (no dumps, no vacuums, no > > > apps, etc). Any suggested investigations appreciated... > > > > Have you done any tests that eliminate the client code and > > client-server communcation? Something like the following: > > > > Michael, you seem to have nailed it. The local inserts (via > Unix domain sockets?) that were running at 6 QPS ran at 6800 > to 41000 QPS in a PL/pgSQL function. Here's another part of the riddle. The query durations for the individual inserts as logged in the server log are ranging 100ms-500ms per insert when connecting via DBI/DBD::Pg from localhost. Inside pl/pgsql, they're clearly averaging faster than 1ms per insert. I would have thought those times were independent of the client communication layers...? Ed
On Thu, Aug 11, 2005 at 12:59:32PM -0600, Ed L. wrote: > > Michael, you seem to have nailed it. The local inserts (via > > Unix domain sockets?) that were running at 6 QPS ran at 6800 > > to 41000 QPS in a PL/pgSQL function. > > Here's another part of the riddle. The query durations for the > individual inserts as logged in the server log are ranging > 100ms-500ms per insert when connecting via DBI/DBD::Pg from > localhost. Inside pl/pgsql, they're clearly averaging faster > than 1ms per insert. I would have thought those times were > independent of the client communication layers...? Have you done any client-side tests that eliminate Perl? I'd suggest writing a little C program so you can measure libpq's performance without the extra layers of Perl and DBI/DBD::Pg. Test both local (Unix socket) and network (IPv4 or IPv6 socket) connections. A couple of months ago, Tom Lane noticed that benchmarks using DBD::Pg make PostgreSQL look worse than it deserves: http://archives.postgresql.org/pgsql-interfaces/2005-06/msg00003.php -- Michael Fuhr
On Thursday August 11 2005 1:37 pm, Michael Fuhr wrote: > Have you done any client-side tests that eliminate Perl? I'd > suggest writing a little C program so you can measure libpq's > performance without the extra layers of Perl and DBI/DBD::Pg. > Test both local (Unix socket) and network (IPv4 or IPv6 > socket) connections. Michael, you nailed it again. My libpq test C program delivered between 2400 QPS and 5000 QPS vs ~10 QPS for DBI/DBD::Pg on this box. It remains unclear to me why the same DBI/DBD::Pg client code would deliver performance 2-3 orders of magnitude better on other roughly comparable or inferior boxes. What would be the recommended replacement for DBI/DBD::Pg as a perl interface? Thanks, Ed
> > Michael, you nailed it again. My libpq test C program delivered > between 2400 QPS and 5000 QPS vs ~10 QPS for DBI/DBD::Pg on this > box. > > It remains unclear to me why the same DBI/DBD::Pg client code > would deliver performance 2-3 orders of magnitude better on > other roughly comparable or inferior boxes. > I'm using DBD::Pg for some large inserts(well, they were large) most of it was pretty reasonable, there were some quirky things, like last_insert_id is/was incredibly slow(it's very smart, but the way it does things makes the performance horrible) doing the currval call yourself is much faster. I found that using the Perl profiler identified a couple things like that and let me get things to a reasonable speed. -Steve
On Thu, Aug 11, 2005 at 03:29:29PM -0600, Ed L. wrote: > Michael, you nailed it again. My libpq test C program delivered > between 2400 QPS and 5000 QPS vs ~10 QPS for DBI/DBD::Pg on this > box. > > It remains unclear to me why the same DBI/DBD::Pg client code > would deliver performance 2-3 orders of magnitude better on > other roughly comparable or inferior boxes. I'd suggest contacting the module's maintainer -- see the AUTHORS section of the DBD::Pg manual page for contact info. -- Michael Fuhr
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > It remains unclear to me why the same DBI/DBD::Pg client code > would deliver performance 2-3 orders of magnitude better on > other roughly comparable or inferior boxes. You need to see exactly what is going on to make things so slow. You can try setting the trace level for very verbose detail (e.g. $dbh->trace(10)) or you could look at using DBI::Profile or DBI::ProfileDumper. If it is the same code and same databases on different boxes, there is definitely something weird going on. Make sure that Perl, DBI, DBD::Pg, Postgres, and the Postgres libraries are all the same version. Running "make test" for DBD::Pg will display the current settings for all of the above. (or just copy the code from t/01connect.t). If all versions are indeed the same, see if you can create a small test script that illustrates the problem, then run the above mentioned debugging aids and see if you can spot a difference. - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200508120941 https://www.biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEARECAAYFAkL8p0oACgkQvJuQZxSWSsjlowCeOS6GkhISN4d6oWaRQQfD91QA Eg8AoPJz55hEwmOcuuHC7GpLMf0cDstt =ZZME -----END PGP SIGNATURE-----
On Thursday August 11 2005 6:20 pm, Michael Fuhr wrote: > On Thu, Aug 11, 2005 at 03:29:29PM -0600, Ed L. wrote: > > Michael, you nailed it again. My libpq test C program > > delivered between 2400 QPS and 5000 QPS vs ~10 QPS for > > DBI/DBD::Pg on this box. > > > > It remains unclear to me why the same DBI/DBD::Pg client > > code would deliver performance 2-3 orders of magnitude > > better on other roughly comparable or inferior boxes. > > I'd suggest contacting the module's maintainer -- see the > AUTHORS section of the DBD::Pg manual page for contact info. Well, just as I thought I had this one pinned, my test results have become wildly inconsistent, eroding all confidence in my prior conclusions about DBI slowness, etc. I've seen at least 1000+ QPS performance via DBI/DBD::Pg, Pg, and C/libpq. I'm now investigating the possibility of simple old intermittent I/O congestion... Ed
On Fri, Aug 12, 2005 at 05:20:49PM -0600, Ed L. wrote: > Well, just as I thought I had this one pinned, my test results > have become wildly inconsistent, eroding all confidence in my > prior conclusions about DBI slowness, etc. I've seen at least > 1000+ QPS performance via DBI/DBD::Pg, Pg, and C/libpq. I'm now > investigating the possibility of simple old intermittent I/O > congestion... How consistent were the results before? I got the impression that you saw consistently bad performance with DBD::Pg when other methods performed well. -- Michael Fuhr
On Friday August 12 2005 5:27 pm, Michael Fuhr wrote: > On Fri, Aug 12, 2005 at 05:20:49PM -0600, Ed L. wrote: > > Well, just as I thought I had this one pinned, my test > > results have become wildly inconsistent, eroding all > > confidence in my prior conclusions about DBI slowness, etc. > > I've seen at least 1000+ QPS performance via DBI/DBD::Pg, > > Pg, and C/libpq. I'm now investigating the possibility of > > simple old intermittent I/O congestion... > > How consistent were the results before? I got the impression > that you saw consistently bad performance with DBD::Pg when > other methods performed well. Results were very consistent until this morning. Now dbi/dbd::pg are intermittently showing performance 1-2 orders of magnitude faster. :/ Ed
On Fri, Aug 12, 2005 at 05:37:22PM -0600, Ed L. wrote: > On Friday August 12 2005 5:27 pm, Michael Fuhr wrote: > > How consistent were the results before? I got the impression > > that you saw consistently bad performance with DBD::Pg when > > other methods performed well. > > Results were very consistent until this morning. Now dbi/dbd::pg > are intermittently showing performance 1-2 orders of magnitude > faster. :/ Has anything changed on the system since the results were consistently slow? New hardware, new versions of anything, reboot, etc.? Did you do any profiling when DBD::Pg was consistently slow to see where the bottleneck was? -- Michael Fuhr
On Friday August 12 2005 6:11 pm, Michael Fuhr wrote: > Has anything changed on the system since the results were > consistently slow? New hardware, new versions of anything, > reboot, etc.? Did you do any profiling when DBD::Pg was > consistently slow to see where the bottleneck was? All good questions. Obviously, something has changed, but I'm hard-pressed to identify what it might have been. No, I didn't do any profiling, since I found the presumed dbi/libpq difference first. I will measure perf of dbi, Pg, and libpq over time and see if I can combine with I/O measures to see trends. Ed
On Fri, Aug 12, 2005 at 06:20:27PM -0600, Ed L. wrote: > On Friday August 12 2005 6:11 pm, Michael Fuhr wrote: > > Has anything changed on the system since the results were > > consistently slow? New hardware, new versions of anything, > > reboot, etc.? Did you do any profiling when DBD::Pg was > > consistently slow to see where the bottleneck was? > > All good questions. Obviously, something has changed, but I'm > hard-pressed to identify what it might have been. No, I didn't > do any profiling, since I found the presumed dbi/libpq > difference first. I will measure perf of dbi, Pg, and libpq > over time and see if I can combine with I/O measures to see > trends. I meant profiling of DBD::Pg, as Greg Sabino Mullane suggested. Here's his message in case you missed it: http://archives.postgresql.org/pgsql-general/2005-08/msg00655.php -- Michael Fuhr
On Friday August 12 2005 6:29 pm, Michael Fuhr wrote: > I meant profiling of DBD::Pg, as Greg Sabino Mullane > suggested. Here's his message in case you missed it: No, I didn't miss that, and will do that as a next step. Thanks, Michael. Ed