Thread: concurrent connections is worse than serialization?
I have a testing program that uses 30 concurrent connections (max_connections = 32 in my postgresql.conf) and each does 100 insertions to a simple table with index. It took me approximately 2 minutes to finish all of them. But under the same environment(after "delete From test_table, and vacuum analyze"), I then queue up all those 30 connections one after another one (serialize) and it took only 30 seconds to finish. Why is it that the performance of concurrent connections is worse than serializing them into one? I was testing them using our own (proprietary) scripting engine and the extension library that supports postgresql serializes the queries by simply locking when a query manipulates a PGconn object and unlocking when it is done. (And similiarly, it creates a PGconn object on the stack for each concurrent queries.) Thanks -- Wei Weng Network Software Engineer KenCast Inc.
On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote: > I have a testing program that uses 30 concurrent connections > (max_connections = 32 in my postgresql.conf) and each does 100 > insertions to a simple table with index. > > It took me approximately 2 minutes to finish all of them. > > But under the same environment(after "delete From test_table, and vacuum > analyze"), I then queue up all those 30 connections one after another > one (serialize) and it took only 30 seconds to finish. > > Why is it that the performance of concurrent connections is worse than > serializing them into one? What was the limiting factor during the test? Was the CPU maxed, memory, disk I/O? I take it the insert really *is* simple - no dependencies etc. > I was testing them using our own (proprietary) scripting engine and the > extension library that supports postgresql serializes the queries by > simply locking when a query manipulates a PGconn object and unlocking > when it is done. (And similiarly, it creates a PGconn object on the > stack for each concurrent queries.) I assume you've ruled the application end of things out. - Richard Huxton
On Wed, 2002-08-14 at 05:18, Richard Huxton wrote: > On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote: > > I have a testing program that uses 30 concurrent connections > > (max_connections = 32 in my postgresql.conf) and each does 100 > > insertions to a simple table with index. > > > > It took me approximately 2 minutes to finish all of them. > > > > But under the same environment(after "delete From test_table, and vacuum > > analyze"), I then queue up all those 30 connections one after another > > one (serialize) and it took only 30 seconds to finish. > > > > Why is it that the performance of concurrent connections is worse than > > serializing them into one? > > What was the limiting factor during the test? Was the CPU maxed, memory, disk > I/O? No, none of the above was maxed. CPU usage that I paid attention to was at most a 48%. > > I take it the insert really *is* simple - no dependencies etc. > > > I was testing them using our own (proprietary) scripting engine and the > > extension library that supports postgresql serializes the queries by > > simply locking when a query manipulates a PGconn object and unlocking > > when it is done. (And similiarly, it creates a PGconn object on the > > stack for each concurrent queries.) > > I assume you've ruled the application end of things out. What does this mean? Thanks -- Wei Weng Network Software Engineer KenCast Inc.
On Wednesday 14 Aug 2002 3:20 pm, Wei Weng wrote: > On Wed, 2002-08-14 at 05:18, Richard Huxton wrote: > > On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote: [30 connections is much slower than 1 connection 30 times] > > What was the limiting factor during the test? Was the CPU maxed, memory, > > disk I/O? > > No, none of the above was maxed. CPU usage that I paid attention to was > at most a 48%. Something must be the limiting factor. One of- CPU- Memory- Disk I/O- Database (configuration, or design)- Application If it's not CPU, is the system going into swap or are you seeing a lot of disk activity? > > I assume you've ruled the application end of things out. > > What does this mean? I mean if you don't actually run the queries, then 30 separate processes is fine? If you can provide us with an EXPLAIN of the query and the relevant schema definitions, we can rule out database design. - Richard Huxton
On Wed, 2002-08-14 at 10:49, Richard Huxton wrote: > On Wednesday 14 Aug 2002 3:20 pm, Wei Weng wrote: > > On Wed, 2002-08-14 at 05:18, Richard Huxton wrote: > > > On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote: > > [30 connections is much slower than 1 connection 30 times] Yeah, but the problem is, say I have 20 users using select on the database at the same time, and each select takes 10 seconds to finish. I really can't queue them up (or the last user will reall have to wait for a long time), can I? > > > > What was the limiting factor during the test? Was the CPU maxed, memory, > > > disk I/O? > > > > No, none of the above was maxed. CPU usage that I paid attention to was > > at most a 48%. > > Something must be the limiting factor. One of > - CPU > - Memory > - Disk I/O > - Database (configuration, or design) > - Application > > If it's not CPU, is the system going into swap or are you seeing a lot of disk > activity? I did hear a lot of disk noise when I ran the test. How do I tell if the "system is going into swap"? Is there any system settings I can/should change to make this a little faster? > > > > I assume you've ruled the application end of things out. > > > > What does this mean? > > I mean if you don't actually run the queries, then 30 separate processes is > fine? > > If you can provide us with an EXPLAIN of the query and the relevant schema > definitions, we can rule out database design. > This is actually really simple. A table like -------------------- | foo | -------------------- |ID VARCHAR(40) | --> primary key |Name VARCHAR(100)| -------------------- And I did an INSERT INTO foo ('some-unique-guid-here', 'Test Name'); So I don't think it is any matter of the database. Thanks -- Wei Weng Network Software Engineer KenCast Inc.
Wei Weng <wweng@kencast.com> writes: > On Wed, 2002-08-14 at 10:49, Richard Huxton wrote: >> If it's not CPU, is the system going into swap or are you seeing a >> lot of disk activity? > I did hear a lot of disk noise when I ran the test. How do I tell if the > "system is going into swap"? Try running "vmstat 1" while doing the test. If the "swap" columns show any activity then you're swapping. Watching free memory with "top" may also be insightful. I tend to agree with what I think Richard suspects: you haven't got enough RAM for thirty concurrent server processes, and so the system is wasting a lot of time swapping processes to disk. What are the hardware parameters, anyway? regards, tom lane
On Wednesday 14 Aug 2002 4:29 pm, Wei Weng wrote: > > [30 connections is much slower than 1 connection 30 times] > > Yeah, but the problem is, say I have 20 users using select on the > database at the same time, and each select takes 10 seconds to finish. I > really can't queue them up (or the last user will reall have to wait for > a long time), can I? No, and you shouldn't have to - Postgresql can handle hundreds of concurrent connections if needs be. > > If it's not CPU, is the system going into swap or are you seeing a lot of > > disk activity? > > I did hear a lot of disk noise when I ran the test. How do I tell if the > "system is going into swap"? You'd expect disk activity to a degree, it's a database after all. To see how much and what is happening try one of : vmstat, iostat, top. You should have at least one of these on whatever system you are using. > Is there any system settings I can/should change to make this a little > faster? There are a number of settings in postgresql.conf - see the online docs (I think there's a tuning document on techdocs.postgresql.org). Bear in mind that the default settings for postgresql are very small, so you might want to look at the sort memory/shared memory settings. Increase values a bit at a time and see where you get a good balance. See the list archives for lots of discussion of this. > |ID VARCHAR(40) | --> primary key > |Name VARCHAR(100)| > And I did an INSERT INTO foo ('some-unique-guid-here', 'Test Name'); Try using the script multicli_ins.sh below - it creates 10 perl clients each of which inserts 100 rows in separate transactions. This is about as inefficient as you can get. Have a play with different numbers of connections and see if there's a pattern. => select max(ts) - min(ts) from foo; ?column? -----------------00:00:06.448863 So - I can insert 1000 using 10 clients in under 6.5 seconds. That's on an untuned system on a development server (400MHz, 256MB, IDE disk). === table definition === CREATE TABLE "foo" ( "id" character varying(40), "name" character varying(100), "ts" timestamp with timezone DEFAULT now() ); CREATE UNIQUE INDEX foo_key ON foo (id); === Perl script: pg_ins_test.pl === #!/usr/bin/perl -w my $id = shift; open CMD,"| psql -q"; my $key = "${id}AAA"; for (my $i=0; $i<100; $i++) { print CMD "INSERT INTO foo (id,name) VALUES ('$key','Test name');\n"; $key++; } close CMD; print "Done $id\n"; === Shellscript: multicli_ins.sh === #!/bin/sh psql -c 'DELETE FROM foo' for i in A B C D E F G H I J do ./pg_ins_test.pl $i 2>/dev/null & done === End shellscript === - Richard Huxton