Thread: concurrent connections is worse than serialization?

concurrent connections is worse than serialization?

From
Wei Weng
Date:
I have a testing program that uses 30 concurrent connections
(max_connections = 32 in my postgresql.conf) and each does 100
insertions to a simple table with index.

It took me approximately 2 minutes to finish all of them.

But under the same environment(after "delete From test_table, and vacuum
analyze"), I then queue up all those 30 connections one after another
one (serialize) and it took only 30 seconds to finish.

Why is it that the performance of concurrent connections is worse than
serializing them into one?

I was testing them using our own (proprietary) scripting engine and the
extension library that supports postgresql serializes the queries by
simply locking when a query manipulates a PGconn object and unlocking
when it is done. (And similiarly, it creates a PGconn object on the
stack for each concurrent queries.)

Thanks

-- 
Wei Weng
Network Software Engineer
KenCast Inc.




Re: concurrent connections is worse than serialization?

From
Richard Huxton
Date:
On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote:
> I have a testing program that uses 30 concurrent connections
> (max_connections = 32 in my postgresql.conf) and each does 100
> insertions to a simple table with index.
>
> It took me approximately 2 minutes to finish all of them.
>
> But under the same environment(after "delete From test_table, and vacuum
> analyze"), I then queue up all those 30 connections one after another
> one (serialize) and it took only 30 seconds to finish.
>
> Why is it that the performance of concurrent connections is worse than
> serializing them into one?

What was the limiting factor during the test? Was the CPU maxed, memory, disk
I/O?

I take it the insert really *is* simple - no dependencies etc.

> I was testing them using our own (proprietary) scripting engine and the
> extension library that supports postgresql serializes the queries by
> simply locking when a query manipulates a PGconn object and unlocking
> when it is done. (And similiarly, it creates a PGconn object on the
> stack for each concurrent queries.)

I assume you've ruled the application end of things out.

- Richard Huxton


Re: concurrent connections is worse than serialization?

From
Wei Weng
Date:
On Wed, 2002-08-14 at 05:18, Richard Huxton wrote:
> On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote:
> > I have a testing program that uses 30 concurrent connections
> > (max_connections = 32 in my postgresql.conf) and each does 100
> > insertions to a simple table with index.
> >
> > It took me approximately 2 minutes to finish all of them.
> >
> > But under the same environment(after "delete From test_table, and vacuum
> > analyze"), I then queue up all those 30 connections one after another
> > one (serialize) and it took only 30 seconds to finish.
> >
> > Why is it that the performance of concurrent connections is worse than
> > serializing them into one?
> 
> What was the limiting factor during the test? Was the CPU maxed, memory, disk 
> I/O?
No, none of the above was maxed. CPU usage that I paid attention to was
at most a 48%.

> 
> I take it the insert really *is* simple - no dependencies etc.
> 
> > I was testing them using our own (proprietary) scripting engine and the
> > extension library that supports postgresql serializes the queries by
> > simply locking when a query manipulates a PGconn object and unlocking
> > when it is done. (And similiarly, it creates a PGconn object on the
> > stack for each concurrent queries.)
> 
> I assume you've ruled the application end of things out.
What does this mean?

Thanks

-- 
Wei Weng
Network Software Engineer
KenCast Inc.




Re: concurrent connections is worse than serialization?

From
Richard Huxton
Date:
On Wednesday 14 Aug 2002 3:20 pm, Wei Weng wrote:
> On Wed, 2002-08-14 at 05:18, Richard Huxton wrote:
> > On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote:

[30 connections is much slower than 1 connection 30 times]

> > What was the limiting factor during the test? Was the CPU maxed, memory,
> > disk I/O?
>
> No, none of the above was maxed. CPU usage that I paid attention to was
> at most a 48%.

Something must be the limiting factor. One of- CPU- Memory- Disk I/O- Database (configuration, or design)- Application

If it's not CPU, is the system going into swap or are you seeing a lot of disk
activity?

> > I assume you've ruled the application end of things out.
>
> What does this mean?

I mean if you don't actually run the queries, then 30 separate processes is
fine?

If you can provide us with an EXPLAIN of the query and the relevant schema
definitions, we can rule out database design.

- Richard Huxton


Re: concurrent connections is worse than serialization?

From
Wei Weng
Date:
On Wed, 2002-08-14 at 10:49, Richard Huxton wrote:
> On Wednesday 14 Aug 2002 3:20 pm, Wei Weng wrote:
> > On Wed, 2002-08-14 at 05:18, Richard Huxton wrote:
> > > On Tuesday 13 Aug 2002 9:39 pm, Wei Weng wrote:
> 
> [30 connections is much slower than 1 connection 30 times]

Yeah, but the problem is, say I have 20 users using select on the
database at the same time, and each select takes 10 seconds to finish. I
really can't queue them up (or the last user will reall have to wait for
a long time), can I?

> 
> > > What was the limiting factor during the test? Was the CPU maxed, memory,
> > > disk I/O?
> >
> > No, none of the above was maxed. CPU usage that I paid attention to was
> > at most a 48%.
> 
> Something must be the limiting factor. One of
>  - CPU
>  - Memory
>  - Disk I/O
>  - Database (configuration, or design)
>  - Application
> 
> If it's not CPU, is the system going into swap or are you seeing a lot of disk 
> activity?
I did hear a lot of disk noise when I ran the test. How do I tell if the
"system is going into swap"?

Is there any system settings I can/should change to make this a little
faster? 

> 
> > > I assume you've ruled the application end of things out.
> >
> > What does this mean?
> 
> I mean if you don't actually run the queries, then 30 separate processes is 
> fine?
> 
> If you can provide us with an EXPLAIN of the query and the relevant schema 
> definitions, we can rule out database design.
> 

This is actually really simple.

A table like 

--------------------
| foo              | 
--------------------
|ID    VARCHAR(40) | --> primary key
|Name  VARCHAR(100)|
--------------------

And I did an INSERT INTO foo ('some-unique-guid-here', 'Test Name');

So I don't think it is any matter of the database. 

Thanks

-- 
Wei Weng
Network Software Engineer
KenCast Inc.




Re: concurrent connections is worse than serialization?

From
Tom Lane
Date:
Wei Weng <wweng@kencast.com> writes:
> On Wed, 2002-08-14 at 10:49, Richard Huxton wrote:
>> If it's not CPU, is the system going into swap or are you seeing a
>> lot of disk activity?

> I did hear a lot of disk noise when I ran the test. How do I tell if the
> "system is going into swap"?

Try running "vmstat 1" while doing the test.  If the "swap" columns show
any activity then you're swapping.  Watching free memory with "top" may
also be insightful.

I tend to agree with what I think Richard suspects: you haven't got
enough RAM for thirty concurrent server processes, and so the system is
wasting a lot of time swapping processes to disk.  What are the hardware
parameters, anyway?
        regards, tom lane


Re: concurrent connections is worse than serialization?

From
Richard Huxton
Date:
On Wednesday 14 Aug 2002 4:29 pm, Wei Weng wrote:
> > [30 connections is much slower than 1 connection 30 times]
>
> Yeah, but the problem is, say I have 20 users using select on the
> database at the same time, and each select takes 10 seconds to finish. I
> really can't queue them up (or the last user will reall have to wait for
> a long time), can I?

No, and you shouldn't have to - Postgresql can handle hundreds of concurrent
connections if needs be.

> > If it's not CPU, is the system going into swap or are you seeing a lot of
> > disk activity?
>
> I did hear a lot of disk noise when I ran the test. How do I tell if the
> "system is going into swap"?

You'd expect disk activity to a degree, it's a database after all. To see how
much and what is happening try one of : vmstat, iostat, top. You should have
at least one of these on whatever system you are using.

> Is there any system settings I can/should change to make this a little
> faster?

There are a number of settings in postgresql.conf - see the online docs (I
think there's a tuning document on techdocs.postgresql.org). Bear in mind
that the default settings for postgresql are very small, so you might want to
look at the sort memory/shared memory settings. Increase values a bit at a
time and see where you get a good balance. See the list archives for lots of
discussion of this.

> |ID    VARCHAR(40) | --> primary key
> |Name  VARCHAR(100)|
> And I did an INSERT INTO foo ('some-unique-guid-here', 'Test Name');

Try using the script multicli_ins.sh below - it creates 10 perl clients each
of which inserts 100 rows in separate transactions. This is about as
inefficient as you can get.

Have a play with different numbers of connections and see if there's a
pattern.

=> select max(ts) - min(ts) from foo;   ?column?
-----------------00:00:06.448863

So - I can insert 1000 using 10 clients in under 6.5 seconds. That's on an
untuned system on a development server (400MHz, 256MB, IDE disk).

=== table definition ===

CREATE TABLE "foo" (       "id" character varying(40),       "name" character varying(100),       "ts" timestamp with
timezone DEFAULT now() 
);
CREATE UNIQUE INDEX foo_key ON foo (id);

=== Perl script: pg_ins_test.pl ===

#!/usr/bin/perl -w
my $id = shift;
open CMD,"| psql -q";
my $key = "${id}AAA";
for (my $i=0; $i<100; $i++) {   print CMD "INSERT INTO foo (id,name) VALUES ('$key','Test name');\n";   $key++;
}
close CMD;
print "Done $id\n";

=== Shellscript: multicli_ins.sh ===

#!/bin/sh
psql -c 'DELETE FROM foo'
for i in A B C D E F G H I J
do ./pg_ins_test.pl $i 2>/dev/null &
done

=== End shellscript ===

- Richard Huxton