Thread: Low performance on Windows problem

Low performance on Windows problem

From
Dalibor Sramek
Date:
Hello.

I would like to build a shared repository for Enterprise Architect
(http://www.sparxsystems.com.au/ea.htm) using PostgreSQL. I have done it
before with Linux and FreeBSD servers and everything was working out of the
box. The repository is pretty simple database with less than 100 tables (the
schema is at
http://www.sparxsystems.com.au/downloads/corp/Postgres_Basemodel.sql).

The problem is that at the moment I have only a Windows XP "server" at my
disposal. I have installed PostgreSQL 8.0.3 for Windows and set the
repository up. Unfortunately the performance is unacceptable: every
operation with the model stored in the repository is by the order of
magnitude slower than on the FreeBSD server with half as good hardware.
(BTW CPU load is nearly 0, network load is under 5%, the machine has 1GB
RAM and the database size is 14MB.)

I have tried to:
- tweak the postgresql.conf - no apparent change
- kill all unnecessary services - no apparent change
- install MySQL on the same machine to compare - it is as fast as PostgreSQL
  on FreeBSD (= way faster than PG on the machine)

Anyway I believe the problem is in the Win PostgreSQL server but I have no
idea where to look and neither do I have much time to spend.
(Also I really do not want to run MySQL ;-)

Any suggestions are welcome.

Thanks

Dalibor Sramek

P.S.

More information about EA PGSQL repositories:
http://sparxsystems.com.au/resources/corporate/
http://sparxsystems.com.au/EAUserGuide/index.html?connecttoapostgresqlreposi.htm
http://sparxsystems.com.au/EAUserGuide/index.html?setupapostgresqlodbcdriver.htm

--
Dalibor Sramek  http://www.insula.cz/dali             \ In the eyes of cats
 /              dalibor.sramek@insula.cz               \  all things
/      >H blog  http://www.transhumanismus.cz/blog.php  \   belong to cats.

Re: Low performance on Windows problem

From
"Merlin Moncure"
Date:
> Hello.
>
> I would like to build a shared repository for Enterprise Architect
> (http://www.sparxsystems.com.au/ea.htm) using PostgreSQL. I have done
it
> before with Linux and FreeBSD servers and everything was working out
of
> the
> box. The repository is pretty simple database with less than 100
tables
> (the
> schema is at
> http://www.sparxsystems.com.au/downloads/corp/Postgres_Basemodel.sql).
>
> The problem is that at the moment I have only a Windows XP "server" at
my
> disposal. I have installed PostgreSQL 8.0.3 for Windows and set the
> repository up. Unfortunately the performance is unacceptable: every
> operation with the model stored in the repository is by the order of
> magnitude slower than on the FreeBSD server with half as good
hardware.
> (BTW CPU load is nearly 0, network load is under 5%, the machine has
1GB
> RAM and the database size is 14MB.)
>
> I have tried to:
> - tweak the postgresql.conf - no apparent change
> - kill all unnecessary services - no apparent change
> - install MySQL on the same machine to compare - it is as fast as
> PostgreSQL
>   on FreeBSD (= way faster than PG on the machine)

Can you give specific examples of cases that are not performing like you
expect?  If possible, give a few queries with explain analyze times and
all that.

Are you syncing your data?   Win32 fsync is about 1/3 as fast as linux
fsync, although this was changed to fsync_writethrough for clarification
purposes.

Merlin

Re: Low performance on Windows problem

From
Dalibor Sramek
Date:
On Tue, Sep 13, 2005 at 07:58:20AM -0400, Merlin Moncure wrote:
> Can you give specific examples of cases that are not performing like you
> expect?  If possible, give a few queries with explain analyze times and
> all that.

O.K. I have found one particular problem:

2005-09-13 14:43:02 LOG:  statement: declare SQL_CUR03949008 cursor for
SELECT * FROM t_umlpattern
2005-09-13 14:43:02 LOG:  duration: 0.000 ms
2005-09-13 14:43:02 LOG:  statement: fetch 1000 in SQL_CUR03949008
2005-09-13 14:43:22 LOG:  duration: 20185.000 ms

This command is executed while a model is loaded from the repository.

The table definition is:
CREATE TABLE t_umlpattern (
    PatternID INTEGER DEFAULT nextval('"patternid_seq"'::text) NOT NULL
PRIMARY KEY,
    PatternCategory VARCHAR(100),
    PatternName VARCHAR(150),
    Style VARCHAR(250),
    Notes TEXT,
    PatternXML TEXT,
    Version VARCHAR(50)
);

It has just 23 rows but the PatternXML column is rather large. The table
dump has over 900 kB.

Now
select * from t_umlpattern limit 2

takes 1500+ msec on the Windows machine and 60 on a comparable Linux
machine. Both selects performed from remote PgAdmin.
The same select performed localy on the windows machine takes 60 msec.

So I guess the problem is in the transfer of the bigger amount of data from
the Windows server.

I put the dump at http://www.insula.cz/dali/misc/table.zip

Could anybody confirm the difference?

Thanks for any suggestions.

Dalibor Sramek

--
Dalibor Sramek  http://www.insula.cz/dali             \ In the eyes of cats
 /              dalibor.sramek@insula.cz               \  all things
/      >H blog  http://www.transhumanismus.cz/blog.php  \   belong to cats.

Re: Low performance on Windows problem

From
"Merlin Moncure"
Date:
> On Tue, Sep 13, 2005 at 07:58:20AM -0400, Merlin Moncure wrote:
> This command is executed while a model is loaded from the repository.
>
> The table definition is:
> CREATE TABLE t_umlpattern (
>     PatternID INTEGER DEFAULT nextval('"patternid_seq"'::text) NOT
NULL
> PRIMARY KEY,
>     PatternCategory VARCHAR(100),
>     PatternName VARCHAR(150),
>     Style VARCHAR(250),
>     Notes TEXT,
>     PatternXML TEXT,
>     Version VARCHAR(50)
> );
>
> It has just 23 rows but the PatternXML column is rather large. The
table
> dump has over 900 kB.
>
> Now
> select * from t_umlpattern limit 2
>
> takes 1500+ msec on the Windows machine and 60 on a comparable Linux
> machine. Both selects performed from remote PgAdmin.
> The same select performed localy on the windows machine takes 60 msec.
>
> So I guess the problem is in the transfer of the bigger amount of data
> from
> the Windows server.
>
> I put the dump at http://www.insula.cz/dali/misc/table.zip
>
> Could anybody confirm the difference?

I loaded your dump and was able to select entire table in trivial time
from both pgAdmin and psql shell.  I am suspecting some type of tcp
problem here.  Can you confirm slow times on unloaded server?

Merlin

Re: Low performance on Windows problem

From
Dalibor Sramek
Date:
On Tue, Sep 13, 2005 at 10:20:05AM -0400, Merlin Moncure wrote:
> I loaded your dump and was able to select entire table in trivial time
> from both pgAdmin and psql shell.  I am suspecting some type of tcp
> problem here.  Can you confirm slow times on unloaded server?

Did you run the select remotely on a Windows server?

Yes the server load is practically 0. Note the difference between local and
remote execution of the command. I think you are right about the network
problem possibility. But it is bound to PostgreSQL. MySQL on the same
machine (and same database content) had no problem.

So are there any known issues with PostgreSQL on Windows sending data to
remote hosts connected via ODBC?
What should I do to find out more debug info?

Thanks

Dalibor Sramek

--
Dalibor Sramek  http://www.insula.cz/dali             \ In the eyes of cats
 /              dalibor.sramek@insula.cz               \  all things
/      >H blog  http://www.transhumanismus.cz/blog.php  \   belong to cats.

Re: Low performance on Windows problem

From
"Merlin Moncure"
Date:
> Did you run the select remotely on a Windows server?

yes.

> Yes the server load is practically 0. Note the difference between
local
> and
> remote execution of the command. I think you are right about the
network
> problem possibility. But it is bound to PostgreSQL. MySQL on the same
> machine (and same database content) had no problem.
>
> So are there any known issues with PostgreSQL on Windows sending data
to
> remote hosts connected via ODBC?
> What should I do to find out more debug info?

1. turn on all your logging and make sure we looking at the right place
(planner stats, etc).
2. run explain analyze and compare timings (which returns only explain
output).
3. do a select max(patternxml) test.t_umlpattern and observe the time.
4. do a select substr(patternxml, 1, 10) from test.t_umlpattern and
observe the time.
5. do select array_accum(q::text) from generate_series(1,10000) q;

if array_accum errors out, do:

CREATE AGGREGATE public.array_accum (
    sfunc = array_append,
    basetype = anyelement,
    stype = anyarray,
    initcond = '{}'
);

and observe the time.

Merlin

Re: Low performance on Windows problem

From
"Kevin Grittner"
Date:
This is sounding suspiciously similar to behavior I've seen with other types of TCP database connections when the
tcp-no-delayoption is not on.  Is it possible that the ODBC driver for Windows is not successfully setting this up? 

-Kevin


>>> Dalibor Sramek <dali@insula.cz> 09/13/05 9:34 AM >>>
On Tue, Sep 13, 2005 at 10:20:05AM -0400, Merlin Moncure wrote:
> I loaded your dump and was able to select entire table in trivial time
> from both pgAdmin and psql shell.  I am suspecting some type of tcp
> problem here.  Can you confirm slow times on unloaded server?

Did you run the select remotely on a Windows server?

Yes the server load is practically 0. Note the difference between local and
remote execution of the command. I think you are right about the network
problem possibility. But it is bound to PostgreSQL. MySQL on the same
machine (and same database content) had no problem.

So are there any known issues with PostgreSQL on Windows sending data to
remote hosts connected via ODBC?
What should I do to find out more debug info?


Re: Low performance on Windows problem

From
Tom Lane
Date:
Dalibor Sramek <dali@insula.cz> writes:
> select * from t_umlpattern limit 2
> takes 1500+ msec on the Windows machine and 60 on a comparable Linux
> machine. Both selects performed from remote PgAdmin.
> The same select performed localy on the windows machine takes 60 msec.

So it's a networking issue.  I haven't paid real close attention to
Windows problems, but I recall that we've heard a couple of reports
of Windows performance problems that were resolved by removing various
third-party network filters and/or installing Windows service pack
updates.  Check through the list archives ...

            regards, tom lane

Re: Low performance on Windows problem

From
Dalibor Sramek
Date:
On Tue, Sep 13, 2005 at 11:32:02AM -0400, Tom Lane wrote:
> So it's a networking issue.  I haven't paid real close attention to
> ...
> updates.  Check through the list archives ...

This one
http://archives.postgresql.org/pgsql-performance/2005-06/msg00593.php

seems to be very similar to my problem. Somebody suggested that setting
TCP_NODELAY option to the TCP connection may help. Before I dive into the
source: could some win-pg guru tell me if the Windows server tries to set
this option? Is it possible to change it via configuration? Is there a way
to find out if the TCP_NODELAY option was actually used for a connection?

Anyway thank you all. I believe I am getting closer to a solution.

Dalibor Sramek

--
Dalibor Sramek  http://www.insula.cz/dali             \ In the eyes of cats
 /              dalibor.sramek@insula.cz               \  all things
/      >H blog  http://www.transhumanismus.cz/blog.php  \   belong to cats.

Re: Low performance on Windows problem

From
Dalibor Sramek
Date:
On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote:
> 5. do select array_accum(q::text) from generate_series(1,10000) q;

I made the tests you suggested and the pattern is clear. The difference
between local and remote command execution is caused by moving data over
the network. E.g. the command above takes 700 ms locally and 1500 ms
remotely. Remote explain analyze takes exactly the 700 ms.

I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm
and the measured throughput between the two machines is over 10000 kB/s.
PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer
speed. So the difference between local and remote execution should IMHO stay
in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem.

Just for the record: the server PC is Dell Precision 330 with 3Com 3C920
integrated network card. OS MS Windows Professional 2002 with service pack
2. There is Symantec Antivirus installed - which I have (hopefully)
completely disabled.

Thanks for any help

Dalibor Sramek

--
Dalibor Sramek  http://www.insula.cz/dali             \ In the eyes of cats
 /              dalibor.sramek@insula.cz               \  all things
/      >H blog  http://www.transhumanismus.cz/blog.php  \   belong to cats.

Re: Low performance on Windows problem

From
"Kevin Grittner"
Date:
(1)  Latency and throughput don't necessarily correlate well.  When blasting
quantities of data to test throughput, TCP_NODELAY might not matter
much -- a full buffer will be sent without a delay anyway.  What do you get
on a ping while running the throughput test?

(2)  Besides the TCP_NODELAY issue, another issue which has caused
similar problems is a mismatch between half duplex and full duplex in the
configuration of the switch and the server.  Sometimes auto-negotiate
doesn't work as advertised; you might want to try setting the configuration
explicitly, if you aren't already doing so.

-Kevin


>>> Dalibor Sramek <dali@insula.cz> 09/14/05 8:02 AM >>>
On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote:
> 5. do select array_accum(q::text) from generate_series(1,10000) q;

I made the tests you suggested and the pattern is clear. The difference
between local and remote command execution is caused by moving data over
the network. E.g. the command above takes 700 ms locally and 1500 ms
remotely. Remote explain analyze takes exactly the 700 ms.

I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm
and the measured throughput between the two machines is over 10000 kB/s.
PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer
speed. So the difference between local and remote execution should IMHO stay
in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem.

Just for the record: the server PC is Dell Precision 330 with 3Com 3C920
integrated network card. OS MS Windows Professional 2002 with service pack
2. There is Symantec Antivirus installed - which I have (hopefully)
completely disabled.

Thanks for any help

Dalibor Sramek


Re: Low performance on Windows problem

From
"Merlin Moncure"
Date:
> in the 10 ms range. Definitely not 800 ms. The 8.1 has the same
problem.
>
> Just for the record: the server PC is Dell Precision 330 with 3Com
3C920
> integrated network card. OS MS Windows Professional 2002 with service
pack
> 2. There is Symantec Antivirus installed - which I have (hopefully)
> completely disabled.

Try throwing in another network card and see if it helps.  Next step is
to try twinking tcp settings
(http://support.microsoft.com/default.aspx?scid=kb;en-us;314053) and see
if that helps.  Beyond that, try playing the update driver game.  If you
are still having problems, try receiving bigger and bigger results to
see where problem occurs. 1-2k range suggests mtu problem, 4-8k range
suggests tcp receive window problem.

Beyond that, I'm stumped, uh, buy Opteron? :)

Merlin