Thread: Low performance on Windows problem
Hello. I would like to build a shared repository for Enterprise Architect (http://www.sparxsystems.com.au/ea.htm) using PostgreSQL. I have done it before with Linux and FreeBSD servers and everything was working out of the box. The repository is pretty simple database with less than 100 tables (the schema is at http://www.sparxsystems.com.au/downloads/corp/Postgres_Basemodel.sql). The problem is that at the moment I have only a Windows XP "server" at my disposal. I have installed PostgreSQL 8.0.3 for Windows and set the repository up. Unfortunately the performance is unacceptable: every operation with the model stored in the repository is by the order of magnitude slower than on the FreeBSD server with half as good hardware. (BTW CPU load is nearly 0, network load is under 5%, the machine has 1GB RAM and the database size is 14MB.) I have tried to: - tweak the postgresql.conf - no apparent change - kill all unnecessary services - no apparent change - install MySQL on the same machine to compare - it is as fast as PostgreSQL on FreeBSD (= way faster than PG on the machine) Anyway I believe the problem is in the Win PostgreSQL server but I have no idea where to look and neither do I have much time to spend. (Also I really do not want to run MySQL ;-) Any suggestions are welcome. Thanks Dalibor Sramek P.S. More information about EA PGSQL repositories: http://sparxsystems.com.au/resources/corporate/ http://sparxsystems.com.au/EAUserGuide/index.html?connecttoapostgresqlreposi.htm http://sparxsystems.com.au/EAUserGuide/index.html?setupapostgresqlodbcdriver.htm -- Dalibor Sramek http://www.insula.cz/dali \ In the eyes of cats / dalibor.sramek@insula.cz \ all things / >H blog http://www.transhumanismus.cz/blog.php \ belong to cats.
> Hello. > > I would like to build a shared repository for Enterprise Architect > (http://www.sparxsystems.com.au/ea.htm) using PostgreSQL. I have done it > before with Linux and FreeBSD servers and everything was working out of > the > box. The repository is pretty simple database with less than 100 tables > (the > schema is at > http://www.sparxsystems.com.au/downloads/corp/Postgres_Basemodel.sql). > > The problem is that at the moment I have only a Windows XP "server" at my > disposal. I have installed PostgreSQL 8.0.3 for Windows and set the > repository up. Unfortunately the performance is unacceptable: every > operation with the model stored in the repository is by the order of > magnitude slower than on the FreeBSD server with half as good hardware. > (BTW CPU load is nearly 0, network load is under 5%, the machine has 1GB > RAM and the database size is 14MB.) > > I have tried to: > - tweak the postgresql.conf - no apparent change > - kill all unnecessary services - no apparent change > - install MySQL on the same machine to compare - it is as fast as > PostgreSQL > on FreeBSD (= way faster than PG on the machine) Can you give specific examples of cases that are not performing like you expect? If possible, give a few queries with explain analyze times and all that. Are you syncing your data? Win32 fsync is about 1/3 as fast as linux fsync, although this was changed to fsync_writethrough for clarification purposes. Merlin
On Tue, Sep 13, 2005 at 07:58:20AM -0400, Merlin Moncure wrote: > Can you give specific examples of cases that are not performing like you > expect? If possible, give a few queries with explain analyze times and > all that. O.K. I have found one particular problem: 2005-09-13 14:43:02 LOG: statement: declare SQL_CUR03949008 cursor for SELECT * FROM t_umlpattern 2005-09-13 14:43:02 LOG: duration: 0.000 ms 2005-09-13 14:43:02 LOG: statement: fetch 1000 in SQL_CUR03949008 2005-09-13 14:43:22 LOG: duration: 20185.000 ms This command is executed while a model is loaded from the repository. The table definition is: CREATE TABLE t_umlpattern ( PatternID INTEGER DEFAULT nextval('"patternid_seq"'::text) NOT NULL PRIMARY KEY, PatternCategory VARCHAR(100), PatternName VARCHAR(150), Style VARCHAR(250), Notes TEXT, PatternXML TEXT, Version VARCHAR(50) ); It has just 23 rows but the PatternXML column is rather large. The table dump has over 900 kB. Now select * from t_umlpattern limit 2 takes 1500+ msec on the Windows machine and 60 on a comparable Linux machine. Both selects performed from remote PgAdmin. The same select performed localy on the windows machine takes 60 msec. So I guess the problem is in the transfer of the bigger amount of data from the Windows server. I put the dump at http://www.insula.cz/dali/misc/table.zip Could anybody confirm the difference? Thanks for any suggestions. Dalibor Sramek -- Dalibor Sramek http://www.insula.cz/dali \ In the eyes of cats / dalibor.sramek@insula.cz \ all things / >H blog http://www.transhumanismus.cz/blog.php \ belong to cats.
> On Tue, Sep 13, 2005 at 07:58:20AM -0400, Merlin Moncure wrote: > This command is executed while a model is loaded from the repository. > > The table definition is: > CREATE TABLE t_umlpattern ( > PatternID INTEGER DEFAULT nextval('"patternid_seq"'::text) NOT NULL > PRIMARY KEY, > PatternCategory VARCHAR(100), > PatternName VARCHAR(150), > Style VARCHAR(250), > Notes TEXT, > PatternXML TEXT, > Version VARCHAR(50) > ); > > It has just 23 rows but the PatternXML column is rather large. The table > dump has over 900 kB. > > Now > select * from t_umlpattern limit 2 > > takes 1500+ msec on the Windows machine and 60 on a comparable Linux > machine. Both selects performed from remote PgAdmin. > The same select performed localy on the windows machine takes 60 msec. > > So I guess the problem is in the transfer of the bigger amount of data > from > the Windows server. > > I put the dump at http://www.insula.cz/dali/misc/table.zip > > Could anybody confirm the difference? I loaded your dump and was able to select entire table in trivial time from both pgAdmin and psql shell. I am suspecting some type of tcp problem here. Can you confirm slow times on unloaded server? Merlin
On Tue, Sep 13, 2005 at 10:20:05AM -0400, Merlin Moncure wrote: > I loaded your dump and was able to select entire table in trivial time > from both pgAdmin and psql shell. I am suspecting some type of tcp > problem here. Can you confirm slow times on unloaded server? Did you run the select remotely on a Windows server? Yes the server load is practically 0. Note the difference between local and remote execution of the command. I think you are right about the network problem possibility. But it is bound to PostgreSQL. MySQL on the same machine (and same database content) had no problem. So are there any known issues with PostgreSQL on Windows sending data to remote hosts connected via ODBC? What should I do to find out more debug info? Thanks Dalibor Sramek -- Dalibor Sramek http://www.insula.cz/dali \ In the eyes of cats / dalibor.sramek@insula.cz \ all things / >H blog http://www.transhumanismus.cz/blog.php \ belong to cats.
> Did you run the select remotely on a Windows server? yes. > Yes the server load is practically 0. Note the difference between local > and > remote execution of the command. I think you are right about the network > problem possibility. But it is bound to PostgreSQL. MySQL on the same > machine (and same database content) had no problem. > > So are there any known issues with PostgreSQL on Windows sending data to > remote hosts connected via ODBC? > What should I do to find out more debug info? 1. turn on all your logging and make sure we looking at the right place (planner stats, etc). 2. run explain analyze and compare timings (which returns only explain output). 3. do a select max(patternxml) test.t_umlpattern and observe the time. 4. do a select substr(patternxml, 1, 10) from test.t_umlpattern and observe the time. 5. do select array_accum(q::text) from generate_series(1,10000) q; if array_accum errors out, do: CREATE AGGREGATE public.array_accum ( sfunc = array_append, basetype = anyelement, stype = anyarray, initcond = '{}' ); and observe the time. Merlin
This is sounding suspiciously similar to behavior I've seen with other types of TCP database connections when the tcp-no-delayoption is not on. Is it possible that the ODBC driver for Windows is not successfully setting this up? -Kevin >>> Dalibor Sramek <dali@insula.cz> 09/13/05 9:34 AM >>> On Tue, Sep 13, 2005 at 10:20:05AM -0400, Merlin Moncure wrote: > I loaded your dump and was able to select entire table in trivial time > from both pgAdmin and psql shell. I am suspecting some type of tcp > problem here. Can you confirm slow times on unloaded server? Did you run the select remotely on a Windows server? Yes the server load is practically 0. Note the difference between local and remote execution of the command. I think you are right about the network problem possibility. But it is bound to PostgreSQL. MySQL on the same machine (and same database content) had no problem. So are there any known issues with PostgreSQL on Windows sending data to remote hosts connected via ODBC? What should I do to find out more debug info?
Dalibor Sramek <dali@insula.cz> writes: > select * from t_umlpattern limit 2 > takes 1500+ msec on the Windows machine and 60 on a comparable Linux > machine. Both selects performed from remote PgAdmin. > The same select performed localy on the windows machine takes 60 msec. So it's a networking issue. I haven't paid real close attention to Windows problems, but I recall that we've heard a couple of reports of Windows performance problems that were resolved by removing various third-party network filters and/or installing Windows service pack updates. Check through the list archives ... regards, tom lane
On Tue, Sep 13, 2005 at 11:32:02AM -0400, Tom Lane wrote: > So it's a networking issue. I haven't paid real close attention to > ... > updates. Check through the list archives ... This one http://archives.postgresql.org/pgsql-performance/2005-06/msg00593.php seems to be very similar to my problem. Somebody suggested that setting TCP_NODELAY option to the TCP connection may help. Before I dive into the source: could some win-pg guru tell me if the Windows server tries to set this option? Is it possible to change it via configuration? Is there a way to find out if the TCP_NODELAY option was actually used for a connection? Anyway thank you all. I believe I am getting closer to a solution. Dalibor Sramek -- Dalibor Sramek http://www.insula.cz/dali \ In the eyes of cats / dalibor.sramek@insula.cz \ all things / >H blog http://www.transhumanismus.cz/blog.php \ belong to cats.
On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote: > 5. do select array_accum(q::text) from generate_series(1,10000) q; I made the tests you suggested and the pattern is clear. The difference between local and remote command execution is caused by moving data over the network. E.g. the command above takes 700 ms locally and 1500 ms remotely. Remote explain analyze takes exactly the 700 ms. I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm and the measured throughput between the two machines is over 10000 kB/s. PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer speed. So the difference between local and remote execution should IMHO stay in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem. Just for the record: the server PC is Dell Precision 330 with 3Com 3C920 integrated network card. OS MS Windows Professional 2002 with service pack 2. There is Symantec Antivirus installed - which I have (hopefully) completely disabled. Thanks for any help Dalibor Sramek -- Dalibor Sramek http://www.insula.cz/dali \ In the eyes of cats / dalibor.sramek@insula.cz \ all things / >H blog http://www.transhumanismus.cz/blog.php \ belong to cats.
(1) Latency and throughput don't necessarily correlate well. When blasting quantities of data to test throughput, TCP_NODELAY might not matter much -- a full buffer will be sent without a delay anyway. What do you get on a ping while running the throughput test? (2) Besides the TCP_NODELAY issue, another issue which has caused similar problems is a mismatch between half duplex and full duplex in the configuration of the switch and the server. Sometimes auto-negotiate doesn't work as advertised; you might want to try setting the configuration explicitly, if you aren't already doing so. -Kevin >>> Dalibor Sramek <dali@insula.cz> 09/14/05 8:02 AM >>> On Tue, Sep 13, 2005 at 11:05:00AM -0400, Merlin Moncure wrote: > 5. do select array_accum(q::text) from generate_series(1,10000) q; I made the tests you suggested and the pattern is clear. The difference between local and remote command execution is caused by moving data over the network. E.g. the command above takes 700 ms locally and 1500 ms remotely. Remote explain analyze takes exactly the 700 ms. I downloaded PCATTCP - http://www.pcausa.com/Utilities/pcattcp.htm and the measured throughput between the two machines is over 10000 kB/s. PCATTCP allows setting TCP_NODELAY but it had no effect on the transfer speed. So the difference between local and remote execution should IMHO stay in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem. Just for the record: the server PC is Dell Precision 330 with 3Com 3C920 integrated network card. OS MS Windows Professional 2002 with service pack 2. There is Symantec Antivirus installed - which I have (hopefully) completely disabled. Thanks for any help Dalibor Sramek
> in the 10 ms range. Definitely not 800 ms. The 8.1 has the same problem. > > Just for the record: the server PC is Dell Precision 330 with 3Com 3C920 > integrated network card. OS MS Windows Professional 2002 with service pack > 2. There is Symantec Antivirus installed - which I have (hopefully) > completely disabled. Try throwing in another network card and see if it helps. Next step is to try twinking tcp settings (http://support.microsoft.com/default.aspx?scid=kb;en-us;314053) and see if that helps. Beyond that, try playing the update driver game. If you are still having problems, try receiving bigger and bigger results to see where problem occurs. 1-2k range suggests mtu problem, 4-8k range suggests tcp receive window problem. Beyond that, I'm stumped, uh, buy Opteron? :) Merlin