Thread: Re: Performance question (stripped down the problem)
> Hello once more, > > I stripped the database down to one single table with only the relevant > columns. A pg_dump can be found under > > http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2 > > I would really like it if you try the query > Hi Andreas I tried it. See my actions below. The main performance boost is reached by creating an index and disabling the sequential scan: ------------------------------------------------------- Without any index; ------------------------------------------------------- time psql tt <<END SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; END real 0m18.128s user 0m0.010s sys 0m0.010s ------------------------------------------------------- create index ix_1 on hauptdaten_fall(meldekategorie); ------------------------------------------------------- Same statement real 0m18.259s user 0m0.020s sys 0m0.010s no difference ------------------------------------------------------- now disable seqscan: ------------------------------------------------------- time psql tt <<END set enable_seqscan = off; SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; END real 0m3.701s user 0m0.010s sys 0m0.000s Best regards Herbie -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Herbert Liechti http://www.thinx.ch ThinX networked business services Adlergasse 5, CH-4500 Solothurn ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On Thu, 20 Sep 2001, Herbert Liechti wrote: > I tried it. See my actions below. The main performance boost is > reached by creating an index and disabling the sequential scan: Thanks. I tried this and it helps in dead (see below). > ------------------------------------------------------- > create index ix_1 on hauptdaten_fall(meldekategorie); I did so before for in other tests. > ------------------------------------------------------- > > Same statement > > real 0m18.259s > user 0m0.020s > sys 0m0.010s The same on my machine: real 0m18.128s user 0m0.070s sys 0m0.010s > ------------------------------------------------------- > now disable seqscan: > ------------------------------------------------------- > time psql tt <<END > set enable_seqscan = off; > SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz > FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY > Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; > END > > > real 0m3.701s > user 0m0.010s > sys 0m0.000s real 0m5.905s user 0m0.060s sys 0m0.030s I have no real explanation why I have the same result in the first case but significant more time for the second. but it helps for the first step. On the other hand if I test my *real* database: real 0m20.539s user 0m0.060s sys 0m0.060s and with "set enable_seqscan = off;" real 0m10.133s user 0m0.040s sys 0m0.020s I get in fact an increase of speed by factor 2, but anyway it is even far to slow for our application. If I start a slightly more complex query (not to mention that we are far from the amount of data we will get after a year, I get the following stats: -- default -- set enable_seqscan = off; MS-SQL server real 0m30.891s real 0m27.165s about 1s user 0m0.050s user 0m0.080s sys 0m0.070s sys 0m0.050s <other query example> real 0m53.698s real 0m54.481s about 2.5s user 0m0.190s user 0m0.180s sys 0m0.040s sys 0m0.040s This is about factor 20 compared to the MS-SQL server and I have real hard arguing for PostgreSQL. In fact the MS-SQL server times are estimated from inside Access - the plain server would be even faster. By the way - the last example shows that enforcing index scan don´t necessarily makes the thing faster - perhaps it could even slow down for other queries?? I would be happy to forward the exact queries which lead to this measures if someone is interested. Kind regards Andreas.
On Thu, Sep 20, 2001 at 11:10:02AM +0200, Herbert Liechti wrote: > I tried it. See my actions below. The main performance boost is > reached by creating an index and disabling the sequential scan: > Without any index; > real 0m18.128s > user 0m0.010s > sys 0m0.010s > > Same statement with index > real 0m18.259s > user 0m0.020s > sys 0m0.010s > no difference > > now disable seqscan: > time psql tt <<END > set enable_seqscan = off; > real 0m3.701s > user 0m0.010s > sys 0m0.000s same here (dual PIII-866, Debian, 512 MB, raid1+0) real 0m6.472s user 0m0.000s sys 0m0.010s real 0m6.195s user 0m0.010s sys 0m0.000s real 0m2.885s user 0m0.010s sys 0m0.000s tinus
On Thu, 20 Sep 2001, Justin Clift wrote: > Sorry, I haven't seen the history of this thread. One question which > might be relevant is, have you adjusted the postgresql.conf file from > the default memory settings to be something better? I adjusted two parameters: shared_buffers = 2048 (When I tried 4096 I´ve got a connection error. Don´t know what this means, but anyway increasing of this value did not changed anything.) sort_mem = 2048 (After increasing this value (from default 512) to 1024 I got an increase in speed from 20s to 18s - not much but better than nothing. Further increase to 2048 did not change anything further so I stopped here.) > If these are the times you're getting from a default configuration, you > might be able to get far better results by doing performance tuning of > PostgreSQL and/or the server. Any other values which might help here? Kind regards Andreas.
Hi Andreas, Sorry, I haven't seen the history of this thread. One question which might be relevant is, have you adjusted the postgresql.conf file from the default memory settings to be something better? If these are the times you're getting from a default configuration, you might be able to get far better results by doing performance tuning of PostgreSQL and/or the server. What do you think? Regards and best wishes, Justin Clift "Tille, Andreas" wrote: > > On Thu, 20 Sep 2001, Herbert Liechti wrote: > > > I tried it. See my actions below. The main performance boost is > > reached by creating an index and disabling the sequential scan: > Thanks. I tried this and it helps in dead (see below). > > > ------------------------------------------------------- > > create index ix_1 on hauptdaten_fall(meldekategorie); > I did so before for in other tests. > > ------------------------------------------------------- > > > > Same statement > > > > real 0m18.259s > > user 0m0.020s > > sys 0m0.010s > The same on my machine: > > real 0m18.128s > user 0m0.070s > sys 0m0.010s > > > ------------------------------------------------------- > > now disable seqscan: > > ------------------------------------------------------- > > time psql tt <<END > > set enable_seqscan = off; > > SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz > > FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY > > Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; > > END > > > > > > real 0m3.701s > > user 0m0.010s > > sys 0m0.000s > > real 0m5.905s > user 0m0.060s > sys 0m0.030s > > I have no real explanation why I have the same result in the first > case but significant more time for the second. but it helps for the > first step. > > On the other hand if I test my *real* database: > > real 0m20.539s > user 0m0.060s > sys 0m0.060s > > and with "set enable_seqscan = off;" > > real 0m10.133s > user 0m0.040s > sys 0m0.020s > > I get in fact an increase of speed by factor 2, but anyway it is even > far to slow for our application. If I start a slightly more complex > query (not to mention that we are far from the amount of data we will > get after a year, I get the following stats: > > -- default -- set enable_seqscan = off; MS-SQL server > real 0m30.891s real 0m27.165s about 1s > user 0m0.050s user 0m0.080s > sys 0m0.070s sys 0m0.050s > > <other query example> > > real 0m53.698s real 0m54.481s about 2.5s > user 0m0.190s user 0m0.180s > sys 0m0.040s sys 0m0.040s > > This is about factor 20 compared to the MS-SQL server and I have > real hard arguing for PostgreSQL. In fact the MS-SQL server times > are estimated from inside Access - the plain server would be even > faster. > > By the way - the last example shows that enforcing index scan don´t > necessarily makes the thing faster - perhaps it could even slow down > for other queries?? > > I would be happy to forward the exact queries which lead to this > measures if someone is interested. > > Kind regards > > Andreas. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Thu, 20 Sep 2001, Einar Karttunen asked me for query plans for both M$ SQL and postgresql: M$ SQL: |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1005]))) |--Stream Aggregate(GROUP BY:([Hauptdaten_Fall].[MeldeKategorie]) DEFINE:([Expr1005]=Count(*))) |--Index Scan(OBJECT:([IfSG].[dbo].[Hauptdaten_Fall].[IX_MeldeKategorie]), ORDERED FORWARD) Postgresql: time psql ifsg <<... explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; ... NOTICE: QUERY PLAN: Aggregate (cost=32881.62..33768.91 rows=17746 width=16) -> Group (cost=32881.62..33325.27 rows=177458 width=16) -> Sort (cost=32881.62..32881.62 rows=177458 width=16) -> Seq Scan on hauptdaten_fall (cost=0.00..15024.12 rows=177458 width=16) real 0m1.382s user 0m0.040s sys 0m0.020s And the other case with enforcing index scan: time psql ifsg <<... set enable_seqscan = off; explain SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz FROM Hauptdaten_Fall WHERE (((Hauptdaten_Fall.IstAktuell)=20)) GROUP BY Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie; ... NOTICE: QUERY PLAN: Aggregate (cost=0.00..146770.97 rows=17746 width=16) -> Group (cost=0.00..146327.32 rows=177458 width=16) -> Index Scan using ix_meldekategorie_hauptdaten_fa on hauptdaten_fall (cost=0.00..145883.68 rows=177458 width=16) real 0m0.102s (for sure it´s faster to have a plan if enforced ...) user 0m0.030s sys 0m0.020s Does this help in any way? If I´m not completely wrong also M$ SQL server prefers to use the index ix_meldekategorie. Kind regards Andreas.
Hi Andreas, I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and running Linux Mandrake 8.0 First thing I did was to increase the amount of shared memory and stuff which Linux allows things to use : echo "kernel.shmall = 134217728" >> /etc/sysctl.conf echo "kernel.shmmax = 134217728" >> /etc/sysctl.conf For my system, that'll raise the shared memory limits to 128MB at system boot time. btw, the "134217728" figure = 128MB (128 * 1024 * 1024) Then I changed the limits for the running system (so no reboot is necessary) : echo 134217728 > /proc/sys/kernel/shmall echo 134217728 > /proc/sys/kernel/shmmax Then adjusted the postgresql.conf file with these values : sort_mem = 32768 shared_buffers = 220 Now, that's a bunch of shared_buffers, but at the same time I also raised the max_connections to 110. This seems to have dropped my execution times, but I haven't seriously gotten around to tuning this system. The key thing I think you've missed is to update the shared memory, etc. More info about it can be found at : http://www.postgresql.org/idocs/index.php?kernel-resources.html Bruce Momjian also put together some information about optimising things with PostgreSQL at : http://www.ca.postgresql.org/docs/hw_performance/ If you want to be able to benchmark things on your system, I use the "Open Source Database Benchmark" (Linux only at present), running the latest CVS version of it, and also tweaked to not use hash indices. A tarball of working source code is available at : http://techdocs.postgresql.org/techdocs/perftuningfigures.php Hope this is of assistance Andreas. Regards and best wishes, Justin Clift "Tille, Andreas" wrote: > > On Thu, 20 Sep 2001, Justin Clift wrote: > > > Sorry, I haven't seen the history of this thread. One question which > > might be relevant is, have you adjusted the postgresql.conf file from > > the default memory settings to be something better? > I adjusted two parameters: > > shared_buffers = 2048 > (When I tried 4096 I´ve got a connection error. Don´t know what this > means, but anyway increasing of this value did not changed anything.) > > sort_mem = 2048 > (After increasing this value (from default 512) to 1024 I got an > increase in speed from 20s to 18s - not much but better than nothing. > Further increase to 2048 did not change anything further so I stopped > here.) > > > If these are the times you're getting from a default configuration, you > > might be able to get far better results by doing performance tuning of > > PostgreSQL and/or the server. > Any other values which might help here? > > Kind regards > > Andreas. > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Fri, 21 Sep 2001, Justin Clift wrote: > Hi Andreas, > > I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and > running Linux Mandrake 8.0 > > First thing I did was to increase the amount of shared memory and stuff > which Linux allows things to use : > > echo "kernel.shmall = 134217728" >> /etc/sysctl.conf > echo "kernel.shmmax = 134217728" >> /etc/sysctl.conf > > For my system, that'll raise the shared memory limits to 128MB at system > boot time. > > btw, the "134217728" figure = 128MB (128 * 1024 * 1024) > > Then I changed the limits for the running system (so no reboot is > necessary) : > > echo 134217728 > /proc/sys/kernel/shmall > echo 134217728 > /proc/sys/kernel/shmmax > > Then adjusted the postgresql.conf file with these values : > > sort_mem = 32768 > shared_buffers = 220 > > Now, that's a bunch of shared_buffers, but at the same time I also > raised the max_connections to 110. 220 is much less than I have set before I posted my stats yesterday. I have set it to 2048. But adjusting kernel.shmall = 134217728 kernel.shmmax = 134217728 gave me a speed up by nearly factor 2! That could be a nice start for further increasing of memory. (Well, that machine has 2GB ;-) ... ) > This seems to have dropped my execution times, but I haven't seriously > gotten around to tuning this system. So yes, it has dropped my execution times from 20 times slower than MS-SQL to 10 times slower, i.e. I have to continue tuning my setup. > The key thing I think you've missed is to update the shared memory, > etc. More info about it can be found at : > > http://www.postgresql.org/idocs/index.php?kernel-resources.html > > Bruce Momjian also put together some information about optimising things > with PostgreSQL at : > > http://www.ca.postgresql.org/docs/hw_performance/ I´ve read both documents now and see no other parameter to adjust than shared_buffers. I have to admit that I´m not really sure if this parameter is responsible for the term "cache size" on page http://www.ca.postgresql.org/docs/hw_performance/node8.html (I would consider it to be helpfull if the relevant parameter would be mentioned in the text, Bruce.) I just post the parameters I changed on my system and the results I´ve got: /etc/sysctl.conf kernel.shmall = 134217728 kernel.shmmax = 134217728 fs.file-max = 16384 /etc/postgresql/postgresql.conf: max_connections = 256 shared_buffers = 2048 sort_mem = 32768 This setup gave me a speed increase from 56s to 33s for a certain query (the one which took the M$-SQL server 2.5s). Now I tried to increase sort_mem = 32768 shared_buffers = 4096 and got no real speed difference but I noticed an improved memory usage by top. So I continued increasing shared_buffers by doubling the size step by step. To enable a certain amount of shared_buffers I also had to adjust kernel.shmall and kernel.shmmax (I got errors otherwise). Here I post some parameter settings and corresponding memory usage measured by top and times for the query: kernel.shmall = 536870912 kernel.shmmax = 536870912 shared_buffers = 32768 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 769 postgres 16 0 78372 76M 52916 R 99.9 7.6 0:23 postmaster real 0m33.591s user 0m0.190s sys 0m0.040s kernel.shmall = 1073741824 kernel.shmmax = 1073741824 shared_buffers = 65536 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 865 postgres 17 0 80332 78M 54836 R 99.9 7.7 0:20 postmaster real 0m32.861s user 0m0.200s sys 0m0.010s kernel.shmall = 2147483648 kernel.shmmax = 2147483648 shared_buffers = 131072 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 1172 postgres 18 0 86572 84M 60748 R 99.9 8.3 0:22 postmaster 7.1.3: (I also tried PostgreSQL 7.1.3 all other values for 7.1.2) 2644 postgres 17 0 87088 84M 61264 R 99.9 8.4 0:29 postmaster set enable_seqscan = off; (quite the same speed but other mem-usage) 1205 postgres 18 0 85500 83M 59676 R 99.9 8.2 0:22 postmaster 7.1.3: 2631 postgres 15 0 81972 79M 56148 R 99.9 7.9 0:28 postmaster real 0m32.835s user 0m0.210s sys 0m0.050s I noticed no real difference in speed in all this tests but I observed an increased need of memory usage. There was no difference in query speed if I enabled or disabled index scan and between PostgreSQL version 7.1.2 and 7.1.3. Furthermore I wonder about the following fact: I see no real difference in speed if I start the query immediately after restarting postmaster and redoing the same query. In my opinion the first query should fill the relevant tables into memory cache which should take some time but the second query should be faster because the cache is just filled. So I wonder if it makes sense if I continue increasing those values until I observe this difference or if I don´t see any increase in memory usage by top. I think I could spend some more memory on this task currently because it is less than 10% memory usage and there is no swap at all on the machine. # vmstat procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 0 176216 21536 1576744 0 0 1 3 2 7 0 0 24 > If you want to be abye to benchmark things on your system, I use the > "Open Source Database Benchmark" (Linux only at present), running the > latest CVS version of it, and also tweaked to not use hash indices. A > tarball of working source code is available at : > > http://techdocs.postgresql.org/techdocs/perftuningfigures.php Thanks. I don´t want to do *any* benchmark. Only my application is relevant even if PostgreSQL outperforms other databases in any benchmark. > Hope this is of assistance Andreas. It was of assistance for sure and I hope further tuning brings me near the target. Thanks Justin. Kind regards Andreas.
Hi Andreas, Interesting problem - Like Justin I am running Postgres 7.1.3 + Mandrake 8. Your query : SELECT hauptdaten_fall.meldekategorie, count(hauptdaten_fall.id) AS Anz FROM hauptdaten_fall WHERE (((hauptdaten_fall.istaktuell)=20)) GROUP BY hauptdaten_fall.meldekategorie ORDER BY hauptdaten_fall.meldekategorie takes 19 s on my old hardware ( P2 333Mhz + 384Mb + 2x20Gb ATA) I can get some improvement by making sort_mem=20480 : The query then takes 8s - still a bit slow - Mysql does it in 2 s :-( Increasing sort_mem>20480 did not give any further improvement. Creating indexes, clustering,fooling about with enable_seqscan + cpu_tuple_cost did not help at all. As noted by others it appears that data access is not the issue - as the query : SELECT count(hauptdaten_fall.id) AS Anz FROM hauptdaten_fall WHERE (((hauptdaten_fall.istaktuell)=20)) takes only 2 s - so I am guessing that the 8s result is about as good as can be gotten without delving into the PG code for GROUP BY access. A slightly complex workaround for better performance is to use a summary table : CREATE TABLE hauptdaten_fall_sum(meldekategorie varchar(10), istaktuell integer, cnt integer) and maintain it via triggers on hauptdaten_fall The offending query then becomes : SELECT meldekategorie,cnt FROM hauptdaten_fall_sum WHERE istaktuell=20; which is unmeasurably fast ( i.e 0 s ) on my system. Hope it helps or gives food for thought regards Mark
Hi Andreas, Good to hear this has been of benefit. From reading your email, you haven't altered the value of sort_mem, just shared_buffers. It might be worthwhile checking things out with sort_mem at different levels too before homing in on the "best" value(s) for your application & setup. :-) Regards and best wisehs, Justin Clift "Tille, Andreas" wrote: > > On Fri, 21 Sep 2001, Justin Clift wrote: > > > Hi Andreas, > > > > I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and > > running Linux Mandrake 8.0 > > > > First thing I did was to increase the amount of shared memory and stuff > > which Linux allows things to use : > > > > echo "kernel.shmall = 134217728" >> /etc/sysctl.conf > > echo "kernel.shmmax = 134217728" >> /etc/sysctl.conf > > > > For my system, that'll raise the shared memory limits to 128MB at system > > boot time. > > > > btw, the "134217728" figure = 128MB (128 * 1024 * 1024) > > > > Then I changed the limits for the running system (so no reboot is > > necessary) : > > > > echo 134217728 > /proc/sys/kernel/shmall > > echo 134217728 > /proc/sys/kernel/shmmax > > > > Then adjusted the postgresql.conf file with these values : > > > > sort_mem = 32768 > > shared_buffers = 220 > > > > Now, that's a bunch of shared_buffers, but at the same time I also > > raised the max_connections to 110. > 220 is much less than I have set before I posted my stats yesterday. > I have set it to 2048. But adjusting > kernel.shmall = 134217728 > kernel.shmmax = 134217728 > gave me a speed up by nearly factor 2! That could be a nice start > for further increasing of memory. (Well, that machine has 2GB ;-) ... ) > > > This seems to have dropped my execution times, but I haven't seriously > > gotten around to tuning this system. > So yes, it has dropped my execution times from 20 times slower than > MS-SQL to 10 times slower, i.e. I have to continue tuning my setup. > > > The key thing I think you've missed is to update the shared memory, > > etc. More info about it can be found at : > > > > http://www.postgresql.org/idocs/index.php?kernel-resources.html > > > > Bruce Momjian also put together some information about optimising things > > with PostgreSQL at : > > > > http://www.ca.postgresql.org/docs/hw_performance/ > I´ve read both documents now and see no other parameter to adjust than > shared_buffers. I have to admit that I´m not really sure if this > parameter is responsible for the term "cache size" on page > http://www.ca.postgresql.org/docs/hw_performance/node8.html > (I would consider it to be helpfull if the relevant parameter would > be mentioned in the text, Bruce.) > > I just post the parameters I changed on my system and the results I´ve got: > > /etc/sysctl.conf > kernel.shmall = 134217728 > kernel.shmmax = 134217728 > fs.file-max = 16384 > > /etc/postgresql/postgresql.conf: > max_connections = 256 > shared_buffers = 2048 > sort_mem = 32768 > > This setup gave me a speed increase from 56s to 33s for a certain query > (the one which took the M$-SQL server 2.5s). > > Now I tried to increase > sort_mem = 32768 > shared_buffers = 4096 > > and got no real speed difference but I noticed an improved memory usage by > top. So I continued increasing shared_buffers by doubling the size step > by step. To enable a certain amount of shared_buffers I also had to adjust > kernel.shmall and kernel.shmmax (I got errors otherwise). > > Here I post some parameter settings and corresponding memory usage > measured by top and times for the query: > > kernel.shmall = 536870912 > kernel.shmmax = 536870912 > > shared_buffers = 32768 > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 769 postgres 16 0 78372 76M 52916 R 99.9 7.6 0:23 postmaster > > real 0m33.591s > user 0m0.190s > sys 0m0.040s > > kernel.shmall = 1073741824 > kernel.shmmax = 1073741824 > > shared_buffers = 65536 > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 865 postgres 17 0 80332 78M 54836 R 99.9 7.7 0:20 postmaster > > real 0m32.861s > user 0m0.200s > sys 0m0.010s > > kernel.shmall = 2147483648 > kernel.shmmax = 2147483648 > > shared_buffers = 131072 > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 1172 postgres 18 0 86572 84M 60748 R 99.9 8.3 0:22 postmaster > > 7.1.3: (I also tried PostgreSQL 7.1.3 all other values for 7.1.2) > 2644 postgres 17 0 87088 84M 61264 R 99.9 8.4 0:29 postmaster > > set enable_seqscan = off; (quite the same speed but other mem-usage) > 1205 postgres 18 0 85500 83M 59676 R 99.9 8.2 0:22 postmaster > > 7.1.3: > 2631 postgres 15 0 81972 79M 56148 R 99.9 7.9 0:28 postmaster > > real 0m32.835s > user 0m0.210s > sys 0m0.050s > > I noticed no real difference in speed in all this tests but I observed > an increased need of memory usage. There was no difference in query > speed if I enabled or disabled index scan and between PostgreSQL version > 7.1.2 and 7.1.3. > > Furthermore I wonder about the following fact: I see no real difference > in speed if I start the query immediately after restarting postmaster > and redoing the same query. In my opinion the first query should fill > the relevant tables into memory cache which should take some time but the > second query should be faster because the cache is just filled. > > So I wonder if it makes sense if I continue increasing those values > until I observe this difference or if I don´t see any increase in > memory usage by top. I think I could spend some more memory on this > task currently because it is less than 10% memory usage and there > is no swap at all on the machine. > > # vmstat > procs memory swap io system cpu > r b w swpd free buff cache si so bi bo in cs us sy id > 1 0 0 0 176216 21536 1576744 0 0 1 3 2 7 0 0 24 > > > If you want to be abye to benchmark things on your system, I use the > > "Open Source Database Benchmark" (Linux only at present), running the > > latest CVS version of it, and also tweaked to not use hash indices. A > > tarball of working source code is available at : > > > > http://techdocs.postgresql.org/techdocs/perftuningfigures.php > Thanks. I don´t want to do *any* benchmark. Only my application is > relevant even if PostgreSQL outperforms other databases in any > benchmark. > > > Hope this is of assistance Andreas. > It was of assistance for sure and I hope further tuning brings me near > the target. Thanks Justin. > > Kind regards > > Andreas. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Sat, 22 Sep 2001, Justin Clift wrote: > Good to hear this has been of benefit. From reading your email, you > haven't altered the value of sort_mem, just shared_buffers. May be I was a little bit unclear. I stopped increasing sort_mem after there was no siginificant increase in speed. My latest value was sort_mem = 65536 which is higher than any suggestion I´ve got here. I observed no increase in speed against sort_mem = 32768 and sort_mem = 131072 > It might be worthwhile checking things out with sort_mem at different > levels too before homing in on the "best" value(s) for your application > & setup. I observed the only difference when I increased shared_buffers. This difference was not significant regarding speed but memory usage (measured by top). Today I observed that I seemed to reach the border for simply doubling the shared_buffers value. If I set shared_buffers = 262144 and double the kernel.shmall and kernel.shmmax values in /etc/sysctl.conf postmaster says: psql: connectDBStart() -- connect() failed: No such file or directory Is the postmaster running locally and accepting connections on Unix socket '/var/run/postgresql/.s.PGSQL.5432'? (this is the same error I´ve got when I increased shared_buffers without adjusting sysctl parameters.) So my question remains: Why isn´t the databased cached if there is enough space to do so? I think it is not cached because there is no speed difference in doing the first query after Server restart and repeating the same query. Moreover top shows an continual increase of memory usage over a longer query. If the database would be cached the memory usage should be high also at the start of the query (IMHO). Kind regards Andreas.
On Sat, 22 Sep 2001, Mark kirkwood wrote: > Interesting problem :) Hope we can find also an interesting solution ;-). > takes 19 s on my old hardware ( P2 333Mhz + 384Mb + 2x20Gb ATA) > > I can get some improvement by making sort_mem=20480 : > > The query then takes 8s - still a bit slow - Mysql does it in 2 s :-( > Increasing sort_mem>20480 did not give any further improvement. sort_mem = 65536 This gave no further improvement over sort_mem = 32768 and my box takes about 5s for this task, which is more than one order of magnitude higher than M$-SQL server. > As noted by others it appears that data access is not the issue - as the > query : > > SELECT > count(hauptdaten_fall.id) AS Anz > FROM hauptdaten_fall > WHERE (((hauptdaten_fall.istaktuell)=20)) > > takes only 2 s - so I am guessing that the 8s result is about as good as can > be gotten without delving into the PG code for GROUP BY access. I did the same thing some days ago. Even if the GROUP BY is detected as the reason for the biggest part of slowing down - the M$-SQL server solves the whole task faster than PostgreSQL is doing this simple select. > A slightly complex workaround for better performance is to use a summary > table : > > CREATE TABLE hauptdaten_fall_sum(meldekategorie varchar(10), > istaktuell integer, > cnt integer) > > and maintain it via triggers on hauptdaten_fall > > The offending query then becomes : > > SELECT meldekategorie,cnt > FROM hauptdaten_fall_sum > WHERE istaktuell=20; > > which is unmeasurably fast ( i.e 0 s ) on my system. Perhaps I have to think about such workarounds. The problem is that this was just the simplest example of several much more complex ones. This would mean I have to rework each query and have to do several speed tests. This makes the port very hard and perhaps the intended replication setup impossible. Kind regards Andreas.
On Thu, Sep 20, 2001 at 11:10:02AM +0200, Herbert Liechti wrote: > I tried it. See my actions below. The main performance boost is > reached by creating an index and disabling the sequential scan: > Without any index; > real 0m18.128s > user 0m0.010s > sys 0m0.010s > > Same statement with index > real 0m18.259s > user 0m0.020s > sys 0m0.010s > no difference > > now disable seqscan: > time psql tt <<END > set enable_seqscan = off; > real 0m3.701s > user 0m0.010s > sys 0m0.000s same here (dual PIII-866, Debian, 512 MB, raid1+0) real 0m6.472s user 0m0.000s sys 0m0.010s real 0m6.195s user 0m0.010s sys 0m0.000s real 0m2.885s user 0m0.010s sys 0m0.000s tinus
> same here (dual PIII-866, Debian, 512 MB, raid1+0) > > real 0m6.472s > user 0m0.000s > sys 0m0.010s > > real 0m6.195s > user 0m0.010s > sys 0m0.000s > > real 0m2.885s > user 0m0.010s > sys 0m0.000s This is interesting, just yesterday I was perusing some of Bruce Momjian's works on PG tuning, and noticed that Postgres prefers sequential scans over indexes when much of the table has to be read, all because of the number of head movements on the disk. It would seem that these days, where RAM is cheap, that most people have a great enough disk cache that head movements can become irrelevant. However, I can also see where some people may have incredibly large tables that just won't fit into RAM. An easy solution to both might be to create a user-specifiable switch passed at startup that would simply tell PG that sequentials aren't necessarily better than index scans. Not completely disabling them, but at least giving it a pointer that it doesn't *have* to use sequentials. steve
On Thu, Sep 27, 2001 at 11:18:31AM -0600, Steve Wolfe wrote: > This is interesting, just yesterday I was perusing some of Bruce > Momjian's works on PG tuning, and noticed that Postgres prefers sequential > scans over indexes when much of the table has to be read, all because of > the number of head movements on the disk. It would seem that these days, > where RAM is cheap, that most people have a great enough disk cache that > head movements can become irrelevant. > > However, I can also see where some people may have incredibly large > tables that just won't fit into RAM. An easy solution to both might be to > create a user-specifiable switch passed at startup that would simply tell > PG that sequentials aren't necessarily better than index scans. Not > completely disabling them, but at least giving it a pointer that it > doesn't *have* to use sequentials. There is a user specifieable value somewhere that controls how expensive an index scan is and how expensive a seqential scan is. By tuning those you could probably get the effect you want. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
> There is a user specifieable value somewhere that controls how expensive an > index scan is and how expensive a seqential scan is. By tuning those you > could probably get the effect you want. Hmmm.... nice. Can anybody point me to that value? steve
On Fri, Sep 28, 2001 at 10:25:13AM -0600, Steve Wolfe wrote: > > There is a user specifieable value somewhere that controls how expensive > an > > index scan is and how expensive a seqential scan is. By tuning those you > > could probably get the effect you want. > > Hmmm.... nice. Can anybody point me to that value? In the documentation, section 3.4.1 here. PostgreSQL 7.1.3 Administrator's Guide > Server Runtime Environment > Planner and Optimizer Tuning There's a whole bunch of tunable variables. It does say that there is not much idea of what values are good so any experiments would be helpful. HTH, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Magnetism, electricity and motion are like a three-for-two special offer: > if you have two of them, the third one comes free.
On Thu, 27 Sep 2001, Steve Wolfe wrote: > This is interesting, just yesterday I was perusing some of Bruce > Momjian's works on PG tuning, and noticed that Postgres prefers sequential > scans over indexes when much of the table has to be read, all because of > the number of head movements on the disk. It would seem that these days, > where RAM is cheap, that most people have a great enough disk cache that > head movements can become irrelevant. > > However, I can also see where some people may have incredibly large > tables that just won't fit into RAM. An easy solution to both might be to > create a user-specifiable switch passed at startup that would simply tell > PG that sequentials aren't necessarily better than index scans. Not > completely disabling them, but at least giving it a pointer that it > doesn't *have* to use sequentials. The problem is that *both* methods are to slow for my application :-(. Kind regards Andreas.