Thread: Re: Performance question (stripped down the problem)

Re: Performance question (stripped down the problem)

From
Herbert Liechti
Date:
> Hello once more,
>
> I stripped the database down to one single table with only the relevant
> columns.  A pg_dump can be found under
>
>     http://www.foodborne-net.de/~tillea/ifsg/ifsgtest.dump.bz2
>
> I would really like it if you try the query
>
Hi Andreas
I tried it. See my actions below. The main performance boost is
reached by creating an index and disabling the sequential scan:


-------------------------------------------------------
Without any index;
-------------------------------------------------------

time psql tt <<END
SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY
Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
END

real    0m18.128s
user    0m0.010s
sys     0m0.010s


-------------------------------------------------------
create index ix_1 on hauptdaten_fall(meldekategorie);
-------------------------------------------------------

Same statement

real    0m18.259s
user    0m0.020s
sys     0m0.010s

no difference

-------------------------------------------------------
now disable seqscan:
-------------------------------------------------------
time psql tt <<END
set enable_seqscan = off;
SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY
Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
END


real    0m3.701s
user    0m0.010s
sys     0m0.000s


Best regards
Herbie
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Herbert Liechti                                  http://www.thinx.ch
ThinX networked business services    Adlergasse 5, CH-4500 Solothurn
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: Performance question (stripped down the problem)

From
"Tille, Andreas"
Date:
On Thu, 20 Sep 2001, Herbert Liechti wrote:

> I tried it. See my actions below. The main performance boost is
> reached by creating an index and disabling the sequential scan:
Thanks.  I tried this and it helps in dead (see below).

> -------------------------------------------------------
> create index ix_1 on hauptdaten_fall(meldekategorie);
I did so before for in other tests.
> -------------------------------------------------------
>
> Same statement
>
> real    0m18.259s
> user    0m0.020s
> sys     0m0.010s
The same on my machine:

real    0m18.128s
user    0m0.070s
sys     0m0.010s

> -------------------------------------------------------
> now disable seqscan:
> -------------------------------------------------------
> time psql tt <<END
> set enable_seqscan = off;
> SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
> FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY
> Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
> END
>
>
> real    0m3.701s
> user    0m0.010s
> sys     0m0.000s

real    0m5.905s
user    0m0.060s
sys     0m0.030s

I have no real explanation why I have the same result in the first
case but significant more time for the second. but it helps for the
first step.

On the other hand if I test my *real* database:

real    0m20.539s
user    0m0.060s
sys     0m0.060s

and with "set enable_seqscan = off;"

real    0m10.133s
user    0m0.040s
sys     0m0.020s

I get in fact an increase of speed by factor 2, but anyway it is even
far to slow for our application.  If I start a slightly more complex
query (not to mention that we are far from the amount of data we will
get after a year, I get the following stats:

  -- default --         set enable_seqscan = off;         MS-SQL server
real    0m30.891s         real    0m27.165s               about 1s
user    0m0.050s          user    0m0.080s
sys     0m0.070s          sys     0m0.050s

<other query example>

real    0m53.698s         real    0m54.481s               about 2.5s
user    0m0.190s          user    0m0.180s
sys     0m0.040s          sys     0m0.040s

This is about factor 20 compared to the MS-SQL server and I have
real hard arguing for PostgreSQL.  In fact the MS-SQL server times
are estimated from inside Access - the plain server would be even
faster.

By the way - the last example shows that enforcing index scan don´t
necessarily makes the thing faster - perhaps it could even slow down
for other queries??

I would be happy to forward the exact queries which lead to this
measures if someone is interested.

Kind regards

         Andreas.

Re: Performance question (stripped down the problem)

From
z.nijmeyers@cable.a2000.nl
Date:
On Thu, Sep 20, 2001 at 11:10:02AM +0200, Herbert Liechti wrote:
> I tried it. See my actions below. The main performance boost is
> reached by creating an index and disabling the sequential scan:
> Without any index;
> real    0m18.128s
> user    0m0.010s
> sys     0m0.010s
>
> Same statement with index
> real    0m18.259s
> user    0m0.020s
> sys     0m0.010s
> no difference
>
> now disable seqscan:
> time psql tt <<END
> set enable_seqscan = off;
> real    0m3.701s
> user    0m0.010s
> sys     0m0.000s


same here (dual PIII-866, Debian, 512 MB, raid1+0)

real    0m6.472s
user    0m0.000s
sys     0m0.010s

real    0m6.195s
user    0m0.010s
sys     0m0.000s

real    0m2.885s
user    0m0.010s
sys     0m0.000s


tinus

Re: Performance question (stripped down the problem)

From
"Tille, Andreas"
Date:
On Thu, 20 Sep 2001, Justin Clift wrote:

> Sorry, I haven't seen the history of this thread.  One question which
> might be relevant is, have you adjusted the postgresql.conf file from
> the default memory settings to be something better?
I adjusted two parameters:

shared_buffers = 2048
 (When I tried 4096 I´ve got a connection error. Don´t know what this
  means, but anyway increasing of this value did not changed anything.)

sort_mem = 2048
 (After increasing this value (from default 512) to 1024 I got an
  increase in speed from 20s to 18s - not much but better than nothing.
  Further increase to 2048 did not change anything further so I stopped
  here.)

> If these are the times you're getting from a default configuration, you
> might be able to get far better results by doing performance tuning of
> PostgreSQL and/or the server.
Any other values which might help here?

Kind regards

       Andreas.

Re: Performance question (stripped down the problem)

From
Justin Clift
Date:
Hi Andreas,

Sorry, I haven't seen the history of this thread.  One question which
might be relevant is, have you adjusted the postgresql.conf file from
the default memory settings to be something better?

If these are the times you're getting from a default configuration, you
might be able to get far better results by doing performance tuning of
PostgreSQL and/or the server.

What do you think?

Regards and best wishes,

Justin Clift


"Tille, Andreas" wrote:
>
> On Thu, 20 Sep 2001, Herbert Liechti wrote:
>
> > I tried it. See my actions below. The main performance boost is
> > reached by creating an index and disabling the sequential scan:
> Thanks.  I tried this and it helps in dead (see below).
>
> > -------------------------------------------------------
> > create index ix_1 on hauptdaten_fall(meldekategorie);
> I did so before for in other tests.
> > -------------------------------------------------------
> >
> > Same statement
> >
> > real    0m18.259s
> > user    0m0.020s
> > sys     0m0.010s
> The same on my machine:
>
> real    0m18.128s
> user    0m0.070s
> sys     0m0.010s
>
> > -------------------------------------------------------
> > now disable seqscan:
> > -------------------------------------------------------
> > time psql tt <<END
> > set enable_seqscan = off;
> > SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
> > FROM Hauptdaten_Fall WHERE Hauptdaten_Fall.IstAktuell = 20 GROUP BY
> > Hauptdaten_Fall.MeldeKategorie ORDER BY Hauptdaten_Fall.MeldeKategorie;
> > END
> >
> >
> > real    0m3.701s
> > user    0m0.010s
> > sys     0m0.000s
>
> real    0m5.905s
> user    0m0.060s
> sys     0m0.030s
>
> I have no real explanation why I have the same result in the first
> case but significant more time for the second. but it helps for the
> first step.
>
> On the other hand if I test my *real* database:
>
> real    0m20.539s
> user    0m0.060s
> sys     0m0.060s
>
> and with "set enable_seqscan = off;"
>
> real    0m10.133s
> user    0m0.040s
> sys     0m0.020s
>
> I get in fact an increase of speed by factor 2, but anyway it is even
> far to slow for our application.  If I start a slightly more complex
> query (not to mention that we are far from the amount of data we will
> get after a year, I get the following stats:
>
>   -- default --         set enable_seqscan = off;         MS-SQL server
> real    0m30.891s         real    0m27.165s               about 1s
> user    0m0.050s          user    0m0.080s
> sys     0m0.070s          sys     0m0.050s
>
> <other query example>
>
> real    0m53.698s         real    0m54.481s               about 2.5s
> user    0m0.190s          user    0m0.180s
> sys     0m0.040s          sys     0m0.040s
>
> This is about factor 20 compared to the MS-SQL server and I have
> real hard arguing for PostgreSQL.  In fact the MS-SQL server times
> are estimated from inside Access - the plain server would be even
> faster.
>
> By the way - the last example shows that enforcing index scan don´t
> necessarily makes the thing faster - perhaps it could even slow down
> for other queries??
>
> I would be happy to forward the exact queries which lead to this
> measures if someone is interested.
>
> Kind regards
>
>          Andreas.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
     - Indira Gandhi

Re: Performance question (stripped down the problem)

From
"Tille, Andreas"
Date:
On Thu, 20 Sep 2001, Einar Karttunen asked me for query plans for
both M$ SQL and postgresql:

M$ SQL:

  |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1005])))
       |--Stream Aggregate(GROUP BY:([Hauptdaten_Fall].[MeldeKategorie])
DEFINE:([Expr1005]=Count(*)))
            |--Index
Scan(OBJECT:([IfSG].[dbo].[Hauptdaten_Fall].[IX_MeldeKategorie]),
ORDERED FORWARD)


Postgresql:

time psql ifsg <<...
explain
SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
FROM Hauptdaten_Fall
WHERE (((Hauptdaten_Fall.IstAktuell)=20))
GROUP BY Hauptdaten_Fall.MeldeKategorie
ORDER BY Hauptdaten_Fall.MeldeKategorie;
...


NOTICE:  QUERY PLAN:

Aggregate  (cost=32881.62..33768.91 rows=17746 width=16)
  ->  Group  (cost=32881.62..33325.27 rows=177458 width=16)
        ->  Sort  (cost=32881.62..32881.62 rows=177458 width=16)
              ->  Seq Scan on hauptdaten_fall  (cost=0.00..15024.12 rows=177458 width=16)


real    0m1.382s
user    0m0.040s
sys     0m0.020s


And the other case with enforcing index scan:

time psql ifsg <<...
set enable_seqscan = off;
explain
SELECT Hauptdaten_Fall.MeldeKategorie, Count(Hauptdaten_Fall.ID) AS Anz
FROM Hauptdaten_Fall
WHERE (((Hauptdaten_Fall.IstAktuell)=20))
GROUP BY Hauptdaten_Fall.MeldeKategorie
ORDER BY Hauptdaten_Fall.MeldeKategorie;
...


NOTICE:  QUERY PLAN:

Aggregate  (cost=0.00..146770.97 rows=17746 width=16)
  ->  Group  (cost=0.00..146327.32 rows=177458 width=16)
        ->  Index Scan using ix_meldekategorie_hauptdaten_fa on hauptdaten_fall  (cost=0.00..145883.68 rows=177458
width=16)


real    0m0.102s  (for sure it´s faster to have a plan if enforced ...)
user    0m0.030s
sys     0m0.020s


Does this help in any way?  If I´m not completely wrong also M$ SQL
server prefers to use the index ix_meldekategorie.

Kind regards

       Andreas.

Re: Performance question (stripped down the problem)

From
Justin Clift
Date:
Hi Andreas,

I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and
running Linux Mandrake 8.0

First thing I did was to increase the amount of shared memory and stuff
which Linux allows things to use :

echo "kernel.shmall = 134217728" >> /etc/sysctl.conf
echo "kernel.shmmax = 134217728" >> /etc/sysctl.conf

For my system, that'll raise the shared memory limits to 128MB at system
boot time.

btw, the "134217728" figure = 128MB  (128 * 1024 * 1024)

Then I changed the limits for the running system (so no reboot is
necessary) :

echo 134217728 > /proc/sys/kernel/shmall
echo 134217728 > /proc/sys/kernel/shmmax

Then adjusted the postgresql.conf file with these values :

sort_mem = 32768
shared_buffers = 220

Now, that's a bunch of shared_buffers, but at the same time I also
raised the max_connections to 110.

This seems to have dropped my execution times, but I haven't seriously
gotten around to tuning this system.

The key thing I think you've missed is to update the shared memory,
etc.  More info about it can be found at :

http://www.postgresql.org/idocs/index.php?kernel-resources.html

Bruce Momjian also put together some information about optimising things
with PostgreSQL at :

http://www.ca.postgresql.org/docs/hw_performance/

If you want to be able to benchmark things on your system, I use the
"Open Source Database Benchmark" (Linux only at present), running the
latest CVS version of it, and also tweaked to not use hash indices.  A
tarball of working source code is available at :

http://techdocs.postgresql.org/techdocs/perftuningfigures.php

Hope this is of assistance Andreas.

Regards and best wishes,

Justin Clift


"Tille, Andreas" wrote:
>
> On Thu, 20 Sep 2001, Justin Clift wrote:
>
> > Sorry, I haven't seen the history of this thread.  One question which
> > might be relevant is, have you adjusted the postgresql.conf file from
> > the default memory settings to be something better?
> I adjusted two parameters:
>
> shared_buffers = 2048
>  (When I tried 4096 I´ve got a connection error. Don´t know what this
>   means, but anyway increasing of this value did not changed anything.)
>
> sort_mem = 2048
>  (After increasing this value (from default 512) to 1024 I got an
>   increase in speed from 20s to 18s - not much but better than nothing.
>   Further increase to 2048 did not change anything further so I stopped
>   here.)
>
> > If these are the times you're getting from a default configuration, you
> > might be able to get far better results by doing performance tuning of
> > PostgreSQL and/or the server.
> Any other values which might help here?
>
> Kind regards
>
>        Andreas.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
     - Indira Gandhi

Hardware tuning (Was: Performance question)

From
"Tille, Andreas"
Date:
On Fri, 21 Sep 2001, Justin Clift wrote:

> Hi Andreas,
>
> I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and
> running Linux Mandrake 8.0
>
> First thing I did was to increase the amount of shared memory and stuff
> which Linux allows things to use :
>
> echo "kernel.shmall = 134217728" >> /etc/sysctl.conf
> echo "kernel.shmmax = 134217728" >> /etc/sysctl.conf
>
> For my system, that'll raise the shared memory limits to 128MB at system
> boot time.
>
> btw, the "134217728" figure = 128MB  (128 * 1024 * 1024)
>
> Then I changed the limits for the running system (so no reboot is
> necessary) :
>
> echo 134217728 > /proc/sys/kernel/shmall
> echo 134217728 > /proc/sys/kernel/shmmax
>
> Then adjusted the postgresql.conf file with these values :
>
> sort_mem = 32768
> shared_buffers = 220
>
> Now, that's a bunch of shared_buffers, but at the same time I also
> raised the max_connections to 110.
220 is much less than I have set before I posted my stats yesterday.
I have set it to 2048.  But adjusting
   kernel.shmall = 134217728
   kernel.shmmax = 134217728
gave me a speed up by nearly factor 2!  That could be a nice start
for further increasing of memory. (Well, that machine has 2GB ;-) ... )

> This seems to have dropped my execution times, but I haven't seriously
> gotten around to tuning this system.
So yes, it has dropped my execution times from 20 times slower than
MS-SQL to 10 times slower, i.e. I have to continue tuning my setup.

> The key thing I think you've missed is to update the shared memory,
> etc.  More info about it can be found at :
>
> http://www.postgresql.org/idocs/index.php?kernel-resources.html
>
> Bruce Momjian also put together some information about optimising things
> with PostgreSQL at :
>
> http://www.ca.postgresql.org/docs/hw_performance/
I´ve read both documents now and see no other parameter to adjust than
shared_buffers.  I have to admit that I´m not really sure if this
parameter is responsible for the term "cache size" on page
   http://www.ca.postgresql.org/docs/hw_performance/node8.html
(I would consider it to be helpfull if the relevant parameter would
be mentioned in the text, Bruce.)

I just post the parameters I changed on my system and the results I´ve got:

/etc/sysctl.conf
  kernel.shmall = 134217728
  kernel.shmmax = 134217728
  fs.file-max = 16384

/etc/postgresql/postgresql.conf:
  max_connections = 256
  shared_buffers  = 2048
  sort_mem        = 32768

This setup gave me a speed increase from 56s to 33s for a certain query
(the one which took the M$-SQL server 2.5s).

Now I tried to increase
  sort_mem        = 32768
  shared_buffers  = 4096

and got no real speed difference but I noticed an improved memory usage by
top.  So I continued increasing shared_buffers by doubling the size step
by step.  To enable a certain amount of shared_buffers I also had to adjust
kernel.shmall and kernel.shmmax (I got errors otherwise).

Here I post some parameter settings and corresponding memory usage
measured by top and times for the query:

kernel.shmall = 536870912
kernel.shmmax = 536870912

shared_buffers = 32768

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  769 postgres  16   0 78372  76M 52916 R    99.9  7.6   0:23 postmaster

real    0m33.591s
user    0m0.190s
sys     0m0.040s


kernel.shmall = 1073741824
kernel.shmmax = 1073741824

shared_buffers = 65536

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
  865 postgres  17   0 80332  78M 54836 R    99.9  7.7   0:20 postmaster

real    0m32.861s
user    0m0.200s
sys     0m0.010s


kernel.shmall = 2147483648
kernel.shmmax = 2147483648

shared_buffers = 131072

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1172 postgres  18   0 86572  84M 60748 R    99.9  8.3   0:22 postmaster

7.1.3: (I also tried PostgreSQL 7.1.3 all other values for 7.1.2)
 2644 postgres  17   0 87088  84M 61264 R    99.9  8.4   0:29 postmaster

set enable_seqscan = off;  (quite the same speed but other mem-usage)
 1205 postgres  18   0 85500  83M 59676 R    99.9  8.2   0:22 postmaster

7.1.3:
 2631 postgres  15   0 81972  79M 56148 R    99.9  7.9   0:28 postmaster

real    0m32.835s
user    0m0.210s
sys     0m0.050s

I noticed no real difference in speed in all this tests but I observed
an increased need of memory usage.  There was no difference in query
speed if I enabled or disabled index scan and between PostgreSQL version
7.1.2 and 7.1.3.

Furthermore I wonder about the following fact:  I see no real difference
in speed if I start the query immediately after restarting postmaster
and redoing the same query.  In my opinion the first query should fill
the relevant tables into memory cache which should take some time but the
second query should be faster because the cache is just filled.

So I wonder if it makes sense if I continue increasing those values
until I observe this difference or if I don´t see any increase in
memory usage by top.  I think I could spend some more memory on this
task currently because it is less than 10% memory usage and there
is no swap at all on the machine.

# vmstat
   procs                      memory    swap          io     system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
 1  0  0      0 176216  21536 1576744   0   0     1     3    2     7   0   0  24

> If you want to be abye to benchmark things on your system, I use the
> "Open Source Database Benchmark" (Linux only at present), running the
> latest CVS version of it, and also tweaked to not use hash indices.  A
> tarball of working source code is available at :
>
> http://techdocs.postgresql.org/techdocs/perftuningfigures.php
Thanks.  I don´t want to do *any* benchmark.  Only my application is
relevant even if PostgreSQL outperforms other databases in any
benchmark.

> Hope this is of assistance Andreas.
It was of assistance for sure and I hope further tuning brings me near
the target. Thanks Justin.

Kind regards

        Andreas.

Re: Performance question (stripped down the problem)

From
Mark kirkwood
Date:
Hi Andreas,

Interesting problem  - Like Justin I am running Postgres 7.1.3 + Mandrake 8.

Your query :

SELECT
  hauptdaten_fall.meldekategorie,
  count(hauptdaten_fall.id) AS Anz
FROM hauptdaten_fall
WHERE (((hauptdaten_fall.istaktuell)=20))
GROUP BY hauptdaten_fall.meldekategorie
ORDER BY hauptdaten_fall.meldekategorie

takes 19 s on my old hardware ( P2 333Mhz + 384Mb + 2x20Gb ATA)

I can get some improvement by making sort_mem=20480 :

The query then takes 8s - still a bit slow  - Mysql does it in 2 s  :-(
Increasing sort_mem>20480 did not give any further improvement.

Creating indexes, clustering,fooling about with enable_seqscan +
cpu_tuple_cost did not help at all.

As noted by others it appears that data access is not the issue - as the
query :

SELECT
  count(hauptdaten_fall.id) AS Anz
FROM hauptdaten_fall
WHERE (((hauptdaten_fall.istaktuell)=20))

takes only 2 s - so I am guessing that the 8s result is about as good as can
be gotten without delving into the PG code for GROUP BY access.

A slightly complex workaround for better performance is to use a summary
table  :

CREATE TABLE hauptdaten_fall_sum(meldekategorie varchar(10),
                                                     istaktuell integer,
                                                     cnt integer)

and maintain it via triggers on hauptdaten_fall

The offending query then becomes :

SELECT meldekategorie,cnt
FROM hauptdaten_fall_sum
WHERE istaktuell=20;

which is unmeasurably fast ( i.e 0 s ) on my system.


Hope it helps or gives food for thought

regards

Mark

Re: Hardware tuning (Was: Performance question)

From
Justin Clift
Date:
Hi Andreas,

Good to hear this has been of benefit.  From reading your email, you
haven't altered the value of sort_mem, just shared_buffers.

It might be worthwhile checking things out with sort_mem at different
levels too before homing in on the "best" value(s) for your application
& setup.

:-)

Regards and best wisehs,

Justin Clift


"Tille, Andreas" wrote:
>
> On Fri, 21 Sep 2001, Justin Clift wrote:
>
> > Hi Andreas,
> >
> > I'm running PostgreSQL 7.1.3 here on a PC with nearly a gig of ram, and
> > running Linux Mandrake 8.0
> >
> > First thing I did was to increase the amount of shared memory and stuff
> > which Linux allows things to use :
> >
> > echo "kernel.shmall = 134217728" >> /etc/sysctl.conf
> > echo "kernel.shmmax = 134217728" >> /etc/sysctl.conf
> >
> > For my system, that'll raise the shared memory limits to 128MB at system
> > boot time.
> >
> > btw, the "134217728" figure = 128MB  (128 * 1024 * 1024)
> >
> > Then I changed the limits for the running system (so no reboot is
> > necessary) :
> >
> > echo 134217728 > /proc/sys/kernel/shmall
> > echo 134217728 > /proc/sys/kernel/shmmax
> >
> > Then adjusted the postgresql.conf file with these values :
> >
> > sort_mem = 32768
> > shared_buffers = 220
> >
> > Now, that's a bunch of shared_buffers, but at the same time I also
> > raised the max_connections to 110.
> 220 is much less than I have set before I posted my stats yesterday.
> I have set it to 2048.  But adjusting
>    kernel.shmall = 134217728
>    kernel.shmmax = 134217728
> gave me a speed up by nearly factor 2!  That could be a nice start
> for further increasing of memory. (Well, that machine has 2GB ;-) ... )
>
> > This seems to have dropped my execution times, but I haven't seriously
> > gotten around to tuning this system.
> So yes, it has dropped my execution times from 20 times slower than
> MS-SQL to 10 times slower, i.e. I have to continue tuning my setup.
>
> > The key thing I think you've missed is to update the shared memory,
> > etc.  More info about it can be found at :
> >
> > http://www.postgresql.org/idocs/index.php?kernel-resources.html
> >
> > Bruce Momjian also put together some information about optimising things
> > with PostgreSQL at :
> >
> > http://www.ca.postgresql.org/docs/hw_performance/
> I´ve read both documents now and see no other parameter to adjust than
> shared_buffers.  I have to admit that I´m not really sure if this
> parameter is responsible for the term "cache size" on page
>    http://www.ca.postgresql.org/docs/hw_performance/node8.html
> (I would consider it to be helpfull if the relevant parameter would
> be mentioned in the text, Bruce.)
>
> I just post the parameters I changed on my system and the results I´ve got:
>
> /etc/sysctl.conf
>   kernel.shmall = 134217728
>   kernel.shmmax = 134217728
>   fs.file-max = 16384
>
> /etc/postgresql/postgresql.conf:
>   max_connections = 256
>   shared_buffers  = 2048
>   sort_mem        = 32768
>
> This setup gave me a speed increase from 56s to 33s for a certain query
> (the one which took the M$-SQL server 2.5s).
>
> Now I tried to increase
>   sort_mem        = 32768
>   shared_buffers  = 4096
>
> and got no real speed difference but I noticed an improved memory usage by
> top.  So I continued increasing shared_buffers by doubling the size step
> by step.  To enable a certain amount of shared_buffers I also had to adjust
> kernel.shmall and kernel.shmmax (I got errors otherwise).
>
> Here I post some parameter settings and corresponding memory usage
> measured by top and times for the query:
>
> kernel.shmall = 536870912
> kernel.shmmax = 536870912
>
> shared_buffers = 32768
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>   769 postgres  16   0 78372  76M 52916 R    99.9  7.6   0:23 postmaster
>
> real    0m33.591s
> user    0m0.190s
> sys     0m0.040s
>
> kernel.shmall = 1073741824
> kernel.shmmax = 1073741824
>
> shared_buffers = 65536
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>   865 postgres  17   0 80332  78M 54836 R    99.9  7.7   0:20 postmaster
>
> real    0m32.861s
> user    0m0.200s
> sys     0m0.010s
>
> kernel.shmall = 2147483648
> kernel.shmmax = 2147483648
>
> shared_buffers = 131072
>
>   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
>  1172 postgres  18   0 86572  84M 60748 R    99.9  8.3   0:22 postmaster
>
> 7.1.3: (I also tried PostgreSQL 7.1.3 all other values for 7.1.2)
>  2644 postgres  17   0 87088  84M 61264 R    99.9  8.4   0:29 postmaster
>
> set enable_seqscan = off;  (quite the same speed but other mem-usage)
>  1205 postgres  18   0 85500  83M 59676 R    99.9  8.2   0:22 postmaster
>
> 7.1.3:
>  2631 postgres  15   0 81972  79M 56148 R    99.9  7.9   0:28 postmaster
>
> real    0m32.835s
> user    0m0.210s
> sys     0m0.050s
>
> I noticed no real difference in speed in all this tests but I observed
> an increased need of memory usage.  There was no difference in query
> speed if I enabled or disabled index scan and between PostgreSQL version
> 7.1.2 and 7.1.3.
>
> Furthermore I wonder about the following fact:  I see no real difference
> in speed if I start the query immediately after restarting postmaster
> and redoing the same query.  In my opinion the first query should fill
> the relevant tables into memory cache which should take some time but the
> second query should be faster because the cache is just filled.
>
> So I wonder if it makes sense if I continue increasing those values
> until I observe this difference or if I don´t see any increase in
> memory usage by top.  I think I could spend some more memory on this
> task currently because it is less than 10% memory usage and there
> is no swap at all on the machine.
>
> # vmstat
>    procs                      memory    swap          io     system         cpu
>  r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us  sy  id
>  1  0  0      0 176216  21536 1576744   0   0     1     3    2     7   0   0  24
>
> > If you want to be abye to benchmark things on your system, I use the
> > "Open Source Database Benchmark" (Linux only at present), running the
> > latest CVS version of it, and also tweaked to not use hash indices.  A
> > tarball of working source code is available at :
> >
> > http://techdocs.postgresql.org/techdocs/perftuningfigures.php
> Thanks.  I don´t want to do *any* benchmark.  Only my application is
> relevant even if PostgreSQL outperforms other databases in any
> benchmark.
>
> > Hope this is of assistance Andreas.
> It was of assistance for sure and I hope further tuning brings me near
> the target. Thanks Justin.
>
> Kind regards
>
>         Andreas.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
     - Indira Gandhi

Re: Hardware tuning (Was: Performance question)

From
"Tille, Andreas"
Date:
On Sat, 22 Sep 2001, Justin Clift wrote:

> Good to hear this has been of benefit.  From reading your email, you
> haven't altered the value of sort_mem, just shared_buffers.
May be I was a little bit unclear.  I stopped increasing sort_mem after
there was no siginificant increase in speed.  My latest value was
   sort_mem = 65536
which is higher than any suggestion I´ve got here.
I observed no increase in speed against
   sort_mem = 32768
and
   sort_mem = 131072

> It might be worthwhile checking things out with sort_mem at different
> levels too before homing in on the "best" value(s) for your application
> & setup.
I observed the only difference when I increased shared_buffers.  This
difference was not significant regarding speed but memory usage (measured
by top).
Today I observed that I seemed to reach the border for simply doubling
the shared_buffers value.  If I set shared_buffers = 262144 and double
the kernel.shmall and kernel.shmmax values in /etc/sysctl.conf postmaster
says:

psql: connectDBStart() -- connect() failed: No such file or directory
        Is the postmaster running locally
        and accepting connections on Unix socket '/var/run/postgresql/.s.PGSQL.5432'?

(this is the same error I´ve got when I increased shared_buffers without
adjusting sysctl parameters.)

So my question remains:  Why isn´t the databased cached if there is
enough space to do so?  I think it is not cached because there is
no speed difference in doing the first query after Server restart and
repeating the same query.  Moreover top shows an continual increase
of memory usage over a longer query.  If the database would be cached
the memory usage should be high also at the start of the query (IMHO).

Kind regards

         Andreas.

Re: Performance question (stripped down the problem)

From
"Tille, Andreas"
Date:
On Sat, 22 Sep 2001, Mark kirkwood wrote:

> Interesting problem
:)  Hope we can find also an interesting solution ;-).

> takes 19 s on my old hardware ( P2 333Mhz + 384Mb + 2x20Gb ATA)
>
> I can get some improvement by making sort_mem=20480 :
>
> The query then takes 8s - still a bit slow  - Mysql does it in 2 s  :-(
> Increasing sort_mem>20480 did not give any further improvement.
  sort_mem = 65536
This gave no further improvement over sort_mem = 32768 and my box
takes about 5s for this task, which is more than one order of magnitude
higher than M$-SQL server.

> As noted by others it appears that data access is not the issue - as the
> query :
>
> SELECT
>   count(hauptdaten_fall.id) AS Anz
> FROM hauptdaten_fall
> WHERE (((hauptdaten_fall.istaktuell)=20))
>
> takes only 2 s - so I am guessing that the 8s result is about as good as can
> be gotten without delving into the PG code for GROUP BY access.
I did the same thing some days ago.  Even if the GROUP BY is detected as
the reason for the biggest part of slowing down - the M$-SQL server solves
the whole task faster than PostgreSQL is doing this simple select.

> A slightly complex workaround for better performance is to use a summary
> table  :
>
> CREATE TABLE hauptdaten_fall_sum(meldekategorie varchar(10),
>                                                      istaktuell integer,
>                                                      cnt integer)
>
> and maintain it via triggers on hauptdaten_fall
>
> The offending query then becomes :
>
> SELECT meldekategorie,cnt
> FROM hauptdaten_fall_sum
> WHERE istaktuell=20;
>
> which is unmeasurably fast ( i.e 0 s ) on my system.
Perhaps I have to think about such workarounds.  The problem is that this
was just the simplest example of several much more complex ones.  This would
mean I have to rework each query and have to do several speed tests.  This
makes the port very hard and perhaps the intended replication setup
impossible.

Kind regards

       Andreas.

Re: Performance question (stripped down the problem)

From
gravity
Date:
On Thu, Sep 20, 2001 at 11:10:02AM +0200, Herbert Liechti wrote:
> I tried it. See my actions below. The main performance boost is
> reached by creating an index and disabling the sequential scan:
> Without any index;
> real    0m18.128s
> user    0m0.010s
> sys     0m0.010s
>
> Same statement with index
> real    0m18.259s
> user    0m0.020s
> sys     0m0.010s
> no difference
>
> now disable seqscan:
> time psql tt <<END
> set enable_seqscan = off;
> real    0m3.701s
> user    0m0.010s
> sys     0m0.000s


same here (dual PIII-866, Debian, 512 MB, raid1+0)

real    0m6.472s
user    0m0.000s
sys     0m0.010s

real    0m6.195s
user    0m0.010s
sys     0m0.000s

real    0m2.885s
user    0m0.010s
sys     0m0.000s


tinus

Re: Performance question (stripped down the problem)

From
"Steve Wolfe"
Date:
> same here (dual PIII-866, Debian, 512 MB, raid1+0)
>
> real    0m6.472s
> user    0m0.000s
> sys     0m0.010s
>
> real    0m6.195s
> user    0m0.010s
> sys     0m0.000s
>
> real    0m2.885s
> user    0m0.010s
> sys     0m0.000s

  This is interesting, just yesterday I was perusing some of Bruce
Momjian's works on PG tuning, and noticed that Postgres prefers sequential
scans over indexes when much of the table has to be read, all because of
the number of head movements on the disk.   It would seem that these days,
where RAM is cheap, that most people have a great enough disk cache that
head movements can become irrelevant.

  However, I can also see where some people may have incredibly large
tables that just won't fit into RAM.  An easy solution to both might be to
create a user-specifiable switch passed at startup that would simply tell
PG that sequentials aren't necessarily better than index scans.  Not
completely disabling them, but at least giving it a pointer that it
doesn't *have* to use sequentials.

steve



Re: Performance question (stripped down the problem)

From
Martijn van Oosterhout
Date:
On Thu, Sep 27, 2001 at 11:18:31AM -0600, Steve Wolfe wrote:
>   This is interesting, just yesterday I was perusing some of Bruce
> Momjian's works on PG tuning, and noticed that Postgres prefers sequential
> scans over indexes when much of the table has to be read, all because of
> the number of head movements on the disk.   It would seem that these days,
> where RAM is cheap, that most people have a great enough disk cache that
> head movements can become irrelevant.
>
>   However, I can also see where some people may have incredibly large
> tables that just won't fit into RAM.  An easy solution to both might be to
> create a user-specifiable switch passed at startup that would simply tell
> PG that sequentials aren't necessarily better than index scans.  Not
> completely disabling them, but at least giving it a pointer that it
> doesn't *have* to use sequentials.

There is a user specifieable value somewhere that controls how expensive an
index scan is and how expensive a seqential scan is. By tuning those you
could probably get the effect you want.

--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

Re: Performance question (stripped down the problem)

From
"Steve Wolfe"
Date:
> There is a user specifieable value somewhere that controls how expensive
an
> index scan is and how expensive a seqential scan is. By tuning those you
> could probably get the effect you want.

  Hmmm.... nice.  Can anybody point me to that value?

steve



Re: Performance question (stripped down the problem)

From
Martijn van Oosterhout
Date:
On Fri, Sep 28, 2001 at 10:25:13AM -0600, Steve Wolfe wrote:
> > There is a user specifieable value somewhere that controls how expensive
> an
> > index scan is and how expensive a seqential scan is. By tuning those you
> > could probably get the effect you want.
>
>   Hmmm.... nice.  Can anybody point me to that value?

In the documentation, section 3.4.1 here.

PostgreSQL 7.1.3 Administrator's Guide > Server Runtime Environment >
   Planner and Optimizer Tuning

There's a whole bunch of tunable variables. It does say that there is not
much idea of what values are good so any experiments would be helpful.

HTH,
--
Martijn van Oosterhout <kleptog@svana.org>
http://svana.org/kleptog/
> Magnetism, electricity and motion are like a three-for-two special offer:
> if you have two of them, the third one comes free.

Re: Performance question (stripped down the problem)

From
"Tille, Andreas"
Date:
On Thu, 27 Sep 2001, Steve Wolfe wrote:

>   This is interesting, just yesterday I was perusing some of Bruce
> Momjian's works on PG tuning, and noticed that Postgres prefers sequential
> scans over indexes when much of the table has to be read, all because of
> the number of head movements on the disk.   It would seem that these days,
> where RAM is cheap, that most people have a great enough disk cache that
> head movements can become irrelevant.
>
>   However, I can also see where some people may have incredibly large
> tables that just won't fit into RAM.  An easy solution to both might be to
> create a user-specifiable switch passed at startup that would simply tell
> PG that sequentials aren't necessarily better than index scans.  Not
> completely disabling them, but at least giving it a pointer that it
> doesn't *have* to use sequentials.
The problem is that *both* methods are to slow for my application :-(.

Kind regards

         Andreas.