My server is oddly very slow - Mailing list pgsql-admin

From adrien ducos
Subject My server is oddly very slow
Date
Msg-id 4DC17AAB.1080206@hbs-research.com
Whole thread Raw
Responses Re: My server is oddly very slow
Prepared statments: partial indexes are avoided!
List pgsql-admin
I have 2 servers 1 for development (called dev) 1 for production (called
prod). The server for development is very basic  (a dual core 2.1ghtz on
a basic 300MB hard drive with 2GB of memory (this server is not
dedicated to postgres, it also uses php, mapserver and so on).

I assigned the memory as follow
max co 40
shared buffers 258MB
work_mem 8MB
maitainance_work_mem: 32MB
wal_buffers 8MB
check_point_seg 16
autovacuum on
effective-cache-size 744MB

Dev is installed on centos 4 32bits (on a virtual server openvz)

My production server is a QUAD CORE XEON E5310 (1,6Ghtz 3GB of RAM and
the hard drive is a very fast SAN with fiberchanel: the copy of files on
the SAN is about 3 times faster than on the development hard drive. This
is a fully dedicated server for postgres.

I assigned the memory as follow
max co 40
shared buffers 512MB
work_mem 16MB
maitainance_work_mem: 128MB
wal_buffers 8MB
check_point_seg 16
autovacuum on
effective-cache-size 1536MB

prod is installed on Red Hat Enterprise Linux ES release 4 (Nahant
Update 6) 32bits (not on a virtual server)
Those 2 distribution are very similar, and I use the same proprams RPM
on both systems

The version of both databases is postgres 8.4.1, I installed the same
database (in fact prod comes form a dump of dev once every week) The
dump is very big about 16GB compressed. Once it is dumped, I make a full
analyze on the database before I use it.

Now according to the hardware, I would suspect dev should be slower than
prod, but almose all queries are slower on prod, some times 10 times slower.

Today I made a query with an explain analyse (same query on both
servers) and I got for dev:


   1.
      "Limit  (cost=396689.94..399801.92 rows=2000 width=512) (actual
      time=8696.467..10028.179 rows=2000 loops=1)"
   2.
      "  ->  GroupAggregate  (cost=396689.94..424470.56 rows=17854
      width=512) (actual time=8696.463..10024.632 rows=2000 loops=1)"
   3.
      "        Filter: (sum(effectif_max) >= 10)"
   4.
      "        ->  Sort  (cost=396689.94..397088.34 rows=159360
      width=512) (actual time=8693.396..8745.436 rows=8344 loops=1)"
   5.
      "              Sort Key: implantation_company_id, virtual_building_id"
   6.
      "              Sort Method:  external merge  Disk: 125928kB"
   7.
      "              ->  Bitmap Heap Scan on gen_establishment_search
       (cost=8156.66..346425.20 rows=159360 width=512) (actual
      time=2545.813..6688.078 rows=152200 loops=1)"
   8.
      "                    Recheck Cond: (gis_departement_id =
      '75'::bpchar)"
   9.
      "                    Filter: (((telephone IS NOT NULL) OR
      (telephone_siege IS NOT NULL)) AND (NOT etablissement_masque) AND
      (ref_zone_permission_id = ANY ('{2,1}'::integer[])) AND
      (created_by_user_group = 1) AND (ref_establishment_type_id = ANY
      ('{1,2,3}'::integer[])))"
  10.
      "                    ->  Bitmap Index Scan on
      gen_establishment_search_gis_departement_id  (cost=0.00..8116.82
      rows=498680 width=0) (actual time=2435.952..2435.952 rows=502304
      loops=1)"
  11.
      "                          Index Cond: (gis_departement_id =
      '75'::bpchar)"
  12.
      "Total runtime: 10109.194 ms"


-> so 10 seconds






and for prod

   1.
      Limit  (cost=397787.02..400793.19 rows=2000 width=513) (actual
      time=94188.493..97457.042 rows=2000 loops=1)
   2.
         ->  GroupAggregate  (cost=397787.02..425288.95 rows=18297
      width=513) (actual time=94188.489..97452.268 rows=2000 loops=1)
   3.
               Filter: (sum(effectif_max) >= 10)
   4.
               ->  Sort  (cost=397787.02..398179.69 rows=157069
      width=513) (actual time=94187.660..94530.617 rows=8312 loops=1)
   5.
                     Sort Key: implantation_company_id, virtual_building_id
   6.
                     Sort Method:  external merge  Disk: 124192kB
   7.
                     ->  Bitmap Heap Scan on gen_establishment_search
       (cost=8231.64..347992.15 rows=157069 width=513) (actual
      time=836.994..91818.746 rows=149851 loops=1)
   8.
                           Recheck Cond: (gis_departement_id = '75'::bpchar)
   9.
                           Filter: (((telephone IS NOT NULL) OR
      (telephone_siege IS NOT NULL)) AND (NOT etablissement_masque) AND
      (ref_zone_permission_id = ANY ('{2,1}'::int
  10.
      eger[])) AND (created_by_user_group = 1) AND
      (ref_establishment_type_id = ANY ('{1,2,3}'::integer[])))
  11.
                           ->  Bitmap Index Scan on
      gen_establishment_search_gis_departement_id  (cost=0.00..8192.37
      rows=502886 width=0) (actual time=699.530..699.530 rows=
  12.
      507213 loops=1)
  13.
                                 Index Cond: (gis_departement_id =
      '75'::bpchar)
  14.
       Total runtime: 97496.574 ms


-> 90 seconds

for some other queries is is worst. None of the queries I have tested
are faster on prod.

So I checked the memory on prod during my query execution:
vmstat
procs -----------memory---------- ---swap-- -----io---- --system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy
id wa
 0  1   1280  23320   6356 2975956    0    0    29    45    1     0  2
0 95  3

The swap is not used but something is a bit odd: the cache is using 98%
of the RAM

if I do
SHOW shared_buffers;
I get

shared_buffers
----------------
512MB


as I would expect from my configuration.


I've got out of idea, any idea?


--

Logo_HBS_mail.jpg
   Adrien DUCOS
   Analyste développeur
   aducos@hbs-research.com <mailto:aducos@hbs-research.com>
   www.hbs-research.com <http://www.hbs-research.com/>
   +33 (0)9 70 44 64 10
   24 rue de l'Est
   75020 Paris




pgsql-admin by date:

Previous
From: Tobias Schneider
Date:
Subject: Authentification via Kerberos with ProxyTicket
Next
From: Mark Stosberg
Date:
Subject: Re: best practice for moving millions of rows to child table when setting up partitioning?