Re: "Mysterious" issues with newly installed 8.3 - Mailing list pgsql-performance

From Scott Carey
Subject Re: "Mysterious" issues with newly installed 8.3
Date
Msg-id a1ec7d000810091734k60cee7d7qb95470936ad461c1@mail.gmail.com
Whole thread Raw
In response to "Mysterious" issues with newly installed 8.3  (Carlos Moreno <morenopg@mochima.com>)
Responses Re: "Mysterious" issues with newly installed 8.3  (Craig James <craig_james@emolecules.com>)
Re: "Mysterious" issues with newly installed 8.3  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-performance
On Thu, Oct 9, 2008 at 4:51 PM, Carlos Moreno <morenopg@mochima.com> wrote:

Ok, I know that such an open and vague question like this one
is...  well, open and vague...  But still.

The short story:

Just finished an 8.3.4 installation on a new machine, to replace
an existing one;  the new machine is superior (i.e., higher
performance) in virtually every way --- twice as much memory,
faster processor, faster drives, etc.

I made an exact copy of the existing database on the new
machine, and the exact same queries run on both reveal that
the old machine beats the new one by a factor of close to 2 !!!!
(i.e., the same queries run close to twice as fast on the old
machine!!!)

To make things worse:  the old machine is in operation, under
normal workload  (and right now the system may be around
peak time), and the new machine is there sitting doing nothing;
just one user logged in using psql to run the queries --- *no-one
and nothing* is connecting to the new server.

So... What's going on???


The details:

CPU:
New: Opteron DC 1218HE  (1MB cache per core) @2.6GHz
Old:  Athlon64 X2  (512K cache per core)  @2.2GHz

RAM:
New:  4GB
Old:   2GB

HD:
Doesn't matter the capacity, but I have every reason to believe
the new one is faster --- hdparm reports 105MB/sec transfer
rate;  the measurement for the old server is meaningless, since
it is in operation  (i.e., there is actual database activity), so it
measures between 50MB/sec and 70MB/sec.  Given its age, I
would estimate 70 to 80 MB/sec

OS:
New:  CentOS 5.2  (gcc 4.1.2)
Old:  FC6  (gcc 4.1.2)

PG:
New:  8.3.4 installed from source
Old:   8.2.4 installed from source

Presumably relevant configuration parameters --- shared_buffers
was set to 250MB on the old one;  I set it to 500MB on the new
one  (kinda makes sense, no?  1/8 of the physical memory in both
cases).

I set max_fsm_pages a little bit higher on the new one (409600
instead of 307200 on the old one).  The rest is pretty much
identical  (except for the autovacuum --- I left the defaults in the
new one)


The old machine is vacuum-analyzed once a day  (around 4AM);
on the new one, I ran a vacuumdb -z -f after populating it.


Some interesting outputs:

explain analyze select count(*) from users;
New:
 Aggregate  (cost=8507.11..8507.12 rows=1 width=0) (actual
time=867.582..867.584 rows=1 loops=1)
  ->  Seq Scan on users  (cost=0.00..7964.49 rows=217049 width=0)
(actual time=0.016..450.560 rows=217049 loops=1)
 Total runtime: 867.744 ms

Old:
 Aggregate  (cost=17171.22..17171.22 rows=1 width=0) (actual
time=559.475..559.476 rows=1 loops=1)
  ->  Seq Scan on users  (cost=0.00..16628.57 rows=217057 width=0)
(actual time=0.009..303.026 rows=217107 loops=1)
 Total runtime: 559.536 ms

Running the same command again several times practically
does not change anything.


explain analyze select count(*) from users where username like 'A%';
New:
 Aggregate  (cost=6361.28..6361.29 rows=1 width=0) (actual
time=87.528..87.530 rows=1 loops=1)
  ->  Bitmap Heap Scan on users  (cost=351.63..6325.33 rows=14376
width=0) (actual time=6.444..53.426 rows=17739 loops=1)
        Filter: ((username)::text ~~ 'a%'::text)
        ->  Bitmap Index Scan on c_username_unique  (cost=0.00..348.04
rows=14376 width=0) (actual time=5.383..5.383 rows=17739 loops=1)
              Index Cond: (((username)::text >= 'a'::text) AND
((username)::text < 'b'::text))
 Total runtime: 87.638 ms

Old:
 Aggregate  (cost=13188.91..13188.92 rows=1 width=0) (actual
time=61.743..61.745 rows=1 loops=1)
  ->  Bitmap Heap Scan on users  (cost=392.07..13157.75 rows=12466
width=0) (actual time=7.433..40.847 rows=17747 loops=1)
        Filter: ((username)::text ~~ 'a%'::text)
        ->  Bitmap Index Scan on c_username_unique  (cost=0.00..388.96
rows=12466 width=0) (actual time=5.652..5.652 rows=17861 loops=1)
              Index Cond: (((username)::text >= 'a'::character varying)
AND ((username)::text < 'b'::character varying))
 Total runtime: 61.824 ms


Any ideas?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

First, use iostat or another tool  to view the disk usage on the new machine during these queries and validate that it is not using the disk at all.  This is most likely the case.

Then, to be sure, set its config parameters to be equal to the old one, and turn off auto-vacuum.  This will also most likely have no effect.

Once this is confirmed, we can be pretty sure that the issue is restricted to:
CPU / RAM / Motherboard on the hardware side.  There may still be some software effects in the OS or drivers, or PostgreSQL to account for, but lets drill into the hardware and try and eliminate that first.

Sure, the processor should be faster, but Athlon64s / Opterons are very sensitive to the RAM used and its performance and tuning.
So, you should find some basic CPU benchmarks and RAM benchmarks -- you'll want to measure latency as well as bandwidth.
Athlon64 and Opteron both typically have two memory busses per processor, and it is possible to populate the memory banks in such a way that the system has half the bandwidth.
In any event, you'll first want to identify if simple benchmark software is able to prove a disparity between the systems independant of postgres.  This may be a bit difficult to do on the live system however. 

But it is my suspicion that Postgres performance is often more dependant on the memory subsystem performance than the CPU Mhz (as are most databases) and poor components, configuration, or tuning on that side would show up in queries like the examples here.

pgsql-performance by date:

Previous
From: Carlos Moreno
Date:
Subject: "Mysterious" issues with newly installed 8.3
Next
From: Craig James
Date:
Subject: Re: "Mysterious" issues with newly installed 8.3