Thread: Deathly slow performance on SMP red-hat system

Deathly slow performance on SMP red-hat system

From
ptjm@interlog.com (Patrick TJ McPhee)
Date:
I have a Red Hat ELsmp system running on a dual opteron hp server with
16 gigs of memory. I believe the RH installation is straight out of the
box. I've compiled postgres 8.1.4 from sources.

The problem is that query performance is horrible, and I can't think
why, although it seems clear the problem is outside the query engine.
For instance, "select 1" takes on the order of 100ms. Explain analyze
doesn't shed much light on why this might be:

 explain analyze select 1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.006 rows=1 loops=1)
 Total runtime: 0.059 ms
(2 rows)

But if I turn on duration logging, I get timings like
 LOG:  duration: 91.480 ms

I was able to pg_reload 80 gigs of data in a reasonable amount of time, but
individual queries are taking a long time.

I'm wondering if anyone's seen anything like this.
--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com

Re: Deathly slow performance on SMP red-hat system

From
"Joshua D. Drake"
Date:
Patrick TJ McPhee wrote:
> I have a Red Hat ELsmp system running on a dual opteron hp server with
> 16 gigs of memory. I believe the RH installation is straight out of the
> box. I've compiled postgres 8.1.4 from sources.
>
> The problem is that query performance is horrible, and I can't think
> why, although it seems clear the problem is outside the query engine.
> For instance, "select 1" takes on the order of 100ms. Explain analyze
> doesn't shed much light on why this might be:
>
>  explain analyze select 1;
>                                      QUERY PLAN
> ------------------------------------------------------------------------------------
>  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.006 rows=1 loops=1)
>  Total runtime: 0.059 ms
> (2 rows)
>
> But if I turn on duration logging, I get timings like
>  LOG:  duration: 91.480 ms
>
> I was able to pg_reload 80 gigs of data in a reasonable amount of time, but
> individual queries are taking a long time.
>
> I'm wondering if anyone's seen anything like this.

Vacuum? Analyze?

Joshua D. Drake

--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



Re: Deathly slow performance on SMP red-hat system

From
"Steve Poe"
Date:
Patrick,

Have you modified your postgresql.conf file or is this default out-of-the-box too? Are the database and database logs (pg_xlog) on shared disc space?

Steve

On 8/28/06, Patrick TJ McPhee <ptjm@interlog.com> wrote:
I have a Red Hat ELsmp system running on a dual opteron hp server with
16 gigs of memory. I believe the RH installation is straight out of the
box. I've compiled postgres 8.1.4 from sources.

The problem is that query performance is horrible, and I can't think
why, although it seems clear the problem is outside the query engine.
For instance, "select 1" takes on the order of 100ms. Explain analyze
doesn't shed much light on why this might be:

explain analyze select 1;
                                     QUERY PLAN
------------------------------------------------------------------------------------
Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.005..0.006 rows=1 loops=1)
Total runtime: 0.059 ms
(2 rows)

But if I turn on duration logging, I get timings like
LOG:  duration: 91.480 ms

I was able to pg_reload 80 gigs of data in a reasonable amount of time, but
individual queries are taking a long time.

I'm wondering if anyone's seen anything like this.
--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: Deathly slow performance on SMP red-hat system

From
"Florian G. Pflug"
Date:
Patrick TJ McPhee wrote:
> I have a Red Hat ELsmp system running on a dual opteron hp server with
> 16 gigs of memory. I believe the RH installation is straight out of the
> box. I've compiled postgres 8.1.4 from sources.
>
> The problem is that query performance is horrible, and I can't think
> why, although it seems clear the problem is outside the query engine.
> For instance, "select 1" takes on the order of 100ms. Explain analyze
> doesn't shed much light on why this might be:
I've had this problem on a old version of redhat. It turned out that
the driver included with redhat for the raid-controller was very slow -
i think that it was fsync-performance which was really bad with that driver.
The controller was as adaptec aacraid - in the end, we used debian, and
a recent 2.6 kernel, and things were much faster.

greetings, Florian Pflug

Re: Deathly slow performance on SMP red-hat system

From
ptjm@interlog.com (Patrick TJ McPhee)
Date:
In article <44F44F44.1010104@commandprompt.com>,
Joshua D. Drake <jd@commandprompt.com> wrote:
% Patrick TJ McPhee wrote:
[...]
[the query is "select 1"]
% > But if I turn on duration logging, I get timings like
% >  LOG:  duration: 91.480 ms

[...]

% Vacuum? Analyze?

I had autovacuum on initially, but turned it off. The slowness was in
evidence from the point the data was loaded, when presumably vacuum would
be superfluous. The data is analyzed. Right now, I'm not so
worried about my real data as "select 1" being two orders of magnitude
slower than I'd expect it to be.

Steve Poe asked if I've modified postgresql.conf, and if the database and
logs are on separate volumes.

The .conf file has the memory parameters (shared buffers, work mem,
effective cache size, etc) bumped up quite a bit. We have the block size
set to 16k and the statistics target has been increased from the
default. I had some of the planner costs adjusted as well, but they
don't seem to be material to the problem. It's basically a copy of the
.conf file that's working well in production on similar hardware under
NetBSD.

The logs and data are all one file system, which seems to be on a logical
volume with a single disk sitting under it.

Florian Pflug reports that he had a similar problem due to a slow RAID
controller driver, to which I have no comment.

Thanks for your comments.

--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com

Re: Deathly slow performance on SMP red-hat system

From
"Florian G. Pflug"
Date:
Patrick TJ McPhee wrote:
> % Patrick TJ McPhee wrote:
> [...]
> [the query is "select 1"]
> % > But if I turn on duration logging, I get timings like
> % >  LOG:  duration: 91.480 ms
>
> The logs and data are all one file system, which seems to be on a logical
> volume with a single disk sitting under it.
>
> Florian Pflug reports that he had a similar problem due to a slow RAID
> controller driver, to which I have no comment.
You could try doing:

begin;
select 1;
select 1;
...
rollback;

If this is faster, than it's committing a transaction which is slow -
remember that a statement not wrapped in begin/commit will cause
postgres to start a transaction, execute the statement, and commit
afterwards.

If this is slow too, then I'd suggest playing with postgresql.conf
parameters - e.g. try turning any logging, and the statistics collector
off. If that doesn't reveal a possible cause, then I'd suggest that
you strace the backend you're connected to, and try to see where it's
spending it's time. Since 100ms for a simple "select 1;" is way out
of bounds, I bet that it's some syscall that's taking up all the time -
probably either network or disk related.

greetings, Florian Pflug



>
> Thanks for your comments.
>


Re: Deathly slow performance on SMP red-hat system

From
ptjm@interlog.com (Patrick TJ McPhee)
Date:
In article <44F87893.9050400@phlo.org>, Florian G. Pflug <fgp@phlo.org> wrote:

% You could try doing:
%
% begin;
% select 1;
% select 1;
% ...
% rollback;

Yes, I tried that, but each select is on the order of 100ms.

% off. If that doesn't reveal a possible cause, then I'd suggest that
% you strace the backend you're connected to, and try to see where it's
% spending it's time. Since 100ms for a simple "select 1;" is way out

I tried that, but I don't have the results handy, and the machine's not
responding now. I remember the first time I looked at the result, I
thought it was spending an inordinate amount of time in gettimeofday,
but I also had an strace which showed time in send, which is where
I think the problem is.

I'm pretty strongly convinced that the time is being spent sending data
to the client. Actually I measured the full round trip and found it was
taking on the order of 200ms, which is consistent with no time at all to
perform the select, but 100ms each to send the query to the db and the
result to the client.

I think I need to checkout the socket layer on this machine.
--

Patrick TJ McPhee
North York  Canada
ptjm@interlog.com