Thread: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Robert Creager
Date:
Sigh...

I recently upgraded from 7.4.1 to 8.0.3.  The application did not change.  I'm
now running both database concurrently (on different ports, same machine) just
so I could verify the problem really exists.

The application is a custom test application for testing mechanical systems.
The runs in question (4 at a time) each generate 16 queries at a time of which
the results are sent to the mechanical system which processes the request, which
processes them anywhere from 10 to 120 seconds.  The system is capable of
completing between 4 and 8 jobs at once.  So, once the system is running, at
most there will be 8 queries per run simultaneously.

The entire database fits into RAM (2Gb), as evidenced by no disk activity and
relatively small database size.  pg_xlog is on different disks from the db.

The problem is that on version 8.0.3, once I get 3 or more concurrent runs
going, the query times start tanking (>20 seconds).  On 7.4.1, the applications
hum along with queries typically below .2 seconds on over 5 concurrent runs.
Needless to say, 7.4.1 behaves as expected...  The only change between runs is
the port connecting to.  Bot DB's are up at the same time.

For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
analyze every 5 minutes.

The system is Mandrake Linux running 2.4.22 kernel with dual Intel Xenon CPU
with HT enabled.  On an 803 run, the context switching is up around 60k.  On
7.4.1, it maxes around 23k and averages < 1k.

I've attached four files.  741 has the query and explain analyze.  803 has the
query and explain analyze during loaded and unloaded times.  I've also attached
the conf files for the two versions running.  I've gone through them and don't
see any explanation for the problem I'm having.

I'm guessing this is the CS problem that reared it's head last year?  I had an
e-mail exchange in April of last year about this.  Any reason this would be
worse with 8.0.3?

Thanks,
Rob

--
 13:33:43 up 3 days, 17:08,  9 users,  load average: 0.16, 0.59, 0.40
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 -

From
"Matthew T. O'Connor"
Date:
Robert Creager wrote:

>For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
>analyze every 5 minutes.
>
>

Are you sure that pg_autovacuum is doing it's job?  Meaning are you sure
it's vacuuming as often as needed?  Try to run it with -d2 or so and
make sure that it is actually doing the vacuuming needed.


Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
I am, and it is.  It's ANALYZING and VACUUM'ing tables every interval (5 minutes
- 8.0.3).  Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1
database and it's still clicking along at < .2 second queries.  Last year
(7.4.1), I noticed that it took about a week of heavy activity (for this DB)
before I'd really need a vacuum.  That's when I put in the 5 min cron.

When I first switched over to 8.0.3, I was still running the cron vacuum.  I got
into big trouble when I had vacuum's backed up for 6 hours.  That's when I
started noticing the query problem, and the CS numbers being high.  7.4.1
vacuums every 5 minutes always take < 30 seconds (when I'm watching).

Cheers,
Rob

When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400),
"Matthew T. O'Connor" <matthew@zeut.net> confessed:

> Robert Creager wrote:
>
> >For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
> >analyze every 5 minutes.
> >
> >
>
> Are you sure that pg_autovacuum is doing it's job?  Meaning are you sure
> it's vacuuming as often as needed?  Try to run it with -d2 or so and
> make sure that it is actually doing the vacuuming needed.


--
 22:04:10 up 4 days,  1:39,  8 users,  load average: 0.15, 0.15, 0.12
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> I'm guessing this is the CS problem that reared it's head last year?

The context swap problem was no worse in 8.0 than in prior versions,
so that hardly seems like a good explanation.  Have you tried reverting
to the cron-based vacuuming method you used in 7.4?

            regards, tom lane

Re: Huge performance problem between 7.4.1 and 8.0.3 -

From
David Mitchell
Date:
Sounds like either someone is holding a lock on your pg8 db, or maybe
you need a vacuum full. No amount of normal vacuuming will fix a table
that needs a vacuum full. Although if that were the case I'd expect you
to have slow queries regardless of the number of concurrent connections.
Maybe you should check who is holding locks.

David

Robert Creager wrote:
> I am, and it is.  It's ANALYZING and VACUUM'ing tables every interval (5 minutes
> - 8.0.3).  Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1
> database and it's still clicking along at < .2 second queries.  Last year
> (7.4.1), I noticed that it took about a week of heavy activity (for this DB)
> before I'd really need a vacuum.  That's when I put in the 5 min cron.
>
> When I first switched over to 8.0.3, I was still running the cron vacuum.  I got
> into big trouble when I had vacuum's backed up for 6 hours.  That's when I
> started noticing the query problem, and the CS numbers being high.  7.4.1
> vacuums every 5 minutes always take < 30 seconds (when I'm watching).
>
> Cheers,
> Rob
>
> When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400),
> "Matthew T. O'Connor" <matthew@zeut.net> confessed:
>
>
>>Robert Creager wrote:
>>
>>
>>>For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
>>>analyze every 5 minutes.
>>>
>>>
>>
>>Are you sure that pg_autovacuum is doing it's job?  Meaning are you sure
>>it's vacuuming as often as needed?  Try to run it with -d2 or so and
>>make sure that it is actually doing the vacuuming needed.
>
>
>


--
David Mitchell
Software Engineer
Telogis


Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> I am, and it is.  It's ANALYZING and VACUUM'ing tables every interval (5 mi=
> nutes
> - 8.0.3).  Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1
> database and it's still clicking along at < .2 second queries.

Have you compared physical table sizes?  If the autovac daemon did let
things get out of hand, you'd need a VACUUM FULL or CLUSTER or TRUNCATE
to get the table size back down --- plain VACUUM is unlikely to fix it.

            regards, tom lane

Re: Huge performance problem between 7.4.1 and 8.0.3 -

From
"Matthew T. O'Connor"
Date:
Ok, it doesn't look like an autovacuum problem.  The only other thing I
can think of is that some query is doing a seq scan rather than an index
scan.  Have you turned on the query logging to see what queries are
taking so long?

Matt


Robert Creager wrote:

>I am, and it is.  It's ANALYZING and VACUUM'ing tables every interval (5 minutes
>- 8.0.3).  Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1
>database and it's still clicking along at < .2 second queries.  Last year
>(7.4.1), I noticed that it took about a week of heavy activity (for this DB)
>before I'd really need a vacuum.  That's when I put in the 5 min cron.
>
>When I first switched over to 8.0.3, I was still running the cron vacuum.  I got
>into big trouble when I had vacuum's backed up for 6 hours.  That's when I
>started noticing the query problem, and the CS numbers being high.  7.4.1
>vacuums every 5 minutes always take < 30 seconds (when I'm watching).
>
>Cheers,
>Rob
>
>When grilled further on (Sun, 17 Jul 2005 23:48:20 -0400),
>"Matthew T. O'Connor" <matthew@zeut.net> confessed:
>
>
>
>>Robert Creager wrote:
>>
>>
>>
>>>For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
>>>analyze every 5 minutes.
>>>
>>>
>>>
>>>
>>Are you sure that pg_autovacuum is doing it's job?  Meaning are you sure
>>it's vacuuming as often as needed?  Try to run it with -d2 or so and
>>make sure that it is actually doing the vacuuming needed.
>>
>>
>
>
>
>


Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Mon, 18 Jul 2005 00:18:43 -0400),
"Matthew T. O'Connor" <matthew@zeut.net> confessed:

> Have you turned on the query logging to see what queries are
> taking so long?
>

Yeah.  In the original message is a typical query.  One from 741 and the other
on 803.  On 803, an explain analyze is done twice.  Once during the problem,
once when the system is idle.  On 741, the query behaves the same no matter
what...

Cheers,
Rob

--
 22:31:18 up 4 days,  2:06,  8 users,  load average: 0.25, 0.18, 0.11
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Mon, 18 Jul 2005 16:17:54 +1200),
David Mitchell <david.mitchell@telogis.com> confessed:

> Maybe you should check who is holding locks.

Hmmm...  The only difference is how the vacuum is run.  One by autovacuum, one
by cron (vacuum analyze every 5 minutes).

Cheers,
Rob

--
 23:01:44 up 4 days,  2:36,  6 users,  load average: 0.27, 0.16, 0.10
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 -

From
"Jeffrey W. Baker"
Date:
On Sun, 2005-07-17 at 21:34 -0600, Robert Creager wrote:
> Sigh...
>
> I recently upgraded from 7.4.1 to 8.0.3.  The application did not change.  I'm
> now running both database concurrently (on different ports, same machine) just
> so I could verify the problem really exists.
>
> The application is a custom test application for testing mechanical systems.
> The runs in question (4 at a time) each generate 16 queries at a time of which
> the results are sent to the mechanical system which processes the request, which
> processes them anywhere from 10 to 120 seconds.  The system is capable of
> completing between 4 and 8 jobs at once.  So, once the system is running, at
> most there will be 8 queries per run simultaneously.
>
> The entire database fits into RAM (2Gb), as evidenced by no disk activity and
> relatively small database size.  pg_xlog is on different disks from the db.
>
> The problem is that on version 8.0.3, once I get 3 or more concurrent runs
> going, the query times start tanking (>20 seconds).  On 7.4.1, the applications
> hum along with queries typically below .2 seconds on over 5 concurrent runs.
> Needless to say, 7.4.1 behaves as expected...  The only change between runs is
> the port connecting to.  Bot DB's are up at the same time.
>
> For 8.03, pg_autovacuum is running.  On 7.4.1, I set up a cron job to vacuum
> analyze every 5 minutes.
>
> The system is Mandrake Linux running 2.4.22 kernel with dual Intel Xenon CPU
> with HT enabled.  On an 803 run, the context switching is up around 60k.  On
> 7.4.1, it maxes around 23k and averages < 1k.

Did you build 8.0.3 yourself, or install it from packages?  I've seen in
the past where pg would build with the wrong kind of mutexes on some
machines, and that would send the CS through the roof.  If you did build
it yourself, check your ./configure logs.  If not, try strace.

-jwb

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> Have you tried reverting
> to the cron-based vacuuming method you used in 7.4?
>

I just stopped autovacuum, ran a manual vacuum analyze on 803 (2064 pages
needed, 8000000 FSM setting) and re-started the run (with cron vac enabled).
The query problem has not showed up yet (1/2 hour).  A vacuum on 741 showed 3434
pages needed, 200000 FSM setting.

I'll let it run the night and see if it shows up after a couple of hours.  It
has run clean for 1 hour prior.  If this runs 'till morning, I'll re-enable the
autovacuum, disable the cron and see if it reproduces itself (the slowdown).

Cheers,
Rob

--
 22:18:40 up 4 days,  1:53,  8 users,  load average: 0.10, 0.20, 0.14
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Sun, 17 Jul 2005 22:09:11 -0700),
"Jeffrey W. Baker" <jwbaker@acm.org> confessed:

>
> Did you build 8.0.3 yourself, or install it from packages?  I've seen in
> the past where pg would build with the wrong kind of mutexes on some
> machines, and that would send the CS through the roof.  If you did build
> it yourself, check your ./configure logs.  If not, try strace.

I always build PG from source.  I did check the config.log command line
(./configure) and they were similar enough.  The system has not changed between
building the two versions (if it ain't broke...).

Cheers,
Rob

--
 23:25:21 up 4 days,  3:00,  6 users,  load average: 0.25, 0.15, 0.11
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Mon, 18 Jul 2005 00:18:30 -0400),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> > I am, and it is.  It's ANALYZING and VACUUM'ing tables every interval (5 mi=
> > nutes
> > - 8.0.3).  Right now, for that last 4 hours, I'm not VACUUMing the 7.4.1
> > database and it's still clicking along at < .2 second queries.
>
> Have you compared physical table sizes?  If the autovac daemon did let
> things get out of hand, you'd need a VACUUM FULL or CLUSTER or TRUNCATE
> to get the table size back down --- plain VACUUM is unlikely to fix it.

Table sizes, no.  Entire DB size is 45Mb for 803 and 29Mb for 741.  Cannot make
a direct comparison between the two as I've run against more machines now with
803 than 741, so I'd expect it to be larger.

I'm still running relatively clean on 803 with cron vacuum.  The CS are jumping
from 100 to 120k, but it's not steady state like it was before, and queries are
all under 5 seconds (none hitting the logs) and are typically (glancing at test
runs) still under 1 sec, with some hitting ~2 seconds occasionally.

I've 6 runs going concurrently.  Just saw (vmstat 1) a set of 8 seconds where
the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds...
It's bouncing all over the place, but staying reasonably well behaved overall.

Whoop.  Spoke too soon.  Just hit the wall.  CS at ~80k constant, queries over
10 seconds and rising (30+ now)...  Looking at ps, the vacuum is currently
running.  Going back in the logs, the CS and vacuum hit at about the same time.

I'm going to go back to 741 with the same load and see what happens by tomorrow
morning...  I'll change the cron vac to hit the 741 db.

Cheers,
Rob

--
 23:29:24 up 4 days,  3:04,  6 users,  load average: 0.02, 0.07, 0.08
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> The context swap problem was no worse in 8.0 than in prior versions,
> so that hardly seems like a good explanation.  Have you tried reverting
> to the cron-based vacuuming method you used in 7.4?
>

I've "vacuum_cost_delay = 10" in the conf file for 803.  hit, miss, dirty and
limit are 1, 10, 20 and 200 respectively.  Could that be contributing to the
problem?  I'll know more in an hour or so with 741 running and cron vac and the
same load.

Cheers,
Rob

--
 23:53:53 up 4 days,  3:28,  6 users,  load average: 0.11, 0.13, 0.11
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Sun, 17 Jul 2005 23:43:29 -0600),
Robert Creager <Robert_Creager@LogicalChaos.org> confessed:

> I've 6 runs going concurrently.  Just saw (vmstat 1) a set of 8 seconds where
> the CS didn't drop below 90k, but right now its at ~300 for over 30 seconds...

> It's bouncing all over the place, but staying reasonably well behaved overall.
>

Against 741 and the same load, CS is steady around 300 with spikes up to 4k, but
it's only been running for about 15 minutes.  All queries are < .2 seconds.

Cheers,
Rob

--
 00:03:33 up 4 days,  3:38,  6 users,  load average: 1.67, 0.98, 0.44
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> Tom Lane <tgl@sss.pgh.pa.us> confessed:
>> The context swap problem was no worse in 8.0 than in prior versions,
>> so that hardly seems like a good explanation.  Have you tried reverting
>> to the cron-based vacuuming method you used in 7.4?

> I've "vacuum_cost_delay = 10" in the conf file for 803.

Hmm, did you read this thread?
http://archives.postgresql.org/pgsql-performance/2005-07/msg00088.php

It's still far from clear what's going on there, but it might be
interesting to see if turning off the vacuum delay changes your results
with 8.0.

            regards, tom lane

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Mon, 18 Jul 2005 00:10:53 -0400),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> The context swap problem was no worse in 8.0 than in prior versions,
> so that hardly seems like a good explanation.  Have you tried reverting
> to the cron-based vacuuming method you used in 7.4?
>

Ran 7 hours on 741 with VACUUM ANALYZE every 5 minutes.  The largest CS I saw
was 40k, with an average of 500 (via script which monitors vmstat output).

I've done a VACUUM FULL ANALYZE on 803 and have switched the cron based VACUUM
ANALYZE to 803 also.  The tests are now running again.

> Hmm, did you read this thread?
> http://archives.postgresql.org/pgsql-performance/2005-07/msg00088.php

I just glanced at it.  Once I've reproduced (or not) the problem on 803 with the
VACUUM FULL, I'll turn off the vacuum delay.

Cheers,
Rob

--
 07:10:06 up 4 days, 10:45,  6 users,  load average: 0.28, 0.40, 0.29
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Mon, 18 Jul 2005 09:23:11 -0400),
Tom Lane <tgl@sss.pgh.pa.us> confessed:

> It's still far from clear what's going on there, but it might be
> interesting to see if turning off the vacuum delay changes your results
> with 8.0.
>

Can that be affected by hupping the server, or do I need a restart?

Thanks,
Rob

--
 07:46:53 up 4 days, 11:21,  6 users,  load average: 0.77, 0.43, 0.27
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Tom Lane
Date:
Robert Creager <Robert_Creager@LogicalChaos.org> writes:
> Tom Lane <tgl@sss.pgh.pa.us> confessed:
>> It's still far from clear what's going on there, but it might be
>> interesting to see if turning off the vacuum delay changes your results
>> with 8.0.

> Can that be affected by hupping the server, or do I need a restart?

sighup should be fine.

            regards, tom lane

Re: Huge performance problem between 7.4.1 and 8.0.3 -

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:

>Robert Creager <Robert_Creager@LogicalChaos.org> writes:
>
>
>>I've "vacuum_cost_delay = 10" in the conf file for 803.
>>
>>
>
>Hmm, did you read this thread?
>http://archives.postgresql.org/pgsql-performance/2005-07/msg00088.php
>
>It's still far from clear what's going on there, but it might be
>interesting to see if turning off the vacuum delay changes your results
>with 8.0.
>

With the contrib autovacuum code  if you don't specify vacuum delay
settings from the command line, then autovacuum doesn't touch them.
Therefore (if you aren't specifying them from the command line), on 803,
the vacuum delay settings should be the same for a cron issued vacuum
and an autovacuum issued vacuum.  So if the vacuum delay settings are
the problem, then it should show up either way.


Matt


Re: Huge performance problem between 7.4.1 and 8.0.3 -

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Therefore (if you aren't specifying them from the command line), on 803,
> the vacuum delay settings should be the same for a cron issued vacuum
> and an autovacuum issued vacuum.  So if the vacuum delay settings are
> the problem, then it should show up either way.

... as indeed it does according to Robert's recent reports.  Still
awaiting the definitive test, but I'm starting to think this is another
case of the strange behavior Ian Westmacott exhibited.

            regards, tom lane

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Robert Creager
Date:
In regards to
http://archives.postgresql.org/pgsql-performance/2005-07/msg00261.php

Tom Says:
> ... as indeed it does according to Robert's recent reports.  Still
> awaiting the definitive test, but I'm starting to think this is another
> case of the strange behavior Ian Westmacott exhibited.

Ok.  This morning at around 7:30am I started tests against a freshly VACUUM FULL
ANALYZE 803 database with the vacuum delay on and cron running vacuum analyze
every 5 minutes.

Around 8:15 I was starting to receive hits of a few seconds of high CS hits,
higher than the previous 7 hour run on 741.  I changed the vacuum delay to 0 and
HUP'ed the server (how can I see the value vacuum_cost_delay run time?).  By
10:30, I had vacuum jobs backed up since 9:20 and the queries were over 75
seconds.

I'm currently running on 741 as I need to get work done today ;-)  I'll restart
the 803 db, vacuum full analyze again and next opportunity (maybe tonight),
start runs again with cron vacuum and a vacuum_cost_delay of 0, unless I should
try something else?

Cheers,
Rob

--

Robert Creager
Advisory Software Engineer
Phone 303.673.2365
Pager 888.912.4458
Fax   303.661.5379
StorageTek

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Tom Lane
Date:
Robert Creager <Robert_Creager@StorageTek.com> writes:
> Around 8:15 I was starting to receive hits of a few seconds of high CS hits,
> higher than the previous 7 hour run on 741.  I changed the vacuum delay to 0 and
> HUP'ed the server (how can I see the value vacuum_cost_delay run
> time?).

Start a fresh psql session and "SHOW vacuum_cost_delay" to verify what
the active setting is.

            regards, tom lane

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Robert Creager
Date:
On Mon, 18 Jul 2005 13:52:53 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Start a fresh psql session and "SHOW vacuum_cost_delay" to verify what
> the active setting is.

Thanks.  It does show 0 for 803 in a session that was up since I thought I had
HUPed the server with the new value.

This is leading me to believe that 803 doesn't do very well with VACUUM ANALYZE
running often, at least in my particular application...  I will provide a more
definitive statement to that affect, hopefully tonight.

Cheers,
Rob

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Robert Creager
Date:
On Mon, 18 Jul 2005 13:52:53 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
> Start a fresh psql session and "SHOW vacuum_cost_delay" to verify what
> the active setting is.
>

Alright.  Restarted the 803 database.  Cron based vacuum analyze is running
every 5 minutes.  vacuum_cost_delay is 0.  The problem showed up after about 1/2
hour of running.  I've got vacuum jobs stacked from the last 35 minutes, with 2
vacuums running at the same time.  CS is around 73k.

What do I do now?  I can bring the db back to normal and not run any cron based
vacuum to see if it still happens, but I suspect nothing will happen without the
vacuum.  I'll leave it in it's current semi-catatonic state as long as possible
in case there is something to look at?

Cheers,
Rob

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Tom Lane
Date:
Robert Creager <Robert_Creager@StorageTek.com> writes:
> Alright.  Restarted the 803 database.  Cron based vacuum analyze is
> running every 5 minutes.  vacuum_cost_delay is 0.  The problem showed
> up after about 1/2 hour of running.  I've got vacuum jobs stacked from
> the last 35 minutes, with 2 vacuums running at the same time.  CS is
> around 73k.

Hmm, I hadn't thought about the possible impact of multiple concurrent
vacuums.  Is the problem caused by that, or has performance already gone
into the tank by the time the cron-driven vacuums are taking long enough
to overlap?

            regards, tom lane

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Robert Creager
Date:
On Tue, 19 Jul 2005 12:54:22 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Hmm, I hadn't thought about the possible impact of multiple concurrent
> vacuums.  Is the problem caused by that, or has performance already gone
> into the tank by the time the cron-driven vacuums are taking long enough
> to overlap?

Don't know just yet.  When I run the vacuums manually on a healthy system on
741, they take less than 30 seconds.  I've stopped the cron vacuum and canceled
all the outstanding vacuum processes, but the 803 is still struggling (1/2 hour
later).

I'll re-start the database, vacuum full analyze and restart the runs without the
cron vacuum running.

Cheers,
Rob

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS issue?

From
Robert Creager
Date:
On Tue, 19 Jul 2005 12:54:22 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Robert Creager <Robert_Creager@StorageTek.com> writes:
>
> Hmm, I hadn't thought about the possible impact of multiple concurrent
> vacuums.  Is the problem caused by that, or has performance already gone
> into the tank by the time the cron-driven vacuums are taking long enough
> to overlap?

All statements over 5 seconds are logged.  Vacuums are running on the 5 minute mark.

Log file shows the first query starts going bad a 9:32:15 (7 seconds), although the second query start before the first
. The first vacuum statement logged shows 1148 seconds completing at 9:54:09, so starting at 9:35.  Looks like the
vacuumis an innocent bystander of the problem. 

The first problem queries are below.  Additionally, I've attached 5 minutes (bzipped) of logs starting at the first
eventbelow. 

Jul 19 09:32:15 annette postgres[17029]: [2-1] LOG:  duration: 7146.168 ms  statement:
Jul 19 09:32:15 annette postgres[17029]: [2-2] ^I      SELECT location_id, location_type.name AS type, library, rail
Jul 19 09:32:15 annette postgres[17029]: [2-3] ^I      FROM location_lock JOIN location USING( location_id )
Jul 19 09:32:15 annette postgres[17029]: [2-4] ^I                         JOIN location_type USING( location_type_id )
Jul 19 09:32:15 annette postgres[17029]: [2-5] ^I      WHERE test_session_id = '5264'
Jul 19 09:32:20 annette postgres[17092]: [2-1] LOG:  duration: 13389.730 ms  statement:
Jul 19 09:32:20 annette postgres[17092]: [2-2] ^I      SELECT location_type.name AS location_type_name,
Jul 19 09:32:20 annette postgres[17092]: [2-3] ^I             library, rail, col, side, row, location_id,
Jul 19 09:32:20 annette postgres[17092]: [2-4] ^I             hli_lsm, hli_panel, hli_row, hli_col
Jul 19 09:32:20 annette postgres[17092]: [2-5] ^I      FROM location JOIN location_type USING( location_type_id )
Jul 19 09:32:20 annette postgres[17092]: [2-6] ^I                    JOIN complex USING( library_id )
Jul 19 09:32:20 annette postgres[17092]: [2-7] ^I                    LEFT OUTER JOIN hli_location USING( location_id )
Jul 19 09:32:20 annette postgres[17092]: [2-8] ^I                    LEFT OUTER JOIN application USING( application_id
)
Jul 19 09:32:20 annette postgres[17092]: [2-9] ^I      WHERE  complex.complex_id = '13'
Jul 19 09:32:20 annette postgres[17092]: [2-10] ^I         AND location_id NOT IN
Jul 19 09:32:20 annette postgres[17092]: [2-11] ^I                (SELECT location_id
Jul 19 09:32:20 annette postgres[17092]: [2-12] ^I                 FROM location_lock)
Jul 19 09:32:20 annette postgres[17092]: [2-13] ^I         AND location_id NOT IN
Jul 19 09:32:20 annette postgres[17092]: [2-14] ^I                (SELECT location_id
Jul 19 09:32:20 annette postgres[17092]: [2-15] ^I                 FROM cartridge)
Jul 19 09:32:20 annette postgres[17092]: [2-16] ^IAND (location_type.name ~ 'cell' AND application.name ~ 'hli'  AND
hli_lsm= 1 AND col BETWEEN -2 AND 2) 
Jul 19 09:32:20 annette postgres[17092]: [2-17] ^I
Jul 19 09:32:20 annette postgres[17092]: [2-18] ^I         ORDER BY location.usage_count, location.rand LIMIT 1
Jul 19 09:32:20 annette postgres[17092]: [2-19] ^I         FOR UPDATE OF location

Cheers,
Rob


Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
When grilled further on (Tue, 19 Jul 2005 12:09:51 -0600),
Robert Creager <Robert_Creager@StorageTek.com> confessed:

> On Tue, 19 Jul 2005 12:54:22 -0400
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Hmm, I hadn't thought about the possible impact of multiple concurrent
> > vacuums.  Is the problem caused by that, or has performance already gone
> > into the tank by the time the cron-driven vacuums are taking long enough
> > to overlap?
>
>
> I'll re-start the database, vacuum full analyze and restart the runs without
the
> cron vacuum running.
>

It took a few hours, but the problem did finally occur with no vacuum running on
803.  CS is averaging 72k.  I cannot quantitatively say it took longer to
reproduce than with the vacuums running, but it seemed like it did.

Can any information be gotten out of this?  Should I try CVS HEAD?

Thoughts?

Thanks,
Rob

--
 22:41:36 up 6 days,  2:16,  6 users,  load average: 0.15, 0.21, 0.30
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment

Re: Huge performance problem between 7.4.1 and 8.0.3 - CS

From
Robert Creager
Date:
I've now backed off to version 7.4.1, which doesn't exhibit the problems that
8.0.3 does.  I guess I'll wait 'till the next version and see if any progress
has occurred.

Rob

When grilled further on (Tue, 19 Jul 2005 22:49:08 -0600),
Robert Creager <Robert_Creager@logicalchaos.org> confessed:

> When grilled further on (Tue, 19 Jul 2005 12:09:51 -0600),
> Robert Creager <Robert_Creager@StorageTek.com> confessed:
>
> > On Tue, 19 Jul 2005 12:54:22 -0400
> > Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > > Hmm, I hadn't thought about the possible impact of multiple concurrent
> > > vacuums.  Is the problem caused by that, or has performance already gone
> > > into the tank by the time the cron-driven vacuums are taking long enough
> > > to overlap?
> >
> >
> > I'll re-start the database, vacuum full analyze and restart the runs without
> the
> > cron vacuum running.
> >
>
> It took a few hours, but the problem did finally occur with no vacuum running
on
> 803.  CS is averaging 72k.  I cannot quantitatively say it took longer to
> reproduce than with the vacuums running, but it seemed like it did.
>
> Can any information be gotten out of this?  Should I try CVS HEAD?
>
> Thoughts?
>
> Thanks,
> Rob
>
> --
>  22:41:36 up 6 days,  2:16,  6 users,  load average: 0.15, 0.21, 0.30
> Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004


--
 14:35:32 up 9 days, 18:10,  5 users,  load average: 2.17, 2.19, 2.15
Linux 2.6.5-02 #8 SMP Mon Jul 12 21:34:44 MDT 2004

Attachment