Thread: Vacuum Problems (locking)

Vacuum Problems (locking)

From
Mike Goldner
Date:
I have a nightly vacuum scheduled as follows:

su - postgres -c "/usr/bin/vacuumdb --analyze --dbname=mydb"

Last night, it appears that the vacuum blocked db access from my
application server (JBoss).  Here is the logfile snippet:

[3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]NOTICE:  number of
page slots needed (559472) exceeds max_fsm_pages (200000)
[3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT:  Consider
increasing the configuration parameter "max_fsm_pages" to a value over
559472.
[3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]LOG:  number of page
slots needed (559472) exceeds max_fsm_pages (200000)
[3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT:  Consider
increasing the configuration parameter "max_fsm_pages" to a value over
559472.
[3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  transaction ID
wrap limit is 1073813592, limited by database "benchpg"
[3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  transaction ID
wrap limit is 1073813592, limited by database "benchpg"
[3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  duration:
10224738.050 ms  statement: VACUUM ANALYZE;

First of all, my max_fsm_pages is obviously way off.  However, every
time I increase my max_fsm_pages the next vacuum says that it requires
more.  Will there every be a plateau in the requested pages?  What is
the impact of exceeding the pages slots needed?

Secondly, in this case, it almost looks like there were two vacuum
process running based on the duplicated log messages.  There is only one
cron job, so there shouldn't have been a second vacuum running.  Should
I just ignore this?

Finally, and most important is the blocking.  The vacuum duration
reported in the log converts to about 170 minutes.  I can track
backwards in the log and the only messages prior to the 6:52am
completion of the vacuum end at 3:57am (almost exactly 170 minutes
prior).

So, all indications point to postgres blocking all access during the
entire vacuum.

The only semi-unusual characteristic of my database is that I have one
table in particular that grows and shrinks dramatically.  My entire
database after a vacuum --full is about 8GB.  However, after a day or
two, the db will grow as large as 75GB.  Truncating that one table
(since it empties regularly) will shrink the db right back to the 8GB.

Is it possible that the extreme table size (maybe requiring the high
number of page slots) is causing postgres to block more than it should
in order to process that table?

Thanks,

Mike

Re: Vacuum Problems (locking)

From
"Joshua D. Drake"
Date:
Mike Goldner wrote:
> I have a nightly vacuum scheduled as follows:
>
> su - postgres -c "/usr/bin/vacuumdb --analyze --dbname=mydb"
>
> Last night, it appears that the vacuum blocked db access from my
> application server (JBoss).  Here is the logfile snippet:
>
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]NOTICE:  number of
> page slots needed (559472) exceeds max_fsm_pages (200000)
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT:  Consider
> increasing the configuration parameter "max_fsm_pages" to a value over
> 559472.
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]LOG:  number of page
> slots needed (559472) exceeds max_fsm_pages (200000)
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT:  Consider
> increasing the configuration parameter "max_fsm_pages" to a value over
> 559472.
> [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  transaction ID
> wrap limit is 1073813592, limited by database "benchpg"
> [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  transaction ID
> wrap limit is 1073813592, limited by database "benchpg"
> [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  duration:
> 10224738.050 ms  statement: VACUUM ANALYZE;
>
> First of all, my max_fsm_pages is obviously way off.  However, every
> time I increase my max_fsm_pages the next vacuum says that it requires
> more.  Will there every be a plateau in the requested pages?  What is
> the impact of exceeding the pages slots needed?

You will need to vacuum full and if your need continues to grow, you
need to vacuum more.

> The only semi-unusual characteristic of my database is that I have one
> table in particular that grows and shrinks dramatically.  My entire
> database after a vacuum --full is about 8GB.  However, after a day or
> two, the db will grow as large as 75GB.  Truncating that one table
> (since it empties regularly) will shrink the db right back to the 8GB.

Vacuum that single table, ALOT.

Sincerely,

Joshua D. Drake


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


Re: Vacuum Problems (locking)

From
Mike Goldner
Date:
On Wed, 2006-10-25 at 15:54 -0400, Mike Goldner wrote:
> I have a nightly vacuum scheduled as follows:
>
> su - postgres -c "/usr/bin/vacuumdb --analyze --dbname=mydb"
>
> Last night, it appears that the vacuum blocked db access from my
> application server (JBoss).  Here is the logfile snippet:
>
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]NOTICE:  number of
> page slots needed (559472) exceeds max_fsm_pages (200000)
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT:  Consider
> increasing the configuration parameter "max_fsm_pages" to a value over
> 559472.
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]LOG:  number of page
> slots needed (559472) exceeds max_fsm_pages (200000)
> [3693-jbossdb-postgres-2006-10-25 06:52:29.488 EDT]HINT:  Consider
> increasing the configuration parameter "max_fsm_pages" to a value over
> 559472.
> [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  transaction ID
> wrap limit is 1073813592, limited by database "benchpg"
> [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  transaction ID
> wrap limit is 1073813592, limited by database "benchpg"
> [3693-jbossdb-postgres-2006-10-25 06:52:29.530 EDT]LOG:  duration:
> 10224738.050 ms  statement: VACUUM ANALYZE;
>
> First of all, my max_fsm_pages is obviously way off.  However, every
> time I increase my max_fsm_pages the next vacuum says that it requires
> more.  Will there every be a plateau in the requested pages?  What is
> the impact of exceeding the pages slots needed?
>
> Secondly, in this case, it almost looks like there were two vacuum
> process running based on the duplicated log messages.  There is only one
> cron job, so there shouldn't have been a second vacuum running.  Should
> I just ignore this?
>
> Finally, and most important is the blocking.  The vacuum duration
> reported in the log converts to about 170 minutes.  I can track
> backwards in the log and the only messages prior to the 6:52am
> completion of the vacuum end at 3:57am (almost exactly 170 minutes
> prior).
>
> So, all indications point to postgres blocking all access during the
> entire vacuum.
>
> The only semi-unusual characteristic of my database is that I have one
> table in particular that grows and shrinks dramatically.  My entire
> database after a vacuum --full is about 8GB.  However, after a day or
> two, the db will grow as large as 75GB.  Truncating that one table
> (since it empties regularly) will shrink the db right back to the 8GB.
>
> Is it possible that the extreme table size (maybe requiring the high
> number of page slots) is causing postgres to block more than it should
> in order to process that table?
>
> Thanks,
>
> Mike
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

I just noticed that my nightly vacuum failed the previous three nights
with the following error:

[2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]ERROR:  failed to re-
find parent key in "jms_messages_pkey"
[2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]STATEMENT:  VACUUM
ANALYZE;

I'm not sure if it is related, but it seems pertinent.

Thanks,

Mike

Re: Vacuum Problems (locking)

From
Tom Lane
Date:
Mike Goldner <mgoldner@agmednet.com> writes:
> First of all, my max_fsm_pages is obviously way off.  However, every
> time I increase my max_fsm_pages the next vacuum says that it requires
> more.  Will there every be a plateau in the requested pages?

We realized recently that this can happen if you have a single relation
that all by itself needs more FSM space than you've got; any per-table
space beyond max_fsm_pages is disregarded in the calculation of the
total system-wide space needed, and so you get this inflationary effect
where the reported requirement keeps going up when you increase the
setting.  There's a fix in for 8.2.

> So, all indications point to postgres blocking all access during the
> entire vacuum.

You haven't said anything I consider credible evidence of that.  But try
looking in pg_locks next time.

            regards, tom lane

Re: Vacuum Problems (locking)

From
Andrew Sullivan
Date:
On Wed, Oct 25, 2006 at 03:54:17PM -0400, Mike Goldner wrote:
> Finally, and most important is the blocking.  The vacuum duration
> reported in the log converts to about 170 minutes.  I can track
> backwards in the log and the only messages prior to the 6:52am
> completion of the vacuum end at 3:57am (almost exactly 170 minutes
> prior).
>
> So, all indications point to postgres blocking all access during the
> entire vacuum.

The vacuum command you posted doesn't block.  I can think of two
other explanations:

1.    You just didn't have any activity then.  Don't throw away
this possibility without evidence: I can't count the number of blind
alleys I've been down because someone insisted that "never happens".

2.    You're completely pegged on I/O.  Vacuum will make this
worse, and maybe therefore no transactions get through.  You can
fiddle with the vacuum settings to get them to back off a little and
let some other transactions through.

That said, your essential problem is that one table.  Vacuum it more
often -- from the look of the churn on it, I'd just put a job on it
that runs all the time and sleeps for a few seconds in between -- and
you shouldn't have this problem.  But you'll need to VACUUM FULL or
dump and reload first.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
Windows is a platform without soap, where rats run around
in open sewers.
        --Daniel Eran

Re: Vacuum Problems (locking)

From
"Jim C. Nasby"
Date:
On Wed, Oct 25, 2006 at 04:00:29PM -0400, Mike Goldner wrote:
> I just noticed that my nightly vacuum failed the previous three nights
> with the following error:
>
> [2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]ERROR:  failed to re-
> find parent key in "jms_messages_pkey"
> [2032-jbossdb-postgres-2006-10-24 04:12:30.019 EDT]STATEMENT:  VACUUM
> ANALYZE;
>
> I'm not sure if it is related, but it seems pertinent.

Well, vacuuming a lot of databases once a day is bad enough. Not
vacuuming for 3 days...

IIRC you can fix that issue by reindexing jms_messages_pkey.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)