Thread: auto truncate/vacuum full

auto truncate/vacuum full

From

JC Praud

Date:

27 October 2009, 14:15:36

Hi all,

I've got a weird thing on one of my databases this night:

- I've a monthly partition for storing activity logs defined as this:
- mother log table
- one child partition for each month

- Last friday I dumped the last month partition, and tried to truncate it, which locked lots of queries because the mother table was also locked. So I killed the TRUNCATE and performed a DELETE on my partition

- Last night the database locked. pg_log full of messages about insert into the mother table waiting for a lock.

- After about 40min, the waiting queries acquired their locks and ran.

- At the same time, my munin graphs show a decrease of my dbase size, of about the size of my partition, which pg_(total_)relation_size says has a zero size now.

- No slow query mentionning truncate nor vacuum full.

- According to pg_stat_user_tables an autovacuum was run on the table just after the unlock.

So my question are: can the autovacuum daemon perform vacuum full ? Or another internal postgres process ? Could it come from the TRUNCATE I run and canceled 4 days before ?

I'm running postgresql-8.3.7

Regards,

--
JC
Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!

Re: auto truncate/vacuum full

From

Alvaro Herrera

Date:

27 October 2009, 17:32:00

JC Praud escribió:

> So my question are: can the autovacuum daemon perform vacuum full ? Or
> another internal postgres process ? Could it come from the TRUNCATE I run
> and canceled 4 days before ?

No.  Autovacuum only issues commands that don't lock tables strongly.  I
doubt this has anything to do with your old TRUNCATE either.  My guess
is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
automatic external process (cron or some such).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: auto truncate/vacuum full

From

Pavel Stehule

Date:

27 October 2009, 19:02:31

2009/10/27 Alvaro Herrera <alvherre@commandprompt.com>:
> JC Praud escribió:
>
>> So my question are: can the autovacuum daemon perform vacuum full ? Or
>> another internal postgres process ? Could it come from the TRUNCATE I run
>> and canceled 4 days before ?
>
> No.  Autovacuum only issues commands that don't lock tables strongly.  I
> doubt this has anything to do with your old TRUNCATE either.  My guess
> is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
> automatic external process (cron or some such).

look again on code. I checked it today. When is possible to drop some
pages from end, then vacuum do lock, and try to remove free pages. I
checked it. This process is started from full vacuum, lazy vacuum and
autovacuum too. We have similar issue like JC has.

Regards
Pavel

>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: auto truncate/vacuum full

From

Alvaro Herrera

Date:

27 October 2009, 19:40:33

Pavel Stehule escribió:
> 2009/10/27 Alvaro Herrera <alvherre@commandprompt.com>:
> > JC Praud escribió:
> >
> >> So my question are: can the autovacuum daemon perform vacuum full ? Or
> >> another internal postgres process ? Could it come from the TRUNCATE I run
> >> and canceled 4 days before ?
> >
> > No.  Autovacuum only issues commands that don't lock tables strongly.  I
> > doubt this has anything to do with your old TRUNCATE either.  My guess
> > is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
> > automatic external process (cron or some such).
>
> look again on code. I checked it today. When is possible to drop some
> pages from end, then vacuum do lock, and try to remove free pages. I
> checked it. This process is started from full vacuum, lazy vacuum and
> autovacuum too. We have similar issue like JC has.

Oh, the truncation bit.  Yeah.  It uses a conditional lock acquire
though, so if the table is already locked it shouldn't try to do
anything.

Now 40 mins walking those pages to figure out that they need to be
truncated, I concede that it's too much.  Maybe we shouldn't be doing a
backwards scan; perhaps this breaks the OS readahead and make it even
slower.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: auto truncate/vacuum full

From

Greg Smith

Date:

27 October 2009, 19:53:24

On Tue, 27 Oct 2009, Alvaro Herrera wrote:

> Now 40 mins walking those pages to figure out that they need to be
> truncated, I concede that it's too much.  Maybe we shouldn't be doing a
> backwards scan; perhaps this breaks the OS readahead and make it even
> slower.

I've watched that take hours before on a large table after purging
hundreds of gigabytes of old historical data.  Improvements to speed that
up like scanning more efficiently would be welcome.  But given the
potential for a really bad worst-case here, I have to wonder if this
really needs to get broken up into bits with finer locking instead of
micromanaging the details.

(Yes, I should have been using date-range partitioning instead and just
dropped the old partitions, but sometimes these things grow only after
you've made design decisions the wrong way)

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: auto truncate/vacuum full

From

Tom Lane

Date:

27 October 2009, 20:33:52

Greg Smith <gsmith@gregsmith.com> writes:
> On Tue, 27 Oct 2009, Alvaro Herrera wrote:
>> Now 40 mins walking those pages to figure out that they need to be
>> truncated, I concede that it's too much.  Maybe we shouldn't be doing a
>> backwards scan; perhaps this breaks the OS readahead and make it even
>> slower.

> I've watched that take hours before on a large table after purging
> hundreds of gigabytes of old historical data.

Shouldn't autovacuum be getting kicked off the lock if anyone does
something that conflicts?  This seems like it should be a nonproblem
if everything is operating as designed.

The issue I can see is that we might never be able to complete any
truncation if there's a lot of potentially removable pages and a
pretty steady flow of conflicting lock attempts.  But that would
result in failure to remove bloat, not stoppage of conflicting queries.

It might be worth allowing autovacuum to execute the truncation every
few hundred pages, so as to ensure that progress can be made even if
it never gets a very long uninterrupted lock on the table.

            regards, tom lane

Re: auto truncate/vacuum full

From

Greg Smith

Date:

27 October 2009, 20:40:50

On Tue, 27 Oct 2009, Tom Lane wrote:

> The issue I can see is that we might never be able to complete any
> truncation if there's a lot of potentially removable pages and a pretty
> steady flow of conflicting lock attempts.  But that would result in
> failure to remove bloat, not stoppage of conflicting queries.

That's exactly it.  It's possible to get into a situation where it becomes
increasingly difficult to even catch up bloat once you get behind by a
certain amount.  All it takes is one giant deletion and you can get stuck
here until there's a window to take the database down altogether, and you
could end up down for hours waiting for that to execute.

> It might be worth allowing autovacuum to execute the truncation every
> few hundred pages, so as to ensure that progress can be made even if it
> never gets a very long uninterrupted lock on the table.

That was the sort of thing I was alluding to, moving in that direction
would seem much more valuable than trying to optimize the existing
approach better.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: auto truncate/vacuum full

From

Tom Lane

Date:

27 October 2009, 20:48:07

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Now 40 mins walking those pages to figure out that they need to be
> truncated, I concede that it's too much.  Maybe we shouldn't be doing a
> backwards scan; perhaps this breaks the OS readahead and make it even
> slower.

That's very possible, since a backwards scan is guaranteed to destroy
any rotational positioning the OS has done --- you'll have to wait
at least one disk rotation for each page, whereas a forward scan could
hope to do better than that.

Maybe we could do this along with my idea of incremental truncation.
Scan the last K pages of the relation *forwards*, truncate as
appropriate, repeat, until finding a nonempty page.  Choosing the max
value of K might be a bit tricky.

            regards, tom lane

Re: auto truncate/vacuum full

From

Tom Lane

Date:

27 October 2009, 20:53:21

Greg Smith <gsmith@gregsmith.com> writes:
> On Tue, 27 Oct 2009, Tom Lane wrote:
>> The issue I can see is that we might never be able to complete any
>> truncation if there's a lot of potentially removable pages and a pretty
>> steady flow of conflicting lock attempts.  But that would result in
>> failure to remove bloat, not stoppage of conflicting queries.

> That's exactly it.

That might be exactly what's worrying *you*, but my point was that it
doesn't appear to explain the OP's complaint.  Whatever issue he has
got is something else.

            regards, tom lane

Re: auto truncate/vacuum full

From

Alvaro Herrera

Date:

27 October 2009, 23:29:54

JC Praud escribió:

> - Last night the database locked. pg_log full of messages about insert into
> the mother table waiting for a lock.

This bit does not make much sense to me.  A transaction waiting will not
show up in the log.  Were they cancelled?  Can you paste an extract from
the log?

> - After about 40min, the waiting queries acquired their locks and ran.

Do you have a vacuum in cron or something like that?  As Tom says, if it
had been autovacuum, it should have been cancelled automatically (else
we've got a bug); but something invoking vacuum externally wouldn't
have, so what you describe is what we would expect.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: auto truncate/vacuum full

From

JC Praud

Date:

28 October 2009, 10:53:50

On Tue, Oct 27, 2009 at 6:31 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote:

JC Praud escribió:

> So my question are: can the autovacuum daemon perform vacuum full ? Or
> another internal postgres process ? Could it come from the TRUNCATE I run
> and canceled 4 days before ?

No. Autovacuum only issues commands that don't lock tables strongly. I
doubt this has anything to do with your old TRUNCATE either. My guess
is that somebody else ran TRUNCATE and forgot to tell you; or maybe an
automatic external process (cron or some such).

I suspected that, too. And asked the ninjas we have here ;) I also checked the vacuum cronjobs we have.
None performing full vacuum on this table.
If it was the case, I should have seen the query in the pg_log as a slow query, I guess ?

Regards,

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
JC
Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!

Re: auto truncate/vacuum full

From

JC Praud

Date:

28 October 2009, 15:21:25

Sorry, I got a bit lost in the thread. BTW, Thanks for all the answers :)

On Wed, Oct 28, 2009 at 12:29 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote:

JC Praud escribió:

> - Last night the database locked. pg_log full of messages about insert into
> the mother table waiting for a lock.

This bit does not make much sense to me. A transaction waiting will not
show up in the log. Were they cancelled? Can you paste an extract from
the log?

No, the transactions were not cancelled. All I saw in he pg_log is this (taken at the time le lock was lifted):

2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 LOG: process 14866 still waiting for RowExclusiveLock on relation 26683 of database 23806 after 5000.271 ms
2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 LOG: process 14048 still waiting for RowExclusiveLock on relation 26683 of database 23806 after 5000.409 ms
2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:39:50 CET dbuser dbase 10.33.10.41 LOG: duration: 62103.487 ms statement: SELECT put_root_files_into_ag_scheduler_delete_files( 8 , 50 );
2009-10-27 05:39:57 CET dbuser dbase 10.33.10.133 LOG: process 14797 still waiting for RowExclusiveLock on relation 26683 of database 23806 after 5000.362 ms
2009-10-27 05:39:57 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.33 LOG: process 18964 acquired RowExclusiveLock on relation 26683 of database 23806 after 3572619.123 ms
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.33 STATEMENT: INSERT INTO log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 LOG: process 24284 acquired RowExclusiveLock on relation 26683 of database 23806 after 3572193.509 ms
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 LOG: process 19497 acquired RowExclusiveLock on relation 26683 of database 23806 after 3572005.173 ms
2009-10-27 05:40:09 CET dbuser dbase 10.33.10.133 STATEMENT: INSERT INTO log_user (id_user, action, comment, date, id_session, ip_addr) VALUES (...)

The log_user table is the mother table with rules redirecting the inserts to the monthly partitions.

> - After about 40min, the waiting queries acquired their locks and ran.

Do you have a vacuum in cron or something like that? As Tom says, if it
had been autovacuum, it should have been cancelled automatically (else
we've got a bug); but something invoking vacuum externally wouldn't
have, so what you describe is what we would expect.

I was not monitoring the database at the time, all I saw is that pg_stat_user_tables dates the last_autovacuum at 2009-10-27 05:40:09.611129+01 this day on the partition.
We have no cronjobs running vacuum on these monthly partition.
No sign of canceled vacuums neither.

Regards,

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

--
JC
Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!

Re: auto truncate/vacuum full

From

Alvaro Herrera

Date:

28 October 2009, 15:36:23

JC Praud escribió:

> On Wed, Oct 28, 2009 at 12:29 AM, Alvaro Herrera <alvherre@commandprompt.com

> > This bit does not make much sense to me.  A transaction waiting will not
> > show up in the log.  Were they cancelled?  Can you paste an extract from
> > the log?
>
> No, the transactions were not cancelled. All I saw in he pg_log is this
> (taken at the time le lock was  lifted):
>
> 2009-10-27 05:39:19 CET dbuser dbase 10.33.10.133 LOG:  process 14866 still
> waiting for RowExclusiveLock on relation 26683 of database 23806 after
> 5000.271 ms

Oh, you have log_lock_waits enabled.  It makes plenty of sense now,
thanks.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: auto truncate/vacuum full

From

Jaime Casanova

Date:

28 October 2009, 16:20:54

On Tue, Oct 27, 2009 at 6:29 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
>
> Do you have a vacuum in cron or something like that?  As Tom says, if it
> had been autovacuum, it should have been cancelled automatically (else
> we've got a bug); but something invoking vacuum externally wouldn't
> have, so what you describe is what we would expect.
>

then we have a bug (at least in 8.3, haven't tried in 8.4)... i see
this a month ago, an autovacuum blocking a lot of concurrent updates
and selects... once i pg_cancel_backend() the autovacuum process the
other ones starting to move

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157

Re: auto truncate/vacuum full

From

Tom Lane

Date:

28 October 2009, 16:33:25

Jaime Casanova <jcasanov@systemguards.com.ec> writes:
> On Tue, Oct 27, 2009 at 6:29 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> Do you have a vacuum in cron or something like that?  As Tom says, if it
>> had been autovacuum, it should have been cancelled automatically (else
>> we've got a bug); but something invoking vacuum externally wouldn't
>> have, so what you describe is what we would expect.

> then we have a bug (at least in 8.3, haven't tried in 8.4)... i see
> this a month ago, an autovacuum blocking a lot of concurrent updates
> and selects... once i pg_cancel_backend() the autovacuum process the
> other ones starting to move

Hmm ... actually there is one case where autovac won't allow itself
to be kicked off locks, which is if it's performing an anti-wraparound
vacuum.  Perhaps anti-wraparound vacuums should skip trying to truncate
relations?

I'm not convinced that that explains Jaime's report though.  You'd
expect AW vacuums to only happen on mostly-unused tables, not ones
that are sufficiently central to an application to result in blocking
a lot of queries ...

            regards, tom lane