Thread: VACUUM touching file but not updating relation

VACUUM touching file but not updating relation

From
Thom Brown
Date:
Hi,

I just noticed that the VACUUM process touches a lot of relations
(affects mtime) but for one file I looked at, it didn't change.  This
doesn't always happen, and many relations aren't touched at all.

I had the following relation:

-rw-------  1 thom  staff      40960 13 Oct 16:06 11946

Ran MD5 over the file:

MD5 (11946) = d6626f930f1fb6d77c3907d3279fe693

Then VACUUM ANALYSE'd all databases in full.

This relation was supposedly affected:

-rw-------  1 thom  staff      40960 14 Oct 11:27 11946

But then I ran MD5 back over it:

MD5 (11946) = d6626f930f1fb6d77c3907d3279fe693

This is the same as before.  What is it doing?  Does this happen
often?  And I can't find out what this particular OID relates to
either.

I'm using 9.2devel btw.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: VACUUM touching file but not updating relation

From
Thom Brown
Date:
On 14 October 2011 12:12, Thom Brown <thom@linux.com> wrote:
> Hi,
>
> I just noticed that the VACUUM process touches a lot of relations
> (affects mtime) but for one file I looked at, it didn't change.  This
> doesn't always happen, and many relations aren't touched at all.
>
> I had the following relation:
>
> -rw-------  1 thom  staff      40960 13 Oct 16:06 11946
>
> Ran MD5 over the file:
>
> MD5 (11946) = d6626f930f1fb6d77c3907d3279fe693
>
> Then VACUUM ANALYSE'd all databases in full.
>
> This relation was supposedly affected:
>
> -rw-------  1 thom  staff      40960 14 Oct 11:27 11946
>
> But then I ran MD5 back over it:
>
> MD5 (11946) = d6626f930f1fb6d77c3907d3279fe693
>
> This is the same as before.  What is it doing?  Does this happen
> often?  And I can't find out what this particular OID relates to
> either.
>
> I'm using 9.2devel btw.

Does anyone know what happened here?  I'm just wondering if there's
some action being performed on the file which can be avoided.  Of
course I haven't determined how often this happens.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: VACUUM touching file but not updating relation

From
Tom Lane
Date:
Thom Brown <thom@linux.com> writes:
> On 14 October 2011 12:12, Thom Brown <thom@linux.com> wrote:
>> I just noticed that the VACUUM process touches a lot of relations
>> (affects mtime) but for one file I looked at, it didn't change. �This
>> doesn't always happen, and many relations aren't touched at all.

No immmediate ideas as to why the mtime would change if the file
contents didn't.  It seems like there must be a code path that marked
a buffer dirty without having changed it, but we're usually pretty
careful about that.

>> And I can't find out what this particular OID relates to
>> either.

Well, the generic method is

regression=# select oid,relname from pg_class where relfilenode = 11946;
  oid  |    relname
-------+----------------
 11563 | pg_toast_11561
(1 row)

This is a toast relation, so ...

regression=# select oid,relname from pg_class where reltoastrelid = 11563;
  oid  |   relname
-------+--------------
 11561 | sql_packages
(1 row)

So in my git-tip database, that relfilenode is
information_schema.sql_packages' TOAST table.  However, such OID
assignments aren't terribly stable in development tip, and it was almost
certainly something different a month ago (especially since
sql_packages' TOAST table is generally empty, and your file is not).
So you'll need to check this for yourself to see what it was, assuming
you still have that database around.  It's a safe bet that it was a
system catalog or index or toast table belonging thereto, though, just
based on the range of OIDs it's in.

            regards, tom lane

Re: VACUUM touching file but not updating relation

From
Thom Brown
Date:
On 11 November 2011 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thom@linux.com> writes:
>> On 14 October 2011 12:12, Thom Brown <thom@linux.com> wrote:
>>> I just noticed that the VACUUM process touches a lot of relations
>>> (affects mtime) but for one file I looked at, it didn't change.  This
>>> doesn't always happen, and many relations aren't touched at all.
>
> No immmediate ideas as to why the mtime would change if the file
> contents didn't.  It seems like there must be a code path that marked
> a buffer dirty without having changed it, but we're usually pretty
> careful about that.
>
>>> And I can't find out what this particular OID relates to
>>> either.
>
> Well, the generic method is
>
> regression=# select oid,relname from pg_class where relfilenode = 11946;
>  oid  |    relname
> -------+----------------
>  11563 | pg_toast_11561
> (1 row)
>
> This is a toast relation, so ...
>
> regression=# select oid,relname from pg_class where reltoastrelid = 11563;
>  oid  |   relname
> -------+--------------
>  11561 | sql_packages
> (1 row)
>
> So in my git-tip database, that relfilenode is
> information_schema.sql_packages' TOAST table.  However, such OID
> assignments aren't terribly stable in development tip, and it was almost
> certainly something different a month ago (especially since
> sql_packages' TOAST table is generally empty, and your file is not).
> So you'll need to check this for yourself to see what it was, assuming
> you still have that database around.  It's a safe bet that it was a
> system catalog or index or toast table belonging thereto, though, just
> based on the range of OIDs it's in.

No, I don't still have the database, but tried the same thing on a
pre-existing database and found a few files exhibiting the same
change.

I checked all files where the time stamp of the file had changed, but
had the same MD5 sum.  I used the list in the query you mentioned and
get:

test2=# select oid,relname from pg_class where relfilenode in

(11682,11692,11707,11708,11725,11726,11727,11728,11740,11743,11744,11751,11752,11757,11761,11764,11765,11771,11776,11777,11778,11795,11816,11817,11854,11855,11858,11861,11862,11865,11866,11869,11870,11873,11874,11901,11902);

 oid  |             relname
------+---------------------------------
 2664 | pg_constraint_conname_nsp_index
 2651 | pg_am_name_index
 2652 | pg_am_oid_index
 2756 | pg_amop_oid_index
 2757 | pg_amproc_oid_index
 2650 | pg_aggregate_fnoid_index
 2839 | pg_toast_2618_index
 2660 | pg_cast_oid_index
 3085 | pg_collation_oid_index
 3164 | pg_collation_name_enc_nsp_index
 2689 | pg_operator_oprname_l_r_n_index
 2754 | pg_opfamily_am_name_nsp_index
 2755 | pg_opfamily_oid_index
 2681 | pg_language_name_index
 2682 | pg_language_oid_index
 2692 | pg_rewrite_oid_index
 2693 | pg_rewrite_rel_rulename_index
 2673 | pg_depend_depender_index
 2674 | pg_depend_reference_index
 3608 | pg_ts_config_cfgname_index
 3712 | pg_ts_config_oid_index
 3609 | pg_ts_config_map_index
 3604 | pg_ts_dict_dictname_index
 3605 | pg_ts_dict_oid_index
 3606 | pg_ts_parser_prsname_index
 3607 | pg_ts_parser_oid_index
 3766 | pg_ts_template_tmplname_index
 3767 | pg_ts_template_oid_index
 3080 | pg_extension_oid_index
 2840 | pg_toast_2619
 2665 | pg_constraint_conrelid_index
 2666 | pg_constraint_contypid_index
 2667 | pg_constraint_oid_index
 3081 | pg_extension_name_index
(34 rows)

An additional VACUUM shows up no such changes except for the case of a
visibility map, although I suspect that's expected to happen.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: VACUUM touching file but not updating relation

From
Tom Lane
Date:
Thom Brown <thom@linux.com> writes:
> On 11 November 2011 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thom Brown <thom@linux.com> writes:
>>> I just noticed that the VACUUM process touches a lot of relations
>>> (affects mtime) but for one file I looked at, it didn't change.  This
>>> doesn't always happen, and many relations aren't touched at all.

>> No immmediate ideas as to why the mtime would change if the file
>> contents didn't.  It seems like there must be a code path that marked
>> a buffer dirty without having changed it, but we're usually pretty
>> careful about that.

> I checked all files where the time stamp of the file had changed, but
> had the same MD5 sum.  I used the list in the query you mentioned and
> get: [ mostly indexes ]

Hmm, is this on a hot standby master?

I observe that _bt_delitems_vacuum() unconditionally dirties the page
and writes a WAL record, whether it has anything to do or not; and that
if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
there being (probably) nothing useful to do.  Seems like that could be
improved.  The comment explaining why it's necessary to do that doesn't
make any sense to me, either.

            regards, tom lane

Re: VACUUM touching file but not updating relation

From
Thom Brown
Date:
On 11 November 2011 23:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thom@linux.com> writes:
>> On 11 November 2011 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Thom Brown <thom@linux.com> writes:
>>>> I just noticed that the VACUUM process touches a lot of relations
>>>> (affects mtime) but for one file I looked at, it didn't change.  This
>>>> doesn't always happen, and many relations aren't touched at all.
>
>>> No immmediate ideas as to why the mtime would change if the file
>>> contents didn't.  It seems like there must be a code path that marked
>>> a buffer dirty without having changed it, but we're usually pretty
>>> careful about that.
>
>> I checked all files where the time stamp of the file had changed, but
>> had the same MD5 sum.  I used the list in the query you mentioned and
>> get: [ mostly indexes ]
>
> Hmm, is this on a hot standby master?

It's using a wal_level of hot_standby and has max_wal_senders set to
2, but it's not actually replicating to anywhere else.  But if I
comment out both of these, restart, then compare pre-vacuum and
post-vacuum, I get the following results for unchanged but touched
items:

test=# select oid,relname from pg_class where relfilenode in
(11680,11682,11684,11686,11690,16530);
  oid  |       relname
-------+---------------------
  2619 | pg_statistic
  2840 | pg_toast_2619
  2841 | pg_toast_2619_index
 16530 | cows2
(4 rows)

The items which didn't match a result in this instance were 11686 and
11690, which is surprising since they both have a visibility map and
free space map, indicating they're some kind of table.

> I observe that _bt_delitems_vacuum() unconditionally dirties the page
> and writes a WAL record, whether it has anything to do or not; and that
> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
> there being (probably) nothing useful to do.  Seems like that could be
> improved.  The comment explaining why it's necessary to do that doesn't
> make any sense to me, either.

Well the effect, in the single instances I've checked, is certainly
more pronounced for hot_standby, but there still appears to be some
occurrences for minimal wal_level too.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: VACUUM touching file but not updating relation

From
Thom Brown
Date:
On 12 November 2011 00:08, Thom Brown <thom@linux.com> wrote:
> On 11 November 2011 23:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thom Brown <thom@linux.com> writes:
>>> On 11 November 2011 00:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Thom Brown <thom@linux.com> writes:
>>>>> I just noticed that the VACUUM process touches a lot of relations
>>>>> (affects mtime) but for one file I looked at, it didn't change.  This
>>>>> doesn't always happen, and many relations aren't touched at all.
>>
>>>> No immmediate ideas as to why the mtime would change if the file
>>>> contents didn't.  It seems like there must be a code path that marked
>>>> a buffer dirty without having changed it, but we're usually pretty
>>>> careful about that.
>>
>>> I checked all files where the time stamp of the file had changed, but
>>> had the same MD5 sum.  I used the list in the query you mentioned and
>>> get: [ mostly indexes ]
>>
>> Hmm, is this on a hot standby master?
>
> It's using a wal_level of hot_standby and has max_wal_senders set to
> 2, but it's not actually replicating to anywhere else.  But if I
> comment out both of these, restart, then compare pre-vacuum and
> post-vacuum, I get the following results for unchanged but touched
> items:
>
> test=# select oid,relname from pg_class where relfilenode in
> (11680,11682,11684,11686,11690,16530);
>  oid  |       relname
> -------+---------------------
>  2619 | pg_statistic
>  2840 | pg_toast_2619
>  2841 | pg_toast_2619_index
>  16530 | cows2
> (4 rows)
>
> The items which didn't match a result in this instance were 11686 and
> 11690, which is surprising since they both have a visibility map and
> free space map, indicating they're some kind of table.
>
>> I observe that _bt_delitems_vacuum() unconditionally dirties the page
>> and writes a WAL record, whether it has anything to do or not; and that
>> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
>> there being (probably) nothing useful to do.  Seems like that could be
>> improved.  The comment explaining why it's necessary to do that doesn't
>> make any sense to me, either.
>
> Well the effect, in the single instances I've checked, is certainly
> more pronounced for hot_standby, but there still appears to be some
> occurrences for minimal wal_level too.

So would you say this is acceptable and normal activity, or is
something awry here?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: VACUUM touching file but not updating relation

From
Tom Lane
Date:
Thom Brown <thom@linux.com> writes:
>> On 11 November 2011 23:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I observe that _bt_delitems_vacuum() unconditionally dirties the page
>>> and writes a WAL record, whether it has anything to do or not; and that
>>> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
>>> there being (probably) nothing useful to do.  Seems like that could be
>>> improved.  The comment explaining why it's necessary to do that doesn't
>>> make any sense to me, either.

>> Well the effect, in the single instances I've checked, is certainly
>> more pronounced for hot_standby, but there still appears to be some
>> occurrences for minimal wal_level too.

> So would you say this is acceptable and normal activity, or is
> something awry here?

Well, it's expected given the current coding in the btree vacuum logic.
It's not clear to me why it was written like that, though.

            regards, tom lane

Re: VACUUM touching file but not updating relation

From
Simon Riggs
Date:
On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thom@linux.com> writes:
>>> On 11 November 2011 23:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I observe that _bt_delitems_vacuum() unconditionally dirties the page
>>>> and writes a WAL record, whether it has anything to do or not; and that
>>>> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
>>>> there being (probably) nothing useful to do.  Seems like that could be
>>>> improved.  The comment explaining why it's necessary to do that doesn't
>>>> make any sense to me, either.
>
>>> Well the effect, in the single instances I've checked, is certainly
>>> more pronounced for hot_standby, but there still appears to be some
>>> occurrences for minimal wal_level too.
>
>> So would you say this is acceptable and normal activity, or is
>> something awry here?
>
> Well, it's expected given the current coding in the btree vacuum logic.
> It's not clear to me why it was written like that, though.

I'll take a look.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: VACUUM touching file but not updating relation

From
Simon Riggs
Date:
On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thom Brown <thom@linux.com> writes:
>>> On 11 November 2011 23:28, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I observe that _bt_delitems_vacuum() unconditionally dirties the page
>>>> and writes a WAL record, whether it has anything to do or not; and that
>>>> if XLogStandbyInfoActive() then btvacuumscan will indeed call it despite
>>>> there being (probably) nothing useful to do.  Seems like that could be
>>>> improved.  The comment explaining why it's necessary to do that doesn't
>>>> make any sense to me, either.
>
>>> Well the effect, in the single instances I've checked, is certainly
>>> more pronounced for hot_standby, but there still appears to be some
>>> occurrences for minimal wal_level too.
>
>> So would you say this is acceptable and normal activity, or is
>> something awry here?
>
> Well, it's expected given the current coding in the btree vacuum logic.
> It's not clear to me why it was written like that, though.

The code works as designed.

_bt_delitems_vacuum() is only ever called with nitems == 0 when it is
the last block of the relation with wal_level = hot standby

As discussed in the comments we must issue a WAL record for the last
block, whatever else has occurred.

So the correct number of WAL records is emitted and I see no bug there.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: VACUUM touching file but not updating relation

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> On Fri, Nov 18, 2011 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Well, it's expected given the current coding in the btree vacuum logic.
>> It's not clear to me why it was written like that, though.

> The code works as designed.

> _bt_delitems_vacuum() is only ever called with nitems == 0 when it is
> the last block of the relation with wal_level = hot standby

> As discussed in the comments we must issue a WAL record for the last
> block, whatever else has occurred.

> So the correct number of WAL records is emitted and I see no bug there.

What Thom's complaining about is that the buffer may be marked dirty
unnecessarily, ie when there has been no actual data change.

            regards, tom lane

Re: VACUUM touching file but not updating relation

From
Simon Riggs
Date:
On Fri, Nov 18, 2011 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> What Thom's complaining about is that the buffer may be marked dirty
> unnecessarily, ie when there has been no actual data change.

OK, I'll patch it.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: VACUUM touching file but not updating relation

From
Simon Riggs
Date:
On Fri, Nov 18, 2011 at 3:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> So the correct number of WAL records is emitted and I see no bug there.
>
> What Thom's complaining about is that the buffer may be marked dirty
> unnecessarily, ie when there has been no actual data change.

Based upon both your feedback, I made a change to stop the block being
marked dirty, though Tom now wants that removed.

Thom, your earlier analysis showing that the md5 checksum of a
relation had changed is not happening because of the section of code
you identified. The code sets some data on the page, which would cause
the md5 checksum to change. So it cannot be the btree code  at
_bt_delitems_vacuum() causing this.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services