Thread: pgsql: Skip full index scan during cleanup of B-tree indexes whenpossi

pgsql: Skip full index scan during cleanup of B-tree indexes whenpossi

From
Teodor Sigaev
Date:
Skip full index scan during cleanup of B-tree indexes when possible

Vacuum of index consists from two stages: multiple (zero of more) ambulkdelete
calls and one amvacuumcleanup call. When workload on particular table
is append-only, then autovacuum isn't intended to touch this table. However,
user may run vacuum manually in order to fill visibility map and get benefits
of index-only scans. Then ambulkdelete wouldn't be called for indexes
of such table (because no heap tuples were deleted), only amvacuumcleanup would
be called In this case, amvacuumcleanup would perform full index scan for
two objectives: put recyclable pages into free space map and update index
statistics.

This patch allows btvacuumclanup to skip full index scan when two conditions
are satisfied: no pages are going to be put into free space map and index
statistics isn't stalled. In order to check first condition, we store
oldest btpo_xact in the meta-page. When it's precedes RecentGlobalXmin, then
there are some recyclable pages. In order to check second condition we store
number of heap tuples observed during previous full index scan by cleanup.
If fraction of newly inserted tuples is less than
vacuum_cleanup_index_scale_factor, then statistics isn't considered to be
stalled. vacuum_cleanup_index_scale_factor can be defined as both reloption and GUC (default).

This patch bumps B-tree meta-page version. Upgrade of meta-page is performed
"on the fly": during VACUUM meta-page is rewritten with new version. No special
handling in pg_upgrade is required.

Author: Masahiko Sawada, Alexander Korotkov
Review by: Peter Geoghegan, Kyotaro Horiguchi, Alexander Korotkov, Yura Sokolov
Discussion:
https://www.postgresql.org/message-id/flat/CAD21AoAX+d2oD_nrd9O2YkpzHaFr=uQeGr9s1rKC3O4ENc568g@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/857f9c36cda520030381bd8c2af20adf0ce0e1d4

Modified Files
--------------
contrib/amcheck/verify_nbtree.c               |   8 +-
contrib/pageinspect/Makefile                  |   3 +-
contrib/pageinspect/btreefuncs.c              |   4 +-
contrib/pageinspect/expected/btree.out        |  16 +--
contrib/pageinspect/pageinspect--1.6--1.7.sql |  26 +++++
contrib/pageinspect/pageinspect.control       |   2 +-
contrib/pgstattuple/expected/pgstattuple.out  |  10 +-
doc/src/sgml/config.sgml                      |  25 +++++
doc/src/sgml/pageinspect.sgml                 |  16 +--
doc/src/sgml/ref/create_index.sgml            |  15 +++
src/backend/access/common/reloptions.c        |  13 ++-
src/backend/access/nbtree/nbtinsert.c         |  12 +++
src/backend/access/nbtree/nbtpage.c           | 150 ++++++++++++++++++++++++--
src/backend/access/nbtree/nbtree.c            | 118 ++++++++++++++++++--
src/backend/access/nbtree/nbtxlog.c           |   6 +-
src/backend/utils/init/globals.c              |   2 +
src/backend/utils/misc/guc.c                  |  10 ++
src/include/access/nbtree.h                   |  11 +-
src/include/access/nbtxlog.h                  |   4 +
src/include/miscadmin.h                       |   2 +
src/include/utils/rel.h                       |   2 +
src/test/regress/expected/btree_index.out     |  29 +++++
src/test/regress/sql/btree_index.sql          |  19 ++++
23 files changed, 458 insertions(+), 45 deletions(-)


Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Alexander Korotkov
Date:
Hi!

On Wed, Apr 4, 2018 at 7:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Skip full index scan during cleanup of B-tree indexes when possible

Thank you for committing this.

It appears that patch contains some redundant variabled.  See warnings produced
by gcc-7.

nbtpage.c: In function '_bt_update_meta_cleanup_info':
nbtpage.c:121:15: warning: variable 'metaopaque' set but not used [-Wunused-but-set-variable]
  BTPageOpaque metaopaque;
               ^~~~~~~~~~
nbtree.c: In function '_bt_vacuum_needs_cleanup':
nbtree.c:790:15: warning: variable 'metaopaque' set but not used [-Wunused-but-set-variable]
  BTPageOpaque metaopaque;
               ^~~~~~~~~~

Attached patch fixes this.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Peter Geoghegan
Date:
On Wed, Apr 4, 2018 at 3:32 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Hi!
>
> On Wed, Apr 4, 2018 at 7:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>>
>> Skip full index scan during cleanup of B-tree indexes when possible
>
>
> Thank you for committing this.
>
> It appears that patch contains some redundant variabled.  See warnings
> produced
> by gcc-7.

I also see an assertion failure within _bt_getrootheight():

TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
"/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
Line: 619)

-- 
Peter Geoghegan


Peter Geoghegan <pg@bowt.ie> writes:
> I also see an assertion failure within _bt_getrootheight():

> TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
> "/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
> Line: 619)

Hm, buildfarm's not complaining --- what's the test case?

            regards, tom lane


Re: pgsql: Skip full index scan during cleanup of B-tree indexeswhen possi

From
Michael Paquier
Date:
On Wed, Apr 04, 2018 at 08:58:14PM -0400, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
> > I also see an assertion failure within _bt_getrootheight():
>
> > TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
> > "/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
> > Line: 619)
>
> Hm, buildfarm's not complaining --- what's the test case?

Hm.  No problems here either with a56e267 and gcc 7.3.  The warnings are
here for sure, and any compiler would complain about those.
--
Michael

Attachment

Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Peter Geoghegan
Date:
On Wed, Apr 4, 2018 at 5:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
>> "/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
>> Line: 619)
>
> Hm, buildfarm's not complaining --- what's the test case?

This was discovered while testing/reviewing the latest version of the
INCLUDE covering indexes patch. It now seems to be unrelated.

Sorry for the noise.

-- 
Peter Geoghegan


Peter Geoghegan <pg@bowt.ie> writes:
>>> TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
>>> "/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
>>> Line: 619)

>> Hm, buildfarm's not complaining --- what's the test case?

> This was discovered while testing/reviewing the latest version of the
> INCLUDE covering indexes patch. It now seems to be unrelated.

Oh, wait ... I wonder if you saw that because you were running a new
backend without having re-initdb'd?  Once you had re-initdb'd, then
of course there would be no old-format btree indexes anywhere.  But
if you hadn't, then anyplace that was not prepared to cope with the
old header format would complain about pre-existing indexes.

In short, this sounds like a place that did not get the memo about
how to cope with un-upgraded indexes.

            regards, tom lane


Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Peter Geoghegan
Date:
On Wed, Apr 4, 2018 at 8:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This was discovered while testing/reviewing the latest version of the
>> INCLUDE covering indexes patch. It now seems to be unrelated.
>
> Oh, wait ... I wonder if you saw that because you were running a new
> backend without having re-initdb'd?

Yes. That's what happened.

> Once you had re-initdb'd, then
> of course there would be no old-format btree indexes anywhere.  But
> if you hadn't, then anyplace that was not prepared to cope with the
> old header format would complain about pre-existing indexes.
>
> In short, this sounds like a place that did not get the memo about
> how to cope with un-upgraded indexes.

Sounds plausible.

-- 
Peter Geoghegan


Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Alexander Korotkov
Date:
On Thu, Apr 5, 2018 at 1:32 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Apr 4, 2018 at 7:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
Skip full index scan during cleanup of B-tree indexes when possible

Thank you for committing this.

It appears that patch contains some redundant variabled.  See warnings produced
by gcc-7.

nbtpage.c: In function '_bt_update_meta_cleanup_info':
nbtpage.c:121:15: warning: variable 'metaopaque' set but not used [-Wunused-but-set-variable]
  BTPageOpaque metaopaque;
               ^~~~~~~~~~
nbtree.c: In function '_bt_vacuum_needs_cleanup':
nbtree.c:790:15: warning: variable 'metaopaque' set but not used [-Wunused-but-set-variable]
  BTPageOpaque metaopaque;
               ^~~~~~~~~~

Attached patch fixes this.

Teodor already committed [1] better patch [2] from Kyotaro Horiguchi.  This question is closed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Alexander Korotkov
Date:
On Thu, Apr 5, 2018 at 6:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@bowt.ie> writes:
>>> TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
>>> "/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
>>> Line: 619)

>> Hm, buildfarm's not complaining --- what's the test case?

> This was discovered while testing/reviewing the latest version of the
> INCLUDE covering indexes patch. It now seems to be unrelated.

Oh, wait ... I wonder if you saw that because you were running a new
backend without having re-initdb'd?  Once you had re-initdb'd, then
of course there would be no old-format btree indexes anywhere.  But
if you hadn't, then anyplace that was not prepared to cope with the
old header format would complain about pre-existing indexes.

In short, this sounds like a place that did not get the memo about
how to cope with un-upgraded indexes.

That's an issue, because meta-page should be upgraded "on the fly".
That was tested, but perhaps without assertions.  I'll investigate more on
this an propose a fix.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Alexander Korotkov
Date:
On Thu, Apr 5, 2018 at 2:26 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Apr 5, 2018 at 6:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@bowt.ie> writes:
>>> TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
>>> "/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
>>> Line: 619)

>> Hm, buildfarm's not complaining --- what's the test case?

> This was discovered while testing/reviewing the latest version of the
> INCLUDE covering indexes patch. It now seems to be unrelated.

Oh, wait ... I wonder if you saw that because you were running a new
backend without having re-initdb'd?  Once you had re-initdb'd, then
of course there would be no old-format btree indexes anywhere.  But
if you hadn't, then anyplace that was not prepared to cope with the
old header format would complain about pre-existing indexes.

In short, this sounds like a place that did not get the memo about
how to cope with un-upgraded indexes.

That's an issue, because meta-page should be upgraded "on the fly".
That was tested, but perhaps without assertions.  I'll investigate more on
this an propose a fix.

So, "Assert(metad->btm_version == BTREE_VERSION)" was failed, because
metadata was copied from meta page to rel->rd_amcache "as is" with
metad->btm_version == BTREE_MIN_VERSION.

Without assert everything works fine, because extended metadata fields are
never used from rel->rd_amcache.  So my first intention was to relax this
assert to Assert(metad->btm_version >= BTREE_MIN_VERSION && metad->btm_version <= BTREE_VERSION).
But then I still have concern that we copy metadata beyond pd_lower
when metapage is in old format.

memcpy(rel->rd_amcache, metad, sizeof(BTMetaPageData));

This is why I decided to write separate function to handle caching of metadata,
which takes care about filling unavailable fields of metadata with default values.
I also made same fix for pageinspect.  Patch is attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Attachment

Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Alexander Korotkov
Date:
On Thu, Apr 5, 2018 at 3:24 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Apr 5, 2018 at 2:26 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Apr 5, 2018 at 6:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Geoghegan <pg@bowt.ie> writes:
>>> TRAP: FailedAssertion("!(metad->btm_version == 3)", File:
>>> "/home/pg/postgresql/root/build/../source/src/backend/access/nbtree/nbtpage.c",
>>> Line: 619)

>> Hm, buildfarm's not complaining --- what's the test case?

> This was discovered while testing/reviewing the latest version of the
> INCLUDE covering indexes patch. It now seems to be unrelated.

Oh, wait ... I wonder if you saw that because you were running a new
backend without having re-initdb'd?  Once you had re-initdb'd, then
of course there would be no old-format btree indexes anywhere.  But
if you hadn't, then anyplace that was not prepared to cope with the
old header format would complain about pre-existing indexes.

In short, this sounds like a place that did not get the memo about
how to cope with un-upgraded indexes.

That's an issue, because meta-page should be upgraded "on the fly".
That was tested, but perhaps without assertions.  I'll investigate more on
this an propose a fix.

So, "Assert(metad->btm_version == BTREE_VERSION)" was failed, because
metadata was copied from meta page to rel->rd_amcache "as is" with
metad->btm_version == BTREE_MIN_VERSION.

Without assert everything works fine, because extended metadata fields are
never used from rel->rd_amcache.  So my first intention was to relax this
assert to Assert(metad->btm_version >= BTREE_MIN_VERSION && metad->btm_version <= BTREE_VERSION).
But then I still have concern that we copy metadata beyond pd_lower
when metapage is in old format.

memcpy(rel->rd_amcache, metad, sizeof(BTMetaPageData));

This is why I decided to write separate function to handle caching of metadata,
which takes care about filling unavailable fields of metadata with default values.
I also made same fix for pageinspect.  Patch is attached.

I forgot to note, that I've tested this patch in following way.  I did initdb and
created some tables and indexes on eac93e20af.  And then used the
same cluster on patched master including index scans, some inserts/updates/deletes,
bt_metap(), vacuum and so on.  Everything worked fine.
Sorry, that I didn't test that enough before.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Fujii Masao
Date:
On Thu, Apr 5, 2018 at 1:29 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
> Skip full index scan during cleanup of B-tree indexes when possible

This commit added XLOG_BTREE_META_CLEANUP, so btree_desc() and
btree_identify() should be updated so that they handle XLOG_BTREE_META_CLEANUP.
But ISTM that you forgot doing that.

Regards,

-- 
Fujii Masao


Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Peter Geoghegan
Date:
On Wed, Apr 18, 2018 at 11:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Apr 5, 2018 at 1:29 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>> Skip full index scan during cleanup of B-tree indexes when possible
>
> This commit added XLOG_BTREE_META_CLEANUP, so btree_desc() and
> btree_identify() should be updated so that they handle XLOG_BTREE_META_CLEANUP.
> But ISTM that you forgot doing that.

Another thing that I noticed is that the metapage stores
btm_last_cleanup_num_heap_tuples as a float4, even though
xl_btree_metadata stores it as a double. I suggest that both places
store it as float8, to be consistent. (It should not be double because
we always avoid using anything other types with explicit typedef'd
widths in WAL records.)

-- 
Peter Geoghegan


Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Alexander Korotkov
Date:
On Wed, Apr 18, 2018 at 9:21 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Apr 18, 2018 at 11:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Apr 5, 2018 at 1:29 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>> Skip full index scan during cleanup of B-tree indexes when possible
>
> This commit added XLOG_BTREE_META_CLEANUP, so btree_desc() and
> btree_identify() should be updated so that they handle XLOG_BTREE_META_CLEANUP.
> But ISTM that you forgot doing that.
 
Thank you for notice.  See attached bt-vacuum-cleanup-wal-record-desc-identify.patch fixing that.

Another thing that I noticed is that the metapage stores
btm_last_cleanup_num_heap_tuples as a float4, even though
xl_btree_metadata stores it as a double. I suggest that both places
store it as float8, to be consistent. (It should not be double because
we always avoid using anything other types with explicit typedef'd
widths in WAL records.)

Good catch, thank you!  I also agree that both these fields should be of float8 type.
Please, find attached bt-vacuum-cleanup-float8-num-heap-tuples.patch fixing that.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Attachment

Re: pgsql: Skip full index scan during cleanup of B-tree indexes whenpossi

From
Teodor Sigaev
Date:
>     Another thing that I noticed is that the metapage stores
>     btm_last_cleanup_num_heap_tuples as a float4, even though
>     xl_btree_metadata stores it as a double. I suggest that both places
>     store it as float8, to be consistent. (It should not be double because
>     we always avoid using anything other types with explicit typedef'd
>     widths in WAL records.)
> 
> 
> Good catch, thank you!  I also agree that both these fields should be of 
> float8 type.
> Please, find attached bt-vacuum-cleanup-float8-num-heap-tuples.patch 
> fixing that.

Hm. patch changes on-disk representation of btree meta page. And there 
is no support to upgrade meta page since 857f9c36 commit (and should 
not, because that versions wasn't relesead), to prevent possible false 
positive bug reports I suggest to bump catversion. Objections?


-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru
                                       WWW: http://www.sigaev.ru/


Re: pgsql: Skip full index scan during cleanup of B-tree indexes when possi

From
Alexander Korotkov
Date:
On Thu, Apr 19, 2018 at 9:19 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
    Another thing that I noticed is that the metapage stores
    btm_last_cleanup_num_heap_tuples as a float4, even though
    xl_btree_metadata stores it as a double. I suggest that both places
    store it as float8, to be consistent. (It should not be double because
    we always avoid using anything other types with explicit typedef'd
    widths in WAL records.)


Good catch, thank you!  I also agree that both these fields should be of float8 type.
Please, find attached bt-vacuum-cleanup-float8-num-heap-tuples.patch fixing that.

Hm. patch changes on-disk representation of btree meta page. And there is no support to upgrade meta page since 857f9c36 commit (and should not, because that versions wasn't relesead), to prevent possible false positive bug reports I suggest to bump catversion. Objections?

I agree, catversion should be bumped for this patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: pgsql: Skip full index scan during cleanup of B-tree indexes whenpossi

From
Teodor Sigaev
Date:
Thank you, both patches are pushed.

Alexander Korotkov wrote:
> On Wed, Apr 18, 2018 at 9:21 PM, Peter Geoghegan <pg@bowt.ie 
> <mailto:pg@bowt.ie>> wrote:
> 
>     On Wed, Apr 18, 2018 at 11:12 AM, Fujii Masao <masao.fujii@gmail.com
>     <mailto:masao.fujii@gmail.com>> wrote:
>     > On Thu, Apr 5, 2018 at 1:29 AM, Teodor Sigaev <teodor@sigaev.ru <mailto:teodor@sigaev.ru>> wrote:
>     >> Skip full index scan during cleanup of B-tree indexes when possible
>     >
>     > This commit added XLOG_BTREE_META_CLEANUP, so btree_desc() and
>     > btree_identify() should be updated so that they handle XLOG_BTREE_META_CLEANUP.
>     > But ISTM that you forgot doing that.
> 
> Thank you for notice.  See attached 
> bt-vacuum-cleanup-wal-record-desc-identify.patch fixing that.
> 
>     Another thing that I noticed is that the metapage stores
>     btm_last_cleanup_num_heap_tuples as a float4, even though
>     xl_btree_metadata stores it as a double. I suggest that both places
>     store it as float8, to be consistent. (It should not be double because
>     we always avoid using anything other types with explicit typedef'd
>     widths in WAL records.)
> 
> 
> Good catch, thank you!  I also agree that both these fields should be of float8 
> type.
> Please, find attached bt-vacuum-cleanup-float8-num-heap-tuples.patch fixing that.
> 
> ------
> Alexander Korotkov
> Postgres Professional:http://www.postgrespro.com <http://www.postgrespro.com/>
> The Russian Postgres Company
> 

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/