(Re)building index using itself or another index of the same table - Mailing list pgsql-hackers

From Arseny Sher
Subject (Re)building index using itself or another index of the same table
Date
Msg-id 87d0g5d0tm.fsf@ars-thinkpad
Whole thread Raw
Responses Re: (Re)building index using itself or another index of the same table
List pgsql-hackers
Hi,

Our customer encountered a curious scenario. They have a table with GIN
index on expression, which performs multiple joins with this table
itself. These joins employ another btree index for efficiency.
VACUUM FULL on this table fails with error like

ERROR:  could not read block 3534 in file "base/41366676/56697497": read only 0 of 8192 bytes

It happens because order in which indexes are rebuilt is not specified.
GIN index is being rebuilt when btree index is not reconstructed yet;
an attempt to use old index with rewritten heap crashes.

A problem of similar nature can be reproduced with the following
stripped-down scenario:

CREATE TABLE pears(f1 int primary key, f2 int);
INSERT INTO pears SELECT i, i+1 FROM generate_series(1, 100) i;
CREATE OR REPLACE FUNCTION pears_f(i int) RETURNS int LANGUAGE SQL IMMUTABLE AS $$
  SELECT f1 FROM pears WHERE pears.f2 = 42
$$;
CREATE index ON pears ((pears_f(f1)));

Here usage of not-yet-created index on pears_f(f1) for its own
construction is pointless, however planner in principle considers it in
get_relation_info, tries to get btree height (_bt_getrootheight) -- and
fails.


There is already a mechanism which prevents usage of indexes during
reindex -- ReindexIsProcessingIndex et al. However, to the contrary of
what index.c:3664 comment say, these protect only indexes on system
catalogs, not user tables: the only real caller is genam.c.

Attached patch extends it: the same check is added to
get_relation_info. Also SetReindexProcessing is cocked in index_create
to defend from index self usage during creation as in stripped example
above. There are some other still unprotected callers of index_build;
concurrent index creation doesn't need it because index is
'not indisvalid' during the build, and in RelationTruncateIndexes
table is empty, so it looks like it can be omitted.


One might argue that function selecting from table can hardly be called
immutable, and immutability is required for index expressions. However,
if user is sure table contents doesn't change, why not? Also, the
possiblity of triggering "could not read block" error with plain SQL is
definitely not nice.

From 5942a3a5b2c90056119b9873c81f30dfa9e003af Mon Sep 17 00:00:00 2001
From: Arseny Sher <sher-ars@yandex.ru>
Date: Thu, 12 Sep 2019 17:35:16 +0300
Subject: [PATCH] Avoid touching user indexes while they are being (re)built.

Existing ReindexIsProcessingIndex check is consulted only in genam.c and thus
enforced only for system catalogs. Check it also in the planner, so that indexes
which are currently being rebuilt are never used. Also cock SetReindexProcessing
in index_create to defend from index self usage during its creation.

Without this, VACUUM FULL or just CREATE INDEX might fail with something like

ERROR:  could not read block 3534 in file "base/41366676/56697497": read only 0 of 8192 bytes

if there are indexes which usage can be considered during these very
indexes (re)building, i.e. index expression scans indexed table.
---
 src/backend/catalog/index.c                | 22 ++++++++++++++++++++--
 src/backend/optimizer/util/plancat.c       |  5 +++++
 src/test/regress/expected/create_index.out | 12 ++++++++++++
 src/test/regress/sql/create_index.sql      | 13 +++++++++++++
 4 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3e1d40662d..5bc764ce46 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1174,7 +1174,22 @@ index_create(Relation heapRelation,
     }
     else
     {
-        index_build(heapRelation, indexRelation, indexInfo, false, true);
+        /* ensure SetReindexProcessing state isn't leaked */
+        PG_TRY();
+        {
+            /* Suppress use of the target index while building it */
+            SetReindexProcessing(heapRelationId, indexRelationId);
+
+            index_build(heapRelation, indexRelation, indexInfo, false, true);
+        }
+        PG_CATCH();
+        {
+            /* Make sure flag gets cleared on error exit */
+            ResetReindexProcessing();
+            PG_RE_THROW();
+        }
+        PG_END_TRY();
+        ResetReindexProcessing();
     }
 
     /*
@@ -1379,7 +1394,10 @@ index_concurrently_build(Oid heapRelationId,
     indexInfo->ii_Concurrent = true;
     indexInfo->ii_BrokenHotChain = false;
 
-    /* Now build the index */
+    /*
+     * Now build the index
+     * SetReindexProcessing is not required since indisvalid is false anyway
+     */
     index_build(heapRel, indexRelation, indexInfo, false, true);
 
     /* Close both the relations, but keep the locks */
diff --git a/src/backend/optimizer/util/plancat.c b/src/backend/optimizer/util/plancat.c
index cf1761401d..9d58cd2574 100644
--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -27,6 +27,7 @@
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/dependency.h"
+#include "catalog/index.h"
 #include "catalog/heap.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_proc.h"
@@ -193,6 +194,10 @@ get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
                         nkeycolumns;
             int            i;
 
+            /* Don't chase own tail */
+            if (ReindexIsProcessingIndex(indexoid))
+                continue;
+
             /*
              * Extract info from the relation descriptor for the index.
              */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 324db1b6ae..1706964277 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -1323,6 +1323,18 @@ create unique index hash_f8_index_1 on hash_f8_heap(abs(random));
 create unique index hash_f8_index_2 on hash_f8_heap((seqno + 1), random);
 create unique index hash_f8_index_3 on hash_f8_heap(random) where seqno > 1000;
 --
+-- Create an index which might consider using this very index during the build.
+--
+-- primary key ensures relhasindex is set
+CREATE TABLE pears(f1 int primary key, f2 int);
+INSERT INTO pears SELECT i, i+1 FROM generate_series(1, 100) i;
+CREATE FUNCTION pears_f(i int) RETURNS int LANGUAGE SQL IMMUTABLE AS $$
+  SELECT f1 FROM pears WHERE pears.f2 = 42
+$$;
+CREATE index ON pears ((pears_f(f1)));
+DROP TABLE pears;
+DROP FUNCTION pears_f;
+--
 -- Try some concurrent index builds
 --
 -- Unfortunately this only tests about half the code paths because there are
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index f96bebf410..76a781f6b0 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -446,6 +446,19 @@ create unique index hash_f8_index_2 on hash_f8_heap((seqno + 1), random);
 create unique index hash_f8_index_3 on hash_f8_heap(random) where seqno > 1000;
 
 --
+-- Create an index which might consider using this very index during the build.
+--
+-- primary key ensures relhasindex is set
+CREATE TABLE pears(f1 int primary key, f2 int);
+INSERT INTO pears SELECT i, i+1 FROM generate_series(1, 100) i;
+CREATE FUNCTION pears_f(i int) RETURNS int LANGUAGE SQL IMMUTABLE AS $$
+  SELECT f1 FROM pears WHERE pears.f2 = 42
+$$;
+CREATE index ON pears ((pears_f(f1)));
+DROP TABLE pears;
+DROP FUNCTION pears_f;
+
+--
 -- Try some concurrent index builds
 --
 -- Unfortunately this only tests about half the code paths because there are
-- 
2.11.0



--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Runtime pruning problem
Next
From: Tom Lane
Date:
Subject: Re: (Re)building index using itself or another index of the same table