Thread: Ideas for a relcache test mode about missing invalidations

Ideas for a relcache test mode about missing invalidations

From

Andres Freund

Date:

01 August 2018, 19:25:18

Hi,

The issue at [1] is caused by missing invalidations, and [2] seems like
a likely candidate too. I wonder if it'd be good to have a relcache test
mode akin to CLOBBER_CACHE_ALWAYS and RELCACHE_FORCE_RELEASE, that tries
to ensure that we've done sufficiently to ensure the right invalidations
are sent.

I think what we'd kind of want is to ensure that relcache entries are
rebuilt at the earliest possible time, but *not* later. That'd mean
they're out of date if there's missing invalidations. Unfortunately I'm
not clear on how that'd be achievable?  Ideas?

The best I can come up with is to code some additional dependencies into
CacheInvalidateHeapTuple(), and add tracking ensuring we've sent the
right messages. But that seems somewhat painful and filled with holes.

[1] http://archives.postgresql.org/message-id/CAKoxK%2B5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf%3DHEQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/12259.1532117714@sss.pgh.pa.us

Greetings,

Andres Freund

Re: Ideas for a relcache test mode about missing invalidations

From

Kyotaro HORIGUCHI

Date:

02 August 2018, 13:18:11

Hello.

At Wed, 1 Aug 2018 09:25:18 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180801162518.jnb2ql5dfmgwp4qo@alap3.anarazel.de>
> Hi,
> 
> The issue at [1] is caused by missing invalidations, and [2] seems like
> a likely candidate too. I wonder if it'd be good to have a relcache test
> mode akin to CLOBBER_CACHE_ALWAYS and RELCACHE_FORCE_RELEASE, that tries
> to ensure that we've done sufficiently to ensure the right invalidations
> are sent.
> 
> I think what we'd kind of want is to ensure that relcache entries are
> rebuilt at the earliest possible time, but *not* later. That'd mean
> they're out of date if there's missing invalidations. Unfortunately I'm
> not clear on how that'd be achievable?  Ideas?
> 
> The best I can come up with is to code some additional dependencies into
> CacheInvalidateHeapTuple(), and add tracking ensuring we've sent the
> right messages. But that seems somewhat painful and filled with holes.
> 
> [1]
http://archives.postgresql.org/message-id/CAKoxK%2B5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf%3DHEQ%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/12259.1532117714@sss.pgh.pa.us

As for [1], it is not a issue on invalidation. It happens also if
the relation has any index and even drop is not needed. The
following steps are sufficient.

create table t( pk serial, t text );
insert into t( t ) values( 'hello' ), ('world');
create index idx_fake0 on t (pk);
create index idx_fake on t ( f_fake( pk ) ); -- ERROR

index_create() creates a new pg_index entry with indislive = true
before building it. So the planner (RelationGetIndexList) sees
the not-yet-populated index while planning the query in f_fake().

The attached patch fixes the issue, but I'm not sure this story
is applicable to [2].

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From eaa5de68ff27bd43a643089f8c5963e5cc3d20cc Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>
Date: Thu, 2 Aug 2018 18:52:24 +0900
Subject: [PATCH] Don't use currently-building index to populate it

index_create() creates a pg_index entry with indislive = true before
populating it. If it is a function index where the function runs a
query on the parent relation, planner sees the just created entry and
tries to access the heap page that is not created yet.  This patch let
index_create() not to set indislive = true after population.
---
 src/backend/catalog/index.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8b276bc430..b561c8696d 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -124,6 +124,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
                     bool immediate,
                     bool isvalid,
                     bool isready);
+static void ActivateIndexRelation(Oid indexoid);
 static void index_update_stats(Relation rel,
                    bool hasindex,
                    double reltuples);
@@ -666,7 +667,7 @@ UpdateIndexRelation(Oid indexoid,
     values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
     values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
     values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
-    values[Anum_pg_index_indislive - 1] = BoolGetDatum(true);
+    values[Anum_pg_index_indislive - 1] = BoolGetDatum(false);
     values[Anum_pg_index_indisreplident - 1] = BoolGetDatum(false);
     values[Anum_pg_index_indkey - 1] = PointerGetDatum(indkey);
     values[Anum_pg_index_indcollation - 1] = PointerGetDatum(indcollation);
@@ -693,6 +694,55 @@ UpdateIndexRelation(Oid indexoid,
     heap_freetuple(tuple);
 }
 
+/* ----------------------------------------------------------------
+ *        ActivateIndexRelation
+ *
+ * Publish index by marking it "relislive"
+
+ *  UpdateIndexRelation builds an index relation with relislive = false so as
+ *  not to be used by the quieries used to build the index. This function
+ *  marks the index as "live" so that it can be used hereafter.
+ *  ----------------------------------------------------------------
+ */
+static void
+ActivateIndexRelation(Oid indexoid)
+{
+    Relation    indrel;
+    SysScanDesc    sdesc;
+    ScanKeyData    skey;
+    HeapTuple    tup;
+
+    ScanKeyInit(&skey,
+                Anum_pg_index_indexrelid,
+                BTEqualStrategyNumber, F_OIDEQ,
+                ObjectIdGetDatum(indexoid));
+    indrel = heap_open(IndexRelationId, RowExclusiveLock);
+    sdesc = systable_beginscan(indrel, InvalidOid, true, NULL, 1, &skey);
+
+    /*
+     * We must see one and only one entry for the key. But don't bother
+     * checking that.
+     */
+    while (HeapTupleIsValid(tup = systable_getnext(sdesc)))
+    {
+        Datum values[Natts_pg_index];
+        bool nulls[Natts_pg_index];
+        bool replaces[Natts_pg_index];
+
+        MemSet(values, 0, sizeof(values));
+        MemSet(nulls, 0, sizeof(nulls));
+        MemSet(replaces, 0, sizeof(replaces));
+        values[Anum_pg_index_indislive] = BoolGetDatum(true);
+        replaces[Anum_pg_index_indislive] = true;
+        tup = heap_modify_tuple(tup, RelationGetDescr(indrel),
+                                values, nulls, replaces);
+        CatalogTupleUpdate(indrel, &tup->t_self, tup);
+        heap_freetuple(tup);
+    }
+    systable_endscan(sdesc);
+    heap_close(indrel, RowExclusiveLock);
+}
+
 
 /*
  * index_create
@@ -1188,6 +1238,9 @@ index_create(Relation heapRelation,
                     true);
     }
 
+    /* let queries use this index */
+    ActivateIndexRelation(indexRelationId);
+
     /*
      * Close the index; but we keep the lock that we acquired above until end
      * of transaction.  Closing the heap is caller's responsibility.
-- 
2.16.3

Re: Ideas for a relcache test mode about missing invalidations

From

Andres Freund

Date:

02 August 2018, 19:03:09

Hi,

On 2018-08-02 19:18:11 +0900, Kyotaro HORIGUCHI wrote:
> At Wed, 1 Aug 2018 09:25:18 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180801162518.jnb2ql5dfmgwp4qo@alap3.anarazel.de>
> > Hi,
> > 
> > The issue at [1] is caused by missing invalidations, and [2] seems like
> > a likely candidate too. I wonder if it'd be good to have a relcache test
> > mode akin to CLOBBER_CACHE_ALWAYS and RELCACHE_FORCE_RELEASE, that tries
> > to ensure that we've done sufficiently to ensure the right invalidations
> > are sent.
> > 
> > I think what we'd kind of want is to ensure that relcache entries are
> > rebuilt at the earliest possible time, but *not* later. That'd mean
> > they're out of date if there's missing invalidations. Unfortunately I'm
> > not clear on how that'd be achievable?  Ideas?
> > 
> > The best I can come up with is to code some additional dependencies into
> > CacheInvalidateHeapTuple(), and add tracking ensuring we've sent the
> > right messages. But that seems somewhat painful and filled with holes.
> > 
> > [1]
http://archives.postgresql.org/message-id/CAKoxK%2B5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf%3DHEQ%40mail.gmail.com
> > [2] https://www.postgresql.org/message-id/12259.1532117714@sss.pgh.pa.us
> 
> As for [1], it is not a issue on invalidation. It happens also if
> the relation has any index and even drop is not needed. The
> following steps are sufficient.

Huh? I don't think this is a proper fix. But please let's argue over in
the other that in the other thread.


Greetings,

Andres Freund

Re: Ideas for a relcache test mode about missing invalidations

From

Peter Geoghegan

Date:

03 August 2018, 00:40:50

On Thu, Aug 2, 2018 at 3:18 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> [1]
http://archives.postgresql.org/message-id/CAKoxK%2B5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf%3DHEQ%40mail.gmail.com
>> [2] https://www.postgresql.org/message-id/12259.1532117714@sss.pgh.pa.us
>
> As for [1], it is not a issue on invalidation. It happens also if
> the relation has any index and even drop is not needed. The
> following steps are sufficient.
>
> create table t( pk serial, t text );
> insert into t( t ) values( 'hello' ), ('world');
> create index idx_fake0 on t (pk);
> create index idx_fake on t ( f_fake( pk ) ); -- ERROR

The fact that there was a weird error wasn't really what we cared
about there. If the user is doing something that's clearly
unreasonable or nonsensical, then they cannot expect much from the
error message (maybe we could do better, but it's not a priority).
What we really cared about was the fact that it was possible to make a
backend's relcache irrecoverably corrupt. That should never be allowed
to happen, even when the user is determined to do something
unreasonable.

-- 
Peter Geoghegan

Re: Ideas for a relcache test mode about missing invalidations

From

Kyotaro HORIGUCHI

Date:

03 August 2018, 04:57:02

At Thu, 2 Aug 2018 14:40:50 -0700, Peter Geoghegan <pg@bowt.ie> wrote in
<CAH2-WznJ8K1HXLTKOY53QBw2dtY0z53xKB-0RpqsJQBS0TkrdQ@mail.gmail.com>
pg> On Thu, Aug 2, 2018 at 3:18 AM, Kyotaro HORIGUCHI
pg> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
pg> >> [1]
http://archives.postgresql.org/message-id/CAKoxK%2B5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf%3DHEQ%40mail.gmail.com
pg> >> [2] https://www.postgresql.org/message-id/12259.1532117714@sss.pgh.pa.us
pg> >
pg> > As for [1], it is not a issue on invalidation. It happens also if
pg> > the relation has any index and even drop is not needed. The
pg> > following steps are sufficient.
pg> >
pg> > create table t( pk serial, t text );
pg> > insert into t( t ) values( 'hello' ), ('world');
pg> > create index idx_fake0 on t (pk);
pg> > create index idx_fake on t ( f_fake( pk ) ); -- ERROR
pg> 
pg> The fact that there was a weird error wasn't really what we cared
pg> about there. If the user is doing something that's clearly
pg> unreasonable or nonsensical, then they cannot expect much from the
pg> error message (maybe we could do better, but it's not a priority).

Hmm. I don't think it's unreasonable or nonsensical, but I don't
argue it here.

pg> What we really cared about was the fact that it was possible to make a
pg> backend's relcache irrecoverably corrupt. That should never be allowed
pg> to happen, even when the user is determined to do something
pg> unreasonable.

I reread through the thread and IUCC, drop_index() is sending
inval on the owing relation and invalidation happens (that is,
RelationCacheInvalidateEntry is called for the owner
relation.). Relcache for the owner is actually being rebuit
during the following create index. At least the problem mentioned
in [1] is happening using a fresh relcache created after invoking
the index creation. The cause is RelationGetIndexList collects
all "indislive" indexes, including the idx_fake before population,
which can be fixed by the attached patch..

I'm I still misunderstanding something?

# Anyway, I'll make another thread for the another issue.

[1]: https://www.postgresql.org/message-id/20180628150209.n2qch5jtn3vt2xaa%40alap3.anarazel.de

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Ideas for a relcache test mode about missing invalidations

From

Kyotaro HORIGUCHI

Date:

03 August 2018, 10:34:43

Hello.

At Fri, 03 Aug 2018 10:54:46 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20180803.105446.155966888.horiguchi.kyotaro@lab.ntt.co.jp>
> I reread through the thread and IUCC, drop_index() is sending
> inval on the owing relation and invalidation happens (that is,

I finally understand that I was totally inept. This came from
*the result* of the original -bug thread.

Sorry for the noise.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Ideas for a relcache test mode about missing invalidations

From

Kyotaro HORIGUCHI

Date:

03 August 2018, 11:30:38

At Wed, 1 Aug 2018 09:25:18 -0700, Andres Freund <andres@anarazel.de> wrote in
<20180801162518.jnb2ql5dfmgwp4qo@alap3.anarazel.de>
> Hi,
> 
> The issue at [1] is caused by missing invalidations, and [2] seems like
> a likely candidate too. I wonder if it'd be good to have a relcache test
> mode akin to CLOBBER_CACHE_ALWAYS and RELCACHE_FORCE_RELEASE, that tries
> to ensure that we've done sufficiently to ensure the right invalidations
> are sent.
> 
> I think what we'd kind of want is to ensure that relcache entries are
> rebuilt at the earliest possible time, but *not* later. That'd mean
> they're out of date if there's missing invalidations. Unfortunately I'm
> not clear on how that'd be achievable?  Ideas?
> 
> The best I can come up with is to code some additional dependencies into
> CacheInvalidateHeapTuple(), and add tracking ensuring we've sent the
> right messages. But that seems somewhat painful and filled with holes.
> 
> [1]
http://archives.postgresql.org/message-id/CAKoxK%2B5fVodiCtMsXKV_1YAKXbzwSfp7DgDqUmcUAzeAhf%3DHEQ%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/12259.1532117714@sss.pgh.pa.us

FWIW, I revisit here.

Maybe stupid, but does it make sense that we always build a new
relcache entry in RelationIdGetRelation then "logically" compare
it with the entry found in relcache? I'm not sure how referece
count affects this, though..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Ideas for a relcache test mode about missing invalidations

From

Peter Geoghegan

Date:

04 August 2018, 01:42:22

On Fri, Aug 3, 2018 at 12:34 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> I reread through the thread and IUCC, drop_index() is sending
>> inval on the owing relation and invalidation happens (that is,
>
> I finally understand that I was totally inept. This came from
> *the result* of the original -bug thread.

I certainly would not say that you were in any way inept. Perhaps
there is an argument for *also* doing what you propose. I am not
dismissive of your idea.

It seems like a question that should be considered separately, on
another thread. If you still want to pursue it.

-- 
Peter Geoghegan

Re: Ideas for a relcache test mode about missing invalidations

From

Kyotaro HORIGUCHI

Date:

09 August 2018, 07:11:31

Hello.

At Fri, 3 Aug 2018 15:42:22 -0700, Peter Geoghegan <pg@bowt.ie> wrote in
<CAH2-WzmXcHXa0MKx5a9NiaaCOE4E4T_rnaHa-N4gN-VoWUT8aw@mail.gmail.com>
> On Fri, Aug 3, 2018 at 12:34 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >> I reread through the thread and IUCC, drop_index() is sending
> >> inval on the owing relation and invalidation happens (that is,
> >
> > I finally understand that I was totally inept. This came from
> > *the result* of the original -bug thread.
> 
> I certainly would not say that you were in any way inept. Perhaps
> there is an argument for *also* doing what you propose. I am not
> dismissive of your idea.

I jumped to the URL to see the thread but it was not
up-to-date. It's the cause of my confusion that I thought the
problem was the error seen *after* the invalidation fix patch in
my repo.

> It seems like a question that should be considered separately, on
> another thread. If you still want to pursue it.

Thanks. I might bring this again later.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center