Re: Index corruption / planner issue with one table in my pg 11.6 instance - Mailing list pgsql-hackers

From Jeremy Finzel
Subject Re: Index corruption / planner issue with one table in my pg 11.6 instance
Date
Msg-id CAMa1XUjHNMMtMF=ZtvSb4uF1y=SXu+V9AmEWT+FtFfFs3SLrTg@mail.gmail.com
Whole thread Raw
In response to Re: Index corruption / planner issue with one table in my pg 11.6 instance  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Index corruption / planner issue with one table in my pg 11.6 instance  (Jeremy Finzel <finzelj@gmail.com>)
List pgsql-hackers
On Tue, Dec 10, 2019 at 12:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Yeah.  The reported behavior can mostly be explained if we assume
that there's some HOT chain in the table that involves an update
of this particular column, so that if we build an index on that
column we see a broken HOT chain, but building an index on some
other column doesn't have a problem.

The problem exists so far as I can tell on indexing *any column* of *this particular table*.  I tried same experiment on another table in the same replication stream, and I cannot reproduce it.

I am building the index **non-concurrently** every time.
 
The thing this doesn't easily explain is that the behavior persists
across repeated index rebuilds.  A broken HOT chain is only broken
as long as the older entry is still visible-to-somebody, so that
such situations ought to be self-healing as time passes.  If it
fails repeatedly, this theory requires assuming that either

1. You've got some extremely old open transactions (maybe forgotten
prepared transactions?), or

No prepared_xacts and no transactions older than a few hours.  Several hour transactions are common in this reporting system.  I have not yet seen if after several hours the index starts showing up in plans.
 
2. Your workload is constantly generating new broken HOT chains of
the same sort, so that there's usually a live one when you try
to build an index.

The fact that you even notice the indcheckxmin restriction indicates
that you do tend to have long-running transactions in the system,
else the index would come free for use fairly quickly.  So #1 isn't
as implausible as I might otherwise think.  But #2 seems probably
more likely on the whole.  OTOH, neither point is exactly within
the offered evidence.

Is there a way for me to test this theory?  I tried the following with no change in behavior:
  1. Disable write load to table
  2. Vacuum analyze table (not vac full)
  3. Create index
  4. Explain
Still did not pick up the index.

Thanks,
Jeremy

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: minscale, rtrim, btrim functions for numeric
Next
From: Asif Rehman
Date:
Subject: Re: WIP/PoC for parallel backup