Thread: Problem Observed in behavior of Create Index Concurrently and Hot Update

Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Amit Kapila

Date:

31 October 2012, 09:13:07

<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">There
seemsto be a problem in behavior of Create Index Concurrently and Hot Update in HEAD code . </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Pleasesee the below testcase</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Step-1</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">-----------</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Client-1</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Createtable t1(c1 int, c2 int, c3 int);</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">insertinto t1 values(1,2,3);</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Step-2</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">-----------</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Client- 2 </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">updatet1 set c2=4; where c1 = 1; -- This will be Hot
update</span><br/><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Select * from t1; </span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> c1| c2 | c3</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">----+----+----</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> 1 |  4 |  3</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">(1row)</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Noproblem till here.</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Step-3</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">-----------</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Client-1</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> createindex concurrently idx_conc_t1 on t1(c2);  -- Run this
commandin debug mode (by having breakpoint in DefineIndex)</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Stopbefore the  CommitTransactionCommand() of phase-2 where
index_buildis done and indisready flag is set to TRUE.</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Aswe have stopped before commit, still indexisready will not
bevisible to other session/transaction.</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Step-4</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">-----------</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Client-2</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">updatet1 set c2=5 where c1=1;  -- Update is success, but this
isa HOT update</span><br /><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">According to me, here is the
problem,it shouldn't have done HOT update.</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Step-5</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">-----------</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Client-1</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Resumedebugging, and complete the command. I have observed in
validate_index(),it doesn't create index entry for c2=5.</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Step-6</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">-----------</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Client-2</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> select* from t1 where c2=5;</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> c1| c2 | c3</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">----+----+----</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> 1 |  5 |  3</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">(1row)</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">postgres=#set enable_seqscan=off;          -- This is to
ensureindex scan should happen</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">SET</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">postgres=#select * from t1 where c2=5;      -- Problem, it
shouldhave shown the Row.</span><br /><span style="font-size:10.0pt;font-family:"Arial","sans-serif""> c1 | c2 |
c3</span><br/><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">----+----+----</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">(0rows)</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">postgres=#select * from t1 where c2=4;    -- Problem, query
isdone for C2=4 and the result shows  C2=5.</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> c1| c2 | c3</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">----+----+----</span><br/><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> 1 |  5 |  3</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">(1row)</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Accordingto me, the problem happens at Step-4. As at Step-4,
itdoes the HOT update due to which validate_index() is not able to put an entry for C2=5</span><br /><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Letme know if I have misunderstood something?</span><br /><br
/><br/><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">With Regards,</span><br /><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">AmitKapila.</span></div>

Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

24 November 2012, 18:50:13

Hi,

On 2012-10-31 11:41:37 +0530, Amit Kapila wrote:
> There seems to be a problem in behavior of Create Index Concurrently and Hot
> Update in HEAD code .

At pgcon.it I had a chance to discuss with Simon how to fix this
bug. Please check the attached patches - and their commit messages - for
the fix and some regression tests.
The fix contains at least one FIXME where I am not sure if we need to do
something additional and the locking behaviour deserves some close
review.

Opinions?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Tom Lane

Date:

27 November 2012, 00:39:50

Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-10-31 11:41:37 +0530, Amit Kapila wrote:
>> There seems to be a problem in behavior of Create Index Concurrently and Hot
>> Update in HEAD code .

> At pgcon.it I had a chance to discuss with Simon how to fix this
> bug. Please check the attached patches - and their commit messages - for
> the fix and some regression tests.

I looked at this a bit.  I am very unhappy with the proposed kluge to
open indexes NoLock in some places.  Even if that's safe today, which
I don't believe in the least, any future change in this area could break
it.

I'm a bit inclined to think that DROP INDEX CONCURRENTLY should have an
additional step that somehow marks the pg_index row in a new way that
makes RelationGetIndexList ignore it, and then wait out existing
transactions before taking the final step of dropping the index.  The
Right Way to do this would likely have been to add another bool column,
but we don't have that option anymore in 9.2.  Maybe we could abuse
indcheckxmin somehow, ie use a state that can't otherwise occur (in
combination with the values of indisvalid/indisready) to denote an index
being dropped.
        regards, tom lane

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Tom Lane

Date:

27 November 2012, 06:31:57

I wrote:
> I'm a bit inclined to think that DROP INDEX CONCURRENTLY should have an
> additional step that somehow marks the pg_index row in a new way that
> makes RelationGetIndexList ignore it, and then wait out existing
> transactions before taking the final step of dropping the index.  The
> Right Way to do this would likely have been to add another bool column,
> but we don't have that option anymore in 9.2.  Maybe we could abuse
> indcheckxmin somehow, ie use a state that can't otherwise occur (in
> combination with the values of indisvalid/indisready) to denote an index
> being dropped.

I looked into this some more.  There are currently three possible states
of the indisvalid/indisready flags:

indisvalid = false, indisready = false
initial state during CREATE INDEX CONCURRENTLY; this shouldresult in sessions honoring the index for HOT-safety
decisions,butnot using it for searches or insertions

indisvalid = false, indisready = true
sessions should now insert into the index, but not use itfor searches

indisvalid = true, indisready = true
normal, fully valid index

Either state of indcheckxmin is valid with all three of these
combinations, so the specific kluge I was contemplating above doesn't
work.  But there is no valid reason for an index to be in this state:

indisvalid = true, indisready = false

I suggest that to fix this for 9.2, we could abuse these flags by
defining that combination as meaning "ignore this index completely",
and having DROP INDEX CONCURRENTLY set this state during its final
wait before removing the index.

Obviously, an additional flag column would be a much cleaner fix,
and I think we should add one in HEAD.  But it's too late to do
that in 9.2.
        regards, tom lane

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Michael Paquier

Date:

27 November 2012, 07:37:08

On Tue, Nov 27, 2012 at 12:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> I'm a bit inclined to think that DROP INDEX CONCURRENTLY should have an
> additional step that somehow marks the pg_index row in a new way that
> makes RelationGetIndexList ignore it, and then wait out existing
> transactions before taking the final step of dropping the index. The
> Right Way to do this would likely have been to add another bool column,
> but we don't have that option anymore in 9.2. Maybe we could abuse
> indcheckxmin somehow, ie use a state that can't otherwise occur (in
> combination with the values of indisvalid/indisready) to denote an index
> being dropped.

I looked into this some more. There are currently three possible states
of the indisvalid/indisready flags:

indisvalid = false, indisready = false

initial state during CREATE INDEX CONCURRENTLY; this should
result in sessions honoring the index for HOT-safety decisions,
but not using it for searches or insertions

indisvalid = false, indisready = true

sessions should now insert into the index, but not use it
for searches

indisvalid = true, indisready = true

normal, fully valid index

Either state of indcheckxmin is valid with all three of these
combinations, so the specific kluge I was contemplating above doesn't
work. But there is no valid reason for an index to be in this state:

indisvalid = true, indisready = false

I suggest that to fix this for 9.2, we could abuse these flags by
defining that combination as meaning "ignore this index completely",
and having DROP INDEX CONCURRENTLY set this state during its final
wait before removing the index.

Obviously, an additional flag column would be a much cleaner fix,
and I think we should add one in HEAD. But it's too late to do
that in 9.2.

For 9.2 as you say the best choice is to ignore in RelationGetIndexList the indexes that are valid and not ready when fetching the index list of a relation.

For the fix in master, just thinking but...

Having 3 boolean flags to manage the state of an index might easily lead to the developer to confusion.
I was thinking about the possibility of using instead of that a list of states that are defined with a character.
We already know 3 possible states which are:
- indisvalid = false, indisready = false, let's say INDEX_IS_INITIAL 'i'
- indisvalid = false, indisready = true, with INDEX_IS_READY 'r'
- indisvalid = true, indisready = true, with INDEX_IS_VALID 'v'

In case an index needs to be ignored as you mention with the combination indisvalid = true and indisready = false, it could be possible to add a new state like INDEX_IS_IGNORE 'g'.

This would avoid the addition of a new column in pg_index and control the status of an index easily.
This is not that much backward-compatible though...

--
Michael Paquier
http://michael.otacoo.com

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Pavan Deolasee

Date:

27 November 2012, 07:49:07

On Tue, Nov 27, 2012 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:

Either state of indcheckxmin is valid with all three of these
combinations, so the specific kluge I was contemplating above doesn't
work. But there is no valid reason for an index to be in this state:

indisvalid = true, indisready = false

I suggest that to fix this for 9.2, we could abuse these flags by
defining that combination as meaning "ignore this index completely",
and having DROP INDEX CONCURRENTLY set this state during its final
wait before removing the index.

Yeah, this looks much better, given our inability to add a new catalog column in 9.2. Can we cheat a little though and use a value other than 0 and 1 for indisvalid or indisready to tell the server to interpret it differently ? I would assume not, but can't see a reason unless these values are converted to other types and back to boolean.

Andres complained about the fact that many callers of RelationGetIndexList are probably not ready to process invalid or not-yet-ready indexes and suggested that those changes should be backpatched to even older releases. But IMO we should do that with a test case that demonstrates that there is indeed a bug. Also, we should teach RelationGetIndexList to take a flags argument and filter out indexes that the caller is not interested instead of every caller doing the checks separately.

Thanks,

Pavan

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Tom Lane

Date:

27 November 2012, 09:09:04

Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> On Tue, Nov 27, 2012 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Either state of indcheckxmin is valid with all three of these
>> combinations, so the specific kluge I was contemplating above doesn't
>> work.  But there is no valid reason for an index to be in this state:
>> indisvalid = true, indisready = false

> Yeah, this looks much better, given our inability to add a new catalog
> column in 9.2. Can we cheat a little though and use a value other than 0
> and 1 for indisvalid or indisready to tell the server to interpret it
> differently ?

No, not unless you'd like "select * from pg_index" to misbehave.
Personally, I like being able to look at catalogs.

> Andres complained about the fact that many callers of RelationGetIndexList
> are probably not ready to process invalid or not-yet-ready indexes and
> suggested that those changes should be backpatched to even older releases.
> But IMO we should do that with a test case that demonstrates that there is
> indeed a bug. Also, we should teach RelationGetIndexList to take a flags
> argument and filter out indexes that the caller is not interested instead
> of every caller doing the checks separately.

We can't really change the API of RelationGetIndexList in the back
branches, as it's just about certain there is third-party code calling
it.  I'm dubious about the advantages of such prefiltering anyway, as:

(1) That would mean that every change to indisvalid/indisready would
require forcing a relcache flush on the parent table.  (Either that or
RelationGetIndexList does pg_index lookups internally, which would
mostly defeat the point of caching the index list at all.)

(2) The big picture here is that it's silly to optimize for any case
other than fully valid indexes, because anything else can be expected
to be a low-probability transient state.  So generally we might as well
have RelationGetIndexList return all the indexes and let callers filter
any that happen to be inappropriate for their usage.
        regards, tom lane

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 13:18:50

On 2012-11-26 16:39:39 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2012-10-31 11:41:37 +0530, Amit Kapila wrote:
> >> There seems to be a problem in behavior of Create Index Concurrently and Hot
> >> Update in HEAD code .
>
> > At pgcon.it I had a chance to discuss with Simon how to fix this
> > bug. Please check the attached patches - and their commit messages - for
> > the fix and some regression tests.
>
> I looked at this a bit.  I am very unhappy with the proposed kluge to
> open indexes NoLock in some places.  Even if that's safe today, which
> I don't believe in the least, any future change in this area could break
> it.

I am not happy with it either. But I failed to see a better way. Note
that in most of the cases a lock actually is acquired shortly
afterwards, just a ->indisvalid or an ->indisready check away. The only
exception is RelationGetIndexAttrBitmap which actually needs to look at
!indisvalid && !indisready indexes because of HOT. All the former cases
could just do a syscache lookup beforehand and skip if it doesn't match
their criterion. I wasn't sure about the (performance, notational)
overhead of doing a second syscache lookup in those paths.

Note that any ->indisvalid/indisready change is protected by waiting for
all concurrent backends to finish, so I don't think the separate
syscache lookup/index_open would be a problem.

For RelationGetIndexAttrBitmap, I don't really see a point in the locks
acquired - after all were not protecting the RelationGetIndexList
either.

Greetings,

Andres

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 13:27:41

On 2012-11-26 22:31:50 -0500, Tom Lane wrote:
> I wrote:
> > I'm a bit inclined to think that DROP INDEX CONCURRENTLY should have an
> > additional step that somehow marks the pg_index row in a new way that
> > makes RelationGetIndexList ignore it, and then wait out existing
> > transactions before taking the final step of dropping the index.  The
> > Right Way to do this would likely have been to add another bool column,
> > but we don't have that option anymore in 9.2.  Maybe we could abuse
> > indcheckxmin somehow, ie use a state that can't otherwise occur (in
> > combination with the values of indisvalid/indisready) to denote an index
> > being dropped.
>
> I looked into this some more.  There are currently three possible states
> of the indisvalid/indisready flags:
>
> indisvalid = false, indisready = false
>
>     initial state during CREATE INDEX CONCURRENTLY; this should
>     result in sessions honoring the index for HOT-safety decisions,
>     but not using it for searches or insertions
>
> indisvalid = false, indisready = true
>
>     sessions should now insert into the index, but not use it
>     for searches
>
> indisvalid = true, indisready = true
>
>     normal, fully valid index
>
> Either state of indcheckxmin is valid with all three of these
> combinations, so the specific kluge I was contemplating above doesn't
> work.  But there is no valid reason for an index to be in this state:
>
> indisvalid = true, indisready = false
>
> I suggest that to fix this for 9.2, we could abuse these flags by
> defining that combination as meaning "ignore this index completely",
> and having DROP INDEX CONCURRENTLY set this state during its final
> wait before removing the index.
>
> Obviously, an additional flag column would be a much cleaner fix,
> and I think we should add one in HEAD.  But it's too late to do
> that in 9.2.

While I aggree that more states would make some stuff cleaner, as long
as were still locking entries in RelationGetIndexAttrBitmap that have
indisvalid = false, indisready = false we still have broken DROP INDEX
CONCURRENTLY due to the exlusive lock acquired during actually dropping
the index. Which can take quite a while on a big index.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 13:48:19

On 2012-11-27 10:18:37 +0530, Pavan Deolasee wrote:
> On Tue, Nov 27, 2012 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > I wrote:
> >
> > Either state of indcheckxmin is valid with all three of these
> > combinations, so the specific kluge I was contemplating above doesn't
> > work.  But there is no valid reason for an index to be in this state:
> >
> > indisvalid = true, indisready = false
> >
> > I suggest that to fix this for 9.2, we could abuse these flags by
> > defining that combination as meaning "ignore this index completely",
> > and having DROP INDEX CONCURRENTLY set this state during its final
> > wait before removing the index.
> >
> >
> Yeah, this looks much better, given our inability to add a new catalog
> column in 9.2. Can we cheat a little though and use a value other than 0
> and 1 for indisvalid or indisready to tell the server to interpret it
> differently ? I would assume not, but can't see a reason unless these
> values are converted to other types and back to boolean.
>
> Andres complained about the fact that many callers of RelationGetIndexList
> are probably not ready to process invalid or not-yet-ready indexes and
> suggested that those changes should be backpatched to even older releases.
> But IMO we should do that with a test case that demonstrates that there is
> indeed a bug.

I haven't yet looked deeply enough to judge whether there are actually
bugs. But I can say that the e.g. the missing indisvalid checks in
transformFkeyCheckAttrs makes me pretty uneasy. Vacuum not checking
whether indexes are ready isn't nice either.

> Also, we should teach RelationGetIndexList to take a flags
> argument and filter out indexes that the caller is not interested instead
> of every caller doing the checks separately.

Given that RelationGetIndexList currently is just returning a cached
list I don't see how thats going to work without significant overhead.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 13:52:16

On 2012-11-27 11:48:08 +0100, Andres Freund wrote:
> On 2012-11-27 10:18:37 +0530, Pavan Deolasee wrote:
> > On Tue, Nov 27, 2012 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > > I wrote:
> > >
> > > Either state of indcheckxmin is valid with all three of these
> > > combinations, so the specific kluge I was contemplating above doesn't
> > > work.  But there is no valid reason for an index to be in this state:
> > >
> > > indisvalid = true, indisready = false
> > >
> > > I suggest that to fix this for 9.2, we could abuse these flags by
> > > defining that combination as meaning "ignore this index completely",
> > > and having DROP INDEX CONCURRENTLY set this state during its final
> > > wait before removing the index.
> > >
> > >
> > Yeah, this looks much better, given our inability to add a new catalog
> > column in 9.2. Can we cheat a little though and use a value other than 0
> > and 1 for indisvalid or indisready to tell the server to interpret it
> > differently ? I would assume not, but can't see a reason unless these
> > values are converted to other types and back to boolean.
> >
> > Andres complained about the fact that many callers of RelationGetIndexList
> > are probably not ready to process invalid or not-yet-ready indexes and
> > suggested that those changes should be backpatched to even older releases.
> > But IMO we should do that with a test case that demonstrates that there is
> > indeed a bug.
> 
> I haven't yet looked deeply enough to judge whether there are actually
> bugs. But I can say that the e.g. the missing indisvalid checks in
> transformFkeyCheckAttrs makes me pretty uneasy. Vacuum not checking
> whether indexes are ready isn't nice either.

At least the former was easy enough to verify after thinking about it
for a minute (<=9.1):

CREATE TABLE clusterbug(id serial primary key, data int);
INSERT INTO clusterbug(data) VALUES(2),(2);
CREATE UNIQUE INDEX CONCURRENTLY clusterbug_data ON clusterbug(data);
CREATE TABLE clusterbug_ref(id serial primary key, clusterbug_id int references clusterbug(data));

Now a !indisready index is getting queried (strangely enough that
doesn't cause an error).

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 13:56:05

On 2012-11-27 11:52:11 +0100, Andres Freund wrote:
> On 2012-11-27 11:48:08 +0100, Andres Freund wrote:
> > On 2012-11-27 10:18:37 +0530, Pavan Deolasee wrote:
> > > On Tue, Nov 27, 2012 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > > I wrote:
> > > >
> > > > Either state of indcheckxmin is valid with all three of these
> > > > combinations, so the specific kluge I was contemplating above doesn't
> > > > work.  But there is no valid reason for an index to be in this state:
> > > >
> > > > indisvalid = true, indisready = false
> > > >
> > > > I suggest that to fix this for 9.2, we could abuse these flags by
> > > > defining that combination as meaning "ignore this index completely",
> > > > and having DROP INDEX CONCURRENTLY set this state during its final
> > > > wait before removing the index.
> > > >
> > > >
> > > Yeah, this looks much better, given our inability to add a new catalog
> > > column in 9.2. Can we cheat a little though and use a value other than 0
> > > and 1 for indisvalid or indisready to tell the server to interpret it
> > > differently ? I would assume not, but can't see a reason unless these
> > > values are converted to other types and back to boolean.
> > >
> > > Andres complained about the fact that many callers of RelationGetIndexList
> > > are probably not ready to process invalid or not-yet-ready indexes and
> > > suggested that those changes should be backpatched to even older releases.
> > > But IMO we should do that with a test case that demonstrates that there is
> > > indeed a bug.
> >
> > I haven't yet looked deeply enough to judge whether there are actually
> > bugs. But I can say that the e.g. the missing indisvalid checks in
> > transformFkeyCheckAttrs makes me pretty uneasy. Vacuum not checking
> > whether indexes are ready isn't nice either.
>
> At least the former was easy enough to verify after thinking about it
> for a minute (<=9.1):
>
> CREATE TABLE clusterbug(id serial primary key, data int);
> INSERT INTO clusterbug(data) VALUES(2),(2);
> CREATE UNIQUE INDEX CONCURRENTLY clusterbug_data ON clusterbug(data);
> CREATE TABLE clusterbug_ref(id serial primary key, clusterbug_id int references clusterbug(data));
>
> Now a !indisready index is getting queried (strangely enough that
> doesn't cause an error).

That should work in 9.2 as well, although its slightly more
complicated. You just need a indisready && !indisvalid index and the
foreign key will happily be created. Easy enough to achieve with two
sessions.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Pavan Deolasee

Date:

27 November 2012, 14:10:26

On Tue, Nov 27, 2012 at 4:22 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2012-11-27 11:48:08 +0100, Andres Freund wrote:

>
> I haven't yet looked deeply enough to judge whether there are actually
> bugs. But I can say that the e.g. the missing indisvalid checks in
> transformFkeyCheckAttrs makes me pretty uneasy. Vacuum not checking
> whether indexes are ready isn't nice either.

At least the former was easy enough to verify after thinking about it
for a minute (<=9.1):

CREATE TABLE clusterbug(id serial primary key, data int);
INSERT INTO clusterbug(data) VALUES(2),(2);
CREATE UNIQUE INDEX CONCURRENTLY clusterbug_data ON clusterbug(data);
CREATE TABLE clusterbug_ref(id serial primary key, clusterbug_id int references clusterbug(data));

Now a !indisready index is getting queried (strangely enough that
doesn't cause an error).

There might be a bug there as you are suggesting, but for me the CREATE INDEX itself fails (and rightly so) because of duplicate keys. Do I need to run these statements in different sessions ?

Thanks,

Pavan

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 14:19:23

On 2012-11-27 16:39:58 +0530, Pavan Deolasee wrote:
> On Tue, Nov 27, 2012 at 4:22 PM, Andres Freund <andres@2ndquadrant.com>wrote:
> 
> > On 2012-11-27 11:48:08 +0100, Andres Freund wrote:
> >
> > >
> > > I haven't yet looked deeply enough to judge whether there are actually
> > > bugs. But I can say that the e.g. the missing indisvalid checks in
> > > transformFkeyCheckAttrs makes me pretty uneasy. Vacuum not checking
> > > whether indexes are ready isn't nice either.
> >
> > At least the former was easy enough to verify after thinking about it
> > for a minute (<=9.1):
> >
> > CREATE TABLE clusterbug(id serial primary key, data int);
> > INSERT INTO clusterbug(data) VALUES(2),(2);
> > CREATE UNIQUE INDEX CONCURRENTLY clusterbug_data ON clusterbug(data);
> > CREATE TABLE clusterbug_ref(id serial primary key, clusterbug_id int
> > references clusterbug(data));
> >
> > Now a !indisready index is getting queried (strangely enough that
> > doesn't cause an error).
> >
> >
> There might be a bug there as you are suggesting, but for me the CREATE
> INDEX itself fails (and rightly so) because of duplicate keys. Do I need to
> run these statements in different sessions ?

Sure, the CREATE INDEX fails. The point is that we successfully can
create the foreign key afterwards anyway.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Pavan Deolasee

Date:

27 November 2012, 14:20:15

On Tue, Nov 27, 2012 at 11:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Pavan Deolasee <pavan.deolasee@gmail.com> writes:
> On Tue, Nov 27, 2012 at 9:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Either state of indcheckxmin is valid with all three of these
>> combinations, so the specific kluge I was contemplating above doesn't
>> work. But there is no valid reason for an index to be in this state:
>> indisvalid = true, indisready = false

> Yeah, this looks much better, given our inability to add a new catalog
> column in 9.2. Can we cheat a little though and use a value other than 0
> and 1 for indisvalid or indisready to tell the server to interpret it
> differently ?

No, not unless you'd like "select * from pg_index" to misbehave.
Personally, I like being able to look at catalogs.

Hmm.. I thought so. Thanks.

If we add a new column to the catalog for HEAD, I think it will be a good idea to add an "indflags" kind of column which can store a bitmap of flags. We probably don't get into this kind of situation often, but if we do, then we can more flexibility without impacting rebuilding the catalogs. My two cents.

Thanks,

Pavan

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Pavan Deolasee

Date:

27 November 2012, 14:23:45

On Tue, Nov 27, 2012 at 4:49 PM, Andres Freund <andres@2ndquadrant.com> wrote:

On 2012-11-27 16:39:58 +0530, Pavan Deolasee wrote:

> >
> There might be a bug there as you are suggesting, but for me the CREATE
> INDEX itself fails (and rightly so) because of duplicate keys. Do I need to
> run these statements in different sessions ?

Sure, the CREATE INDEX fails. The point is that we successfully can
create the foreign key afterwards anyway.

Ah OK. Got it.

Thanks,

Pavan

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Tom Lane

Date:

27 November 2012, 20:02:24

Andres Freund <andres@2ndquadrant.com> writes:
> On 2012-11-26 16:39:39 -0500, Tom Lane wrote:
>> I looked at this a bit.  I am very unhappy with the proposed kluge to
>> open indexes NoLock in some places.  Even if that's safe today, which
>> I don't believe in the least, any future change in this area could break
>> it.

> I am not happy with it either. But I failed to see a better way. Note
> that in most of the cases a lock actually is acquired shortly
> afterwards, just a ->indisvalid or an ->indisready check away.

I think you're missing the point.  The proposed patch will result in
RelationGetIndexAttrBitmap trying to do index_open() on an index that
might be getting dropped *right now* by a concurrent DROP INDEX
CONCURRENTLY.  If the DROP commits while index_open is running, its
SnapshotNow catalog searches will stop finding relevant rows.  From
the point of view of index_open, the relation data structure is then
corrupt (for example, it might not find pg_attribute entries for all
the columns) and so it will throw an error.

We need to fix things so that the final pre-deletion state of the
pg_index row tells observer backends not to even try to open the index.
(Thus, it will not matter whether they come across the pg_index row just
before or just after the deleting transaction commits --- either way,
they don't try to touch the index.)  So that has to be a different state
from the initial state used during CREATE INDEX CONCURRENTLY.

The point of not wanting to open the index NoLock (and for that matter
of having DROP INDEX CONCURRENTLY take AccessExclusiveLock once it
thinks nobody is touching the index) is to make sure that if somehow
somebody is touching the index anyway, they don't see the index's
catalog entries as corrupt.  They'll either all be there or all not.
It's only a belt-and-suspenders safety measure, but I don't want to
give it up.
        regards, tom lane

Re: Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Robert Haas

Date:

27 November 2012, 20:11:47

On Tue, Nov 27, 2012 at 12:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The point of not wanting to open the index NoLock (and for that matter
> of having DROP INDEX CONCURRENTLY take AccessExclusiveLock once it
> thinks nobody is touching the index) is to make sure that if somehow
> somebody is touching the index anyway, they don't see the index's
> catalog entries as corrupt.  They'll either all be there or all not.
> It's only a belt-and-suspenders safety measure, but I don't want to
> give it up.

+1.  There's a whole crapload of commits that I did for 9.2 with
commit messages like "improve concurrent DDL in case XYZ".  A lot of
that had to do with fixing cases where we were examining system
catalogs in unsafe ways before locks had been taken.  I didn't manage
to fix them all, unfortunately, but it's significantly better than it
used to be, and I'd really like it if we could try not to go
backwards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 20:44:12

On 2012-11-27 12:02:09 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On 2012-11-26 16:39:39 -0500, Tom Lane wrote:
> >> I looked at this a bit.  I am very unhappy with the proposed kluge to
> >> open indexes NoLock in some places.  Even if that's safe today, which
> >> I don't believe in the least, any future change in this area could break
> >> it.
>
> > I am not happy with it either. But I failed to see a better way. Note
> > that in most of the cases a lock actually is acquired shortly
> > afterwards, just a ->indisvalid or an ->indisready check away.
>
> I think you're missing the point.

True.

>  The proposed patch will result in
> RelationGetIndexAttrBitmap trying to do index_open() on an index that
> might be getting dropped *right now* by a concurrent DROP INDEX
> CONCURRENTLY.  If the DROP commits while index_open is running, its
> SnapshotNow catalog searches will stop finding relevant rows.  From
> the point of view of index_open, the relation data structure is then
> corrupt (for example, it might not find pg_attribute entries for all
> the columns) and so it will throw an error.

> We need to fix things so that the final pre-deletion state of the
> pg_index row tells observer backends not to even try to open the index.
> (Thus, it will not matter whether they come across the pg_index row just
> before or just after the deleting transaction commits --- either way,
> they don't try to touch the index.)  So that has to be a different state
> from the initial state used during CREATE INDEX CONCURRENTLY.
>
> The point of not wanting to open the index NoLock (and for that matter
> of having DROP INDEX CONCURRENTLY take AccessExclusiveLock once it
> thinks nobody is touching the index) is to make sure that if somehow
> somebody is touching the index anyway, they don't see the index's
> catalog entries as corrupt.  They'll either all be there or all not.
> It's only a belt-and-suspenders safety measure, but I don't want to
> give it up.

So the idea would be to skip about-to-be-dropped indexes in
RelationGetIndexList directly - we don't ever need to watch those, not
even for HOT - because we only have the necessary knowledge there. The
normal valid/ready checks will be done at the callsites of
RelationGetIndexList(). But those checks can be done in a locked manner
now.

Are you already working on a fix? Or shall I work on an updated patch?

I vote for introducing wrapper functions/macro to do the
about-to-be-dropped check, its hard enough to understand as-is.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Tom Lane

Date:

27 November 2012, 20:50:47

Andres Freund <andres@2ndquadrant.com> writes:
> So the idea would be to skip about-to-be-dropped indexes in
> RelationGetIndexList directly - we don't ever need to watch those, not
> even for HOT - because we only have the necessary knowledge there. The
> normal valid/ready checks will be done at the callsites of
> RelationGetIndexList(). But those checks can be done in a locked manner
> now.

Right.

> Are you already working on a fix? Or shall I work on an updated patch?

I'll work on it.

> I vote for introducing wrapper functions/macro to do the
> about-to-be-dropped check, its hard enough to understand as-is.

Meh.  If it's only going to be done in RelationGetIndexList, I'm
not sure that a wrapper macro is worth the trouble.  If we needed
the test in multiple places I'd agree, but what other spots do you
see?
        regards, tom lane

Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 20:55:15

On 2012-11-27 12:50:36 -0500, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > I vote for introducing wrapper functions/macro to do the
> > about-to-be-dropped check, its hard enough to understand as-is.
>
> Meh.  If it's only going to be done in RelationGetIndexList, I'm
> not sure that a wrapper macro is worth the trouble.  If we needed
> the test in multiple places I'd agree, but what other spots do you
> see?

I don't really see any other querying locations - but such a macro would
make the code easier backpatchable when we introduce a third column for
the about-to-be-dropped case.

Anyway, don't feel all too strongly about it.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Tom Lane

Date:

27 November 2012, 22:08:23

BTW, I was thinking that the DROP INDEX CONCURRENTLY logic needed to be:

1. Unset indisvalid, commit, wait out all reading transactions.

2. Unset indisready, commit, wait out all writing transactions.

3. Unset indislive, commit (with parent table relcache flush),
wait out all reading-or-writing transactions.

4. Drop the index.

However, I wonder whether we couldn't combine steps 2 and 3, ie once
there are no readers of the index go directly to the "dead" state.
I don't see a need for a period where the index isn't being inserted
into but is still used for HOT-safety decisions.
        regards, tom lane

Re: Problem Observed in behavior of Create Index Concurrently and Hot Update

From

Andres Freund

Date:

27 November 2012, 22:10:41

On 2012-11-27 14:08:13 -0500, Tom Lane wrote:
> BTW, I was thinking that the DROP INDEX CONCURRENTLY logic needed to be:
>
> 1. Unset indisvalid, commit, wait out all reading transactions.
>
> 2. Unset indisready, commit, wait out all writing transactions.
>
> 3. Unset indislive, commit (with parent table relcache flush),
> wait out all reading-or-writing transactions.
>
> 4. Drop the index.
>
> However, I wonder whether we couldn't combine steps 2 and 3, ie once
> there are no readers of the index go directly to the "dead" state.
> I don't see a need for a period where the index isn't being inserted
> into but is still used for HOT-safety decisions.

I think you're right, that state isn't interesting for anyone.

Greetings,

Andres Freund

--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services