Thread: INVALID index while concurrent indexing in progress?

INVALID index while concurrent indexing in progress?

From
PG Doc comments form
Date:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/sql-createindex.html
Description:

I think the INVALID index can use further explanation, in particular, from
my experience it seems like when building an index concurrently, the index
gets inserted and labeled invalid while the index is being built, and then
the label gets removed if it finishes successfully or gets left on the index
if there is a failure while building the index.  It is my current
understanding, after experimenting, that INVALID means 'incomplete', whether
that's because it's in progress or because it was unable to be completed,
but prior to my experiment my understanding was that INVALID indicated
failure.

This was especially confusing when we were adding an index to a very large
table because we assumed the INVALID index indicated failure when we
couldn't find any other sign of progress or failure.

Re: INVALID index while concurrent indexing in progress?

From
Rajakavitha Kodhandapani
Date:
Hi,

This is my first attempt at contributing to the documentation of PostgreSQL.
Here's the patch. Please let me know if any other changes need to be made.

Regards,
Rajie

On Thu, May 19, 2022 at 1:10 AM PG Doc comments form <noreply@postgresql.org> wrote:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/sql-createindex.html
Description:

I think the INVALID index can use further explanation, in particular, from
my experience it seems like when building an index concurrently, the index
gets inserted and labeled invalid while the index is being built, and then
the label gets removed if it finishes successfully or gets left on the index
if there is a failure while building the index.  It is my current
understanding, after experimenting, that INVALID means 'incomplete', whether
that's because it's in progress or because it was unable to be completed,
but prior to my experiment my understanding was that INVALID indicated
failure.

This was especially confusing when we were adding an index to a very large
table because we assumed the INVALID index indicated failure when we
couldn't find any other sign of progress or failure.
Attachment

Re: INVALID index while concurrent indexing in progress?

From
Laurenz Albe
Date:
On Thu, 2022-05-19 at 19:02 +0530, Rajakavitha Kodhandapani wrote:
> > I think the INVALID index can use further explanation, in particular, from
> > my experience it seems like when building an index concurrently, the index
> > gets inserted and labeled invalid while the index is being built, and then
> > the label gets removed if it finishes successfully or gets left on the index
> > if there is a failure while building the index.  It is my current
> > understanding, after experimenting, that INVALID means 'incomplete', whether
> > that's because it's in progress or because it was unable to be completed,
> > but prior to my experiment my understanding was that INVALID indicated
> > failure.
> > 
> > This was especially confusing when we were adding an index to a very large
> > table because we assumed the INVALID index indicated failure when we
> > couldn't find any other sign of progress or failure.
>
> This is my first attempt at contributing to the documentation of PostgreSQL.
> Here's the patch. Please let me know if any other changes need to be made.

Thank you!  Please send patches as plain text and use bottom posting.

> \cf3 @@ -665,11 +665,14 @@\cf5  Indexes:\
>     <para>\
>      Another caveat when building a unique index concurrently is that the\
>      uniqueness constraint is already being enforced against other transactions\
> \cf6 -    when the second table scan begins.  This means that constraint violations\cf5 \
> \cf4 +    when the second table scan begins. This means that constraint violations\cf5 \
>      could be reported in other queries prior to the index becoming available\
> \cf6 -    for use, or even in cases where the index build eventually fails.  Also,\cf5 \
> \cf6 -    if a failure does occur in the second scan, the <quote>invalid</quote> index\cf5 \
> \cf6 -    continues to enforce its uniqueness constraint afterwards.\cf5 \
> \cf4 +    for use, or even in cases where the index build eventually fails. The index\cf5 \
> \cf4 +    is inserted and labeled <quote>invalid</quote> while the index is being built,\cf5 \
> \cf4 +    and then the label is removed if the index builds successfully. If the index does\cf5 \
> \cf4 +    not build successfully, then the label <quote>invalid</quote> remains. Also, if a\cf5 \
> }

I don't think that this information should be added to a paragraph that
focuses on uniqueness checks in concurrent index builds.

Actually, most of the information is already there.  To quote from the page:

  If a problem arises while scanning the table, such as a deadlock or a uniqueness violation
  in a unique index, the CREATE INDEX command will fail but leave behind an “invalid” index.

How about the following patch to emphasize the role of "invalid":

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index d3102a87d9..fee2c61e5e 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -622,7 +622,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
    </para>
 
    <para>
-    In a concurrent index build, the index is actually entered into
+    In a concurrent index build, the index is actually entered as <quote>invalid</quote> index into
     the system catalogs in one transaction, then two table scans occur in
     two more transactions.  Before each table scan, the index build must
     wait for existing transactions that have modified the table to terminate.
@@ -631,7 +631,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     scan to terminate, including transactions used by any phase of concurrent
     index builds on other tables, if the indexes involved are partial or have
     columns that are not simple column references.
-    Then finally the index can be marked ready for use,
+    Then finally the index can be marked <quote>valid</quote> and ready for use,
     and the <command>CREATE INDEX</command> command terminates.
     Even then, however, the index may not be immediately usable for queries:
     in the worst case, it cannot be used as long as transactions exist that

Yours,
Laurenz Albe



Re: INVALID index while concurrent indexing in progress?

From
Lauren Fliksteen
Date:
Thank you both! I think Laurenz’s changes make perfect sense!

Sent from my iPhone

> On May 19, 2022, at 8:37 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Thu, 2022-05-19 at 19:02 +0530, Rajakavitha Kodhandapani wrote:
>>> I think the INVALID index can use further explanation, in particular, from
>>> my experience it seems like when building an index concurrently, the index
>>> gets inserted and labeled invalid while the index is being built, and then
>>> the label gets removed if it finishes successfully or gets left on the index
>>> if there is a failure while building the index.  It is my current
>>> understanding, after experimenting, that INVALID means 'incomplete', whether
>>> that's because it's in progress or because it was unable to be completed,
>>> but prior to my experiment my understanding was that INVALID indicated
>>> failure.
>>>
>>> This was especially confusing when we were adding an index to a very large
>>> table because we assumed the INVALID index indicated failure when we
>>> couldn't find any other sign of progress or failure.
>>
>> This is my first attempt at contributing to the documentation of PostgreSQL.
>> Here's the patch. Please let me know if any other changes need to be made.
>
> Thank you!  Please send patches as plain text and use bottom posting.
>
>> \cf3 @@ -665,11 +665,14 @@\cf5  Indexes:\
>>    <para>\
>>     Another caveat when building a unique index concurrently is that the\
>>     uniqueness constraint is already being enforced against other transactions\
>> \cf6 -    when the second table scan begins.  This means that constraint violations\cf5 \
>> \cf4 +    when the second table scan begins. This means that constraint violations\cf5 \
>>     could be reported in other queries prior to the index becoming available\
>> \cf6 -    for use, or even in cases where the index build eventually fails.  Also,\cf5 \
>> \cf6 -    if a failure does occur in the second scan, the <quote>invalid</quote> index\cf5 \
>> \cf6 -    continues to enforce its uniqueness constraint afterwards.\cf5 \
>> \cf4 +    for use, or even in cases where the index build eventually fails. The index\cf5 \
>> \cf4 +    is inserted and labeled <quote>invalid</quote> while the index is being built,\cf5 \
>> \cf4 +    and then the label is removed if the index builds successfully. If the index does\cf5 \
>> \cf4 +    not build successfully, then the label <quote>invalid</quote> remains. Also, if a\cf5 \
>> }
>
> I don't think that this information should be added to a paragraph that
> focuses on uniqueness checks in concurrent index builds.
>
> Actually, most of the information is already there.  To quote from the page:
>
>  If a problem arises while scanning the table, such as a deadlock or a uniqueness violation
>  in a unique index, the CREATE INDEX command will fail but leave behind an “invalid” index.
>
> How about the following patch to emphasize the role of "invalid":
>
> diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
> index d3102a87d9..fee2c61e5e 100644
> --- a/doc/src/sgml/ref/create_index.sgml
> +++ b/doc/src/sgml/ref/create_index.sgml
> @@ -622,7 +622,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
>    </para>
>
>    <para>
> -    In a concurrent index build, the index is actually entered into
> +    In a concurrent index build, the index is actually entered as <quote>invalid</quote> index into
>     the system catalogs in one transaction, then two table scans occur in
>     two more transactions.  Before each table scan, the index build must
>     wait for existing transactions that have modified the table to terminate.
> @@ -631,7 +631,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
>     scan to terminate, including transactions used by any phase of concurrent
>     index builds on other tables, if the indexes involved are partial or have
>     columns that are not simple column references.
> -    Then finally the index can be marked ready for use,
> +    Then finally the index can be marked <quote>valid</quote> and ready for use,
>     and the <command>CREATE INDEX</command> command terminates.
>     Even then, however, the index may not be immediately usable for queries:
>     in the worst case, it cannot be used as long as transactions exist that
>
> Yours,
> Laurenz Albe



Re: INVALID index while concurrent indexing in progress?

From
Rajakavitha Kodhandapani
Date:
Thank you, Laurenz. The changes that you suggested make a lot more sense.
I will make the updates and submit the changes.

Regards,
Rajie


On Thu, May 19, 2022 at 9:45 PM Lauren Fliksteen <dancernerd32@gmail.com> wrote:
Thank you both! I think Laurenz’s changes make perfect sense!

Sent from my iPhone

> On May 19, 2022, at 8:37 AM, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Thu, 2022-05-19 at 19:02 +0530, Rajakavitha Kodhandapani wrote:
>>> I think the INVALID index can use further explanation, in particular, from
>>> my experience it seems like when building an index concurrently, the index
>>> gets inserted and labeled invalid while the index is being built, and then
>>> the label gets removed if it finishes successfully or gets left on the index
>>> if there is a failure while building the index.  It is my current
>>> understanding, after experimenting, that INVALID means 'incomplete', whether
>>> that's because it's in progress or because it was unable to be completed,
>>> but prior to my experiment my understanding was that INVALID indicated
>>> failure.
>>>
>>> This was especially confusing when we were adding an index to a very large
>>> table because we assumed the INVALID index indicated failure when we
>>> couldn't find any other sign of progress or failure.
>>
>> This is my first attempt at contributing to the documentation of PostgreSQL.
>> Here's the patch. Please let me know if any other changes need to be made.
>
> Thank you!  Please send patches as plain text and use bottom posting.
>
>> \cf3 @@ -665,11 +665,14 @@\cf5  Indexes:\
>>    <para>\
>>     Another caveat when building a unique index concurrently is that the\
>>     uniqueness constraint is already being enforced against other transactions\
>> \cf6 -    when the second table scan begins.  This means that constraint violations\cf5 \
>> \cf4 +    when the second table scan begins. This means that constraint violations\cf5 \
>>     could be reported in other queries prior to the index becoming available\
>> \cf6 -    for use, or even in cases where the index build eventually fails.  Also,\cf5 \
>> \cf6 -    if a failure does occur in the second scan, the <quote>invalid</quote> index\cf5 \
>> \cf6 -    continues to enforce its uniqueness constraint afterwards.\cf5 \
>> \cf4 +    for use, or even in cases where the index build eventually fails. The index\cf5 \
>> \cf4 +    is inserted and labeled <quote>invalid</quote> while the index is being built,\cf5 \
>> \cf4 +    and then the label is removed if the index builds successfully. If the index does\cf5 \
>> \cf4 +    not build successfully, then the label <quote>invalid</quote> remains. Also, if a\cf5 \
>> }
>
> I don't think that this information should be added to a paragraph that
> focuses on uniqueness checks in concurrent index builds.
>
> Actually, most of the information is already there.  To quote from the page:
>
>  If a problem arises while scanning the table, such as a deadlock or a uniqueness violation
>  in a unique index, the CREATE INDEX command will fail but leave behind an “invalid” index.
>
> How about the following patch to emphasize the role of "invalid":
>
> diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
> index d3102a87d9..fee2c61e5e 100644
> --- a/doc/src/sgml/ref/create_index.sgml
> +++ b/doc/src/sgml/ref/create_index.sgml
> @@ -622,7 +622,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
>    </para>
>
>    <para>
> -    In a concurrent index build, the index is actually entered into
> +    In a concurrent index build, the index is actually entered as <quote>invalid</quote> index into
>     the system catalogs in one transaction, then two table scans occur in
>     two more transactions.  Before each table scan, the index build must
>     wait for existing transactions that have modified the table to terminate.
> @@ -631,7 +631,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
>     scan to terminate, including transactions used by any phase of concurrent
>     index builds on other tables, if the indexes involved are partial or have
>     columns that are not simple column references.
> -    Then finally the index can be marked ready for use,
> +    Then finally the index can be marked <quote>valid</quote> and ready for use,
>     and the <command>CREATE INDEX</command> command terminates.
>     Even then, however, the index may not be immediately usable for queries:
>     in the worst case, it cannot be used as long as transactions exist that
>
> Yours,
> Laurenz Albe

Re: INVALID index while concurrent indexing in progress?

From
Alvaro Herrera
Date:
On 2022-May-19, Rajakavitha Kodhandapani wrote:

> Thank you, Laurenz. The changes that you suggested make a lot more sense.
> I will make the updates and submit the changes.

Hmm, but they're already submitted.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"



Re: INVALID index while concurrent indexing in progress?

From
Rajakavitha Kodhandapani
Date:
Hi Alvaro,

I am new to PostgreSQL and still in the process of getting familiar with the workflow.
Please let me know if I am missing something?

Submitting patch updates

When submitting a new version of a previously submitted patch, you should do a few additional things:

  • Uniquely identify the new version. You can use git format-patch -vN, incrementing N each time; or you can add an incrementing numerical suffix manually. Using the ".patch" extension allows some reviewers to more easily read it in their email client/code editor.
  • Make sure it's easy to find any earlier discussion of the patch, by providing Message-Id-based links to the mailing list posts. Don't expect that everyone will still be able to find previous submissions on their own. You can usually get the message ID of your email by looking for the header Message-Id in your email client. Try View message source or View original if it's not obviously visible.

Regards,
Rajie

On Thu, May 19, 2022 at 10:06 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-May-19, Rajakavitha Kodhandapani wrote:

> Thank you, Laurenz. The changes that you suggested make a lot more sense.
> I will make the updates and submit the changes.

Hmm, but they're already submitted.

--
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

Re: INVALID index while concurrent indexing in progress?

From
Laurenz Albe
Date:
On Fri, 2022-05-20 at 10:14 +0530, Rajakavitha Kodhandapani wrote:
> On Thu, May 19, 2022 at 10:06 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > On 2022-May-19, Rajakavitha Kodhandapani wrote:
> > 
> > > Thank you, Laurenz. The changes that you suggested make a lot more sense.
> > > I will make the updates and submit the changes.
> > 
> > Hmm, but they're already submitted.
> > 
> I am new to PostgreSQL and still in the process of getting familiar with the workflow.
> Please let me know if I am missing something?
> My understanding was based on: https://wiki.postgresql.org/wiki/Submitting_a_Patch

Alvaro meant that my message already contained a patch, so there is no need to
send another one, unless you have a different suggestion.
Added value could be a suggested commit message with "git format-patch" that could
make the committer's job easier.

One thing you could do to improve the conversation is to adhere to the project style
of adding your responses *below* the quoted text that you respond to (and prune away
unnecessary parts), like I am doing here.  I had to reformat your message for that
purpose.  That makes it much easier to follow the thread of the conversation.
Consider that these e-mails are archived for posterity.

What you could do to help getting this committed is to add a commitfest entry
for the patch, so that it doesn't get forgotten.  Unfortunately the commitfest
application has trouble finding a thread on -docs, so you may have to send a
reply in this thread to -hackers to make that work.

Yours,
Laurenz Albe 



Re: INVALID index while concurrent indexing in progress?

From
Alvaro Herrera
Date:
On 2022-May-20, Laurenz Albe wrote:

> Alvaro meant that my message already contained a patch, so there is no need to
> send another one, unless you have a different suggestion.

Right.  Being old-school, I can just save the email to a plain text file
and do "patch < /tmp/emailfile".  'patch' has smarts to detect
irrelevant parts of the message, so the complete email works fine as a
patch.

> Added value could be a suggested commit message with "git
> format-patch" that could make the committer's job easier.

Right -- especially so if it comes with Author/Discussion/etc lines.  (I
don't think I've seen any patch submitter do that.)

Anyway, I pushed the patch.  I appreciate your willingness to
collaborate and look forward to your future participation, for sure.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"World domination is proceeding according to plan"        (Andrew Morton)



Re: INVALID index while concurrent indexing in progress?

From
Laurenz Albe
Date:
On Fri, 2022-05-20 at 10:30 +0200, Alvaro Herrera wrote:
> 
> Anyway, I pushed the patch.  I appreciate your willingness to
> collaborate and look forward to your future participation, for sure.

Thanks!

Laurenz Albe



Re: INVALID index while concurrent indexing in progress?

From
Rajakavitha Kodhandapani
Date:

> Anyway, I pushed the patch.  I appreciate your willingness to
> collaborate and look forward to your future participation, for sure.

Thank you Laurenz and  Alvaro.

Regards,
Rajie

On Fri, May 20, 2022 at 3:55 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2022-05-20 at 10:30 +0200, Alvaro Herrera wrote:
>
> Anyway, I pushed the patch.  I appreciate your willingness to
> collaborate and look forward to your future participation, for sure.

Thanks!

Laurenz Albe