Thread: doc: BRIN indexes and autosummarize

doc: BRIN indexes and autosummarize

From
Roberto Mello
Date:
Here's a patch to clarify the BRIN indexes documentation, particularly with regards
to autosummarize, vacuum and autovacuum. It basically breaks down a big blob of a
paragraph into multiple paragraphs for clarity, plus explicitly tells how summarization
happens manually or automatically.

I also added cross-references to various relevant sections, including the create index
page.

On this topic... I'm not familiar with with the internals of BRIN indexes and in
backend/access/common/reloptions.c I see:

        {
            "autosummarize",
            "Enables automatic summarization on this BRIN index",
            RELOPT_KIND_BRIN,
            AccessExclusiveLock
        },

Is the exclusive lock on the index why autosummarize is off by default?

What would be the downside (if any) of having autosummarize=on by default?

Roberto

--
Crunchy Data - passion for open source PostgreSQL
Attachment

Re: doc: BRIN indexes and autosummarize

From
Justin Pryzby
Date:
On Tue, Jun 28, 2022 at 05:22:34PM -0600, Roberto Mello wrote:
> Here's a patch to clarify the BRIN indexes documentation, particularly with
> regards to autosummarize, vacuum and autovacuum. It basically breaks down a
> big blob of a paragraph into multiple paragraphs for clarity, plus explicitly
> tells how summarization happens manually or automatically.

See also this older thread
https://www.postgresql.org/message-id/flat/20220224193520.GY9008@telsasoft.com

-- 
Justin



Re: doc: BRIN indexes and autosummarize

From
Alvaro Herrera
Date:
On 2022-Jun-28, Roberto Mello wrote:

> Here's a patch to clarify the BRIN indexes documentation, particularly with
> regards to autosummarize, vacuum and autovacuum. It basically breaks
> down a big blob of a paragraph into multiple paragraphs for clarity,
> plus explicitly tells how summarization happens manually or
> automatically.

[Some of] these additions are wrong actually.  It says that autovacuum
will not summarize new entries; but it does.  If you just let the table
sit idle, any autovacuum run that cleans the table will also summarize
any ranges that need summarization.

What 'autosummarization=off' means is that the behavior to trigger an
immediate summarization of a range once it becomes full is not default.
This is very different.

As for the new <para></para>s that you added, I'd say they're
stylistically wrong.  Each paragraph is supposed to be one fully
contained idea; what these tags do is split each idea across several
smaller paragraphs.  This is likely subjective though.

> On this topic... I'm not familiar with with the internals of BRIN
> indexes and in backend/access/common/reloptions.c I see:
> 
>         {
>             "autosummarize",
>             "Enables automatic summarization on this BRIN index",
>             RELOPT_KIND_BRIN,
>             AccessExclusiveLock
>         },
> 
> Is the exclusive lock on the index why autosummarize is off by default?

No.  The lock level mentioned here is what needs to be taken in order to
change the value of this option.

> What would be the downside (if any) of having autosummarize=on by default?

I'm not aware of any.  Maybe we should turn it on by default.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: doc: BRIN indexes and autosummarize

From
Alvaro Herrera
Date:
What about this?

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Java is clearly an example of money oriented programming"  (A. Stepanov)

Attachment

Re: doc: BRIN indexes and autosummarize

From
Jaime Casanova
Date:
On Mon, Jul 04, 2022 at 09:38:42PM +0200, Alvaro Herrera wrote:
> What about this?
> 
> -- 
> Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
> "Java is clearly an example of money oriented programming"  (A. Stepanov)

> diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
> index caf1ea4cef..0a715d41c7 100644
> --- a/doc/src/sgml/brin.sgml
> +++ b/doc/src/sgml/brin.sgml
> @@ -73,31 +73,55 @@
>     summarized range, that range does not automatically acquire a summary
>     tuple; those tuples remain unsummarized until a summarization run is
>     invoked later, creating initial summaries.
> -   This process can be invoked manually using the
> -   <function>brin_summarize_range(regclass, bigint)</function> or
> -   <function>brin_summarize_new_values(regclass)</function> functions;
> -   automatically when <command>VACUUM</command> processes the table;
> -   or by automatic summarization executed by autovacuum, as insertions
> -   occur.  (This last trigger is disabled by default and can be enabled
> -   with the <literal>autosummarize</literal> parameter.)
> -   Conversely, a range can be de-summarized using the
> -   <function>brin_desummarize_range(regclass, bigint)</function> function,
> -   which is useful when the index tuple is no longer a very good
> -   representation because the existing values have changed.
> +  </para>
> +

I feel that somewhere in this paragraph it should be mentioned that is
off by default.

otherwise, +1

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL



Re: doc: BRIN indexes and autosummarize

From
Justin Pryzby
Date:
On Mon, Jul 04, 2022 at 09:38:42PM +0200, Alvaro Herrera wrote:
> +   There are several triggers for initial summarization of a page range
> +   to occur.  If the table is vacuumed, either because
> +   <xref linkend="sql-vacuum" /> has been manually invoked or because
> +   autovacuum causes it,
> +   all existing unsummarized page ranges are summarized.

I'd say "If the table is vacuumed manually or by autovacuum, ..."
(Or "either manually or by autovacuum, ...")

> +   Also, if the index has the
> +   <xref linkend="index-reloption-autosummarize"/> parameter set to on,

Maybe say "If the autovacuum parameter is enabled" (this may avoid needing to
revise it later if we change the default).

> +   then any run of autovacuum in the database will summarize all

I'd avoid saying "run" and instead say "then anytime autovacuum runs in that
database, all ..."

> +   unsummarized page ranges that have been completely filled recently,
> +   regardless of whether the table is processed by autovacuum for other
> +   reasons; see below.

say "whether the table itself" and remove "for other reasons" ?

>    <para>
>     When autosummarization is enabled, each time a page range is filled a

Maybe: filled comma

> -   request is sent to autovacuum for it to execute a targeted summarization
> -   for that range, to be fulfilled at the end of the next worker run on the
> -   same database.  If the request queue is full, the request is not recorded
> -   and a message is sent to the server log:
> +   request is sent to <literal>autovacuum</literal> for it to execute a targeted
> +   summarization for that range, to be fulfilled at the end of the next
> +   autovacuum worker run on the same database. If the request queue is full, the

"to be fulfilled the next time an autovacuum worker finishes running in that
database."

or

"to be fulfilled by an autovacuum worker the next it finishes running in that
database."

> +++ b/doc/src/sgml/ref/create_index.sgml
> @@ -580,6 +580,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
>      <para>
>       Defines whether a summarization run is invoked for the previous page
>       range whenever an insertion is detected on the next one.
> +     See <xref linkend="brin-operation"/> for more details.
> +     The default is <literal>off</literal>.

Maybe "invoked" should say "queued" ?

Also, a reminder that this was never addressed (I wish the project had a way to
keep track of known issues).

https://www.postgresql.org/message-id/20201113160007.GQ30691@telsasoft.com
|error_severity of brin work item
|left     | could not open relation with OID 292103095
|left     | processing work entry for relation "ts.child.alarms_202010_alarm_clear_time_idx"
|Those happen following a REINDEX job on that index.

This inline patch includes my changes as well as yours.
And the attached patch is my changes only.

diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml
index caf1ea4cef1..90897a4af07 100644
--- a/doc/src/sgml/brin.sgml
+++ b/doc/src/sgml/brin.sgml
@@ -73,31 +73,55 @@
    summarized range, that range does not automatically acquire a summary
    tuple; those tuples remain unsummarized until a summarization run is
    invoked later, creating initial summaries.
-   This process can be invoked manually using the
-   <function>brin_summarize_range(regclass, bigint)</function> or
-   <function>brin_summarize_new_values(regclass)</function> functions;
-   automatically when <command>VACUUM</command> processes the table;
-   or by automatic summarization executed by autovacuum, as insertions
-   occur.  (This last trigger is disabled by default and can be enabled
-   with the <literal>autosummarize</literal> parameter.)
-   Conversely, a range can be de-summarized using the
-   <function>brin_desummarize_range(regclass, bigint)</function> function,
-   which is useful when the index tuple is no longer a very good
-   representation because the existing values have changed.
   </para>
 
   <para>
-   When autosummarization is enabled, each time a page range is filled a
-   request is sent to autovacuum for it to execute a targeted summarization
-   for that range, to be fulfilled at the end of the next worker run on the
-   same database.  If the request queue is full, the request is not recorded
-   and a message is sent to the server log:
+   There are several ways to trigger the initial summarization of a page range.
+   If the table is vacuumed, either manually or by
+   <link linkend="autovacuum">autovacuum</link>,
+   all existing unsummarized page ranges are summarized.
+   Also, if the index's
+   <xref linkend="index-reloption-autosummarize"/> parameter is enabled,
+   whenever autovacuum runs in that database, summarization will
+   occur for all
+   unsummarized page ranges that have been filled,
+   regardless of whether the table itself is processed by autovacuum; see below.
+
+   Lastly, the following functions can be used:
+
+   <simplelist>
+    <member>
+     <function>brin_summarize_range(regclass, bigint)</function>
+     summarizes all unsummarized ranges
+    </member>
+    <member>
+     <function>brin_summarize_new_values(regclass)</function>
+     summarizes one specific range, if it is unsummarized
+    </member>
+   </simplelist>
+  </para>
+
+  <para>
+   When autosummarization is enabled, each time a page range is filled, a
+   request is sent to <literal>autovacuum</literal> to execute a targeted
+   summarization for that range, to be fulfilled the next time an autovacuum
+   worker finishes running in that database. If the request queue is full, the
+   request is not recorded and a message is sent to the server log:
 <screen>
 LOG:  request for BRIN range summarization for index "brin_wi_idx" page 128 was not recorded
 </screen>
    When this happens, the range will be summarized normally during the next
    regular vacuum of the table.
   </para>
+
+  <para>
+   Conversely, a range can be de-summarized using the
+   <function>brin_desummarize_range(regclass, bigint)</function> function,
+   which is useful when the index tuple is no longer a very good
+   representation because the existing values have changed.
+   See <xref linkend="functions-admin-index"/> for details.
+  </para>
+
  </sect2>
 </sect1>
 
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 9ffcdc629e6..a5bac9f7373 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -578,8 +578,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class=
     </term>
     <listitem>
     <para>
-     Defines whether a summarization run is invoked for the previous page
+     Defines whether a summarization run is queued for the previous page
      range whenever an insertion is detected on the next one.
+     See <xref linkend="brin-operation"/> for more details.
+     The default is <literal>off</literal>.
     </para>
     </listitem>
    </varlistentry>

Attachment

Re: doc: BRIN indexes and autosummarize

From
Alvaro Herrera
Date:
On 2022-Jul-04, Jaime Casanova wrote:

> I feel that somewhere in this paragraph it should be mentioned that is
> off by default.

OK, I added it.

On 2022-Jul-04, Justin Pryzby wrote:

> [ lots of comments ]

OK, I have adopted all your proposed changes, thanks for submitting in
both forms.  I did some more wordsmithing and pushed, to branches 12 and
up.  11 fails 'make check', I think for lack of Docbook id tags, and I
didn't want to waste more time.  Kindly re-read the result and let me
know if I left something unaddressed, or made something worse.  The
updated text is already visible in the website:
https://www.postgresql.org/docs/devel/brin-intro.html

(Having almost-immediate doc refreshes is an enormous improvement.
Thanks Magnus.)

> Also, a reminder that this was never addressed (I wish the project had a way to
> keep track of known issues).
> 
> https://www.postgresql.org/message-id/20201113160007.GQ30691@telsasoft.com
> |error_severity of brin work item

Yeah, I've not forgotten that item.  I can't promise I'll get it fixed
soon, but it's on my list.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)



Re: doc: BRIN indexes and autosummarize

From
Roberto Mello
Date:
On Mon, Jul 4, 2022 at 9:20 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

[Some of] these additions are wrong actually.  It says that autovacuum
will not summarize new entries; but it does.  If you just let the table
sit idle, any autovacuum run that cleans the table will also summarize
any ranges that need summarization.

What 'autosummarization=off' means is that the behavior to trigger an
immediate summarization of a range once it becomes full is not default.
This is very different.

Without having read through the code, I'll take your word for it. I simply went with what was written on this phrase of the docs:

"or by automatic summarization executed by autovacuum, as insertions occur. (This last trigger is disabled by default and can be enabled with the autosummarize parameter.)"

To me this did not indicate a third behavior, which is what you are describing, so I'm glad we're having this discussion to clarify it.

As for the new <para></para>s that you added, I'd say they're
stylistically wrong.  Each paragraph is supposed to be one fully
contained idea; what these tags do is split each idea across several
smaller paragraphs.  This is likely subjective though.

While I don't disagree with you, readability is more important. We have lots of places (such as that one on the docs) where we have a big blob of text, reducing readability, IMHO. In the source they are broken by new lines, but in the rendered HTML, which is what the vast majority of people read, they get rendered into a big blob-looking-thing.

> What would be the downside (if any) of having autosummarize=on by default?

I'm not aware of any.  Maybe we should turn it on by default.

 +1

Thanks for looking at this Alvaro.

Roberto
--
Cunchy Data -- passion for open source PostgreSQL

Re: doc: BRIN indexes and autosummarize

From
Roberto Mello
Date:
On Tue, Jul 5, 2022 at 5:47 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
OK, I have adopted all your proposed changes, thanks for submitting in
both forms.  I did some more wordsmithing and pushed, to branches 12 and
up.  11 fails 'make check', I think for lack of Docbook id tags, and I
didn't want to waste more time.  Kindly re-read the result and let me
know if I left something unaddressed, or made something worse.  The
updated text is already visible in the website:
https://www.postgresql.org/docs/devel/brin-intro.html

You removed the reference to the functions' documentation at
functions-admin-index choosing instead to duplicate a summarized
version of the docs, and to boot getting the next block to be blobbed
together with it.

Keeping with the reduced-readability theme, you made the paragraphs
even bigger. While I do appreciate the time to clarify things a bit, as was
my original intent with the patch,

We should be writing documentation with the user in mind, not for our
developer eyes. Different target audiences. It is less helpful to have
awesome features that don't get used because users can't really
grasp the docs.

Paragraphs such as this feel like we're playing "summary bingo":

     When a new page is created that does not fall within the last
     summarized range, the range that the new page belongs into
     does not automatically acquire a summary tuple;
     those tuples remain unsummarized until a summarization run is
     invoked later, creating the initial summary for that range

Roberto

--
Crunchy Data -- passion for open source PostgreSQL

Re: doc: BRIN indexes and autosummarize

From
Alvaro Herrera
Date:
On 2022-Jul-05, Roberto Mello wrote:

> You removed the reference to the functions' documentation at
> functions-admin-index choosing instead to duplicate a summarized
> version of the docs, and to boot getting the next block to be blobbed
> together with it.

Actually, my first instinct was to move the interesting parts to the
functions docs, then reference those, removing the duplicate bits.  But
I was discouraged when I read it, because it is just a table in a place
not really appropriate for a larger discussion on it.  Also, a reference
to it is not direct, but rather it goes to a table that contains a lot
of other stuff.

> Keeping with the reduced-readability theme, you made the paragraphs
> even bigger. While I do appreciate the time to clarify things a bit, as was
> my original intent with the patch, [...]

Hmm, which paragraph are you referring to?  I'm not aware of having made
any paragraph bigger, quite the opposite.  In the original text, the
paragraph "At the time of creation," is 13 lines on a browser window
that is half the screen; in the patched text, that has been replaced by
three paragraphs that are 7, 6, and 4 lines long, plus a separate one
for the de-summarization bits at the end of the page, which is 3 lines
long.

> We should be writing documentation with the user in mind, not for our
> developer eyes. Different target audiences. It is less helpful to have
> awesome features that don't get used because users can't really
> grasp the docs.

I try to do that.  I guess I fail more frequently that I should.

> Paragraphs such as this feel like we're playing "summary bingo":
> 
>      When a new page is created that does not fall within the last
>      summarized range, the range that the new page belongs into
>      does not automatically acquire a summary tuple;
>      those tuples remain unsummarized until a summarization run is
>      invoked later, creating the initial summary for that range

Yeah, I am aware that the word "summary" and variations occur way too
many times.  Maybe it is possible to replace "summary tuple" with "BRIN
tuple" for example; can you propose some synonym for "summarized" and
"unsummarized"?  Perhaps something like this:

>      When a new page is created that does not fall within the last
>      summarized range, the range that the new page belongs into
>      does not automatically acquire a BRIN tuple;
>      those [pages] remain uncovered by the BRIN index until a summarization run is
>      invoked later, creating the initial BRIN tuple for that range

(I also replaced the word "tuples" with "pages" in one spot.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Hay dos momentos en la vida de un hombre en los que no debería
especular: cuando puede permitírselo y cuando no puede" (Mark Twain)



Re: doc: BRIN indexes and autosummarize

From
Justin Pryzby
Date:
On Tue, Jul 05, 2022 at 01:47:27PM +0200, Alvaro Herrera wrote:
> OK, I have adopted all your proposed changes, thanks for submitting in
> both forms.  I did some more wordsmithing and pushed, to branches 12 and
> up.  11 fails 'make check', I think for lack of Docbook id tags, and I
> didn't want to waste more time.  Kindly re-read the result and let me
> know if I left something unaddressed, or made something worse.  The
> updated text is already visible in the website:
> https://www.postgresql.org/docs/devel/brin-intro.html

One issue:

+   summarized range, the range that the new page belongs into
+   does not automatically acquire a summary tuple;

"belongs into" sounds wrong - "belongs to" is better.

I'll put that change into my "typos" branch to fix later if it's not addressed
in this thread.

-- 
Justin



Re: doc: BRIN indexes and autosummarize

From
Alvaro Herrera
Date:
On 2022-Jul-05, Justin Pryzby wrote:

> One issue:
> 
> +   summarized range, the range that the new page belongs into
> +   does not automatically acquire a summary tuple;
> 
> "belongs into" sounds wrong - "belongs to" is better.

Hah, and I was wondering if "belongs in" was any better.

> I'll put that change into my "typos" branch to fix later if it's not addressed
> in this thread.

Roberto has some more substantive comments on the new text, so let's try
and fix everything together.  This time, I'll let you guys come up with
a new patch.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/