Thread: doc: BRIN indexes and autosummarize
Here's a patch to clarify the BRIN indexes documentation, particularly with regards
{
"autosummarize",
"Enables automatic summarization on this BRIN index",
RELOPT_KIND_BRIN,
AccessExclusiveLock
},
to autosummarize, vacuum and autovacuum. It basically breaks down a big blob of a
paragraph into multiple paragraphs for clarity, plus explicitly tells how summarization
happens manually or automatically.
I also added cross-references to various relevant sections, including the create index
page.
On this topic... I'm not familiar with with the internals of BRIN indexes and in
backend/access/common/reloptions.c I see:
{
"autosummarize",
"Enables automatic summarization on this BRIN index",
RELOPT_KIND_BRIN,
AccessExclusiveLock
},
Is the exclusive lock on the index why autosummarize is off by default?
What would be the downside (if any) of having autosummarize=on by default?
Roberto
--
Crunchy Data - passion for open source PostgreSQL
Attachment
On Tue, Jun 28, 2022 at 05:22:34PM -0600, Roberto Mello wrote: > Here's a patch to clarify the BRIN indexes documentation, particularly with > regards to autosummarize, vacuum and autovacuum. It basically breaks down a > big blob of a paragraph into multiple paragraphs for clarity, plus explicitly > tells how summarization happens manually or automatically. See also this older thread https://www.postgresql.org/message-id/flat/20220224193520.GY9008@telsasoft.com -- Justin
On 2022-Jun-28, Roberto Mello wrote: > Here's a patch to clarify the BRIN indexes documentation, particularly with > regards to autosummarize, vacuum and autovacuum. It basically breaks > down a big blob of a paragraph into multiple paragraphs for clarity, > plus explicitly tells how summarization happens manually or > automatically. [Some of] these additions are wrong actually. It says that autovacuum will not summarize new entries; but it does. If you just let the table sit idle, any autovacuum run that cleans the table will also summarize any ranges that need summarization. What 'autosummarization=off' means is that the behavior to trigger an immediate summarization of a range once it becomes full is not default. This is very different. As for the new <para></para>s that you added, I'd say they're stylistically wrong. Each paragraph is supposed to be one fully contained idea; what these tags do is split each idea across several smaller paragraphs. This is likely subjective though. > On this topic... I'm not familiar with with the internals of BRIN > indexes and in backend/access/common/reloptions.c I see: > > { > "autosummarize", > "Enables automatic summarization on this BRIN index", > RELOPT_KIND_BRIN, > AccessExclusiveLock > }, > > Is the exclusive lock on the index why autosummarize is off by default? No. The lock level mentioned here is what needs to be taken in order to change the value of this option. > What would be the downside (if any) of having autosummarize=on by default? I'm not aware of any. Maybe we should turn it on by default. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
What about this? -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Java is clearly an example of money oriented programming" (A. Stepanov)
Attachment
On Mon, Jul 04, 2022 at 09:38:42PM +0200, Alvaro Herrera wrote: > What about this? > > -- > Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ > "Java is clearly an example of money oriented programming" (A. Stepanov) > diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml > index caf1ea4cef..0a715d41c7 100644 > --- a/doc/src/sgml/brin.sgml > +++ b/doc/src/sgml/brin.sgml > @@ -73,31 +73,55 @@ > summarized range, that range does not automatically acquire a summary > tuple; those tuples remain unsummarized until a summarization run is > invoked later, creating initial summaries. > - This process can be invoked manually using the > - <function>brin_summarize_range(regclass, bigint)</function> or > - <function>brin_summarize_new_values(regclass)</function> functions; > - automatically when <command>VACUUM</command> processes the table; > - or by automatic summarization executed by autovacuum, as insertions > - occur. (This last trigger is disabled by default and can be enabled > - with the <literal>autosummarize</literal> parameter.) > - Conversely, a range can be de-summarized using the > - <function>brin_desummarize_range(regclass, bigint)</function> function, > - which is useful when the index tuple is no longer a very good > - representation because the existing values have changed. > + </para> > + I feel that somewhere in this paragraph it should be mentioned that is off by default. otherwise, +1 -- Jaime Casanova Director de Servicios Profesionales SystemGuards - Consultores de PostgreSQL
On Mon, Jul 04, 2022 at 09:38:42PM +0200, Alvaro Herrera wrote: > + There are several triggers for initial summarization of a page range > + to occur. If the table is vacuumed, either because > + <xref linkend="sql-vacuum" /> has been manually invoked or because > + autovacuum causes it, > + all existing unsummarized page ranges are summarized. I'd say "If the table is vacuumed manually or by autovacuum, ..." (Or "either manually or by autovacuum, ...") > + Also, if the index has the > + <xref linkend="index-reloption-autosummarize"/> parameter set to on, Maybe say "If the autovacuum parameter is enabled" (this may avoid needing to revise it later if we change the default). > + then any run of autovacuum in the database will summarize all I'd avoid saying "run" and instead say "then anytime autovacuum runs in that database, all ..." > + unsummarized page ranges that have been completely filled recently, > + regardless of whether the table is processed by autovacuum for other > + reasons; see below. say "whether the table itself" and remove "for other reasons" ? > <para> > When autosummarization is enabled, each time a page range is filled a Maybe: filled comma > - request is sent to autovacuum for it to execute a targeted summarization > - for that range, to be fulfilled at the end of the next worker run on the > - same database. If the request queue is full, the request is not recorded > - and a message is sent to the server log: > + request is sent to <literal>autovacuum</literal> for it to execute a targeted > + summarization for that range, to be fulfilled at the end of the next > + autovacuum worker run on the same database. If the request queue is full, the "to be fulfilled the next time an autovacuum worker finishes running in that database." or "to be fulfilled by an autovacuum worker the next it finishes running in that database." > +++ b/doc/src/sgml/ref/create_index.sgml > @@ -580,6 +580,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class= > <para> > Defines whether a summarization run is invoked for the previous page > range whenever an insertion is detected on the next one. > + See <xref linkend="brin-operation"/> for more details. > + The default is <literal>off</literal>. Maybe "invoked" should say "queued" ? Also, a reminder that this was never addressed (I wish the project had a way to keep track of known issues). https://www.postgresql.org/message-id/20201113160007.GQ30691@telsasoft.com |error_severity of brin work item |left | could not open relation with OID 292103095 |left | processing work entry for relation "ts.child.alarms_202010_alarm_clear_time_idx" |Those happen following a REINDEX job on that index. This inline patch includes my changes as well as yours. And the attached patch is my changes only. diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml index caf1ea4cef1..90897a4af07 100644 --- a/doc/src/sgml/brin.sgml +++ b/doc/src/sgml/brin.sgml @@ -73,31 +73,55 @@ summarized range, that range does not automatically acquire a summary tuple; those tuples remain unsummarized until a summarization run is invoked later, creating initial summaries. - This process can be invoked manually using the - <function>brin_summarize_range(regclass, bigint)</function> or - <function>brin_summarize_new_values(regclass)</function> functions; - automatically when <command>VACUUM</command> processes the table; - or by automatic summarization executed by autovacuum, as insertions - occur. (This last trigger is disabled by default and can be enabled - with the <literal>autosummarize</literal> parameter.) - Conversely, a range can be de-summarized using the - <function>brin_desummarize_range(regclass, bigint)</function> function, - which is useful when the index tuple is no longer a very good - representation because the existing values have changed. </para> <para> - When autosummarization is enabled, each time a page range is filled a - request is sent to autovacuum for it to execute a targeted summarization - for that range, to be fulfilled at the end of the next worker run on the - same database. If the request queue is full, the request is not recorded - and a message is sent to the server log: + There are several ways to trigger the initial summarization of a page range. + If the table is vacuumed, either manually or by + <link linkend="autovacuum">autovacuum</link>, + all existing unsummarized page ranges are summarized. + Also, if the index's + <xref linkend="index-reloption-autosummarize"/> parameter is enabled, + whenever autovacuum runs in that database, summarization will + occur for all + unsummarized page ranges that have been filled, + regardless of whether the table itself is processed by autovacuum; see below. + + Lastly, the following functions can be used: + + <simplelist> + <member> + <function>brin_summarize_range(regclass, bigint)</function> + summarizes all unsummarized ranges + </member> + <member> + <function>brin_summarize_new_values(regclass)</function> + summarizes one specific range, if it is unsummarized + </member> + </simplelist> + </para> + + <para> + When autosummarization is enabled, each time a page range is filled, a + request is sent to <literal>autovacuum</literal> to execute a targeted + summarization for that range, to be fulfilled the next time an autovacuum + worker finishes running in that database. If the request queue is full, the + request is not recorded and a message is sent to the server log: <screen> LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was not recorded </screen> When this happens, the range will be summarized normally during the next regular vacuum of the table. </para> + + <para> + Conversely, a range can be de-summarized using the + <function>brin_desummarize_range(regclass, bigint)</function> function, + which is useful when the index tuple is no longer a very good + representation because the existing values have changed. + See <xref linkend="functions-admin-index"/> for details. + </para> + </sect2> </sect1> diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 9ffcdc629e6..a5bac9f7373 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -578,8 +578,10 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] <replaceable class= </term> <listitem> <para> - Defines whether a summarization run is invoked for the previous page + Defines whether a summarization run is queued for the previous page range whenever an insertion is detected on the next one. + See <xref linkend="brin-operation"/> for more details. + The default is <literal>off</literal>. </para> </listitem> </varlistentry>
Attachment
On 2022-Jul-04, Jaime Casanova wrote: > I feel that somewhere in this paragraph it should be mentioned that is > off by default. OK, I added it. On 2022-Jul-04, Justin Pryzby wrote: > [ lots of comments ] OK, I have adopted all your proposed changes, thanks for submitting in both forms. I did some more wordsmithing and pushed, to branches 12 and up. 11 fails 'make check', I think for lack of Docbook id tags, and I didn't want to waste more time. Kindly re-read the result and let me know if I left something unaddressed, or made something worse. The updated text is already visible in the website: https://www.postgresql.org/docs/devel/brin-intro.html (Having almost-immediate doc refreshes is an enormous improvement. Thanks Magnus.) > Also, a reminder that this was never addressed (I wish the project had a way to > keep track of known issues). > > https://www.postgresql.org/message-id/20201113160007.GQ30691@telsasoft.com > |error_severity of brin work item Yeah, I've not forgotten that item. I can't promise I'll get it fixed soon, but it's on my list. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)
On Mon, Jul 4, 2022 at 9:20 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
[Some of] these additions are wrong actually. It says that autovacuum
will not summarize new entries; but it does. If you just let the table
sit idle, any autovacuum run that cleans the table will also summarize
any ranges that need summarization.
What 'autosummarization=off' means is that the behavior to trigger an
immediate summarization of a range once it becomes full is not default.
This is very different.
Without having read through the code, I'll take your word for it. I simply went with what was written on this phrase of the docs:
"or by automatic summarization executed by autovacuum, as insertions occur. (This last trigger is disabled by default and can be enabled with the
autosummarize
parameter.)"To me this did not indicate a third behavior, which is what you are describing, so I'm glad we're having this discussion to clarify it.
As for the new <para></para>s that you added, I'd say they're
stylistically wrong. Each paragraph is supposed to be one fully
contained idea; what these tags do is split each idea across several
smaller paragraphs. This is likely subjective though.
While I don't disagree with you, readability is more important. We have lots of places (such as that one on the docs) where we have a big blob of text, reducing readability, IMHO. In the source they are broken by new lines, but in the rendered HTML, which is what the vast majority of people read, they get rendered into a big blob-looking-thing.
> What would be the downside (if any) of having autosummarize=on by default?
I'm not aware of any. Maybe we should turn it on by default.
+1
Thanks for looking at this Alvaro.
Roberto
--
Cunchy Data -- passion for open source PostgreSQL
On Tue, Jul 5, 2022 at 5:47 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
OK, I have adopted all your proposed changes, thanks for submitting in
both forms. I did some more wordsmithing and pushed, to branches 12 and
up. 11 fails 'make check', I think for lack of Docbook id tags, and I
didn't want to waste more time. Kindly re-read the result and let me
know if I left something unaddressed, or made something worse. The
updated text is already visible in the website:
https://www.postgresql.org/docs/devel/brin-intro.html
You removed the reference to the functions' documentation at
functions-admin-index choosing instead to duplicate a summarized
version of the docs, and to boot getting the next block to be blobbed
together with it.
Keeping with the reduced-readability theme, you made the paragraphs
even bigger. While I do appreciate the time to clarify things a bit, as was
my original intent with the patch,
We should be writing documentation with the user in mind, not for our
developer eyes. Different target audiences. It is less helpful to have
awesome features that don't get used because users can't really
grasp the docs.
Paragraphs such as this feel like we're playing "summary bingo":
When a new page is created that does not fall within the last
summarized range, the range that the new page belongs into
does not automatically acquire a summary tuple;
those tuples remain unsummarized until a summarization run is
invoked later, creating the initial summary for that range
summarized range, the range that the new page belongs into
does not automatically acquire a summary tuple;
those tuples remain unsummarized until a summarization run is
invoked later, creating the initial summary for that range
Roberto
--
Crunchy Data -- passion for open source PostgreSQL
On 2022-Jul-05, Roberto Mello wrote: > You removed the reference to the functions' documentation at > functions-admin-index choosing instead to duplicate a summarized > version of the docs, and to boot getting the next block to be blobbed > together with it. Actually, my first instinct was to move the interesting parts to the functions docs, then reference those, removing the duplicate bits. But I was discouraged when I read it, because it is just a table in a place not really appropriate for a larger discussion on it. Also, a reference to it is not direct, but rather it goes to a table that contains a lot of other stuff. > Keeping with the reduced-readability theme, you made the paragraphs > even bigger. While I do appreciate the time to clarify things a bit, as was > my original intent with the patch, [...] Hmm, which paragraph are you referring to? I'm not aware of having made any paragraph bigger, quite the opposite. In the original text, the paragraph "At the time of creation," is 13 lines on a browser window that is half the screen; in the patched text, that has been replaced by three paragraphs that are 7, 6, and 4 lines long, plus a separate one for the de-summarization bits at the end of the page, which is 3 lines long. > We should be writing documentation with the user in mind, not for our > developer eyes. Different target audiences. It is less helpful to have > awesome features that don't get used because users can't really > grasp the docs. I try to do that. I guess I fail more frequently that I should. > Paragraphs such as this feel like we're playing "summary bingo": > > When a new page is created that does not fall within the last > summarized range, the range that the new page belongs into > does not automatically acquire a summary tuple; > those tuples remain unsummarized until a summarization run is > invoked later, creating the initial summary for that range Yeah, I am aware that the word "summary" and variations occur way too many times. Maybe it is possible to replace "summary tuple" with "BRIN tuple" for example; can you propose some synonym for "summarized" and "unsummarized"? Perhaps something like this: > When a new page is created that does not fall within the last > summarized range, the range that the new page belongs into > does not automatically acquire a BRIN tuple; > those [pages] remain uncovered by the BRIN index until a summarization run is > invoked later, creating the initial BRIN tuple for that range (I also replaced the word "tuples" with "pages" in one spot.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Hay dos momentos en la vida de un hombre en los que no debería especular: cuando puede permitírselo y cuando no puede" (Mark Twain)
On Tue, Jul 05, 2022 at 01:47:27PM +0200, Alvaro Herrera wrote: > OK, I have adopted all your proposed changes, thanks for submitting in > both forms. I did some more wordsmithing and pushed, to branches 12 and > up. 11 fails 'make check', I think for lack of Docbook id tags, and I > didn't want to waste more time. Kindly re-read the result and let me > know if I left something unaddressed, or made something worse. The > updated text is already visible in the website: > https://www.postgresql.org/docs/devel/brin-intro.html One issue: + summarized range, the range that the new page belongs into + does not automatically acquire a summary tuple; "belongs into" sounds wrong - "belongs to" is better. I'll put that change into my "typos" branch to fix later if it's not addressed in this thread. -- Justin
On 2022-Jul-05, Justin Pryzby wrote: > One issue: > > + summarized range, the range that the new page belongs into > + does not automatically acquire a summary tuple; > > "belongs into" sounds wrong - "belongs to" is better. Hah, and I was wondering if "belongs in" was any better. > I'll put that change into my "typos" branch to fix later if it's not addressed > in this thread. Roberto has some more substantive comments on the new text, so let's try and fix everything together. This time, I'll let you guys come up with a new patch. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/