Thread: Failed to request an autovacuum work-item in silence
Hi all, While reading the code, I realized that the requesting an autovacuum work-item could fail in silence if work-item array is full. So the users cannot realize that work-item is never performed. AutoVacuumRequestWork() seems to behave so from the initial implementation but is there any reason of such behavior? It seems to me that it can be a problem even now that there is only one kind of work-item. Attached patch for fixing it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
Hi all,
While reading the code, I realized that the requesting an autovacuum
work-item could fail in silence if work-item array is full. So the
users cannot realize that work-item is never performed.
AutoVacuumRequestWork() seems to behave so from the initial
implementation but is there any reason of such behavior? It seems to
me that it can be a problem even now that there is only one kind of
work-item. Attached patch for fixing it.
Seems reasonable but maybe you can use the word "worker" instead of "work item" for report message.
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote: > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com> > escreveu: >> >> Hi all, >> >> While reading the code, I realized that the requesting an autovacuum >> work-item could fail in silence if work-item array is full. So the >> users cannot realize that work-item is never performed. >> AutoVacuumRequestWork() seems to behave so from the initial >> implementation but is there any reason of such behavior? It seems to >> me that it can be a problem even now that there is only one kind of >> work-item. Attached patch for fixing it. > > > Seems reasonable but maybe you can use the word "worker" instead of "work > item" for report message. > Thank you for the comment. Or can we use the word "work-item" since the commit log and source code use this word? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
> >
> > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com>
> > escreveu:
> >>
> >> Hi all,
> >>
> >> While reading the code, I realized that the requesting an autovacuum
> >> work-item could fail in silence if work-item array is full. So the
> >> users cannot realize that work-item is never performed.
> >> AutoVacuumRequestWork() seems to behave so from the initial
> >> implementation but is there any reason of such behavior? It seems to
> >> me that it can be a problem even now that there is only one kind of
> >> work-item. Attached patch for fixing it.
> >
> >
> > Seems reasonable but maybe you can use the word "worker" instead of "work
> > item" for report message.
> >
>
> Thank you for the comment.
> Or can we use the word "work-item" since the commit log and source
> code use this word?
>
You're correct, I misunderstood it thinking about autovacuum workers and not the internal workitem array.
As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a real workload this can send a lot of messages to log, so:
1) maybe invent a new GUC to control if we need or not to send this info to log
2) change elevel for DEBUG1
Regards,
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On Wed, Jan 24, 2018 at 12:31 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:
>
>
>
> On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
> > <fabriziomello@gmail.com> wrote:
> > >
> > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com>
> > > escreveu:
> > >>
> > >> Hi all,
> > >>
> > >> While reading the code, I realized that the requesting an autovacuum
> > >> work-item could fail in silence if work-item array is full. So the
> > >> users cannot realize that work-item is never performed.
> > >> AutoVacuumRequestWork() seems to behave so from the initial
> > >> implementation but is there any reason of such behavior? It seems to
> > >> me that it can be a problem even now that there is only one kind of
> > >> work-item. Attached patch for fixing it.
> > >
> > >
> > > Seems reasonable but maybe you can use the word "worker" instead of "work
> > > item" for report message.
> > >
> >
> > Thank you for the comment.
> > Or can we use the word "work-item" since the commit log and source
> > code use this word?
> >
>
> You're correct, I misunderstood it thinking about autovacuum workers and not the internal workitem array.
>
> As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a real workload this can send a lot of messages to log, so:
> 1) maybe invent a new GUC to control if we need or not to send this info to log
> 2) change elevel for DEBUG1
>
Looking better for the autovacuum code your patch will work just for BRIN request "brin_summarize_range", correct?
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On Thu, Jan 25, 2018 at 12:14 AM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote: > > > On Wed, Jan 24, 2018 at 12:31 PM, Fabrízio de Royes Mello > <fabriziomello@gmail.com> wrote: >> >> >> >> On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com> >> wrote: >> > >> > On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello >> > <fabriziomello@gmail.com> wrote: >> > > >> > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada >> > > <sawada.mshk@gmail.com> >> > > escreveu: >> > >> >> > >> Hi all, >> > >> >> > >> While reading the code, I realized that the requesting an autovacuum >> > >> work-item could fail in silence if work-item array is full. So the >> > >> users cannot realize that work-item is never performed. >> > >> AutoVacuumRequestWork() seems to behave so from the initial >> > >> implementation but is there any reason of such behavior? It seems to >> > >> me that it can be a problem even now that there is only one kind of >> > >> work-item. Attached patch for fixing it. >> > > >> > > >> > > Seems reasonable but maybe you can use the word "worker" instead of >> > > "work >> > > item" for report message. >> > > >> > >> > Thank you for the comment. >> > Or can we use the word "work-item" since the commit log and source >> > code use this word? >> > >> >> You're correct, I misunderstood it thinking about autovacuum workers and >> not the internal workitem array. >> >> As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a >> real workload this can send a lot of messages to log, so: >> 1) maybe invent a new GUC to control if we need or not to send this info >> to log >> 2) change elevel for DEBUG1 >> Hmm, I'd rather think to log the message at LOG level and to have a new GUC to control the size of maximum of work-items array . I think we should always notify users that work-item couldn't get added because it can become a potential performance problem. Also it might lead other problems once we have other types of work-item. In the log message, we can hint for user that you might want to increase maximum of work-items array. > > Looking better for the autovacuum code your patch will work just for BRIN > request "brin_summarize_range", correct? > Correct. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Thu, Jan 25, 2018 at 10:41 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Jan 25, 2018 at 12:14 AM, Fabrízio de Royes Mello > <fabriziomello@gmail.com> wrote: >> >> >> On Wed, Jan 24, 2018 at 12:31 PM, Fabrízio de Royes Mello >> <fabriziomello@gmail.com> wrote: >>> >>> >>> >>> On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com> >>> wrote: >>> > >>> > On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello >>> > <fabriziomello@gmail.com> wrote: >>> > > >>> > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada >>> > > <sawada.mshk@gmail.com> >>> > > escreveu: >>> > >> >>> > >> Hi all, >>> > >> >>> > >> While reading the code, I realized that the requesting an autovacuum >>> > >> work-item could fail in silence if work-item array is full. So the >>> > >> users cannot realize that work-item is never performed. >>> > >> AutoVacuumRequestWork() seems to behave so from the initial >>> > >> implementation but is there any reason of such behavior? It seems to >>> > >> me that it can be a problem even now that there is only one kind of >>> > >> work-item. Attached patch for fixing it. >>> > > >>> > > >>> > > Seems reasonable but maybe you can use the word "worker" instead of >>> > > "work >>> > > item" for report message. >>> > > >>> > >>> > Thank you for the comment. >>> > Or can we use the word "work-item" since the commit log and source >>> > code use this word? >>> > >>> >>> You're correct, I misunderstood it thinking about autovacuum workers and >>> not the internal workitem array. >>> >>> As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a >>> real workload this can send a lot of messages to log, so: >>> 1) maybe invent a new GUC to control if we need or not to send this info >>> to log >>> 2) change elevel for DEBUG1 >>> > > Hmm, I'd rather think to log the message at LOG level and to have a > new GUC to control the size of maximum of work-items array . I think > we should always notify users that work-item couldn't get added > because it can become a potential performance problem. Also it might > lead other problems once we have other types of work-item. In the log > message, we can hint for user that you might want to increase maximum > of work-items array. > >> >> Looking better for the autovacuum code your patch will work just for BRIN >> request "brin_summarize_range", correct? >> > > Correct. > I've added this to next CF so as not to forget. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On 2018-01-23 14:35:42 +0900, Masahiko Sawada wrote: > While reading the code, I realized that the requesting an autovacuum > work-item could fail in silence if work-item array is full. So the > users cannot realize that work-item is never performed. > AutoVacuumRequestWork() seems to behave so from the initial > implementation but is there any reason of such behavior? It seems to > me that it can be a problem even now that there is only one kind of > work-item. Attached patch for fixing it. Alvaro? - Andres
Masahiko Sawada wrote: > While reading the code, I realized that the requesting an autovacuum > work-item could fail in silence if work-item array is full. So the > users cannot realize that work-item is never performed. > AutoVacuumRequestWork() seems to behave so from the initial > implementation but is there any reason of such behavior? It seems to > me that it can be a problem even now that there is only one kind of > work-item. Attached patch for fixing it. Hmm, yeah. I think it would be better to return false to caller, and have them report the failure. Then they're in a better position to place an accurate log message -- for instance indicating the relation name rather than just OID, and specify work item type rather than some weird integer. (I think the ereport() should definitely be *outside* the locked section, in any case.) I'm inclined to change the API of AutoVacuumRequestWork even in branch 10. Since this stuff is not nicely extensible (c.f. perform_work_item), it doesn't seem likely that any outside code is calling it yet. Once we have hooks to register worker functions and stuff, it'll become more of a problem. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thank you the comment. On Fri, Mar 2, 2018 at 4:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Masahiko Sawada wrote: > >> While reading the code, I realized that the requesting an autovacuum >> work-item could fail in silence if work-item array is full. So the >> users cannot realize that work-item is never performed. >> AutoVacuumRequestWork() seems to behave so from the initial >> implementation but is there any reason of such behavior? It seems to >> me that it can be a problem even now that there is only one kind of >> work-item. Attached patch for fixing it. > > Hmm, yeah. > > I think it would be better to return false to caller, and have them > report the failure. Then they're in a better position to place an > accurate log message -- for instance indicating the relation name rather > than just OID, and specify work item type rather than some weird > integer. (I think the ereport() should definitely be *outside* the > locked section, in any case.) Agreed. Attached an updated patch. > > I'm inclined to change the API of AutoVacuumRequestWork even in branch > 10. Since this stuff is not nicely extensible (c.f. perform_work_item), > it doesn't seem likely that any outside code is calling it yet. Once we > have hooks to register worker functions and stuff, it'll become more of > a problem. +1 Since this API cannot be execute actually other than brin summarization I think we can change it in branch 10. It's good if autovacuum work-items can be added by extensions and so on. Also I'm inclined to change the autovacuum launcher so that it can launcher new worker for performing work-item or to allow requests to specify the execution timing (before or after vacuum), it's for PG12 item though. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hello, The patch applies and compiles cleanly, tests pass. The code is working as well. I was able to check it by simply creating a BRIN index and filling the table with data forcing the index to request lots of work items: create table test (a serial, b text); create index on test using brin (a) with (pages_per_range=1, autosummarize=true); insert into test select i, repeat(md5(random()::text), 30) from generate_series(1, 3000) i; LOG: could not request an autovacuum work item "BrinSummarizeRange" for "test_a_idx" LOG: could not request an autovacuum work item "BrinSummarizeRange" for "test_a_idx" ... Just couple remarks. I would rename 'requested' variable in AutoVacuumRequestWork() func to something like 'success' or 'result'. Because request is something caller does. And I would also rephrase log message as follows: request for autovacuum work item "%s" for "%s" failed Thanks! -- Ildar Musin Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Hi I was thinking that the BRIN code requesting the workitem would print the error message based on the return value. There is no point to returning a boolean indicator if the caller isn't going to do anything with it ... This means you don't need to convert the type to string in autovacuum.c (which would defeat attempts at generalizing this code). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thank you for reviewing! On Thu, Mar 8, 2018 at 6:07 PM, Ildar Musin <i.musin@postgrespro.ru> wrote: > Just couple remarks. I would rename 'requested' variable in > AutoVacuumRequestWork() func to something like 'success' or 'result'. > Because request is something caller does. And I would also rephrase log > message as follows: > > request for autovacuum work item "%s" for "%s" failed Agreed. On Thu, Mar 8, 2018 at 10:46 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Hi > > I was thinking that the BRIN code requesting the workitem would print > the error message based on the return value. There is no point to > returning a boolean indicator if the caller isn't going to do anything > with it ... This means you don't need to convert the type to string in > autovacuum.c (which would defeat attempts at generalizing this code). > Agreed. Attached an updated patch. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Attachment
Hi, autovac_get_workitem_name() declaration seems redundant and should be removed. The same thing with including "utils/lsyscache.h" in brin.c. The 'requested' variable in brininsert() I would again rename to something like 'success' because a work item is requested anyway but what matters is whether the request was satisfied/successful. Except for those two points everything is fine and works as expected. -- Ildar Musin i.musin@postgrespro.ru
Hello, Ildar Musin wrote: > autovac_get_workitem_name() declaration seems redundant and should be > removed. The same thing with including "utils/lsyscache.h" in brin.c. > > The 'requested' variable in brininsert() I would again rename to something > like 'success' because a work item is requested anyway but what matters is > whether the request was satisfied/successful. Thanks, I pushed this. I agree with your comments; so I changed 'requested' to 'recorded' and removed those lines. I also reworded the log message: ereport(LOG, (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("request for BRIN range summarization for index \"%s\" page %u was not recorded", RelationGetRelationName(idxRel), lastPageRange))); And added a paragraph to the docs explaining this situation. Now I'm wondering what will we tell users to do if they get this message too frequently. Neither of the obvious options (1. changing the index's pages_per_range to a larger value; 2. making autovacuum more frequent somehow) seem terribly useful. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 15, 2018 at 12:06 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Hello, > > Ildar Musin wrote: > >> autovac_get_workitem_name() declaration seems redundant and should be >> removed. The same thing with including "utils/lsyscache.h" in brin.c. >> >> The 'requested' variable in brininsert() I would again rename to something >> like 'success' because a work item is requested anyway but what matters is >> whether the request was satisfied/successful. > > Thanks, I pushed this. I agree with your comments; so I changed > 'requested' to 'recorded' and removed those lines. Thank you! >I also reworded the > log message: > > ereport(LOG, > (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), > errmsg("request for BRIN range summarization for index \"%s\" page %u was not recorded", > RelationGetRelationName(idxRel), > lastPageRange))); > > And added a paragraph to the docs explaining this situation. > > Now I'm wondering what will we tell users to do if they get this message > too frequently. Neither of the obvious options (1. changing the index's > pages_per_range to a larger value; 2. making autovacuum more frequent > somehow) seem terribly useful. Or telling users to call brin_summarize_range() manually? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Masahiko Sawada wrote: > On Thu, Mar 15, 2018 at 12:06 AM, Alvaro Herrera > <alvherre@alvh.no-ip.org> wrote: > > Now I'm wondering what will we tell users to do if they get this message > > too frequently. Neither of the obvious options (1. changing the index's > > pages_per_range to a larger value; 2. making autovacuum more frequent > > somehow) seem terribly useful. > > Or telling users to call brin_summarize_range() manually? Yeah, they can do that to fix the situation with each unsummarized range, but that won't silence the log message ... oh! Unless you call it to summarize the range ahead of time -- I think that should fix it. Is that what you were thinking? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Mar 15, 2018 at 7:35 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Masahiko Sawada wrote: >> On Thu, Mar 15, 2018 at 12:06 AM, Alvaro Herrera >> <alvherre@alvh.no-ip.org> wrote: > >> > Now I'm wondering what will we tell users to do if they get this message >> > too frequently. Neither of the obvious options (1. changing the index's >> > pages_per_range to a larger value; 2. making autovacuum more frequent >> > somehow) seem terribly useful. >> >> Or telling users to call brin_summarize_range() manually? > > Yeah, they can do that to fix the situation with each unsummarized > range, but that won't silence the log message ... oh! Unless you call it > to summarize the range ahead of time -- I think that should fix it. Is > that what you were thinking? > Yes. Or with this situation since the multiple work-item for the same index but different block number might be listed the calling brin_summarize_new_values() manually would rather fix the situation more properly? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center