Thread: Failed to request an autovacuum work-item in silence

Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
Hi all,

While reading the code, I realized that the requesting an autovacuum
work-item could fail in silence if work-item array is full. So the
users cannot realize that work-item is never performed.
AutoVacuumRequestWork() seems to behave so from the initial
implementation but is there any reason of such behavior? It seems to
me that it can be a problem even now that there is only one kind of
work-item. Attached patch for fixing it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Failed to request an autovacuum work-item in silence

From
Fabrízio de Royes Mello
Date:

Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com> escreveu:
Hi all,

While reading the code, I realized that the requesting an autovacuum
work-item could fail in silence if work-item array is full. So the
users cannot realize that work-item is never performed.
AutoVacuumRequestWork() seems to behave so from the initial
implementation but is there any reason of such behavior? It seems to
me that it can be a problem even now that there is only one kind of
work-item. Attached patch for fixing it.

Seems reasonable but maybe you can use the word "worker" instead of "work item" for report message.


--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL

Re: Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
>
> Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com>
> escreveu:
>>
>> Hi all,
>>
>> While reading the code, I realized that the requesting an autovacuum
>> work-item could fail in silence if work-item array is full. So the
>> users cannot realize that work-item is never performed.
>> AutoVacuumRequestWork() seems to behave so from the initial
>> implementation but is there any reason of such behavior? It seems to
>> me that it can be a problem even now that there is only one kind of
>> work-item. Attached patch for fixing it.
>
>
> Seems reasonable but maybe you can use the word "worker" instead of "work
> item" for report message.
>

Thank you for the comment.
Or can we use the word "work-item" since the commit log and source
code use this word?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Failed to request an autovacuum work-item in silence

From
Fabrízio de Royes Mello
Date:


On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
> >
> > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com>
> > escreveu:
> >>
> >> Hi all,
> >>
> >> While reading the code, I realized that the requesting an autovacuum
> >> work-item could fail in silence if work-item array is full. So the
> >> users cannot realize that work-item is never performed.
> >> AutoVacuumRequestWork() seems to behave so from the initial
> >> implementation but is there any reason of such behavior? It seems to
> >> me that it can be a problem even now that there is only one kind of
> >> work-item. Attached patch for fixing it.
> >
> >
> > Seems reasonable but maybe you can use the word "worker" instead of "work
> > item" for report message.
> >
>
> Thank you for the comment.
> Or can we use the word "work-item" since the commit log and source
> code use this word?
>

You're correct, I misunderstood it thinking about autovacuum workers and not the internal workitem array.

As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a real workload this can send a lot of messages to log, so:
1) maybe invent a new GUC to control if we need or not to send this info to log
2) change elevel for DEBUG1

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: Failed to request an autovacuum work-item in silence

From
Fabrízio de Royes Mello
Date:


On Wed, Jan 24, 2018 at 12:31 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote:
>
>
>
> On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
> > <fabriziomello@gmail.com> wrote:
> > >
> > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada <sawada.mshk@gmail.com>
> > > escreveu:
> > >>
> > >> Hi all,
> > >>
> > >> While reading the code, I realized that the requesting an autovacuum
> > >> work-item could fail in silence if work-item array is full. So the
> > >> users cannot realize that work-item is never performed.
> > >> AutoVacuumRequestWork() seems to behave so from the initial
> > >> implementation but is there any reason of such behavior? It seems to
> > >> me that it can be a problem even now that there is only one kind of
> > >> work-item. Attached patch for fixing it.
> > >
> > >
> > > Seems reasonable but maybe you can use the word "worker" instead of "work
> > > item" for report message.
> > >
> >
> > Thank you for the comment.
> > Or can we use the word "work-item" since the commit log and source
> > code use this word?
> >
>
> You're correct, I misunderstood it thinking about autovacuum workers and not the internal workitem array.
>
> As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a real workload this can send a lot of messages to log, so:
> 1) maybe invent a new GUC to control if we need or not to send this info to log
> 2) change elevel for DEBUG1
>

Looking better for the autovacuum code your patch will work just for BRIN request "brin_summarize_range", correct?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
On Thu, Jan 25, 2018 at 12:14 AM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
>
>
> On Wed, Jan 24, 2018 at 12:31 PM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
>>
>>
>>
>> On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>> wrote:
>> >
>> > On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
>> > <fabriziomello@gmail.com> wrote:
>> > >
>> > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada
>> > > <sawada.mshk@gmail.com>
>> > > escreveu:
>> > >>
>> > >> Hi all,
>> > >>
>> > >> While reading the code, I realized that the requesting an autovacuum
>> > >> work-item could fail in silence if work-item array is full. So the
>> > >> users cannot realize that work-item is never performed.
>> > >> AutoVacuumRequestWork() seems to behave so from the initial
>> > >> implementation but is there any reason of such behavior? It seems to
>> > >> me that it can be a problem even now that there is only one kind of
>> > >> work-item. Attached patch for fixing it.
>> > >
>> > >
>> > > Seems reasonable but maybe you can use the word "worker" instead of
>> > > "work
>> > > item" for report message.
>> > >
>> >
>> > Thank you for the comment.
>> > Or can we use the word "work-item" since the commit log and source
>> > code use this word?
>> >
>>
>> You're correct, I misunderstood it thinking about autovacuum workers and
>> not the internal workitem array.
>>
>> As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a
>> real workload this can send a lot of messages to log, so:
>> 1) maybe invent a new GUC to control if we need or not to send this info
>> to log
>> 2) change elevel for DEBUG1
>>

Hmm, I'd rather think to log the message at LOG level and to have a
new GUC to control the size of maximum of work-items array . I think
we should always notify users that work-item couldn't get added
because it can become a potential performance problem. Also it might
lead other problems once we have other types of work-item. In the log
message, we can hint for user that you might want to increase maximum
of work-items array.

>
> Looking better for the autovacuum code your patch will work just for BRIN
> request "brin_summarize_range", correct?
>

Correct.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
On Thu, Jan 25, 2018 at 10:41 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Thu, Jan 25, 2018 at 12:14 AM, Fabrízio de Royes Mello
> <fabriziomello@gmail.com> wrote:
>>
>>
>> On Wed, Jan 24, 2018 at 12:31 PM, Fabrízio de Royes Mello
>> <fabriziomello@gmail.com> wrote:
>>>
>>>
>>>
>>> On Tue, Jan 23, 2018 at 11:44 PM, Masahiko Sawada <sawada.mshk@gmail.com>
>>> wrote:
>>> >
>>> > On Tue, Jan 23, 2018 at 8:03 PM, Fabrízio de Royes Mello
>>> > <fabriziomello@gmail.com> wrote:
>>> > >
>>> > > Em ter, 23 de jan de 2018 às 03:36, Masahiko Sawada
>>> > > <sawada.mshk@gmail.com>
>>> > > escreveu:
>>> > >>
>>> > >> Hi all,
>>> > >>
>>> > >> While reading the code, I realized that the requesting an autovacuum
>>> > >> work-item could fail in silence if work-item array is full. So the
>>> > >> users cannot realize that work-item is never performed.
>>> > >> AutoVacuumRequestWork() seems to behave so from the initial
>>> > >> implementation but is there any reason of such behavior? It seems to
>>> > >> me that it can be a problem even now that there is only one kind of
>>> > >> work-item. Attached patch for fixing it.
>>> > >
>>> > >
>>> > > Seems reasonable but maybe you can use the word "worker" instead of
>>> > > "work
>>> > > item" for report message.
>>> > >
>>> >
>>> > Thank you for the comment.
>>> > Or can we use the word "work-item" since the commit log and source
>>> > code use this word?
>>> >
>>>
>>> You're correct, I misunderstood it thinking about autovacuum workers and
>>> not the internal workitem array.
>>>
>>> As NUM_WORKITEMS is fixed in 256 we don't have any real feedback if in a
>>> real workload this can send a lot of messages to log, so:
>>> 1) maybe invent a new GUC to control if we need or not to send this info
>>> to log
>>> 2) change elevel for DEBUG1
>>>
>
> Hmm, I'd rather think to log the message at LOG level and to have a
> new GUC to control the size of maximum of work-items array . I think
> we should always notify users that work-item couldn't get added
> because it can become a potential performance problem. Also it might
> lead other problems once we have other types of work-item. In the log
> message, we can hint for user that you might want to increase maximum
> of work-items array.
>
>>
>> Looking better for the autovacuum code your patch will work just for BRIN
>> request "brin_summarize_range", correct?
>>
>
> Correct.
>

I've added this to next CF so as not to forget.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Failed to request an autovacuum work-item in silence

From
Andres Freund
Date:
Hi,

On 2018-01-23 14:35:42 +0900, Masahiko Sawada wrote:
> While reading the code, I realized that the requesting an autovacuum
> work-item could fail in silence if work-item array is full. So the
> users cannot realize that work-item is never performed.
> AutoVacuumRequestWork() seems to behave so from the initial
> implementation but is there any reason of such behavior? It seems to
> me that it can be a problem even now that there is only one kind of
> work-item. Attached patch for fixing it.

Alvaro?

- Andres


Re: Failed to request an autovacuum work-item in silence

From
Alvaro Herrera
Date:
Masahiko Sawada wrote:

> While reading the code, I realized that the requesting an autovacuum
> work-item could fail in silence if work-item array is full. So the
> users cannot realize that work-item is never performed.
> AutoVacuumRequestWork() seems to behave so from the initial
> implementation but is there any reason of such behavior? It seems to
> me that it can be a problem even now that there is only one kind of
> work-item. Attached patch for fixing it.

Hmm, yeah.

I think it would be better to return false to caller, and have them
report the failure.  Then they're in a better position to place an
accurate log message -- for instance indicating the relation name rather
than just OID, and specify work item type rather than some weird
integer.  (I think the ereport() should definitely be *outside* the
locked section, in any case.)

I'm inclined to change the API of AutoVacuumRequestWork even in branch
10.  Since this stuff is not nicely extensible (c.f. perform_work_item),
it doesn't seem likely that any outside code is calling it yet.  Once we
have hooks to register worker functions and stuff, it'll become more of
a problem.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
Thank you the comment.

On Fri, Mar 2, 2018 at 4:18 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Masahiko Sawada wrote:
>
>> While reading the code, I realized that the requesting an autovacuum
>> work-item could fail in silence if work-item array is full. So the
>> users cannot realize that work-item is never performed.
>> AutoVacuumRequestWork() seems to behave so from the initial
>> implementation but is there any reason of such behavior? It seems to
>> me that it can be a problem even now that there is only one kind of
>> work-item. Attached patch for fixing it.
>
> Hmm, yeah.
>
> I think it would be better to return false to caller, and have them
> report the failure.  Then they're in a better position to place an
> accurate log message -- for instance indicating the relation name rather
> than just OID, and specify work item type rather than some weird
> integer.  (I think the ereport() should definitely be *outside* the
> locked section, in any case.)

Agreed. Attached an updated patch.

>
> I'm inclined to change the API of AutoVacuumRequestWork even in branch
> 10.  Since this stuff is not nicely extensible (c.f. perform_work_item),
> it doesn't seem likely that any outside code is calling it yet.  Once we
> have hooks to register worker functions and stuff, it'll become more of
> a problem.

+1
Since this API cannot be execute actually other than brin
summarization I think we can change it in branch 10.

It's good if autovacuum work-items can be added by extensions and so
on. Also I'm inclined to change the autovacuum launcher so that it can
launcher new worker for performing work-item or to allow requests to
specify the execution timing (before or after vacuum), it's for PG12
item though.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Failed to request an autovacuum work-item in silence

From
Ildar Musin
Date:
Hello,

The patch applies and compiles cleanly, tests pass. The code is working
as well. I was able to check it by simply creating a BRIN index and
filling the table with data forcing the index to request lots of work items:

create table test (a serial, b text);
create index on test using brin (a) with (pages_per_range=1,
autosummarize=true);
insert into test select i, repeat(md5(random()::text), 30) from
generate_series(1, 3000) i;
LOG:  could not request an autovacuum work item "BrinSummarizeRange" for
"test_a_idx"
LOG:  could not request an autovacuum work item "BrinSummarizeRange" for
"test_a_idx"
...

Just couple remarks. I would rename 'requested' variable in
AutoVacuumRequestWork() func to something like 'success' or 'result'.
Because request is something caller does. And I would also rephrase log
message as follows:

   request for autovacuum work item "%s" for "%s" failed

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 



Re: Failed to request an autovacuum work-item in silence

From
Alvaro Herrera
Date:
Hi

I was thinking that the BRIN code requesting the workitem would print
the error message based on the return value.  There is no point to
returning a boolean indicator if the caller isn't going to do anything
with it ...  This means you don't need to convert the type to string in
autovacuum.c (which would defeat attempts at generalizing this code).

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
Thank you for reviewing!

On Thu, Mar 8, 2018 at 6:07 PM, Ildar Musin <i.musin@postgrespro.ru> wrote:
> Just couple remarks. I would rename 'requested' variable in
> AutoVacuumRequestWork() func to something like 'success' or 'result'.
> Because request is something caller does. And I would also rephrase log
> message as follows:
>
>    request for autovacuum work item "%s" for "%s" failed

Agreed.

On Thu, Mar 8, 2018 at 10:46 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Hi
>
> I was thinking that the BRIN code requesting the workitem would print
> the error message based on the return value.  There is no point to
> returning a boolean indicator if the caller isn't going to do anything
> with it ...  This means you don't need to convert the type to string in
> autovacuum.c (which would defeat attempts at generalizing this code).
>

Agreed.

Attached an updated patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

Re: Failed to request an autovacuum work-item in silence

From
Ildar Musin
Date:
Hi,

autovac_get_workitem_name() declaration seems redundant and should be 
removed. The same thing with including "utils/lsyscache.h" in brin.c.

The 'requested' variable in brininsert() I would again rename to 
something like 'success' because a work item is requested anyway but 
what matters is whether the request was satisfied/successful.

Except for those two points everything is fine and works as expected.


-- 
Ildar Musin
i.musin@postgrespro.ru


Re: Failed to request an autovacuum work-item in silence

From
Alvaro Herrera
Date:
Hello,

Ildar Musin wrote:

> autovac_get_workitem_name() declaration seems redundant and should be
> removed. The same thing with including "utils/lsyscache.h" in brin.c.
> 
> The 'requested' variable in brininsert() I would again rename to something
> like 'success' because a work item is requested anyway but what matters is
> whether the request was satisfied/successful.

Thanks, I pushed this.  I agree with your comments; so I changed
'requested' to 'recorded' and removed those lines.  I also reworded the
log message:

                    ereport(LOG,
                            (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                             errmsg("request for BRIN range summarization for index \"%s\" page %u was not recorded",
                                    RelationGetRelationName(idxRel),
                                    lastPageRange)));

And added a paragraph to the docs explaining this situation.

Now I'm wondering what will we tell users to do if they get this message
too frequently.  Neither of the obvious options (1. changing the index's
pages_per_range to a larger value;  2. making autovacuum more frequent
somehow) seem terribly useful.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
On Thu, Mar 15, 2018 at 12:06 AM, Alvaro Herrera
<alvherre@alvh.no-ip.org> wrote:
> Hello,
>
> Ildar Musin wrote:
>
>> autovac_get_workitem_name() declaration seems redundant and should be
>> removed. The same thing with including "utils/lsyscache.h" in brin.c.
>>
>> The 'requested' variable in brininsert() I would again rename to something
>> like 'success' because a work item is requested anyway but what matters is
>> whether the request was satisfied/successful.
>
> Thanks, I pushed this.  I agree with your comments; so I changed
> 'requested' to 'recorded' and removed those lines.

Thank you!

>I also reworded the
> log message:
>
>                     ereport(LOG,
>                             (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
>                              errmsg("request for BRIN range summarization for index \"%s\" page %u was not
recorded",
>                                     RelationGetRelationName(idxRel),
>                                     lastPageRange)));
>
> And added a paragraph to the docs explaining this situation.
>
> Now I'm wondering what will we tell users to do if they get this message
> too frequently.  Neither of the obvious options (1. changing the index's
> pages_per_range to a larger value;  2. making autovacuum more frequent
> somehow) seem terribly useful.

Or telling users to call brin_summarize_range() manually?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Failed to request an autovacuum work-item in silence

From
Alvaro Herrera
Date:
Masahiko Sawada wrote:
> On Thu, Mar 15, 2018 at 12:06 AM, Alvaro Herrera
> <alvherre@alvh.no-ip.org> wrote:

> > Now I'm wondering what will we tell users to do if they get this message
> > too frequently.  Neither of the obvious options (1. changing the index's
> > pages_per_range to a larger value;  2. making autovacuum more frequent
> > somehow) seem terribly useful.
> 
> Or telling users to call brin_summarize_range() manually?

Yeah, they can do that to fix the situation with each unsummarized
range, but that won't silence the log message ... oh! Unless you call it
to summarize the range ahead of time -- I think that should fix it.  Is
that what you were thinking?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Failed to request an autovacuum work-item in silence

From
Masahiko Sawada
Date:
On Thu, Mar 15, 2018 at 7:35 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> Masahiko Sawada wrote:
>> On Thu, Mar 15, 2018 at 12:06 AM, Alvaro Herrera
>> <alvherre@alvh.no-ip.org> wrote:
>
>> > Now I'm wondering what will we tell users to do if they get this message
>> > too frequently.  Neither of the obvious options (1. changing the index's
>> > pages_per_range to a larger value;  2. making autovacuum more frequent
>> > somehow) seem terribly useful.
>>
>> Or telling users to call brin_summarize_range() manually?
>
> Yeah, they can do that to fix the situation with each unsummarized
> range, but that won't silence the log message ... oh! Unless you call it
> to summarize the range ahead of time -- I think that should fix it.  Is
> that what you were thinking?
>

Yes. Or with this situation since the multiple work-item for the same
index but different block number might be listed the calling
brin_summarize_new_values() manually would rather fix the situation
more properly?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center