Thread: [RFC] Shouldn't we remove annoying FATAL messages from server log?

[RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
Hello,

My customers and colleagues sometimes (or often?) ask about the following 
message:

FATAL:  the database system is starting up

This message is often output dozens of times during a failover or PITR.  The 
users seem to be worried because the message level is FATAL and they don't 
know why such severe message is output in a successful failover and 
recovery.  I can blame the users, because the message is merely a 
sub-product of pg_ctl's internal ping.

Similarly, the below message is output when I stop the standby server 
normally.  Why FATAL as a result of successful operation?  I'm afraid DBAs 
are annoyed by these messages, as system administration software collects 
ERROR and more severe messages for daily monitoring.

FATAL:  terminating walreceiver process due to administrator command

Shouldn't we lower the severity or avoiding those messages to server log? 
How about the following measures?

1. FATAL:  the database system is starting up
2. FATAL:  the database system is shutting down
3. FATAL:  the database system is in recovery mode
4. FATAL:  sorry, too many clients already
Report these as FATAL to the client because the client wants to know the 
reason.  But don't output them to server log because they are not necessary 
for DBAs (4 is subtle.)

5. FATAL:  terminating walreceiver process due to administrator command
6. FATAL:  terminating background worker \"%s\" due to administrator command
Don't output these to server log.  Why are they necessary?  For 
troubleshooting purposes?  If necessary, the severity should be LOG (but I 
wonder why other background processes are not reported...)


To suppress server log output, I think we can do as follows.  I guess 
ereport(FATAL) is still needed for easily handling both client report and 
process termination.

log_min_messages = PANIC;
ereport(FATAL,   (errcode(ERRCODE_CANNOT_CONNECT_NOW),   errmsg("the database system is starting up")));


May I hear your opinions?

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Peter Eisentraut
Date:
On 12/5/13, 10:25 AM, MauMau wrote:
> Report these as FATAL to the client because the client wants to know the
> reason.  But don't output them to server log because they are not
> necessary for DBAs

Yeah, this is part of a more general problem, which you have
characterized correctly: What is fatal (or error, or warning, ...) to
the client isn't necessarily fatal (or error, or warning, ...) to the
server or DBA.  Fixing this would need a larger enhancement of the
logging infrastructure.  It's been discussed before, but it's a bit of work.




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Tom Lane
Date:
"MauMau" <maumau307@gmail.com> writes:
> Shouldn't we lower the severity or avoiding those messages to server log? 

No.  They are FATAL so far as the individual session is concerned.
Possibly some documentation effort is needed here, but I don't think
any change in the code behavior would be an improvement.

> 1. FATAL:  the database system is starting up
> 2. FATAL:  the database system is shutting down
> 3. FATAL:  the database system is in recovery mode
> 4. FATAL:  sorry, too many clients already
> Report these as FATAL to the client because the client wants to know the 
> reason.  But don't output them to server log because they are not necessary 
> for DBAs (4 is subtle.)

The notion that a DBA should not be allowed to find out how often #4 is
happening is insane.
        regards, tom lane



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
David Johnston
Date:
Tom Lane-2 wrote
> "MauMau" <

> maumau307@

> > writes:
>> Shouldn't we lower the severity or avoiding those messages to server log? 
> 
> No.  They are FATAL so far as the individual session is concerned.
> Possibly some documentation effort is needed here, but I don't think
> any change in the code behavior would be an improvement.
> 
>> 1. FATAL:  the database system is starting up
>> 2. FATAL:  the database system is shutting down
>> 3. FATAL:  the database system is in recovery mode
>> 4. FATAL:  sorry, too many clients already
>> Report these as FATAL to the client because the client wants to know the 
>> reason.  But don't output them to server log because they are not
>> necessary 
>> for DBAs (4 is subtle.)
> 
> The notion that a DBA should not be allowed to find out how often #4 is
> happening is insane.

Agreed #4 is definitely DBA territory.

ISTM that instituting some level of categorization for messages would be
helpful.  Then logging and reporting frameworks would be able to identify
and segregate the logs in whatever way they and the configuration deems
appropriate.

FATAL: [LOGON] too many clients already

I'd make the category output disabled by default for a long while then
eventually enabled by default but leave the ability to disable.  Calls that
do not supply a category get [N/A] output in category mode.

David J.







--
View this message in context:
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5781925.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Stephen Frost
Date:
* David Johnston (polobo@yahoo.com) wrote:
> ISTM that instituting some level of categorization for messages would be
> helpful.  Then logging and reporting frameworks would be able to identify
> and segregate the logs in whatever way they and the configuration deems
> appropriate.

I've wanted to do that and have even discussed it with folks in the
past, the trick is finding enough toit's, which is difficult when you
start to look at the size of the task...
Thanks,
    Stephen

Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Josh Berkus
Date:
On 12/05/2013 10:21 AM, Stephen Frost wrote:
> * David Johnston (polobo@yahoo.com) wrote:
>> ISTM that instituting some level of categorization for messages would be
>> helpful.  Then logging and reporting frameworks would be able to identify
>> and segregate the logs in whatever way they and the configuration deems
>> appropriate.
> 
> I've wanted to do that and have even discussed it with folks in the
> past, the trick is finding enough toit's, which is difficult when you
> start to look at the size of the task...

But ... if we set a firm policy on this, then we could gradually clean
up the error messages piecemeal over the next couple of major versions.We could also make sure that any new features
compliedwith the
 
categorization policy.

Right now, how to categorize errors is up to each individual patch
author, which means that things are all over the place, and get worse
with each new feature added.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 12/05/2013 10:21 AM, Stephen Frost wrote:
> But ... if we set a firm policy on this, then we could gradually clean
> up the error messages piecemeal over the next couple of major versions.
>  We could also make sure that any new features complied with the
> categorization policy.

> Right now, how to categorize errors is up to each individual patch
> author, which means that things are all over the place, and get worse
> with each new feature added.

I don't think there's that much randomness in is-it-an-ERROR-or-not.
What I believe Stephen is talking about is a classification that
simply doesn't exist today, namely something around how likely is the
message to be of interest to a DBA as opposed to the client application.
We currently compose messages almost entirely with the client in mind,
and that's as it should be.  But we could use some new decoration that's
more DBA-oriented to help decide what goes into the postmaster log.

Before we could get very far we'd need a better understanding than we have
of what cases a DBA might be interested in.  To take the specific example
that started this thread, there wouldn't be a lot of value IMO in a
classification like "connection failure messages".  I think the OP is
probably right that those are often uninteresting --- but as I mentioned,
"too many clients" might become interesting if he's wondering whether he
needs to enlarge max_connections.  Or password failure cases might become
interesting if he starts to suspect breakin attempts.  So I'd want to see
a design that credibly covers those sorts of needs before we put any large
effort into code changes.
        regards, tom lane



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Josh Berkus
Date:
On 12/05/2013 10:46 AM, Tom Lane wrote:
> Before we could get very far we'd need a better understanding than we have
> of what cases a DBA might be interested in.  To take the specific example
> that started this thread, there wouldn't be a lot of value IMO in a
> classification like "connection failure messages".  I think the OP is
> probably right that those are often uninteresting --- but as I mentioned,
> "too many clients" might become interesting if he's wondering whether he
> needs to enlarge max_connections.  Or password failure cases might become
> interesting if he starts to suspect breakin attempts.  So I'd want to see
> a design that credibly covers those sorts of needs before we put any large
> effort into code changes.

Heck, I'd be happy just to have a class of messages which specifically
means "OMG, there's something wrong with the server", that is, a flag
for messages which only occur when PostgreSQL encounters a bug, data
corrpution, or platform error.  Right now, I have to suss those out by
regex.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Alvaro Herrera
Date:
Josh Berkus wrote:

> Heck, I'd be happy just to have a class of messages which specifically
> means "OMG, there's something wrong with the server", that is, a flag
> for messages which only occur when PostgreSQL encounters a bug, data
> corrpution, or platform error.  Right now, I have to suss those out by
> regex.

My proposal was to have something separate from message severity
("criticality").  So the problems would continue to be reported as
FATAL, ERROR or WARNING, but if they are just the result of something
the user did wrong, then they get marked as "non critical", but if
there, say, a failure to flush xlog (which currently results in an
ERROR), we could flag it as critical.  Grepping the log for critical
messages, regardless of severity, would result in actual action items
for the DBA, without having to grep things out by regex.

Otherwise you have to come up with a lot of messages each keeping the
current behavior of abort the current transaction or not, or terminate
the process or not.

There was also the idea that this would be driven off SQLSTATE but this
seems pretty unwieldy to me.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Peter Eisentraut" <peter_e@gmx.net>
> Yeah, this is part of a more general problem, which you have
> characterized correctly: What is fatal (or error, or warning, ...) to
> the client isn't necessarily fatal (or error, or warning, ...) to the
> server or DBA.

Thanks.  In addition, #5 and #6 in my previous mail are even unnecessary for 
both the client and the DBA, aren't they?


> Fixing this would need a larger enhancement of the
> logging infrastructure.  It's been discussed before, but it's a bit of 
> work.

How about the easy fix I proposed?  The current logging infrastructure seems 
enough to solve the original problem with small effort without complicating 
the code.  If you don't like "log_min_messages = PANIC", SetConfigOption() 
can be used instead.  I think we'd better take a step to eliminate the 
facing problem, as well as consider a much richer infrastracture in the long 
run.  I'm also interested in the latter, and want to discuss it after 
solving the problem in front of me.

Regards
MauMau





Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> No.  They are FATAL so far as the individual session is concerned.
> Possibly some documentation effort is needed here, but I don't think
> any change in the code behavior would be an improvement.

You are suggesting that we should add a note like "Don't worry about the 
following message.  This is a result of normal connectivity checking", don't 
you?

FATAL:  the database system is starting up

But I doubt most users would recognize such notes.  Anyway, lots of such 
messages certainly make monitoring and troubleshooting harder, because 
valuable messages are buried.


>> 4. FATAL:  sorry, too many clients already
>> Report these as FATAL to the client because the client wants to know the
>> reason.  But don't output them to server log because they are not 
>> necessary
>> for DBAs (4 is subtle.)
>
> The notion that a DBA should not be allowed to find out how often #4 is
> happening is insane.

I thought someone would point out so.  You are right, #4 is a strong hint 
for the DBA about max_connection setting or connection pool configuration.


Regards
MauMau




Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Josh Berkus" <josh@agliodbs.com>
> Heck, I'd be happy just to have a class of messages which specifically
> means "OMG, there's something wrong with the server", that is, a flag
> for messages which only occur when PostgreSQL encounters a bug, data
> corrpution, or platform error.  Right now, I have to suss those out by
> regex.

What are the issues of using SQLSTATE XXnnn as a filter?

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Andres Freund
Date:
On 2013-12-06 22:35:21 +0900, MauMau wrote:
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
> >No.  They are FATAL so far as the individual session is concerned.
> >Possibly some documentation effort is needed here, but I don't think
> >any change in the code behavior would be an improvement.
> 
> You are suggesting that we should add a note like "Don't worry about the
> following message.  This is a result of normal connectivity checking", don't
> you?
> 
> FATAL:  the database system is starting up

Uh. An explanation why you cannot connect to the database hardly seems
like a superflous log message.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Alvaro Herrera" <alvherre@2ndquadrant.com>
> There was also the idea that this would be driven off SQLSTATE but this
> seems pretty unwieldy to me.

You are referring to this long discussion, don't you?

http://www.postgresql.org/message-id/19791.1335902957@sss.pgh.pa.us

I've read it before, and liked the SQLSTATE-based approach.  It seems like 
properly assigned SQLSTATEs can be used as message IDs, and pairs of 
SQLSTATE and its user action might be utilized to provide sophisticated 
database administration GUI.

That discussion sounds interesting, and I want to take more time to 
consider.  But what do you think of my original suggestion to easily solve 
the current issue?  I'd like to remove the current annoying problem first 
before spending much time for more excited infrastructure.

Regards
MauMau




Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Tom Lane
Date:
"MauMau" <maumau307@gmail.com> writes:
> That discussion sounds interesting, and I want to take more time to 
> consider.  But what do you think of my original suggestion to easily solve 
> the current issue?  I'd like to remove the current annoying problem first 
> before spending much time for more excited infrastructure.

There is no enthusiasm for a quick-hack solution here, and most people
don't actually agree with your proposal that these errors should never
get logged.  So no, that is not happening.  You can hack your local
copy that way if you like of course, but it's not getting committed.
        regards, tom lane



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> There is no enthusiasm for a quick-hack solution here, and most people
> don't actually agree with your proposal that these errors should never
> get logged.  So no, that is not happening.  You can hack your local
> copy that way if you like of course, but it's not getting committed.

Oh, I may have misunderstood your previous comments.  I got the impression 
that you and others regard those messages (except "too many clients") as 
unnecessary in server log.

1. FATAL:  the database system is starting up
2. FATAL:  the database system is shutting down
3. FATAL:  the database system is in recovery mode

5. FATAL:  terminating walreceiver process due to administrator command
6. FATAL:  terminating background worker \"%s\" due to administrator command

Could you tell me why these are necessary in server log?  I guess like this. 
Am I correct?

* #1 through #3 are necessary for the DBA to investigate and explain to the 
end user why he cannot connect to the database.

* #4 and #5 are unnecessary for the DBA.  I can't find out any reason why 
these are useful for the DBA.

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
David Johnston
Date:
MauMau wrote
> From: "Tom Lane" <

> tgl@.pa

> >
>> There is no enthusiasm for a quick-hack solution here, and most people
>> don't actually agree with your proposal that these errors should never
>> get logged.  So no, that is not happening.  You can hack your local
>> copy that way if you like of course, but it's not getting committed.
> 
> Oh, I may have misunderstood your previous comments.  I got the impression 
> that you and others regard those messages (except "too many clients") as 
> unnecessary in server log.
> 
> 1. FATAL:  the database system is starting up
> 2. FATAL:  the database system is shutting down
> 3. FATAL:  the database system is in recovery mode
> 
> 5. FATAL:  terminating walreceiver process due to administrator command
> 6. FATAL:  terminating background worker \"%s\" due to administrator
> command
> 
> Could you tell me why these are necessary in server log?  I guess like
> this. 
> Am I correct?
> 
> * #1 through #3 are necessary for the DBA to investigate and explain to
> the 
> end user why he cannot connect to the database.
> 
> * #4 and #5 are unnecessary for the DBA.  I can't find out any reason why 
> these are useful for the DBA.

For me 1-3 are unusual events in production situations and so knowing when
they occur, and confirming they occurred for a good reason, is a key job of
the DBA.

5 and 6: I don't fully understand when they would happen but likely fall
into the same "the DBA should know what is going on with their server and
confirm any startup/shutdown activity it is involved with".

They might be better categorized "NOTICE" level if they were in response to
a administrator action, versus in response to a crashed process, but even
for the user-initiated situation making sure they hit the log but using
FATAL is totally understandable and IMO desirable.

I'd ask in what situations are these messages occurring so frequently that
they are becoming noise instead of useful data?  Sorry if I missed your
use-case explanation up-thread.

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782234.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
David Johnston
Date:
David Johnston wrote
> 
> MauMau wrote
>> From: "Tom Lane" <

>> tgl@.pa

>> >
>>> There is no enthusiasm for a quick-hack solution here, and most people
>>> don't actually agree with your proposal that these errors should never
>>> get logged.  So no, that is not happening.  You can hack your local
>>> copy that way if you like of course, but it's not getting committed.
>> 
>> Oh, I may have misunderstood your previous comments.  I got the
>> impression 
>> that you and others regard those messages (except "too many clients") as 
>> unnecessary in server log.
>> 
>> 1. FATAL:  the database system is starting up
>> 2. FATAL:  the database system is shutting down
>> 3. FATAL:  the database system is in recovery mode
>> 
>> 5. FATAL:  terminating walreceiver process due to administrator command
>> 6. FATAL:  terminating background worker \"%s\" due to administrator
>> command
>> 
>> Could you tell me why these are necessary in server log?  I guess like
>> this. 
>> Am I correct?
>> 
>> * #1 through #3 are necessary for the DBA to investigate and explain to
>> the 
>> end user why he cannot connect to the database.
>> 
>> * #4 and #5 are unnecessary for the DBA.  I can't find out any reason why 
>> these are useful for the DBA.
> For me 1-3 are unusual events in production situations and so knowing when
> they occur, and confirming they occurred for a good reason, is a key job
> of the DBA.
> 
> 5 and 6: I don't fully understand when they would happen but likely fall
> into the same "the DBA should know what is going on with their server and
> confirm any startup/shutdown activity it is involved with".
> 
> They might be better categorized "NOTICE" level if they were in response
> to a administrator action, versus in response to a crashed process, but
> even for the user-initiated situation making sure they hit the log but
> using FATAL is totally understandable and IMO desirable.
> 
> I'd ask in what situations are these messages occurring so frequently that
> they are becoming noise instead of useful data?  Sorry if I missed your
> use-case explanation up-thread.
> 
> David J.

Went and scanned the thread:

PITR/Failover is not generally that frequent an occurrence but I will agree
that these events are likely common during such.

Maybe PITR/Failover mode can output something in the logs to alleviate user
angst about these frequent events?  I'm doubting that a totally separate
mechanism can be used for this "mode" but instead of looking for things to
remove how about adding some additional coddling to the logs and the
beginning and end of the mode change?

Thought provoking only as I have not actually been a user of said feature.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782235.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
David Johnston
Date:
MauMau wrote
> From: "Tom Lane" <

> tgl@.pa

> >
>> There is no enthusiasm for a quick-hack solution here, and most people
>> don't actually agree with your proposal that these errors should never
>> get logged.  So no, that is not happening.  You can hack your local
>> copy that way if you like of course, but it's not getting committed.
> 
> Oh, I may have misunderstood your previous comments.  I got the impression 
> that you and others regard those messages (except "too many clients") as 
> unnecessary in server log.
> 
> 1. FATAL:  the database system is starting up

How about altering the message to tone down the severity by a half-step...

FATAL: (probably) not! - the database system is starting up

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782236.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "David Johnston" <polobo@yahoo.com>
>> 5. FATAL:  terminating walreceiver process due to administrator command
>> 6. FATAL:  terminating background worker \"%s\" due to administrator
>> command
> 5 and 6: I don't fully understand when they would happen but likely fall
> into the same "the DBA should know what is going on with their server and
> confirm any startup/shutdown activity it is involved with".
>
> They might be better categorized "NOTICE" level if they were in response 
> to
> a administrator action, versus in response to a crashed process, but even
> for the user-initiated situation making sure they hit the log but using
> FATAL is totally understandable and IMO desirable.

#5 is output when the DBA shuts down the replication standby server.
#6 is output when the DBA shuts down the server if he is using any custom 
background worker.
These messages are always output.  What I'm seeing as a problem is that 
FATAL messages are output in a normal situation, which worries the DBA, and 
those messages don't help the DBA with anything.  They merely worry the DBA.

Regards
MauMau




Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "David Johnston" <polobo@yahoo.com>
> PITR/Failover is not generally that frequent an occurrence but I will 
> agree
> that these events are likely common during such.
>
> Maybe PITR/Failover mode can output something in the logs to alleviate 
> user
> angst about these frequent events?  I'm doubting that a totally separate
> mechanism can be used for this "mode" but instead of looking for things to
> remove how about adding some additional coddling to the logs and the
> beginning and end of the mode change?

Yes, those messages are not output frequently, because they are only output 
during planned or unplanned downtime.  But frequency does not matter here. 
Imagine the DBA's heart.  They need to perform maintenance or recovery 
operation in a hurry and wish to not encounter any trouble.  They would 
panic if something goes trouble when he expects to see no trouble.  The 
"FATAL" messages merely make him worried.

The extra FATAL messages can be a problem for support staff.  What do you 
feel if the DBA asks you for help when some emergency trouble occurs during 
recovery, with a few important messages buried in lots of unnecessary FATAL 
ones?

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Greg Stark
Date:
On Sat, Dec 7, 2013 at 12:27 AM, David Johnston <polobo@yahoo.com> wrote:
>>
>> 1. FATAL:  the database system is starting up
>
> How about altering the message to tone down the severity by a half-step...
>
> FATAL: (probably) not! - the database system is starting up

Well it is fatal, the backend for that client doesn't continue.

FATAL means a backend died. It is kind of vague how FATAL and PANIC
differ. They both sound like the end of the world. If you read FATAL
thinking it means the whole service is quitting -- ie what PANIC means
-- then these sound like they're noise.


-- 
greg



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
David Johnston
Date:
MauMau wrote
> From: "David Johnston" <

> polobo@

> >
>>> 5. FATAL:  terminating walreceiver process due to administrator command
>>> 6. FATAL:  terminating background worker \"%s\" due to administrator
>>> command
>> 5 and 6: I don't fully understand when they would happen but likely fall
>> into the same "the DBA should know what is going on with their server and
>> confirm any startup/shutdown activity it is involved with".
>>
>> They might be better categorized "NOTICE" level if they were in response 
>> to
>> a administrator action, versus in response to a crashed process, but even
>> for the user-initiated situation making sure they hit the log but using
>> FATAL is totally understandable and IMO desirable.
> 
> #5 is output when the DBA shuts down the replication standby server.
> #6 is output when the DBA shuts down the server if he is using any custom 
> background worker.
> These messages are always output.  What I'm seeing as a problem is that 
> FATAL messages are output in a normal situation, which worries the DBA,
> and 
> those messages don't help the DBA with anything.  They merely worry the
> DBA.

While "worry" is something to be avoided not logging messages is not the
correct solution.  On the project side looking for ways and places to better
communicate to the user is worthwhile in the absence of adding additional
complexity to the logging system.  The user/DBA, though, is expected to
learn how the proces works, ideally in a testing environment where worry is
much less a problem, and understand that some of what they are seeing are
client-oriented messages that they should be aware of but that do not
indicate server-level issues by themselves.

They should probably run the log through a filter (a template of which the
project should provide) that takes the raw log and filters out those things
that, relative to the role and context, are really better classified as
NOTICE even if the log-level is shown as FATAL.  

Raw logs should be verbose so that tools can have more data with which to
make decisions.  You can always hide what is provided but you can never show
was has been omitted.  I get that people wil scan raw logs but targeting
that lowest-denominator isn't necessarily the best thing to do.  Instead,
getting those people on board with better practices and showing (and
providing) them helpful tools should be the goal.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/RFC-Shouldn-t-we-remove-annoying-FATAL-messages-from-server-log-tp5781899p5782267.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Andres Freund
Date:
On 2013-12-07 13:58:11 +0000, Greg Stark wrote:
> FATAL means a backend died. It is kind of vague how FATAL and PANIC
> differ.

I don't really see much vagueness there. FATAL is an unexpected but
orderly shutdown. PANIC is for the situations where we can't handle the
problem that occurred in any orderly way.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Greg Stark
Date:
On Sat, Dec 7, 2013 at 3:56 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>
> I don't really see much vagueness there. FATAL is an unexpected but
> orderly shutdown. PANIC is for the situations where we can't handle the
> problem that occurred in any orderly way.

Sorry, I was unclear. I meant that without context if someone asked
you which was more severe, "fatal" or "panic" you would have no
particular way to know. After all, for a person it's easier to cure a
panic than a fatality :)

On the client end the FATAL is pretty logical but in the logs it makes
it sound like the entire server died. Especially in this day of
multithreaded servers. I was suggesting that that was the origin of
the confusion here. Anyone who has seen these messages on the client
end many times might interpret them correctly in the server logs but
someone who has only been a DBA, not a database user might never have
seen them except in the server logs and without the context might not
realize that FATAL is a term of art peculiar to Postgres.


-- 
greg



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "David Johnston" <polobo@yahoo.com>
>>>> 5. FATAL:  terminating walreceiver process due to administrator command
>>>> 6. FATAL:  terminating background worker \"%s\" due to administrator
>>>> command
>>> 5 and 6: I don't fully understand when they would happen but likely fall
>>> into the same "the DBA should know what is going on with their server 
>>> and
>>> confirm any startup/shutdown activity it is involved with".
>>>
>>> They might be better categorized "NOTICE" level if they were in response
>>> to
>>> a administrator action, versus in response to a crashed process, but 
>>> even
>>> for the user-initiated situation making sure they hit the log but using
>>> FATAL is totally understandable and IMO desirable.
>>
>> #5 is output when the DBA shuts down the replication standby server.
>> #6 is output when the DBA shuts down the server if he is using any custom
>> background worker.
>> These messages are always output.  What I'm seeing as a problem is that
>> FATAL messages are output in a normal situation, which worries the DBA,
>> and
>> those messages don't help the DBA with anything.  They merely worry the
>> DBA.
>
> While "worry" is something to be avoided not logging messages is not the
> correct solution.  On the project side looking for ways and places to 
> better
> communicate to the user is worthwhile in the absence of adding additional
> complexity to the logging system.  The user/DBA, though, is expected to
> learn how the proces works, ideally in a testing environment where worry 
> is
> much less a problem, and understand that some of what they are seeing are
> client-oriented messages that they should be aware of but that do not
> indicate server-level issues by themselves.

I see, It might makes sense to make the DBA learn how the system works, so 
that more DBAs can solve problems by themselves.  However, I wonder how 
those messages (just stopping server process by admin request) are useful 
for the DBA.  If they are useful, why don't other background processes (e.g. 
archiver, writer, walsender, etc.) output such a message when stopping?  For 
information, #5 and #6 are not client-oriented.

If we want to keep the messages, I think LOG or DEBUG would be appropriate. 
How about altering "ereport(FATAL, ...)" to "ereport(LOG, ...); 
proc_exit(1)"?

Regards
MauMau





Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Greg Stark" <stark@mit.edu>
> On the client end the FATAL is pretty logical but in the logs it makes
> it sound like the entire server died. Especially in this day of
> multithreaded servers. I was suggesting that that was the origin of
> the confusion here. Anyone who has seen these messages on the client
> end many times might interpret them correctly in the server logs but
> someone who has only been a DBA, not a database user might never have
> seen them except in the server logs and without the context might not
> realize that FATAL is a term of art peculiar to Postgres.

I think so, too.
My customers and colleagues, who are relatively new to PostgreSQL, asked 
like "Is this FATAL message a sign of some problem?  What does it mean?"  I 
think it's natural to show concern when finding FATAL messages.  I find it 
unnatural for a normal administration operation to emit a FATAL message.

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Kevin Grittner
Date:
MauMau <maumau307@gmail.com> wrote:
> From: "Greg Stark" <stark@mit.edu>

>> On the client end the FATAL is pretty logical but in the logs it
>> makes it sound like the entire server died.

I agree that is easily misunderstood, especially since a FATAL
problem is less severe than a PANIC; while in common English usage
panic is what might feel when faced with the prospect of speaking
in public, but fatal generally means something that kills -- like a
disease or a plane crash.  There is the notion of a "fatal error"
in English, though; which means an error which puts an end to what
the person who makes such an error is attempting.

>> FATAL is a term of art peculiar to Postgres.

No, it is not; at least not in terms of being "characteristic of
only one person, group, or thing".  The Java logger from Apache
called log4j has a FATAL level which is more serious than ERROR.
The distinction is intended to indicate whether the application is
likely to be able to continue running.  Similarly, Sybase ASE has
severity levels, where above a certain point they are described as
"fatal" -- meaning that the application must acquire a new
connection.  It's probably used elsewhere as well; those are just a
couple I happened to be familiar with.

> I find it unnatural for a normal administration operation to emit
> a FATAL message.

It seems to be a fairly common term of art for a problem which
requires a restart or reconnection.  FATAL is used when the problem
is severe enough that the process or connection must end.  It seems
to me to be what should consistently be used when a client
connection or its process must be terminated for a reason other
than a client-side request to terminate.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Jim Nasby
Date:
On 12/8/13 3:08 PM, MauMau wrote:
>>>
>>> #5 is output when the DBA shuts down the replication standby server.
>>> #6 is output when the DBA shuts down the server if he is using any custom
>>> background worker.
>>> These messages are always output.  What I'm seeing as a problem is that
>>> FATAL messages are output in a normal situation, which worries the DBA,
>>> and
>>> those messages don't help the DBA with anything.  They merely worry the
>>> DBA.
>>
>> While "worry" is something to be avoided not logging messages is not the
>> correct solution.  On the project side looking for ways and places to better
>> communicate to the user is worthwhile in the absence of adding additional
>> complexity to the logging system.  The user/DBA, though, is expected to
>> learn how the proces works, ideally in a testing environment where worry is
>> much less a problem, and understand that some of what they are seeing are
>> client-oriented messages that they should be aware of but that do not
>> indicate server-level issues by themselves.
>
> I see, It might makes sense to make the DBA learn how the system works, so that more DBAs can solve problems by
themselves. However, I wonder how those messages (just stopping server process by admin request) are useful for the
DBA. If they are useful, why don't other background processes (e.g. archiver, writer, walsender, etc.) output such a
messagewhen stopping?  For information, #5 and #6 are not client-oriented.
 
>
> If we want to keep the messages, I think LOG or DEBUG would be appropriate. How about altering "ereport(FATAL, ...)"
to"ereport(LOG, ...); proc_exit(1)"?
 

IMNSHO, this isn't a "worry" thing, it's a complete mis-categorization of what's happening. Do we output "FATAL" when
youdisconnect from the database? Something that is happening *by design* should not be considered as "fatal".
 

I'm not saying we shouldn't log it at all, but it's silly to call it fatal just because the process is going away *when
wedid something that needs to make it go away*.
 

What makes it worse is what happens if you do something like pg_terminate_backend() on one of those? That *should*
createa fatal error (or at least be treated the same way we treat any other use of pg_terminate_backend()).
 

So in this case I'm betting the real problem is that an orderly shutdown is calling pg_terminate_backend() or the
equivalent.While that's convenient from a code standpoint, it's inaccurate from an API standpoint.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Jim Nasby
Date:
On 12/6/13 7:38 AM, Andres Freund wrote:
> On 2013-12-06 22:35:21 +0900, MauMau wrote:
>> From: "Tom Lane" <tgl@sss.pgh.pa.us>
>>> No.  They are FATAL so far as the individual session is concerned.
>>> Possibly some documentation effort is needed here, but I don't think
>>> any change in the code behavior would be an improvement.
>>
>> You are suggesting that we should add a note like "Don't worry about the
>> following message.  This is a result of normal connectivity checking", don't
>> you?
>>
>> FATAL:  the database system is starting up
>
> Uh. An explanation why you cannot connect to the database hardly seems
> like a superflous log message.

It is when *you* are not actually trying to connect but rather pg_ctl is (which is one of the use cases here).

Arguably 1-3 are inaccurate since they're not really about a backend dying... they occur during the startup phase; you
nevereven get a functioning backend. That's a marked difference from other uses of FATAL.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Tom Lane
Date:
Jim Nasby <jim@nasby.net> writes:
> Arguably 1-3 are inaccurate since they're not really about a backend dying... they occur during the startup phase;
younever even get a functioning backend. That's a marked difference from other uses of FATAL.
 

How so?  "FATAL" means "an error that terminates your session", which
is exactly what these are.
        regards, tom lane



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Jim Nasby
Date:
On 12/9/13 5:56 PM, Tom Lane wrote:
> Jim Nasby <jim@nasby.net> writes:
>> Arguably 1-3 are inaccurate since they're not really about a backend dying... they occur during the startup phase;
younever even get a functioning backend. That's a marked difference from other uses of FATAL.
 
>
> How so?  "FATAL" means "an error that terminates your session", which
> is exactly what these are.

Except in these cases the user never actually got a working session; their request was denied.

To be clear, from the client standpoint it's certainly fatal, but not from the server's point of view. This is fully
expectedbehavior as far as the server is concerned. (Obviously it might be an error that caused the shutdown/recovery,
butthat's something different.)
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Tom Lane
Date:
Jim Nasby <jim@nasby.net> writes:
> On 12/9/13 5:56 PM, Tom Lane wrote:
>> How so?  "FATAL" means "an error that terminates your session", which
>> is exactly what these are.

> Except in these cases the user never actually got a working session; their request was denied.

> To be clear, from the client standpoint it's certainly fatal, but not from the server's point of view. This is fully
expectedbehavior as far as the server is concerned. (Obviously it might be an error that caused the shutdown/recovery,
butthat's something different.)
 

Right, but as already pointed out in this thread, these messages are
worded from the client's point of view.  "The client never got a working
connection" seems to me to be an empty distinction.  If you got SIGTERM'd
before you could issue your first query, should that not be FATAL because
you'd not gotten any work done?

More generally, we also say FATAL for all sorts of entirely routine
connection failures, like wrong password or mistyped user name.  People
don't seem to have a problem with those.  Even if some do complain,
the costs of changing that behavior after fifteen-years-and-counting
would certainly exceed any benefit.
        regards, tom lane



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Craig Ringer
Date:
On 12/05/2013 11:25 PM, MauMau wrote:
> Hello,
> 
> My customers and colleagues sometimes (or often?) ask about the
> following message:
> 
> FATAL:  the database system is starting up

I would LOVE that message to do away, forever.

It's a huge PITA for automated log monitoring, analysis, and alerting.

The other one I'd personally like to change, but realise is harder to
actually do, is to separate "ERROR"s caused by obvious user input issues
from internal ERRORs like not finding the backing file for a relation,
block read errors, etc.

String pattern matching is a crude and awful non-solution, especially
given the way PostgreSQL loves to output messages to the log in whatever
language and encoding the current database connection is in.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Craig Ringer
Date:
On 12/06/2013 03:02 AM, Josh Berkus wrote:
> Heck, I'd be happy just to have a class of messages which specifically
> means "OMG, there's something wrong with the server", that is, a flag
> for messages which only occur when PostgreSQL encounters a bug, data
> corrpution, or platform error.  Right now, I have to suss those out by
> regex.

+10

That's what I really need to see in the logs, too.

Using SQLState might be reasonable, but I'd be concerned we'd find cases
where the same standard-specififed SQLState should be sent to the
*client*, despite different underlying causes for the server of
different levels of DBA concern.

I'd rather not suppress logging of user-level errors, etc; they're
important too. It's just necessary to be able to separate them from
internal errors that should never happen when the system is operating
correctly in order to do effective log-based alerting.

After all, I *don't* want to get an SMS whenever the deadlock detector
kicks in and someone's created a database in de_DE so the string pattern
match doesn't kick in.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Bruce Momjian
Date:
On Tue, Dec 10, 2013 at 08:47:22AM +0800, Craig Ringer wrote:
> On 12/05/2013 11:25 PM, MauMau wrote:
> > Hello,
> > 
> > My customers and colleagues sometimes (or often?) ask about the
> > following message:
> > 
> > FATAL:  the database system is starting up
> 
> I would LOVE that message to do away, forever.
> 
> It's a huge PITA for automated log monitoring, analysis, and alerting.
> 
> The other one I'd personally like to change, but realise is harder to
> actually do, is to separate "ERROR"s caused by obvious user input issues
> from internal ERRORs like not finding the backing file for a relation,
> block read errors, etc.
> 
> String pattern matching is a crude and awful non-solution, especially
> given the way PostgreSQL loves to output messages to the log in whatever
> language and encoding the current database connection is in.

Yes, this is certainly a challenge.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Kevin Grittner" <kgrittn@ymail.com>
It seems to be a fairly common term of art for a problem which
requires a restart or reconnection. FATAL is used when the problem
is severe enough that the process or connection must end. It seems
to me to be what should consistently be used when a client
connection or its process must be terminated for a reason other
than a client-side request to terminate.


What do you think of #5 and #6 when matching the above criteria?

5. FATAL:  terminating walreceiver process due to administrator command
6. FATAL:  terminating background worker \"%s\" due to administrator command

These are output when the DBA shuts down the database server and there's no 
client connection.  That is, these don't meet the criteria.  I believe these 
should be suppressed, or use LOG instead of FATAL.

Regards
MauMau





Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Andres Freund
Date:
On 2013-12-12 00:31:25 +0900, MauMau wrote:
> What do you think of #5 and #6 when matching the above criteria?
> 
> 5. FATAL:  terminating walreceiver process due to administrator command
> 6. FATAL:  terminating background worker \"%s\" due to administrator command

Those are important if they happen outside a shutdown. So, if you really
want to remove them from there, you'd need to change the signalling to
handle the cases differently.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Kevin Grittner
Date:
MauMau <maumau307@gmail.com> wrote:
> From: "Kevin Grittner" <kgrittn@ymail.com>

> FATAL is used when the problem is severe enough that the process
> or connection must end. It seems to me to be what should
> consistently be used when a client connection or its process must
> be terminated for a reason other than a client-side request to
> terminate.
>
> What do you think of #5 and #6 when matching the above criteria?
>
> 5. FATAL:  terminating walreceiver process due to administrator
> command
> 6. FATAL:  terminating background worker \"%s\" due to
> administrator command

Those are client connections and their backends terminated for a
reason other than the client side of the connection requesting it.
If we don't classify those as FATAL then the definition of FATAL
becomes much more fuzzy.  What would you define it to mean?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> Jim Nasby <jim@nasby.net> writes:
>> On 12/9/13 5:56 PM, Tom Lane wrote:
>>> How so?  "FATAL" means "an error that terminates your session", which
>>> is exactly what these are.
>
>> Except in these cases the user never actually got a working session; 
>> their request was denied.
>
>> To be clear, from the client standpoint it's certainly fatal, but not 
>> from the server's point of view. This is fully expected behavior as far 
>> as the server is concerned. (Obviously it might be an error that caused 
>> the shutdown/recovery, but that's something different.)
>
> Right, but as already pointed out in this thread, these messages are
> worded from the client's point of view.  "The client never got a working
> connection" seems to me to be an empty distinction.  If you got SIGTERM'd
> before you could issue your first query, should that not be FATAL because
> you'd not gotten any work done?
>
> More generally, we also say FATAL for all sorts of entirely routine
> connection failures, like wrong password or mistyped user name.  People
> don't seem to have a problem with those.  Even if some do complain,
> the costs of changing that behavior after fifteen-years-and-counting
> would certainly exceed any benefit.

I agree that #1-#3 are of course reasonable when there's any client the user 
runs.  The problem is that #1 (The database system is starting up) is output 
in the server log by pg_ctl.  In that case, there's no client the user is 
responsible for.  Why does a new DBA have to be worried about that FATAL 
message?  He didn't do anything wrong.

I thought adding "options='-c log_min_messages=PANIC'" to the connection 
string for PQping() in pg_ctl.c would vanish the message, but it didn't. 
The reason is that connection options take effect in PostgresMain(), which 
is after checking the FATAL condition in ProcessStartupPacket().  Do you 
think there is any good solution?

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Tom Lane
Date:
"MauMau" <maumau307@gmail.com> writes:
> I agree that #1-#3 are of course reasonable when there's any client the user 
> runs.  The problem is that #1 (The database system is starting up) is output 
> in the server log by pg_ctl.  In that case, there's no client the user is 
> responsible for.  Why does a new DBA have to be worried about that FATAL 
> message?  He didn't do anything wrong.

FATAL doesn't mean "the DBA did something wrong".  It means "we terminated
a client session".

The fundamental problem IMO is that you want to complicate the definition
of what these things mean as a substitute for DBAs learning something
about Postgres.  That seems like a fool's errand from here.  They're going
to have to learn what FATAL means sooner or later, and making it more
complicated just raises the height of that barrier.
        regards, tom lane



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Andres Freund" <andres@2ndquadrant.com>
> On 2013-12-12 00:31:25 +0900, MauMau wrote:
>> 5. FATAL:  terminating walreceiver process due to administrator command
>> 6. FATAL:  terminating background worker \"%s\" due to administrator 
>> command
>
> Those are important if they happen outside a shutdown. So, if you really
> want to remove them from there, you'd need to change the signalling to
> handle the cases differently.

How are they important?  If someone mistakenly sends SIGTERM to walreceiver 
and background workers, they are automatically launched by postmaster or 
startup process later like other background processes.  But other background 
processes such as walsender, bgwriter, etc. don't emit FATAL messages.

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
"MauMau"
Date:
From: "Kevin Grittner" <kgrittn@ymail.com>
> 5. FATAL: terminating walreceiver process due to administrator
> command
> 6. FATAL: terminating background worker \"%s\" due to
> administrator command

Those are client connections and their backends terminated for a
reason other than the client side of the connection requesting it.
If we don't classify those as FATAL then the definition of FATAL
becomes much more fuzzy. What would you define it to mean?


I'm sorry to cause you trouble, but my understanding is that those are not 
client connections.  They are just background server processes; walreceiver 
is started by startup process, and background workers are started by 
extension modules.  Am I misunderstanding something?

According to Table 18-1 in the manual:

http://www.postgresql.org/docs/current/static/runtime-config-logging.html

the definition of FATAL is:

FATAL
Reports an error that caused the current session to abort.

This does not apply to the above messages, because there is no error.  The 
DBA just shut down the database server, and the background processes 
terminated successfully.

If some message output is desired, LOG's definition seems the nearest:

LOG
Reports information of interest to administrators, e.g., checkpoint 
activity.

So, I thought "ereport(LOG, ...); proc_exit(0)" is more appropriate than 
ereport(FATAL, ...).  Is this so strange?

Regards
MauMau




Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Josh Berkus
Date:
On 12/11/2013 08:48 AM, Tom Lane wrote:
> The fundamental problem IMO is that you want to complicate the definition
> of what these things mean as a substitute for DBAs learning something
> about Postgres.  That seems like a fool's errand from here.  They're going
> to have to learn what FATAL means sooner or later, and making it more
> complicated just raises the height of that barrier.

I don't think it works to change the NOTICE/ERROR/FATAL tags; for one
thing, I can hear the screaming about people's log scripts from here.

However, it would really be useful to have an extra tag (in addition to
the ERROR or FATAL) for "If you're seeing this message, something has
gone seriously wrong on the server."  Just stuff like corruption
messages, backend crashes, etc.

Otherwise we're requiring users to come up with an alphabet soup of
regexes to filter out the noise error messages from the really, really
important ones.  Speaking as someone who does trainings for new DBAs,
the part where I do "what to look for in the logs" requires over an hour
and still doesn't cover everything. And doesn't internationalize. That's
nasty.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> However, it would really be useful to have an extra tag (in addition to
> the ERROR or FATAL) for "If you're seeing this message, something has
> gone seriously wrong on the server."  Just stuff like corruption
> messages, backend crashes, etc.

Right, we've discussed that idea elsewhere; there's a basically orthogonal
classification that needs to happen.  Pretty much all PANICs are high
priority from a DBA's perspective, but only a subset of either FATAL or
ERROR are.  Somebody needs to do the legwork to determine just what kind
of classification scheme we want and propose at least an initial set of
ereports to be so marked.

One thought I had was that we could probably consider the default behavior
(in the absence of any call of an explicit criticality-marking function)
to be like this:
for ereport(), it's critical if a PANIC and otherwise not
for elog(), it's critical if >= ERROR level, otherwise not.
The rationale for this is that we generally use elog for
not-supposed-to-happen cases, so those are probably interesting.
If we start getting complaints about some elog not being so interesting,
we can convert it to an ereport so it can include an explicit marking
call.
        regards, tom lane