Thread: log_collector doesn't respond to reloads

log_collector doesn't respond to reloads

From
Josh Berkus
Date:
Summary: despite pg_reload(), log directory, filename and destination
don't change

Version: 9.1.3
Platform: RHEL5, custom compile
Severity: Persistant annoyance

Description:

1) change log_directory in postgresql.conf
2) pg_reload_conf()
3) show log_directory displays the correct new directory
4) logging continues to the old directory

Given the custom compile, I'd chalk this up to wonky platform issues,
except that I've seen this problem on other systems, intermittently.
This particular system has the issue in 100% reproduceable fashion, so
I'd like suggestions on how to get to the bottom of it.

I also tried a sighup to the logger process, with  no effect.


--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Summary: despite pg_reload(), log directory, filename and destination
> don't change

Looking at the code, it's really hard to see how this could possibly
happen, unless maybe the process is blocking receipt of SIGHUP.  Which
it shouldn't be.  Not sure about RHEL5, but on recent Linuxen you can
check the process's signal masks like this:

    grep ^Sig /proc/NNNN/status

where NNNN is the logging collector's PID.  Could we see that?

            regards, tom lane

Re: log_collector doesn't respond to reloads

From
Josh Berkus
Date:
On 4/26/12 5:50 PM, Tom Lane wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> Summary: despite pg_reload(), log directory, filename and destination
>> don't change
>
> Looking at the code, it's really hard to see how this could possibly
> happen, unless maybe the process is blocking receipt of SIGHUP.  Which
> it shouldn't be.  Not sure about RHEL5, but on recent Linuxen you can
> check the process's signal masks like this:
>
>     grep ^Sig /proc/NNNN/status
>
> where NNNN is the logging collector's PID.  Could we see that?

SigQ:   0/399360
SigPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000001007806
SigCgt: 0000000000000201

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> On 4/26/12 5:50 PM, Tom Lane wrote:
>> check the process's signal masks like this:
>> grep ^Sig /proc/NNNN/status
>> where NNNN is the logging collector's PID.  Could we see that?

> SigQ:   0/399360
> SigPnd: 0000000000000000
> SigBlk: 0000000000000000
> SigIgn: 0000000001007806
> SigCgt: 0000000000000201

Hm, I'm not sure what SigQ is, but the other values match what I see for
a logging collector process on my Fedora 16 box ... and that process
responds to SIGHUP just fine.  (I believe the rightmost bit of these
masks corresponds to SIGHUP, so this is saying the process is catching
SIGHUP and not currently blocking it, which is what it should be.)

Do you want to try attaching to the collector with a debugger and seeing
if it ever gets into the "if (got_SIGHUP)" block in SysLoggerMain?

            regards, tom lane

Re: log_collector doesn't respond to reloads

From
Mark Kirkwood
Date:
On 27/04/12 13:11, Josh Berkus wrote:
> On 4/26/12 5:50 PM, Tom Lane wrote:
>> Josh Berkus<josh@agliodbs.com>  writes:
>>> Summary: despite pg_reload(), log directory, filename and destination
>>> don't change
>> Looking at the code, it's really hard to see how this could possibly
>> happen, unless maybe the process is blocking receipt of SIGHUP.  Which
>> it shouldn't be.  Not sure about RHEL5, but on recent Linuxen you can
>> check the process's signal masks like this:
>>
>>     grep ^Sig /proc/NNNN/status
>>
>> where NNNN is the logging collector's PID.  Could we see that?
> SigQ:   0/399360
> SigPnd: 0000000000000000
> SigBlk: 0000000000000000
> SigIgn: 0000000001007806
> SigCgt: 0000000000000201
>

Might be a red herring, but I was able to reproduce this if (and only
if) I forgot to create the new dest directory before doing the reload.
Subsequently creating the directory and reloading did not result in a
file in the new location.

Regards

Mark

Re: log_collector doesn't respond to reloads

From
Josh Berkus
Date:
> Do you want to try attaching to the collector with a debugger and seeing
> if it ever gets into the "if (got_SIGHUP)" block in SysLoggerMain?

Hmmm.  No debugger on this system, not unexpectedly.  I'll see if I can
get authorization to add one.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> Do you want to try attaching to the collector with a debugger and seeing
>> if it ever gets into the "if (got_SIGHUP)" block in SysLoggerMain?

> Hmmm.  No debugger on this system, not unexpectedly.  I'll see if I can
> get authorization to add one.

How about strace --- if that's available, just watching the collector
process while you send it SIGHUP might be informative.

            regards, tom lane

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> Might be a red herring, but I was able to reproduce this if (and only
> if) I forgot to create the new dest directory before doing the reload.
> Subsequently creating the directory and reloading did not result in a
> file in the new location.

Hmm, interesting point.  There's this code in there when fopen'ing the
log file fails:

            /*
             * ENFILE/EMFILE are not too surprising on a busy system; just
             * keep using the old file till we manage to get a new one.
             * Otherwise, assume something's wrong with Log_directory and stop
             * trying to create files.
             */
            if (errno != ENFILE && errno != EMFILE)
            {
                ereport(LOG,
                        (errmsg("disabling automatic rotation (use SIGHUP to re-enable)")));
                Log_RotationAge = 0;
                Log_RotationSize = 0;
            }

Whoever wrote that thought that Log_RotationAge/Log_RotationSize would
get reset to normal values during SIGHUP, but it's far from clear to me
that any such thing would actually happen.  However, this would only
apply to Josh's problem if he was trying to set a bogus new value of
log_directory, eg not there or not writable by postgres.  In any case,
if this is the locus of the problem, there ought to be instances of that
log message in the active log file.  (Josh?)

Another point here is that the syslogger will attempt to create the log
directory if it's not there ... but only once, during startup.  Should
we have it repeat that after a SIGHUP?

            regards, tom lane

Re: log_collector doesn't respond to reloads

From
Josh Berkus
Date:
> Whoever wrote that thought that Log_RotationAge/Log_RotationSize would
> get reset to normal values during SIGHUP, but it's far from clear to me
> that any such thing would actually happen.  However, this would only
> apply to Josh's problem if he was trying to set a bogus new value of
> log_directory, eg not there or not writable by postgres.  In any case,
> if this is the locus of the problem, there ought to be instances of that
> log message in the active log file.  (Josh?)

Aha.

Yeah, the problem is, directory permissions to the contrary, something
is preventing the logger from writing to that directory even though I
can do so as the logged-in postgres user.  I'll bet it's their SAN
dynamic mounter thing, given the trouble it's been before. *sigh*.

So this is an issue peculiar to their system, and not a general case.
Sorry for the noise.  I've seen transitory behavior like this before and
had hoped that I'd found a reproduceable test case.  But no.

We do have one piece of unituitive behavior here though, which forms a
bit of a catch-22:

1. DBA changes the log directory
2. Log directory is unwriteable
3. Postgres continues writing to the old log_directory
4. SHOW log_directory displays the *new* log_directory

I think (4) here needs to change.  We shouldn't be showing a different
log directory in pg_settings than we're actually writing to, ever.

Now, here's the Catch-22:

Consider the case that time elapses before anyone discovers that logs
are not being written to the new location, and you change personnel; how
would the new DBA have any idea where the old log was so that he could
read the log message about the unwriteable directory?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
I wrote:
> Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
>> Might be a red herring, but I was able to reproduce this if (and only
>> if) I forgot to create the new dest directory before doing the reload.
>> Subsequently creating the directory and reloading did not result in a
>> file in the new location.

> Whoever wrote that thought that Log_RotationAge/Log_RotationSize would
> get reset to normal values during SIGHUP, but it's far from clear to me
> that any such thing would actually happen.

And in fact a bit of experimentation proves that it doesn't, unless
there are uncommented log_rotation_age and log_rotation_size entries
in postgresql.conf.  So this is a bug, not to mention bad practice.
I think the code needs to use an explicit "rotation_disabled" flag
that it can reset in the SIGHUP block, instead of stomping on GUC
variables.

However, it remains far from clear whether this has anything to do with
Josh's situation.  If it does, there should be telltale messages in
his log ...

            regards, tom lane

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> We do have one piece of unituitive behavior here though, which forms a
> bit of a catch-22:

> 1. DBA changes the log directory
> 2. Log directory is unwriteable
> 3. Postgres continues writing to the old log_directory
> 4. SHOW log_directory displays the *new* log_directory

> I think (4) here needs to change.

Well, sorry, but you can't have that.  SHOW will tell you what your own
backend thinks the value of the GUC variable is, but there is no way to
know what's happening inside the logging collector process.  And I'd be
against trying to add a signaling mechanism that would support telling
you that, because it would add fragility to the logging collector, which
we don't want.

> Consider the case that time elapses before anyone discovers that logs
> are not being written to the new location,

I don't buy that argument.  If things are working as intended, the
collector ought to create a new file in the commanded location
immediately.  I would think any normal DBA would look for that, just
to check that the SIGHUP had worked.

We clearly need to fix the "disable rotation" logic, and I think a
good case can be made for trying a mkdir() when Log_directory changes,
but I don't see that there's more that can be done about this kind
of situation.

BTW, what log messages were you getting exactly?  I'd have expected
something about "could not open log file" as well as the "disabling
automatic rotation" one.

            regards, tom lane

Re: log_collector doesn't respond to reloads

From
Josh Berkus
Date:
> Well, sorry, but you can't have that.  SHOW will tell you what your own
> backend thinks the value of the GUC variable is, but there is no way to
> know what's happening inside the logging collector process.  And I'd be
> against trying to add a signaling mechanism that would support telling
> you that, because it would add fragility to the logging collector, which
> we don't want.

Hmmm.  There's really no feasible way to determine where the actual
logging collector is logging to?

> I don't buy that argument.  If things are working as intended, the
> collector ought to create a new file in the commanded location
> immediately.  I would think any normal DBA would look for that, just
> to check that the SIGHUP had worked.

You're assuming that the change is happening attended.  Consider
automated changes being deployed via puppet or chef, possibly across
many servers.

You can end up in a situation where the logs aren't going where they're
supposed to due to some external problem, and that the DBA has no way to
find out what went wrong because he doesn't know where the logs are *now*.

Mind you, I don't have a solution to suggest, but I think it's a real
problem.

And fixing the rotation bug will help that a lot; it would narrow down
the problem to the current day/hour, which would make it much more
likely that someone would know that the log location had changed.

> BTW, what log messages were you getting exactly?  I'd have expected
> something about "could not open log file" as well as the "disabling
> automatic rotation" one.

2012-04-26 16:38:21 PDT [10180]: [2-1] user=,db= LOG:  received SIGHUP,
reloading configuration files
2012-04-26 16:38:21 PDT [10181]: [1-1] user=,db= LOG:  could not open
log file "/pglogs/check/logs/datacollection-2012-04-26-16-38": No such
file or directory
2012-04-26 16:38:21 PDT [10181]: [2-1] user=,db= LOG:  disabling
automatic rotation (use SIGHUP to re-enable)

So, yes, exactly.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> You can end up in a situation where the logs aren't going where they're
> supposed to due to some external problem, and that the DBA has no way to
> find out what went wrong because he doesn't know where the logs are *now*.

Well, if nothing else, lsof would help.  Another possibility is that we
might change the logging collector process to show its current target
filename in ps status (although might there be security/privacy issues
with that?).  Neither of those things will help Windows users
of course, but the sorts of cases you're presenting aren't going to be
happening on Windows boxes.

[ thinks some more... ]  A lower-tech solution would be to always write
the name of the current log target file into some small text file in
$PGDATA.

On the whole though, I think this is an invented problem.  We've never
heard a complaint from the field about it.

            regards, tom lane

Re: log_collector doesn't respond to reloads

From
Magnus Hagander
Date:
On Sat, Apr 28, 2012 at 03:35, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> You can end up in a situation where the logs aren't going where they're
>> supposed to due to some external problem, and that the DBA has no way to
>> find out what went wrong because he doesn't know where the logs are *now=
*.
>
> Well, if nothing else, lsof would help. =A0Another possibility is that we
> might change the logging collector process to show its current target
> filename in ps status (although might there be security/privacy issues
> with that?). =A0Neither of those things will help Windows users
> of course, but the sorts of cases you're presenting aren't going to be
> happening on Windows boxes.

We do have the "fake process status kernel object" that can be used as
a last resort. If this is something that would only be used in
debugging scenarios, that would be perfectly reasonable I think.


> [ thinks some more... ] =A0A lower-tech solution would be to always write
> the name of the current log target file into some small text file in
> $PGDATA.

That seems ugly.


> On the whole though, I think this is an invented problem. =A0We've never
> heard a complaint from the field about it.

I think process title seems reasonable. We do that for archiver for
example, to tell you where it's writing, don't we?

--=20
=A0Magnus Hagander
=A0Me: http://www.hagander.net/
=A0Work: http://www.redpill-linpro.com/

Re: log_collector doesn't respond to reloads

From
Josh Berkus
Date:
>> On the whole though, I think this is an invented problem.  We've never
>> heard a complaint from the field about it.
>
> I think process title seems reasonable. We do that for archiver for
> example, to tell you where it's writing, don't we?

Yes, we do.

This isn't an invented problem; a brief canvass on IRC shows that people
run into issues with log_collector reloads fairly commonly.  However,
it's pretty low priority, certainly -- never rises above the level of
"annoyance".  The sort of thing where it would be good to have a
bugtracker to put it in so the next time someone is working on the log
collector for other reasons they can tweak this too, or when some new
hacker wants an easy first patch.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: log_collector doesn't respond to reloads

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> This isn't an invented problem; a brief canvass on IRC shows that people
> run into issues with log_collector reloads fairly commonly.  However,
> it's pretty low priority, certainly -- never rises above the level of
> "annoyance".  The sort of thing where it would be good to have a
> bugtracker to put it in so the next time someone is working on the log
> collector for other reasons they can tweak this too, or when some new
> hacker wants an easy first patch.

We have that; it's called TODO.
http://wiki.postgresql.org/wiki/Todo

            regards, tom lane