Thread: Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

On Tue, 2011-01-18 at 19:04 +0000, Magnus Hagander wrote:
> Log replication connections only when log_connections is on
> 
> Previously we'd always log replication connections, with no
> way to turn them off.

You noted that the code was there intentionally, yet you also couldn't
see the reason. That is not a great reason to change it. It's especially
not a great reason to make the change quickly.

The log entry served a very specific purpose, which was helping people
to diagnose problems with replication connectivity. It isn't practical
or sensible to force people to use log_connections to help with that;
that is a sledgehammer to crack a nut. Few people have it turned on in
production, so turning it on after a problem happened doesn't help
diagnose things.

The negative impact of this was a couple of log lines. No bug. Plus I
don't see any reason to introduce an incompatibility with the log output
from 9.0.

So removing this has almost no positive benefit, yet a clear negative
one. We need to look at the negative aspects of changes before we make
them.

Let's concentrate on adding the major features, not rush through loads
of minor changes.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



Re: [COMMITTERS] pgsql: Log replication connections only when log_connections is on

From
Magnus Hagander
Date:
On Wed, Jan 19, 2011 at 13:36, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Tue, 2011-01-18 at 19:04 +0000, Magnus Hagander wrote:
>> Log replication connections only when log_connections is on
>>
>> Previously we'd always log replication connections, with no
>> way to turn them off.
>
> You noted that the code was there intentionally, yet you also couldn't
> see the reason. That is not a great reason to change it. It's especially
> not a great reason to make the change quickly.

Yes. And I brought it up in discussion, and the consensus was to
change it to be consistent.

And per that thread, there was also no consensus on ever *adding* it
that way - if I'd realized it was that way back then, I would've
objected at the time. I didn't realize it until I started seeing the
systems in production more.


> The log entry served a very specific purpose, which was helping people
> to diagnose problems with replication connectivity. It isn't practical
> or sensible to force people to use log_connections to help with that;
> that is a sledgehammer to crack a nut. Few people have it turned on in
> production, so turning it on after a problem happened doesn't help
> diagnose things.

Again, if you read the thread, the consensus was to do it the simple
way and make it configurable by that one. The other suggestion was to
turn log_connections into an enum, which was considered unnecessarily
complicated.


> The negative impact of this was a couple of log lines. No bug. Plus I
> don't see any reason to introduce an incompatibility with the log output
> from 9.0.

The inability to remove it has certainly annoyed me, and I know a few
others who have commented on it. And it's inconsistent behavior,
something we in general try to avoid.


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


On Wed, 2011-01-19 at 13:44 +0100, Magnus Hagander wrote:
> On Wed, Jan 19, 2011 at 13:36, Simon Riggs <simon@2ndquadrant.com> wrote:
> > On Tue, 2011-01-18 at 19:04 +0000, Magnus Hagander wrote:
> >> Log replication connections only when log_connections is on
> >>
> >> Previously we'd always log replication connections, with no
> >> way to turn them off.
> >
> > You noted that the code was there intentionally, yet you also couldn't
> > see the reason. That is not a great reason to change it. It's especially
> > not a great reason to make the change quickly.
> 
> Yes. And I brought it up in discussion, and the consensus was to
> change it to be consistent.

No it wasn't. Robert explained the reason it was there.

Why make the change? Why make it quickly? Why avoid and ignore the CF?

> > The negative impact of this was a couple of log lines. No bug. Plus I
> > don't see any reason to introduce an incompatibility with the log output
> > from 9.0.
> 
> The inability to remove it has certainly annoyed me, and I know a few
> others who have commented on it. And it's inconsistent behavior,
> something we in general try to avoid.

I care about diagnosing problems on production systems. There will be
many more people annoyed by the inability to diagnose issues then there
will be by people who care lots about a couple of log lines, or who care
about a few people's views on inconsistency.

Is your annoyance worth more than causing newbies problems and being
unable to diagnose production systems?

How will we diagnose erratic connection problems now?

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Wed, 2011-01-19 at 12:55 +0000, Simon Riggs wrote:

> How will we diagnose erratic connection problems now?

The point here is that your effort to *remove* pointless log lines now
means that we cannot diagnose production problems with replication
unless we now *add* hundreds of pointless log lines.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> How will we diagnose erratic connection problems now?

As Heikki pointed out, the slave still logs this information, so we
can look there.  If that's enough, yeah, you'll have to turn
log_connections on on the master, but I don't think that will be very
frequent, and it's not an unmanageable burden anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Wed, 2011-01-19 at 09:08 -0500, Robert Haas wrote:
> On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > How will we diagnose erratic connection problems now?
> 
> As Heikki pointed out, the slave still logs this information, so we
> can look there.  If that's enough, yeah, you'll have to turn
> log_connections on on the master, but I don't think that will be very
> frequent, and it's not an unmanageable burden anyway.

So now we have to check *all* of the standby logs in order to check that
replication on the master is working without problems. There will be no
default capability to tie up events on the master with failures of
replication. Events occurring on multiple standbys will not be picked up
as a correlated set of events.

It's possible, but its not a realistic alternative.

We've lost something and gained almost nothing. We need to avoid making
this kind of improvement.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Jan19, 2011, at 16:16 , Simon Riggs wrote:
> On Wed, 2011-01-19 at 09:08 -0500, Robert Haas wrote:
>> On Wed, Jan 19, 2011 at 7:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> How will we diagnose erratic connection problems now?
>>
>> As Heikki pointed out, the slave still logs this information, so we
>> can look there.  If that's enough, yeah, you'll have to turn
>> log_connections on on the master, but I don't think that will be very
>> frequent, and it's not an unmanageable burden anyway.
>
> So now we have to check *all* of the standby logs in order to check that
> replication on the master is working without problems. There will be no
> default capability to tie up events on the master with failures of
> replication. Events occurring on multiple standbys will not be picked up
> as a correlated set of events.

I don't see for what kind of problems logging replication connections is
the right way to monitor replication.

To check that all slaves are connected and are streaming WAL, one ought to
monitor pg_stat_replication, no? Once a slave vanishes from there (or falls
back too far) you'd inspect the *slave's* log to find the reason. Inspecting
the master's log instead will always be a poor replacement, since some failure
conditions won't leave a trace there (Connectivity problems, for example).

Could you explain the failure condition you do have in mind where logging
replication connections unconditionally is beneficial?

best regards,
Florian Pflug



Simon Riggs <simon@2ndQuadrant.com> writes:
> So now we have to check *all* of the standby logs in order to check that
> replication on the master is working without problems. There will be no
> default capability to tie up events on the master with failures of
> replication. Events occurring on multiple standbys will not be picked up
> as a correlated set of events.

This is pure FUD.  If you suspect such a problem, turn on
log_connections.  If you aren't suspecting such a problem, you likely
aren't looking in the postmaster log anyway.
        regards, tom lane


On Wed, 2011-01-19 at 16:42 +0100, Florian Pflug wrote:
> 
> Could you explain the failure condition you do have in mind where
> logging replication connections unconditionally is beneficial? 

Sure. 

Replication drops and immediately reconnects during night.
When did that happen? How many times did it happen? For how long did the
disconnection last? Why did it happen? How does that correlate with
other situations? That info is not available easily.

pg_stat_replication is necessary, and is separate from pg_stat_activity
because replication connections are not the same as normal connections.
We even have a separate permission for replication, to further
demonstrate that.

Replication connections being logged differently from normal connections
was not an anomaly, nor was it wasteful. By treating them the same we
are forced to log too much, instead of just enough.

Replication connections are not even logged unconditionally. You need to
enable replication before they appear in the log. And that is exactly
the time the log entries are helpful.

The question we should have asked is "Why is removing those log entries
helpful?". I shouldn't have to justify putting something back, when the
good reason for its existence was previously explained and there was no
consensus to remove in the first place.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Wed, Jan 19, 2011 at 11:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> The question we should have asked is "Why is removing those log entries
> helpful?". I shouldn't have to justify putting something back, when the
> good reason for its existence was previously explained and there was no
> consensus to remove in the first place.

This is simply revisionist history.  It was discussed for several
days, and consensus was eventually reached.  The fact that you chose
not to participate in that discussion until after it was over doesn't
mean we didn't have it, and the fact that you don't agree with the
outcome doesn't mean there wasn't consensus.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company