Thread: COPYable logs status

COPYable logs status

From
Andrew Dunstan
Date:
[summarising discussion on -patches]

The situation with this patch is that I now have it in a state where I 
think it could be applied, but there is one blocker, namely that we do 
not have a way of preventing the interleaving of log messages from 
different backends, which leads to garbled logs. This is an existing 
issue about which we have had complaints, but it becomes critical for a 
facility the whole purpose of which is to provide logs in a format 
guaranteed to work with our COPY command.

Unfortunately, there is no solution in sight for this problem, certainly 
not one which I think can be devised and implemented simply at this 
stage of the cycle. The solution we'd like to use, LWLocks, is not 
workable in his context. In consequence, I don't think we have any 
option but to shelve this item for the time being.

A couple of bugs have been found and fixes identified, during the review 
process, so it's not a total loss, but it is nevertheless a pity that we 
can't deliver this feature in 8.3.

cheers

andrew


Re: COPYable logs status

From
Alvaro Herrera
Date:
Andrew Dunstan wrote:

> Unfortunately, there is no solution in sight for this problem, certainly 
> not one which I think can be devised and implemented simply at this 
> stage of the cycle. The solution we'd like to use, LWLocks, is not 
> workable in his context. In consequence, I don't think we have any 
> option but to shelve this item for the time being.

The idea of one pipe per process is not really workable, because it
would mean having as many pipes as backends which does not sound very
good.  But how about a mixed approach -- like have the all the backends
share a pipe, controlled by an LWLock, and the auxiliary process have a
separate pipe each?

One thing I haven't understood yet is how having multiple pipes help on
this issue.  Is the logger reading from the pipe and then writing to a
file?  (I haven't read the logger code).

-- 
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"Endurecerse, pero jamás perder la ternura" (E. Guevara)


Re: COPYable logs status

From
Martijn van Oosterhout
Date:
On Fri, Jun 08, 2007 at 08:31:54AM -0400, Andrew Dunstan wrote:
> The situation with this patch is that I now have it in a state where I
> think it could be applied, but there is one blocker, namely that we do
> not have a way of preventing the interleaving of log messages from
> different backends, which leads to garbled logs. This is an existing
> issue about which we have had complaints, but it becomes critical for a
> facility the whole purpose of which is to provide logs in a format
> guaranteed to work with our COPY command.

The whole semantics of PIPEBUF should prevent garbling, as long as each
write is a complete set of lines and no more than PIPEBUF bytes long.
Have we determined the actual cause of the garbling?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: COPYable logs status

From
Alvaro Herrera
Date:
Martijn van Oosterhout wrote:
> On Fri, Jun 08, 2007 at 08:31:54AM -0400, Andrew Dunstan wrote:
> > The situation with this patch is that I now have it in a state where I 
> > think it could be applied, but there is one blocker, namely that we do 
> > not have a way of preventing the interleaving of log messages from 
> > different backends, which leads to garbled logs. This is an existing 
> > issue about which we have had complaints, but it becomes critical for a 
> > facility the whole purpose of which is to provide logs in a format 
> > guaranteed to work with our COPY command.
> 
> The whole semantics of PIPEBUF should prevent garbling, as long as each
> write is a complete set of lines and no more than PIPEBUF bytes long.
> Have we determined the actual cause of the garbling?

No, that's the main problem -- but it has been reported to happen on
entries shorter than PIPE_BUF chars.

-- 
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"La persona que no quería pecar / estaba obligada a sentarseen duras y empinadas sillas    / desprovistas, por ciertode
blandosatenuantes"                          (Patricio Vogel)
 


Re: COPYable logs status

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Martijn van Oosterhout wrote:
>> The whole semantics of PIPEBUF should prevent garbling, as long as each
>> write is a complete set of lines and no more than PIPEBUF bytes long.
>> Have we determined the actual cause of the garbling?

> No, that's the main problem -- but it has been reported to happen on
> entries shorter than PIPE_BUF chars.

It's not entirely clear to me whether there's been proven cases of
interpolation *into* a message shorter than PIPE_BUF (and remember
you've got to count all the lines when determining the length).
The message intruding into the other could certainly be shorter.

If there have been such cases, then our theories about what's going on
are all wet, or else there are some rather nasty bugs in some kernels'
pipe handling.  So it would be good to pin this down.
        regards, tom lane


Re: COPYable logs status

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> The idea of one pipe per process is not really workable, because it
> would mean having as many pipes as backends which does not sound very
> good.  But how about a mixed approach -- like have the all the backends
> share a pipe, controlled by an LWLock, and the auxiliary process have a
> separate pipe each?

Multiple pipes seem like a mess, and in any case the above still doesn't
work for stderr output produced by non-cooperative software (dynamic
loader for instance).

The only solution that I can see is to invent some sort of simple
protocol for the syslogger pipe.  Assume that the kernel honors PIPE_BUF
(this assumption may need proving, see other message).  We could imagine
having elog.c divvy up its writes to the pipe into chunks of less than
PIPE_BUF bytes, where each chunk carries info sufficient to let it be
reassembled.  Perhaps something on the order of
\0 \0 2-byte-length source-PID end-flag text...

The syslogger reassembles these by joining messages with the same
origination PID, until it gets one with the end-flag set.  It would need
enough code to track multiple in-progress messages.

The logger would have to also be able to deal with random text coming
down the pipe (due to aforesaid non-cooperative software).  I would be
inclined to say just take any text not preceded by \0\0 as a standalone
message, up to the next \0\0.  Long chunks of non-protocol text would
risk getting treated as multiple messages, but there's probably not a
lot of harm in that.

BTW, exactly what is the COPYable-logs code going to do with random
text?  I trust the answer is not "throw it away".
        regards, tom lane


Re: COPYable logs status

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>   
>> Martijn van Oosterhout wrote:
>>     
>>> The whole semantics of PIPEBUF should prevent garbling, as long as each
>>> write is a complete set of lines and no more than PIPEBUF bytes long.
>>> Have we determined the actual cause of the garbling?
>>>       
>
>   
>> No, that's the main problem -- but it has been reported to happen on
>> entries shorter than PIPE_BUF chars.
>>     
>
> It's not entirely clear to me whether there's been proven cases of
> interpolation *into* a message shorter than PIPE_BUF (and remember
> you've got to count all the lines when determining the length).
> The message intruding into the other could certainly be shorter.
>
> If there have been such cases, then our theories about what's going on
> are all wet, or else there are some rather nasty bugs in some kernels'
> pipe handling.  So it would be good to pin this down.
>
>
>   

Right. But we don't split lines into PIPE_BUF sized chunks. And doing so 
would make loadable logs possibly rather less pleasant. Ideally we 
should be able to deal with this despite the PIPE_BUF restriction on 
atomic writes.

cheers

andrew


Re: COPYable logs status

From
Andrew Sullivan
Date:
On Fri, Jun 08, 2007 at 10:29:03AM -0400, Tom Lane wrote:
> 
> The only solution that I can see is to invent some sort of simple
> protocol for the syslogger pipe. 

Perhaps having a look at the current IETF syslog discussion will be
helpful in that case?  (I know it's not directly relevant, but maybe
others have thought about some of these things.  I haven't read the
draft, note.)

http://tools.ietf.org/html/draft-ietf-syslog-protocol-20

There's also the discussion of reliability in RFC 3195:

ftp://ftp.rfc-editor.org/in-notes/rfc3195.txt

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
The whole tendency of modern prose is away from concreteness.    --George Orwell


Re: COPYable logs status

From
"Matthew T. O'Connor"
Date:
Andrew Dunstan wrote:
> The situation with this patch is that I now have it in a state where I 
> think it could be applied, but there is one blocker, namely that we do 
> not have a way of preventing the interleaving of log messages from 
> different backends, which leads to garbled logs. This is an existing 
> issue about which we have had complaints, but it becomes critical for a 
> facility the whole purpose of which is to provide logs in a format 
> guaranteed to work with our COPY command.
> 
> Unfortunately, there is no solution in sight for this problem, certainly 
> not one which I think can be devised and implemented simply at this 
> stage of the cycle. The solution we'd like to use, LWLocks, is not 
> workable in his context. In consequence, I don't think we have any 
> option but to shelve this item for the time being.

I think this will get shot down, but here goes anyway...

How about creating a log-writing-process?  Postmaster could write to the 
log files directly until the log-writer is up and running, then all 
processes can send their log output through the log-writer.




Re: COPYable logs status

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> How about creating a log-writing-process?  Postmaster could write to the 
> log files directly until the log-writer is up and running, then all 
> processes can send their log output through the log-writer.

We *have* a log-writing process.  The problem is in getting the data to it.
        regards, tom lane


Re: COPYable logs status

From
"Matthew T. O'Connor"
Date:
Tom Lane wrote:
> "Matthew T. O'Connor" <matthew@zeut.net> writes:
>> How about creating a log-writing-process?  Postmaster could write to the 
>> log files directly until the log-writer is up and running, then all 
>> processes can send their log output through the log-writer.
> 
> We *have* a log-writing process.  The problem is in getting the data to it.

By that I assume you mean the bgwriter, I thought that was for WAL data, 
I didn't think it could or perhaps should be used for normal log file 
writing, but I also know I'm way outside my comfort area in talking 
about this, so excuse the noise if this is way off base.


Re: COPYable logs status

From
Markus Schiltknecht
Date:
Hi,

Tom Lane wrote:
> We *have* a log-writing process.  The problem is in getting the data to it.

Remember the imessages approach I'm using for Postgres-R? It passes 
messages around using shared memory and signals the receiver on incoming 
data. It's not perfect, sure, but it's a general solution to a common 
problem.

Maybe it's worth a thought, instead of fiddling with signals, special 
shmem areas and possible races every time the 'getting data to another 
process'-problem comes up?

Regards

Markus



Re: COPYable logs status

From
Tom Lane
Date:
Markus Schiltknecht <markus@bluegap.ch> writes:
> Tom Lane wrote:
>> We *have* a log-writing process.  The problem is in getting the data to it.

> Remember the imessages approach I'm using for Postgres-R? It passes 
> messages around using shared memory and signals the receiver on incoming 
> data. It's not perfect, sure, but it's a general solution to a common 
> problem.

Uh-huh.  And how will you get libc's dynamic-link code to buy into
issuing link error messages this way?  Not to mention every other bit
of code that might get linked into the backend?

Trapping what comes out of stderr is simply too useful a behavior to lose.
        regards, tom lane


Re: COPYable logs status

From
Tom Lane
Date:
"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Tom Lane wrote:
>> We *have* a log-writing process.  The problem is in getting the data to it.

> By that I assume you mean the bgwriter, I thought that was for WAL data, 

No, I'm talking about src/backend/postmaster/syslogger.c
        regards, tom lane


Re: COPYable logs status

From
Markus Schiltknecht
Date:
Tom Lane wrote:
> Markus Schiltknecht <markus@bluegap.ch> writes:
>> Tom Lane wrote:
>>> We *have* a log-writing process.  The problem is in getting the data to it.
> 
>> Remember the imessages approach I'm using for Postgres-R? It passes 
>> messages around using shared memory and signals the receiver on incoming 
>> data. It's not perfect, sure, but it's a general solution to a common 
>> problem.
> 
> Uh-huh.  And how will you get libc's dynamic-link code to buy into
> issuing link error messages this way?  Not to mention every other bit
> of code that might get linked into the backend?

I was refering to the 'getting data to another process' problem. If 
that's the problem (as you said upthread) imessages might be a solution.

> Trapping what comes out of stderr is simply too useful a behavior to lose.

Sure. I've never said anything against that.

Regards

Markus


Re: COPYable logs status

From
Andrew Dunstan
Date:

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>   
>> The idea of one pipe per process is not really workable, because it
>> would mean having as many pipes as backends which does not sound very
>> good.  But how about a mixed approach -- like have the all the backends
>> share a pipe, controlled by an LWLock, and the auxiliary process have a
>> separate pipe each?
>>     
>
> Multiple pipes seem like a mess, and in any case the above still doesn't
> work for stderr output produced by non-cooperative software (dynamic
> loader for instance).
>
> The only solution that I can see is to invent some sort of simple
> protocol for the syslogger pipe.  Assume that the kernel honors PIPE_BUF
> (this assumption may need proving, see other message).  We could imagine
> having elog.c divvy up its writes to the pipe into chunks of less than
> PIPE_BUF bytes, where each chunk carries info sufficient to let it be
> reassembled.  Perhaps something on the order of
>
>     \0 \0 2-byte-length source-PID end-flag text...
>
> The syslogger reassembles these by joining messages with the same
> origination PID, until it gets one with the end-flag set.  It would need
> enough code to track multiple in-progress messages.
>
> The logger would have to also be able to deal with random text coming
> down the pipe (due to aforesaid non-cooperative software).  I would be
> inclined to say just take any text not preceded by \0\0 as a standalone
> message, up to the next \0\0.  Long chunks of non-protocol text would
> risk getting treated as multiple messages, but there's probably not a
> lot of harm in that.
>
> BTW, exactly what is the COPYable-logs code going to do with random
> text?  I trust the answer is not "throw it away".
>
>             
>   

The CSVlog pipe is a separate pipe from the stderr pipe. Anything that 
goes to stderr now will continue to go to stderr, wherever that is.

I like this scheme for a couple of reasons:
. it will include the ability to tell the real end of a message
. it will let us handle non-protocol messages (although there shouldn't 
be any in the CSVlog pipe).

I'll try to get a patch out for just the stderr case, which should be 
back-patchable, then adjust the CSVlog patch to use it.

I'm thinking of handling the partial lines with a small dynahash of 
StringInfo buffers, which get discarded whenever we don't have a partial 
line for the PID.

cheers

andrew


Re: COPYable logs status

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I'll try to get a patch out for just the stderr case, which should be 
> back-patchable, then adjust the CSVlog patch to use it.

Sounds like a plan.

> I'm thinking of handling the partial lines with a small dynahash of 
> StringInfo buffers, which get discarded whenever we don't have a partial 
> line for the PID.

A hashtable might be overkill --- based on reports so far, it's unlikely
you'd have more than two or three messages being received concurrently,
so a simple list or array might be quicker to search.
        regards, tom lane


Re: COPYable logs status

From
"FAST PostgreSQL"
Date:
Andrew Dunstan wrote:>> The CSVlog pipe is a separate pipe from the stderr pipe. Anything that> goes to stderr now will
continueto go to stderr, wherever that is.>> I like this scheme for a couple of reasons:> . it will include the ability
totell the real end of a message> . it will let us handle non-protocol messages (although there shouldn't> be any in
theCSVlog pipe).
 

Another important reason I went for two seperate pipes is that, in 
Windows, the pipe calls being blocking calls, the performance really 
deteriorates unless we increase the allocated buffer to the pipes 
dramatically.

On a rather decent machine, simply running the regression tests would 
consume a lot of resources, especially when it comes to the errors tests.

Rgds,
Arul Shaji


Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>  
>>> The idea of one pipe per process is not really workable, because it
>>> would mean having as many pipes as backends which does not sound very
>>> good.  But how about a mixed approach -- like have the all the backends
>>> share a pipe, controlled by an LWLock, and the auxiliary process have a
>>> separate pipe each?
>>>     
>>
>> Multiple pipes seem like a mess, and in any case the above still doesn't
>> work for stderr output produced by non-cooperative software (dynamic
>> loader for instance).
>>
>> The only solution that I can see is to invent some sort of simple
>> protocol for the syslogger pipe.  Assume that the kernel honors PIPE_BUF
>> (this assumption may need proving, see other message).  We could imagine
>> having elog.c divvy up its writes to the pipe into chunks of less than
>> PIPE_BUF bytes, where each chunk carries info sufficient to let it be
>> reassembled.  Perhaps something on the order of
>>
>>     \0 \0 2-byte-length source-PID end-flag text...
>>
>> The syslogger reassembles these by joining messages with the same
>> origination PID, until it gets one with the end-flag set.  It would need
>> enough code to track multiple in-progress messages.
>>
>> The logger would have to also be able to deal with random text coming
>> down the pipe (due to aforesaid non-cooperative software).  I would be
>> inclined to say just take any text not preceded by \0\0 as a standalone
>> message, up to the next \0\0.  Long chunks of non-protocol text would
>> risk getting treated as multiple messages, but there's probably not a
>> lot of harm in that.
>>
>> BTW, exactly what is the COPYable-logs code going to do with random
>> text?  I trust the answer is not "throw it away".
>>
>>            
>>   
> 
> The CSVlog pipe is a separate pipe from the stderr pipe. Anything that 
> goes to stderr now will continue to go to stderr, wherever that is.
> 
> I like this scheme for a couple of reasons:
> . it will include the ability to tell the real end of a message
> . it will let us handle non-protocol messages (although there shouldn't 
> be any in the CSVlog pipe).
> 
> I'll try to get a patch out for just the stderr case, which should be 
> back-patchable, then adjust the CSVlog patch to use it.
> 
> I'm thinking of handling the partial lines with a small dynahash of 
> StringInfo buffers, which get discarded whenever we don't have a partial 
> line for the PID.
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
> 
>