Thread: COPYable logs status
[summarising discussion on -patches] The situation with this patch is that I now have it in a state where I think it could be applied, but there is one blocker, namely that we do not have a way of preventing the interleaving of log messages from different backends, which leads to garbled logs. This is an existing issue about which we have had complaints, but it becomes critical for a facility the whole purpose of which is to provide logs in a format guaranteed to work with our COPY command. Unfortunately, there is no solution in sight for this problem, certainly not one which I think can be devised and implemented simply at this stage of the cycle. The solution we'd like to use, LWLocks, is not workable in his context. In consequence, I don't think we have any option but to shelve this item for the time being. A couple of bugs have been found and fixes identified, during the review process, so it's not a total loss, but it is nevertheless a pity that we can't deliver this feature in 8.3. cheers andrew
Andrew Dunstan wrote: > Unfortunately, there is no solution in sight for this problem, certainly > not one which I think can be devised and implemented simply at this > stage of the cycle. The solution we'd like to use, LWLocks, is not > workable in his context. In consequence, I don't think we have any > option but to shelve this item for the time being. The idea of one pipe per process is not really workable, because it would mean having as many pipes as backends which does not sound very good. But how about a mixed approach -- like have the all the backends share a pipe, controlled by an LWLock, and the auxiliary process have a separate pipe each? One thing I haven't understood yet is how having multiple pipes help on this issue. Is the logger reading from the pipe and then writing to a file? (I haven't read the logger code). -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Endurecerse, pero jamás perder la ternura" (E. Guevara)
On Fri, Jun 08, 2007 at 08:31:54AM -0400, Andrew Dunstan wrote: > The situation with this patch is that I now have it in a state where I > think it could be applied, but there is one blocker, namely that we do > not have a way of preventing the interleaving of log messages from > different backends, which leads to garbled logs. This is an existing > issue about which we have had complaints, but it becomes critical for a > facility the whole purpose of which is to provide logs in a format > guaranteed to work with our COPY command. The whole semantics of PIPEBUF should prevent garbling, as long as each write is a complete set of lines and no more than PIPEBUF bytes long. Have we determined the actual cause of the garbling? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Fri, Jun 08, 2007 at 08:31:54AM -0400, Andrew Dunstan wrote: > > The situation with this patch is that I now have it in a state where I > > think it could be applied, but there is one blocker, namely that we do > > not have a way of preventing the interleaving of log messages from > > different backends, which leads to garbled logs. This is an existing > > issue about which we have had complaints, but it becomes critical for a > > facility the whole purpose of which is to provide logs in a format > > guaranteed to work with our COPY command. > > The whole semantics of PIPEBUF should prevent garbling, as long as each > write is a complete set of lines and no more than PIPEBUF bytes long. > Have we determined the actual cause of the garbling? No, that's the main problem -- but it has been reported to happen on entries shorter than PIPE_BUF chars. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "La persona que no quería pecar / estaba obligada a sentarseen duras y empinadas sillas / desprovistas, por ciertode blandosatenuantes" (Patricio Vogel)
Alvaro Herrera <alvherre@commandprompt.com> writes: > Martijn van Oosterhout wrote: >> The whole semantics of PIPEBUF should prevent garbling, as long as each >> write is a complete set of lines and no more than PIPEBUF bytes long. >> Have we determined the actual cause of the garbling? > No, that's the main problem -- but it has been reported to happen on > entries shorter than PIPE_BUF chars. It's not entirely clear to me whether there's been proven cases of interpolation *into* a message shorter than PIPE_BUF (and remember you've got to count all the lines when determining the length). The message intruding into the other could certainly be shorter. If there have been such cases, then our theories about what's going on are all wet, or else there are some rather nasty bugs in some kernels' pipe handling. So it would be good to pin this down. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > The idea of one pipe per process is not really workable, because it > would mean having as many pipes as backends which does not sound very > good. But how about a mixed approach -- like have the all the backends > share a pipe, controlled by an LWLock, and the auxiliary process have a > separate pipe each? Multiple pipes seem like a mess, and in any case the above still doesn't work for stderr output produced by non-cooperative software (dynamic loader for instance). The only solution that I can see is to invent some sort of simple protocol for the syslogger pipe. Assume that the kernel honors PIPE_BUF (this assumption may need proving, see other message). We could imagine having elog.c divvy up its writes to the pipe into chunks of less than PIPE_BUF bytes, where each chunk carries info sufficient to let it be reassembled. Perhaps something on the order of \0 \0 2-byte-length source-PID end-flag text... The syslogger reassembles these by joining messages with the same origination PID, until it gets one with the end-flag set. It would need enough code to track multiple in-progress messages. The logger would have to also be able to deal with random text coming down the pipe (due to aforesaid non-cooperative software). I would be inclined to say just take any text not preceded by \0\0 as a standalone message, up to the next \0\0. Long chunks of non-protocol text would risk getting treated as multiple messages, but there's probably not a lot of harm in that. BTW, exactly what is the COPYable-logs code going to do with random text? I trust the answer is not "throw it away". regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> Martijn van Oosterhout wrote: >> >>> The whole semantics of PIPEBUF should prevent garbling, as long as each >>> write is a complete set of lines and no more than PIPEBUF bytes long. >>> Have we determined the actual cause of the garbling? >>> > > >> No, that's the main problem -- but it has been reported to happen on >> entries shorter than PIPE_BUF chars. >> > > It's not entirely clear to me whether there's been proven cases of > interpolation *into* a message shorter than PIPE_BUF (and remember > you've got to count all the lines when determining the length). > The message intruding into the other could certainly be shorter. > > If there have been such cases, then our theories about what's going on > are all wet, or else there are some rather nasty bugs in some kernels' > pipe handling. So it would be good to pin this down. > > > Right. But we don't split lines into PIPE_BUF sized chunks. And doing so would make loadable logs possibly rather less pleasant. Ideally we should be able to deal with this despite the PIPE_BUF restriction on atomic writes. cheers andrew
On Fri, Jun 08, 2007 at 10:29:03AM -0400, Tom Lane wrote: > > The only solution that I can see is to invent some sort of simple > protocol for the syslogger pipe. Perhaps having a look at the current IETF syslog discussion will be helpful in that case? (I know it's not directly relevant, but maybe others have thought about some of these things. I haven't read the draft, note.) http://tools.ietf.org/html/draft-ietf-syslog-protocol-20 There's also the discussion of reliability in RFC 3195: ftp://ftp.rfc-editor.org/in-notes/rfc3195.txt A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
Andrew Dunstan wrote: > The situation with this patch is that I now have it in a state where I > think it could be applied, but there is one blocker, namely that we do > not have a way of preventing the interleaving of log messages from > different backends, which leads to garbled logs. This is an existing > issue about which we have had complaints, but it becomes critical for a > facility the whole purpose of which is to provide logs in a format > guaranteed to work with our COPY command. > > Unfortunately, there is no solution in sight for this problem, certainly > not one which I think can be devised and implemented simply at this > stage of the cycle. The solution we'd like to use, LWLocks, is not > workable in his context. In consequence, I don't think we have any > option but to shelve this item for the time being. I think this will get shot down, but here goes anyway... How about creating a log-writing-process? Postmaster could write to the log files directly until the log-writer is up and running, then all processes can send their log output through the log-writer.
"Matthew T. O'Connor" <matthew@zeut.net> writes: > How about creating a log-writing-process? Postmaster could write to the > log files directly until the log-writer is up and running, then all > processes can send their log output through the log-writer. We *have* a log-writing process. The problem is in getting the data to it. regards, tom lane
Tom Lane wrote: > "Matthew T. O'Connor" <matthew@zeut.net> writes: >> How about creating a log-writing-process? Postmaster could write to the >> log files directly until the log-writer is up and running, then all >> processes can send their log output through the log-writer. > > We *have* a log-writing process. The problem is in getting the data to it. By that I assume you mean the bgwriter, I thought that was for WAL data, I didn't think it could or perhaps should be used for normal log file writing, but I also know I'm way outside my comfort area in talking about this, so excuse the noise if this is way off base.
Hi, Tom Lane wrote: > We *have* a log-writing process. The problem is in getting the data to it. Remember the imessages approach I'm using for Postgres-R? It passes messages around using shared memory and signals the receiver on incoming data. It's not perfect, sure, but it's a general solution to a common problem. Maybe it's worth a thought, instead of fiddling with signals, special shmem areas and possible races every time the 'getting data to another process'-problem comes up? Regards Markus
Markus Schiltknecht <markus@bluegap.ch> writes: > Tom Lane wrote: >> We *have* a log-writing process. The problem is in getting the data to it. > Remember the imessages approach I'm using for Postgres-R? It passes > messages around using shared memory and signals the receiver on incoming > data. It's not perfect, sure, but it's a general solution to a common > problem. Uh-huh. And how will you get libc's dynamic-link code to buy into issuing link error messages this way? Not to mention every other bit of code that might get linked into the backend? Trapping what comes out of stderr is simply too useful a behavior to lose. regards, tom lane
"Matthew T. O'Connor" <matthew@zeut.net> writes: > Tom Lane wrote: >> We *have* a log-writing process. The problem is in getting the data to it. > By that I assume you mean the bgwriter, I thought that was for WAL data, No, I'm talking about src/backend/postmaster/syslogger.c regards, tom lane
Tom Lane wrote: > Markus Schiltknecht <markus@bluegap.ch> writes: >> Tom Lane wrote: >>> We *have* a log-writing process. The problem is in getting the data to it. > >> Remember the imessages approach I'm using for Postgres-R? It passes >> messages around using shared memory and signals the receiver on incoming >> data. It's not perfect, sure, but it's a general solution to a common >> problem. > > Uh-huh. And how will you get libc's dynamic-link code to buy into > issuing link error messages this way? Not to mention every other bit > of code that might get linked into the backend? I was refering to the 'getting data to another process' problem. If that's the problem (as you said upthread) imessages might be a solution. > Trapping what comes out of stderr is simply too useful a behavior to lose. Sure. I've never said anything against that. Regards Markus
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> The idea of one pipe per process is not really workable, because it >> would mean having as many pipes as backends which does not sound very >> good. But how about a mixed approach -- like have the all the backends >> share a pipe, controlled by an LWLock, and the auxiliary process have a >> separate pipe each? >> > > Multiple pipes seem like a mess, and in any case the above still doesn't > work for stderr output produced by non-cooperative software (dynamic > loader for instance). > > The only solution that I can see is to invent some sort of simple > protocol for the syslogger pipe. Assume that the kernel honors PIPE_BUF > (this assumption may need proving, see other message). We could imagine > having elog.c divvy up its writes to the pipe into chunks of less than > PIPE_BUF bytes, where each chunk carries info sufficient to let it be > reassembled. Perhaps something on the order of > > \0 \0 2-byte-length source-PID end-flag text... > > The syslogger reassembles these by joining messages with the same > origination PID, until it gets one with the end-flag set. It would need > enough code to track multiple in-progress messages. > > The logger would have to also be able to deal with random text coming > down the pipe (due to aforesaid non-cooperative software). I would be > inclined to say just take any text not preceded by \0\0 as a standalone > message, up to the next \0\0. Long chunks of non-protocol text would > risk getting treated as multiple messages, but there's probably not a > lot of harm in that. > > BTW, exactly what is the COPYable-logs code going to do with random > text? I trust the answer is not "throw it away". > > > The CSVlog pipe is a separate pipe from the stderr pipe. Anything that goes to stderr now will continue to go to stderr, wherever that is. I like this scheme for a couple of reasons: . it will include the ability to tell the real end of a message . it will let us handle non-protocol messages (although there shouldn't be any in the CSVlog pipe). I'll try to get a patch out for just the stderr case, which should be back-patchable, then adjust the CSVlog patch to use it. I'm thinking of handling the partial lines with a small dynahash of StringInfo buffers, which get discarded whenever we don't have a partial line for the PID. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I'll try to get a patch out for just the stderr case, which should be > back-patchable, then adjust the CSVlog patch to use it. Sounds like a plan. > I'm thinking of handling the partial lines with a small dynahash of > StringInfo buffers, which get discarded whenever we don't have a partial > line for the PID. A hashtable might be overkill --- based on reports so far, it's unlikely you'd have more than two or three messages being received concurrently, so a simple list or array might be quicker to search. regards, tom lane
Andrew Dunstan wrote:>> The CSVlog pipe is a separate pipe from the stderr pipe. Anything that> goes to stderr now will continueto go to stderr, wherever that is.>> I like this scheme for a couple of reasons:> . it will include the ability totell the real end of a message> . it will let us handle non-protocol messages (although there shouldn't> be any in theCSVlog pipe). Another important reason I went for two seperate pipes is that, in Windows, the pipe calls being blocking calls, the performance really deteriorates unless we increase the allocated buffer to the pipes dramatically. On a rather decent machine, simply running the regression tests would consume a lot of resources, especially when it comes to the errors tests. Rgds, Arul Shaji Andrew Dunstan wrote: > > > Tom Lane wrote: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >> >>> The idea of one pipe per process is not really workable, because it >>> would mean having as many pipes as backends which does not sound very >>> good. But how about a mixed approach -- like have the all the backends >>> share a pipe, controlled by an LWLock, and the auxiliary process have a >>> separate pipe each? >>> >> >> Multiple pipes seem like a mess, and in any case the above still doesn't >> work for stderr output produced by non-cooperative software (dynamic >> loader for instance). >> >> The only solution that I can see is to invent some sort of simple >> protocol for the syslogger pipe. Assume that the kernel honors PIPE_BUF >> (this assumption may need proving, see other message). We could imagine >> having elog.c divvy up its writes to the pipe into chunks of less than >> PIPE_BUF bytes, where each chunk carries info sufficient to let it be >> reassembled. Perhaps something on the order of >> >> \0 \0 2-byte-length source-PID end-flag text... >> >> The syslogger reassembles these by joining messages with the same >> origination PID, until it gets one with the end-flag set. It would need >> enough code to track multiple in-progress messages. >> >> The logger would have to also be able to deal with random text coming >> down the pipe (due to aforesaid non-cooperative software). I would be >> inclined to say just take any text not preceded by \0\0 as a standalone >> message, up to the next \0\0. Long chunks of non-protocol text would >> risk getting treated as multiple messages, but there's probably not a >> lot of harm in that. >> >> BTW, exactly what is the COPYable-logs code going to do with random >> text? I trust the answer is not "throw it away". >> >> >> > > The CSVlog pipe is a separate pipe from the stderr pipe. Anything that > goes to stderr now will continue to go to stderr, wherever that is. > > I like this scheme for a couple of reasons: > . it will include the ability to tell the real end of a message > . it will let us handle non-protocol messages (although there shouldn't > be any in the CSVlog pipe). > > I'll try to get a patch out for just the stderr case, which should be > back-patchable, then adjust the CSVlog patch to use it. > > I'm thinking of handling the partial lines with a small dynahash of > StringInfo buffers, which get discarded whenever we don't have a partial > line for the PID. > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >