Thread: logging in high performance systems.
We have a need for logging in systems where it isn't feasible to log to disk as it negatively impacts performance. I'd like to be able to creatively solve this problem without modifying the core, but today I cannot. So... here's my first whack at solving this with some flexibility. The first thing I did was add hook points where immediate statement logging happens "pre_exec" and those that present duration "post_exec". These should, with optimization turned on, have only a few instructions of impact when no hooks are registered (we could hoist the branch outside the function call if that were identified as an issue). https://github.com/postwait/postgres/commit/62bb9dfa2d373618f10e46678612720a3a01599a The second thing I did was write a sample use of those hooks to implement a completely non-blocking fifo logger. (if it would block, it drops the log line). The concept is that we could run this without risk of negative performance impact due to slow log reading (choosing to drop logs in lieu of pausing). And a simple process could be written to consume from the fifo. We use this method in other systems to log many 10s of thousands of log lines per second with negligible impact on performance. https://github.com/postwait/postgres/commit/c8f5a346c7b2c3eba9f72ea49077dc72f03a2679 Thoughts? Feedback? As can be seen, the patch is pretty tiny. -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle
On 11/23/2011 09:28 PM, Theo Schlossnagle wrote: > The second thing I did was write a sample use of those hooks to > implement a completely non-blocking fifo logger. (if it would block, > it drops the log line). The concept is that we could run this without > risk of negative performance impact due to slow log reading (choosing > to drop logs in lieu of pausing). And a simple process could be > written to consume from the fifo. This was one of the topics at the last developer's meeting you might not have seen go by: http://wiki.postgresql.org/wiki/PgCon_2011_Developer_Meeting#Improving_Logging There was a reference to a pipe-based implementation from Magnus that I haven't gotten a chance to track down yet. I think this area is going to start hitting a lot more people in the upcoming couple of years, since I'm seeing it increasingly at two customers I consider "canary in a cole mine" sentinels for performance issues. I'm now roughly considering three types of users here: -Don't care about the overhead of logging, but are sick of parsing text files. Would prefer the data be in a table instead. -Concerned enough about overhead that statement-level logging is impractical to log or table, but can cope with logging for other things. -Logging rate can burst high enough that messages must start being dropped instead no matter where they go. Before making a big change, log file vs. table needs to be carefully explored to figure which of the two approaches has more reasonable behavior/performance trade-offs. I've been trying to attack this starting at the middle, with the pg_stat_statements rework Peter here did for the current CommitFest. If you've already worked out a way to simulate heavy logging as part of what you've done here, I'd be quite interested to hear how capable you feel it is for the class of problem you're seeing. I've always assumed that pushing the most common queries into shared memory and only showing them on demand, rather than logging them line at a time, could be a big win for some places. We're still a bit light on benchmarks proving that is the case so far though. My assumption has been that eventually a lossy logger was going to be necessary for busier sites, I just haven't been suffering from one enough to hack on it yet. If it's possible to work this out in enough detail to figure out where the hooks go, and to prove they work with at least one consumer of them, I'd consider that a really useful thing to try and squeeze into 9.2. The processing parts can always be further improved later based on production feedback, going along with my recent them of letting extensions that poke and probe existing hooks be one place to brew next version features at. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Wed, Nov 23, 2011 at 11:45 PM, Greg Smith <greg@2ndquadrant.com> wrote: > My assumption has been that eventually a lossy logger was going to be > necessary for busier sites, I just haven't been suffering from one enough to > hack on it yet. If it's possible to work this out in enough detail to > figure out where the hooks go, and to prove they work with at least one > consumer of them, I'd consider that a really useful thing to try and squeeze > into 9.2. I think it's possible. I did both in my patches 1) placed hooks where logging exists today and 2) provided a useful consumer of them. and I tested it (only on a single system) under high simulated load. I see the next steps being:1) agreeing that a problem exists (I know one does, but I suppose consensus is req'd)2) agreeing that "hooks" are the right approach, if not propose a different approach. (fwiw, it's incredible common)3) reworking the implementation to fit in the project; I assume the implementation I proposed will, at best, vaguely resemble anything that gets integrated. It was just a PoC. Also, I put the sample consumer in contrib in a separate commit, just to make it easy to review -- while I'm not against it, I am not proposing that a fifo logger be in contrib. > The processing parts can always be further improved later based > on production feedback, going along with my recent them of letting > extensions that poke and probe existing hooks be one place to brew next > version features at. I think that a generalized hook framework (like that offered by apr-utils) would be generally useful in cases like these. It makes it simple to add them and auto document them. But, I'm not proposing that here. I'm trying to keep the patch to logging so I can solve a critical production issue we seem to be running into more and more. Thanks for you time. -- Theo Schlossnagle http://omniti.com/is/theo-schlossnagle
On Wed, Nov 23, 2011 at 9:28 PM, Theo Schlossnagle <jesus@omniti.com> wrote: > Thoughts? Feedback? Can you add it to the next CommitFest? https://commitfest.postgresql.org/action/commitfest_view/open -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Theo Schlossnagle (jesus@omniti.com) wrote: > I'd like to be able to creatively solve this problem without modifying > the core, but today I cannot. I thought a hook already existed for this (there's already a module which uses a hook to log commands done as a superuser, for example).. Perhaps it wasn't in the right place, however. > So... here's my first whack at solving this with some flexibility. I was hoping to work on something similar (and have discussed it with Magnus a few times) but with much more flexibility. Basically, have a structure where there's a lot of meta-data available that can easily be used to direct where a log message goes (or if it's logged at all), all without having to actually parse the log message (which is painful..). To be honest, I was consdiering something like what syslog-ng provides, but PG specific, not needing regexp's (though perhaps supporting them) and with multiple different destination types and locations. Independent of all that, if we don't have hooks already that can be used for this, I certainly think it makes sense to add some, even if we do something more later. Thanks, Stephen
On 11/24/2011 06:33 PM, Theo Schlossnagle wrote: > I see the next steps being: > 1) agreeing that a problem exists (I know one does, but I suppose > consensus is req'd) +1, high volume logging is also a huge problem here at Skype. Some of the issues that immediately come to mind: - extracting usage statistics from logs - extracting error messages from logs - pushing the logs to remote log server (and no, syslog is not actually usable for that ;) > 2) agreeing that "hooks" are the right approach, if not propose a > different approach. (fwiw, it's incredible common) +1 for hook based approach. While the whole logging system could possibly use an overhaul, this is not likely going to happen for 9.2. Meanwhile hooks provide the flexibility to implement custom log collectors for those who really need it. > 3) reworking the implementation to fit in the project; I assume the > implementation I proposed will, at best, vaguely resemble anything > that gets integrated. It was just a PoC. > I'd vote for a hook to be added to EmitErrorReport. That way all logged messages pass through the hook, which could then decide whether to actually write the message to server log or to discard it. As an added bonus, the message is broken down into pieces so there is no need to parse the messages for severity, context, etc. Patch attached. regards, Martin
Attachment
On 11/24/2011 11:33 AM, Theo Schlossnagle wrote: > I see the next steps being: > 1) agreeing that a problem exists (I know one does, but I suppose > consensus is req'd) > 2) agreeing that "hooks" are the right approach, if not propose a > different approach. (fwiw, it's incredible common) > 3) reworking the implementation to fit in the project; I assume the > implementation I proposed will, at best, vaguely resemble anything > that gets integrated. It was just a PoC. > With this idea still being pretty new, and several of the people popping out opinions in this thread being local--Theo, Stephen, myself--we've decided to make our local Baltimore/Washington PUG meeting this month be an excuse to hash some of this early stuff out a bit more in person, try to speed things up . See http://www.meetup.com/Baltimore-Washington-PostgreSQL-Users-Group/events/44335672/ if any other locals would like to attend, it's a week from today. (Note that the NYC PUG is also having its meeting at the same time this month) -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On Dec 13, 2011, at 13:57, Greg Smith wrote: > With this idea still being pretty new, and several of the people popping out opinions in this thread being local--Theo,Stephen, myself--we've decided to make our local Baltimore/Washington PUG meeting this month be an excuse tohash some of this early stuff out a bit more in person, try to speed things up . See http://www.meetup.com/Baltimore-Washington-PostgreSQL-Users-Group/events/44335672/if any other locals would like to attend,it's a week from today. (Note that the NYC PUG is also having its meeting at the same time this month) What time? I'd potentially like to attend. Philadelphia, represent! Michael Glaesemann grzm seespotcode net
On 11/23/2011 09:28 PM, Theo Schlossnagle wrote: > The first thing I did was add hook points where immediate statement > logging happens "pre_exec" and those that present duration > "post_exec". These should, with optimization turned on, have only a > few instructions of impact when no hooks are registered (we could > hoist the branch outside the function call if that were identified as > an issue). > > https://github.com/postwait/postgres/commit/62bb9dfa2d373618f10e46678612720a3a01599a Theo: could you e-mail the current "pre-flight and post-flight logging hooks" code as a patch to the list? I put it into the CF app so we don't forget to take a look at this, but from a policy point of view an official patch submission to the list is needed here. I've looked at the code on github enough to know we probably want the sort of registration scheme you've proposed no matter what ends up getting logged. The majority of Theo's patch is worrying about how to register and manage multiple hook consumers around statements that are being executed. The other one we got from Martin is hooking all log output and not worrying about the multiple consumer problems. There is an important distinction to make here. Statement logging is one of the largest producers of logging data you want to worry about optimizing for on a high performance system. The decision about whether to log or not may need to be made by the hook. Something that hooks EmitErrorReport has already lost an important opportunity to make a decision about whether the system is perhaps too busy to worry about logging right now at all; you've already paid a chunk of the logging overhead getting the log message to it. I think both areas are going to be important to hook eventually. The meeting to discuss this area at our last BWPUG group happened, and we made some good progress mapping out what various people would like to see here. I'm holding the token for the job of summarizing all that usefully for the community. I'll get back to that as soon as I can now once the January CF is rolling along, hopefully later this week. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
On Thu, Nov 24, 2011 at 04:28, Theo Schlossnagle <jesus@omniti.com> wrote: > So... here's my first whack at solving this with some flexibility. > > The first thing I did was add hook points where immediate statement > logging happens "pre_exec" and those that present duration > "post_exec". These should, with optimization turned on, have only a > few instructions of impact when no hooks are registered (we could > hoist the branch outside the function call if that were identified as > an issue). Note that the hook mechanism you've built is a departure from how other hooks are managed in Postgres. Traditionally hooks are just global function pointers, and each consumer is responsible for storing the previous value of the hook and chain-calling it in the handler. If you want to change this pattern, I think you should start another discussion. E.g. hook registration looks like: next_object_access_hook = object_access_hook; object_access_hook = sepgsql_object_access; static void sepgsql_object_access(ObjectAccessType access, Oid classId, Oid objectId, int subId) { if (next_object_access_hook) (*next_object_access_hook) (access, classId, objectId, subId); Regards, Marti
Excerpts from Marti Raudsepp's message of mar ene 17 12:12:50 -0300 2012: > On Thu, Nov 24, 2011 at 04:28, Theo Schlossnagle <jesus@omniti.com> wrote: > > So... here's my first whack at solving this with some flexibility. > > > > The first thing I did was add hook points where immediate statement > > logging happens "pre_exec" and those that present duration > > "post_exec". These should, with optimization turned on, have only a > > few instructions of impact when no hooks are registered (we could > > hoist the branch outside the function call if that were identified as > > an issue). > > Note that the hook mechanism you've built is a departure from how > other hooks are managed in Postgres. Traditionally hooks are just > global function pointers, and each consumer is responsible for storing > the previous value of the hook and chain-calling it in the handler. If > you want to change this pattern, I think you should start another > discussion. Hm. We already have places doing the other thing, for example see XactCallback and ExprContextCallback. Not sure we have an actual criteria for deciding when to use which. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Mon, Jan 16, 2012 at 3:51 AM, Greg Smith <greg@2ndquadrant.com> wrote: > There is an important distinction to make here. Statement logging is one of > the largest producers of logging data you want to worry about optimizing for > on a high performance system. The decision about whether to log or not may > need to be made by the hook. Something that hooks EmitErrorReport has > already lost an important opportunity to make a decision about whether the > system is perhaps too busy to worry about logging right now at all; you've > already paid a chunk of the logging overhead getting the log message to it. > I think both areas are going to be important to hook eventually. I would dismissed this out of hand at this if you said it a year ago, but I'm older and wiser now. At some point this cycle, I did some benchmarking of the subtransaction abort path, since the slowness of things like EXCEPTION blocks in PL/pgsql is a known sore point. I don't remember the exact numbers anymore, but I do remember the general picture, which is that constructing the error message is shockingly expensive compared to anything else that we do in that path. I dropped it at that point for lack of good ideas: it would be awfully nice to postpone the error message construction until we know that it's actually needed, but I don't see any clean (or even messy) way of doing that. I'm not sure if the effect is quite as significant for toplevel transaction abort, because sending the message to the client is going to cost something, so the relative cost of generating the error message on a busy system will be less. But I still wouldn't like to bet against it being significant if you're really hammering the server. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I would dismissed this out of hand at this if you said it a year ago, > but I'm older and wiser now. At some point this cycle, I did some > benchmarking of the subtransaction abort path, since the slowness of > things like EXCEPTION blocks in PL/pgsql is a known sore point. I > don't remember the exact numbers anymore, but I do remember the > general picture, which is that constructing the error message is > shockingly expensive compared to anything else that we do in that > path. I dropped it at that point for lack of good ideas: it would be > awfully nice to postpone the error message construction until we know > that it's actually needed, but I don't see any clean (or even messy) > way of doing that. I came across this thread while looking back to see if there was anything relevant to Martin Pihlak's logging patch. It doesn't seem to be a reason not to commit his patch, but I was a bit struck by your comment about the cost of generating error messages. We already knew that was expensive, which is why elog.c has provisions to not do it if the message isn't going to be printed. I am wondering exactly what case you were looking at in the above testing --- was it plpgsql exception blocks? If so, I wonder if there'd be any mileage in trying to postpone error message processing until we see if the user actually asks for the text of the message. The SQLERRM variable would have to become even more magic than it is now for that to work, but since it's a wart already, maybe that's not a big objection. regards, tom lane
On Tue, Mar 6, 2012 at 2:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I would dismissed this out of hand at this if you said it a year ago, >> but I'm older and wiser now. At some point this cycle, I did some >> benchmarking of the subtransaction abort path, since the slowness of >> things like EXCEPTION blocks in PL/pgsql is a known sore point. I >> don't remember the exact numbers anymore, but I do remember the >> general picture, which is that constructing the error message is >> shockingly expensive compared to anything else that we do in that >> path. I dropped it at that point for lack of good ideas: it would be >> awfully nice to postpone the error message construction until we know >> that it's actually needed, but I don't see any clean (or even messy) >> way of doing that. > > I came across this thread while looking back to see if there was > anything relevant to Martin Pihlak's logging patch. It doesn't > seem to be a reason not to commit his patch, but I was a bit struck > by your comment about the cost of generating error messages. We > already knew that was expensive, which is why elog.c has provisions > to not do it if the message isn't going to be printed. I am wondering > exactly what case you were looking at in the above testing --- was it > plpgsql exception blocks? If so, I wonder if there'd be any mileage in > trying to postpone error message processing until we see if the user > actually asks for the text of the message. The SQLERRM variable would > have to become even more magic than it is now for that to work, but > since it's a wart already, maybe that's not a big objection. /me looks to see if he still has the test case. Yep, I do. Attached. A patch I was noodling around with, but ultimately was somewhat underwhelmed by, also attached. IIRC it helps, but not a lot. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Nov 23, 2011 at 9:28 PM, Theo Schlossnagle <jesus@omniti.com> wrote: > We have a need for logging in systems where it isn't feasible to log > to disk as it negatively impacts performance. > > I'd like to be able to creatively solve this problem without modifying > the core, but today I cannot. > > So... here's my first whack at solving this with some flexibility. > > The first thing I did was add hook points where immediate statement > logging happens "pre_exec" and those that present duration > "post_exec". These should, with optimization turned on, have only a > few instructions of impact when no hooks are registered (we could > hoist the branch outside the function call if that were identified as > an issue). > > https://github.com/postwait/postgres/commit/62bb9dfa2d373618f10e46678612720a3a01599a > > The second thing I did was write a sample use of those hooks to > implement a completely non-blocking fifo logger. (if it would block, > it drops the log line). The concept is that we could run this without > risk of negative performance impact due to slow log reading (choosing > to drop logs in lieu of pausing). And a simple process could be > written to consume from the fifo. We use this method in other systems > to log many 10s of thousands of log lines per second with negligible > impact on performance. > > https://github.com/postwait/postgres/commit/c8f5a346c7b2c3eba9f72ea49077dc72f03a2679 > > Thoughts? Feedback? > > As can be seen, the patch is pretty tiny. Theo, Tom recently committed something by another author that is along similar lines to what you have here (I think). Can you comment on whether you think more is still needed and what the differences are between that approach and yours? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 9, 2012 at 9:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Tom recently committed something by another author that is along > similar lines to what you have here (I think). Can you comment on > whether you think more is still needed and what the differences are > between that approach and yours? Hearing no response, I am marking this Returned with Feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company