Thread: Add support for logging the current role
Greetings, Minor enhancement, but a valuable one imv. Hopefully there aren't any issues with it. :) Thanks! Stephen commit 3cb707aa9f228e629e7127625a76a223751a778b Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 09:17:31 2011 -0500 Add support for logging the current role This adds a '%o' option to the log_line_prefix GUC which will log the current role. The '%u' option only logs the Session user, which can be misleading, but it's valuable to have both options. *** a/doc/src/sgml/config.sgml --- b/doc/src/sgml/config.sgml *************** *** 3508,3513 **** local0.* /var/log/postgresql --- 3508,3518 ---- <entry>yes</entry> </row> <row> + <entry><literal>%o</literal></entry> + <entry>Current role name</entry> + <entry>yes</entry> + </row> + <row> <entry><literal>%d</literal></entry> <entry>Database name</entry> <entry>yes</entry> *** a/src/backend/utils/error/elog.c --- b/src/backend/utils/error/elog.c *************** *** 1826,1831 **** log_line_prefix(StringInfo buf, ErrorData *edata) --- 1826,1841 ---- appendStringInfoString(buf, username); } break; + case 'o': + if (MyProcPort) + { + const char *rolename = GetUserNameFromId(GetUserId()); + + if (rolename == NULL || *rolename == '\0') + rolename = _("[unknown]"); + appendStringInfoString(buf, rolename); + } + break; case 'd': if (MyProcPort) {
On Wed, Jan 12, 2011 at 9:23 AM, Stephen Frost <sfrost@snowman.net> wrote: > Minor enhancement, but a valuable one imv. Hopefully there aren't any > issues with it. :) 1. Why %o? That's not obviously mnemonic. Perhaps %U? 2. It won't be clear to people reading this what the difference is between %u and this. You probably need to reword the documentation for the existing option as well as documenting the new one. 3. Please attach the patch rather than including it inline, if possible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Jan 12, 2011 at 9:23 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Minor enhancement, but a valuable one imv. Hopefully there aren't any > > issues with it. :) > > 1. Why %o? That's not obviously mnemonic. Perhaps %U? r was taken? :) I'm not sure I like %U, but in the end I don't *really* care. I'll update it to %U and wait for someone else to complain. > 2. It won't be clear to people reading this what the difference is > between %u and this. You probably need to reword the documentation > for the existing option as well as documenting the new one. Fair enough. > 3. Please attach the patch rather than including it inline, if possible. Hrm, I could have sworn that Tom had asked for the exact opposite in the past, but either way is fine by me. Stephen
On Wed, Jan 12, 2011 at 10:12 AM, Stephen Frost <sfrost@snowman.net> wrote: > r was taken? :) I'm not sure I like %U, but in the end I don't *really* > care. I'll update it to %U and wait for someone else to complain. The joys of community... >> 3. Please attach the patch rather than including it inline, if possible. > > Hrm, I could have sworn that Tom had asked for the exact opposite in the > past, but either way is fine by me. Really? I don't remember that, but it's certainly possible. My problem is that cutting and pasting from a browser window into a patch file tends to be a little iffy. If you paste too much or too little or the whitespace doesn't come out quite right, the patch doesn't apply. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > 1. Why %o? That's not obviously mnemonic. Perhaps %U? > > 2. It won't be clear to people reading this what the difference is > between %u and this. You probably need to reword the documentation > for the existing option as well as documenting the new one. > > 3. Please attach the patch rather than including it inline, if possible. Updated patch attached- commit 7319e8ddc91d62addea25b85f7dbe2f95132cdc1 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 10:23:13 2011 -0500 Use %U for role in log_line_prefix; improve docs Change the variable for logging the current role in log_line_prefix from %o to %U, to better reflect the 'user'-type mnemonic. Improve the documentation for the %U and %u log_line_prefix options to better differentiate them from each other. commit 3cb707aa9f228e629e7127625a76a223751a778b Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 09:17:31 2011 -0500 Add support for logging the current role This adds a '%o' option to the log_line_prefix GUC which will log the current role. The '%u' option only logs the Session user, which can be misleading, but it's valuable to have both options. Thanks! Stephen
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jan 12, 2011 at 10:12 AM, Stephen Frost <sfrost@snowman.net> wrote: >> Hrm, I could have sworn that Tom had asked for the exact opposite in the >> past, but either way is fine by me. > Really? I don't remember that, but it's certainly possible. I don't remember saying exactly that either. The main point is to ensure the patch doesn't get mangled in transmission. I've seen people screw it up both ways: inline is much more vulnerable to mailers deciding to whack whitespace around, while attachments are vulnerable to being encoded in all sorts of weird ways, some of which come out nicely in the archives and some of which don't. I'm not in favor of gzipping small patches that could perfectly well be left in readable form. This particular patch looks fine here: http://archives.postgresql.org/pgsql-hackers/2011-01/msg00845.php so I'm thinking Stephen doesn't need to revisit his technique. +1 for choosing something more mnemonic than "%o", btw. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > +1 for choosing something more mnemonic than "%o", btw. Alright, not to be *too* ridiculous about this, but I'm feeling like '%R' might be better than '%U', if we don't mind overloading a single letter based on case. I've always been annoyed at the lack of distinction between 'user' and 'role' in our docs and feel it does lead to some confusion. Updated patch attached, if people agree. Compiles, passes regressions, works as advertised, etc. commit bba27fe63702405514ed2c3bb72b70cc178f9ce1 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 10:38:24 2011 -0500 Change log_line_prefix for current role to %R As we're going for a mnemonic, and this is really about roles instead of users, change log_line_prefix argument to %R from %U for current_role. Thanks, Stephen
Attachment
On Wed, Jan 12, 2011 at 10:43 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> +1 for choosing something more mnemonic than "%o", btw. > > Alright, not to be *too* ridiculous about this, but I'm feeling like > '%R' might be better than '%U', if we don't mind overloading a single > letter based on case. I've always been annoyed at the lack of > distinction between 'user' and 'role' in our docs and feel it does lead > to some confusion. > > Updated patch attached, if people agree. Compiles, passes regressions, > works as advertised, etc. I was thinking that %u/%U would have the advantage of implying some connection between the two things which is in fact present. %r/%R seems not quite as good to me. Also, let's paint it tangerine. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > I was thinking that %u/%U would have the advantage of implying some > connection between the two things which is in fact present. %r/%R > seems not quite as good to me. Also, let's paint it tangerine. I figured that's where you were going. +1 for whatever the committer wants to commit. ;) Stephen
On Wed, Jan 12, 2011 at 11:00 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> I was thinking that %u/%U would have the advantage of implying some >> connection between the two things which is in fact present. %r/%R >> seems not quite as good to me. Also, let's paint it tangerine. > > I figured that's where you were going. > > +1 for whatever the committer wants to commit. ;) OK, done. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jan 12, 2011 at 11:00 AM, Stephen Frost <sfrost@snowman.net> wrote: >> +1 for whatever the committer wants to commit. ;) > OK, done. :-) Uh, did you actually stop to *think* about this patch? What you have just committed puts a syscache lookup into the elog output path. Quite aside from the likely performance hit, this will malfunction badly in any case where we're trying to log from an aborted transaction. Please revert and rethink. regards, tom lane
On Wed, Jan 12, 2011 at 11:53 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Jan 12, 2011 at 11:00 AM, Stephen Frost <sfrost@snowman.net> wrote: >>> +1 for whatever the committer wants to commit. ;) > >> OK, done. :-) > > Uh, did you actually stop to *think* about this patch? You have a valid point here, but this isn't the most tactful way of putting it. > What you have just committed puts a syscache lookup into the elog output > path. Quite aside from the likely performance hit, this will > malfunction badly in any case where we're trying to log from an aborted > transaction. > > Please revert and rethink. I think it's going to take more than a rethink - I don't see any way to salvage it. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Uh, did you actually stop to *think* about this patch? Actually, I was worried about exactly that, but I didn't see anything at the top of elog.c that indicated if it'd be a problem or not (and the Syscache lookup issue was *exactly* what I was looking for). :( There was much discussion about recursion and memory commit and whatnot, but nothing about SysCache lookups. > What you have just committed puts a syscache lookup into the elog output > path. Quite aside from the likely performance hit, this will > malfunction badly in any case where we're trying to log from an aborted > transaction. I had been looking into storing the current role inside the Proc struct or in some new variable and then pulling it from there (updating it when needed during a SET ROLE, of course), but it seemed a bit of overkill if it wasn't necessary (which wasn't obvious to me). We could also just log the role's OID (%o anyone..?), since that doesn't need a syscache lookup to get at. I'd much rather log the role name if we can tho. I had looked through some of the other calls happening in log_line_prefix and didn't see any explicit syscache lookups but it seemed like we were doing quite a few other things that might have issues, so I had assumed it'd be alright. Sorry about that. Thanks, Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > What you have just committed puts a syscache lookup into the elog output > path. Quite aside from the likely performance hit, this will > malfunction badly in any case where we're trying to log from an aborted > transaction. Attached is my (admittedly horrible) attempt to add some comments to elog.c regarding this issue. Reviewing this, I'm not sure the performance concern is really an issue (given that the user could choose to enable it or not), but clearly the other issue is a concern. Thanks, Stephen commit 4dcf23e007967892557b7b113a9229cb9fc4575d Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 12:22:16 2011 -0500 Improve comments at the top of elog.c Add in some comments about how certain usually available backend systemsmay be unavailable or which won't function properly in elog.c due to the current transaction being in a failed state.
On Wed, Jan 12, 2011 at 12:13 PM, Stephen Frost <sfrost@snowman.net> wrote: >> What you have just committed puts a syscache lookup into the elog output >> path. Quite aside from the likely performance hit, this will >> malfunction badly in any case where we're trying to log from an aborted >> transaction. > > I had been looking into storing the current role inside the Proc struct > or in some new variable and then pulling it from there (updating it when > needed during a SET ROLE, of course), but it seemed a bit of overkill if > it wasn't necessary (which wasn't obvious to me). We could also just log > the role's OID (%o anyone..?), since that doesn't need a syscache lookup > to get at. I'd much rather log the role name if we can tho. Logging the OID seems to be of questionable value. I thought of the update-the-variable-when-it-changes approach too, but besides being a bit expensive if it's changing frequently, it's not necessarily safe to do the syscache lookup there either - see the comments for GetUserIdAndSecContext (which are really for SetUserIdAndSecContext, but they're in an odd place). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Jan 12, 2011 at 12:25 PM, Stephen Frost <sfrost@snowman.net> wrote: > Attached is ... I don't see an attachment, other than signature.asc. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Jan 12, 2011 at 12:25 PM, Stephen Frost <sfrost@snowman.net> wrote: > > Attached is ... > > I don't see an attachment, other than signature.asc. I suck, sorry about that, here it is.. See, inlining is better! I wouldn't have forgotten it! ;) Stephen
Attachment
* Robert Haas (robertmhaas@gmail.com) wrote: > Logging the OID seems to be of questionable value. I certainly disagree about this, not being able to figure out what's causing a 'permissions denied' error because you don't know which role the log is coming from is *very* annoying. Having to go look up the role from the OID in the log is also annoying, but less so, imv. :) > I thought of the > update-the-variable-when-it-changes approach too, but besides being a > bit expensive if it's changing frequently, it's not necessarily safe > to do the syscache lookup there either - see the comments for > GetUserIdAndSecContext (which are really for SetUserIdAndSecContext, > but they're in an odd place). Alright, here's a patch which adds the ability to log the current role's OID and which calls GetUserIdAndSecContext() directly and handles the possibility that CurrentUserId isn't valid. Perhaps we should just grab CurrentUserId directly rather than going through GetUserIdAndSecContext()? I could certainly do that instead. Also includes those additional comments in elog.c. Thanks, Stephen commit d9a7acd5ea1f5214b44875b6d257c5c59590167c Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 12:53:50 2011 -0500 Use GetUserIdAndSecContext to get role OID in elog We can't be sure that GetUserId() will be called when current_user is a valid Oid, per the comments in GetUserIdAndSecContext, when called from elog.c, so instead call GetUserIdAndSecContext directly and handle the possible invalid Oid ourselves. commit 605497b062298ea195d8999f8cefca10968ae22f Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 12:29:44 2011 -0500 Change to logging role's OID instead of name Remove the SysCache lookup from elog.c/log_line_prefix by logging the role's OID instead, this addresses a concern where a SysCache lookup could malfunction badly due to logging from a failed transaction. Note that using SysCache from the elog routines could also be a performance hit, though this would only be the case if a user chose to enable that logging.
Attachment
On Wed, Jan 12, 2011 at 12:59 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Logging the OID seems to be of questionable value. > > I certainly disagree about this, not being able to figure out what's > causing a 'permissions denied' error because you don't know which role > the log is coming from is *very* annoying. Interesting. I wonder if we shouldn't try to fix this by including the relevant role name in the error message. Or is that just going to be too messy to live? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Jan 12, 2011 at 12:59 PM, Stephen Frost <sfrost@snowman.net> wrote: > > I certainly disagree about this, not being able to figure out what's > > causing a 'permissions denied' error because you don't know which role > > the log is coming from is *very* annoying. > > Interesting. I wonder if we shouldn't try to fix this by including > the relevant role name in the error message. Or is that just going to > be too messy to live? It might be possible to do and answer that specific question- but what about the obvious next question: which role was this command run with? iow, if I log dml, how do I know what the role was when the dml statement was run? ie- why was this command allowed? Let's ask another question- why do we provide a %u option in log_line_prefix instead of just logging it as part of each statement? When you have roles that aren't 'inherit' and have a lot of 'set role's happening, you end up asking the same questions about role that you would about user. As a side-note, CurrentUserId isn't actually exported (I'm not suprised, tbh, but I've actually checked now), so you have to go through GetUserIdAndSecContext(). Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Interesting. I wonder if we shouldn't try to fix this by including >> the relevant role name in the error message. Or is that just going to >> be too messy to live? > It might be possible to do and answer that specific question- but what > about the obvious next question: which role was this command run with? > iow, if I log dml, how do I know what the role was when the dml > statement was run? ie- why was this command allowed? I'm less than excited about that argument because it's after the fact --- if you needed to know the information, you probably didn't have log_line_prefix set correctly, even assuming you had adequate logging otherwise. And logging an OID just seems too ugly to live. Another little problem with the quick and dirty solution is that stuff that's important enough to warrant a log_line_prefix escape is generally thought to be important enough to warrant inclusion in CSV logs. That would imply adding a column and taking the resultant compatibility hit. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > It might be possible to do and answer that specific question- but what > > about the obvious next question: which role was this command run with? > > iow, if I log dml, how do I know what the role was when the dml > > statement was run? ie- why was this command allowed? > > I'm less than excited about that argument because it's after the fact > --- if you needed to know the information, you probably didn't have > log_line_prefix set correctly, even assuming you had adequate logging > otherwise. And logging an OID just seems too ugly to live. Erm, really? Ok, fine, maybe you didn't have log_line_prefix set correctly the first time you needed the information, but after you discover that you *don't know*, you're going to be looking for an option to let you get that information for the future. I would also suggest that more experienced admins are going to have a default log_line_prefix that they install on new systems they set up (I know I do...), and I'd be suprised if knowing the role that a command is actually run as wasn't popular among that set. I don't like logging an OID either. > Another little problem with the quick and dirty solution is that stuff > that's important enough to warrant a log_line_prefix escape is generally > thought to be important enough to warrant inclusion in CSV logs. That > would imply adding a column and taking the resultant compatibility hit. I'd be more than happy to add support for this to the CSV logs. I agree that it'd make sense to do. I think we need to solve the bigger problem of OID vs. rolename vs. lookups from elog first though. Thanks, Stephen
* Stephen Frost (sfrost@snowman.net) wrote: > Erm, really? Ok, fine, maybe you didn't have log_line_prefix set > correctly the first time you needed the information, but after you > discover that you *don't know*, you're going to be looking for an option > to let you get that information for the future. Oh, yeah, and honestly, the above is the reason I'm after this myself- I was having a difficult time figuring out which of 300-odd users a given error was happening for and was annoyed that I couldn't figure out what role it was. The web server logs in with a 'general' role that doesn't inherit anything and then has to 'set role' to whatever role the user authenticated with. After that 'set role' happens, we've got no way to know from the logs who is impacted by a given error. Guess I'm just trying to say that I didn't write this patch as an academic exercise but rather because it solves a real world problem for me. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > Guess I'm just trying to say that I didn't write this patch as an > academic exercise but rather because it solves a real world problem for > me. I understand. But doing this right is going to take more than ten lines of code, and more than a negligible performance penalty. We have to consider whether it's worth it. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > I understand. But doing this right is going to take more than ten lines > of code, and more than a negligible performance penalty. We have to > consider whether it's worth it. It'd be ideal if the performance hit could only be felt by people who want to enable the option. On the flip side, I don't know that adding a bit of extra work to SET ROLE would be that bad. If it helps (and I don't know if it does, I'm still trying to wrap my head around GetUserIdAndSecContext/SetUserIdAndSecContext), I'd be fine with *not* trying to log the right role when inside Security Definter functions (after all, if those are getting called, the user could go look at the function definition to see which role it's being run as). I gather one issue is how we can pick up what the correct role name is when resetting the role due to a failed transaction..? Building a stack with all the role names pre-cached to deal with that wouldn't be likely to work and we'd need more than one level to deal with savepoints, I assume? We could reset it to an Invalid name on abort and then detect that it needs to be corrected at the start of the next transaction, perhaps? Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I understand. But doing this right is going to take more than ten lines >> of code, and more than a negligible performance penalty. We have to >> consider whether it's worth it. > It'd be ideal if the performance hit could only be felt by people who > want to enable the option. On the flip side, I don't know that adding a > bit of extra work to SET ROLE would be that bad. If it helps (and I > don't know if it does, I'm still trying to wrap my head around > GetUserIdAndSecContext/SetUserIdAndSecContext), I'd be fine with *not* > trying to log the right role when inside Security Definter functions > (after all, if those are getting called, the user could go look at the > function definition to see which role it's being run as). > I gather one issue is how we can pick up what the correct role name is > when resetting the role due to a failed transaction..? Building a stack > with all the role names pre-cached to deal with that wouldn't be likely > to work and we'd need more than one level to deal with savepoints, I > assume? We could reset it to an Invalid name on abort and then detect > that it needs to be corrected at the start of the next transaction, > perhaps? I seem to recall that the assign hook for role stores the string form of the role name anyway. So in principle you could arrange for that to get dumped someplace where elog.c could look at it (think about just adding a string parameter to SetUserIdAndSecContext). It wouldn't track the effects of RENAME ROLE against an actively-used role, but then again neither does %u. I'm not actually concerned about adding a few extra cycles to SET ROLE for this. What bothered me more was the cost of adding another output column to CSV log mode. That's not something you're going to be able to finesse such that only people who care pay the cost. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Another little problem with the quick and dirty solution is that stuff > that's important enough to warrant a log_line_prefix escape is generally > thought to be important enough to warrant inclusion in CSV logs. That > would imply adding a column and taking the resultant compatibility hit. Well if we're down to adding columns to the CSV format, what about adding an explicit column where to output the query duration as an interval literal, rather than putting it in the query string (IIRC)? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 01/12/2011 08:59 PM, Tom Lane wrote: > > I'm not actually concerned about adding a few extra cycles to SET ROLE > for this. What bothered me more was the cost of adding another output > column to CSV log mode. That's not something you're going to be able to > finesse such that only people who care pay the cost. > > I think it's time to revisit the design of CSV logs again, now we have two or three releases worth of experience with it. It needs some flexibility and refinement. cheers andrew
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > I seem to recall that the assign hook for role stores the string form of > the role name anyway. Indeed it does, and it's already exposed through show_role() since it's needed in guc.c. Based on my review and understanding of the comments and calls, it also doesn't do anything particularly complicated or any syscache searches or anything. > It wouldn't track the > effects of RENAME ROLE against an actively-used role, but then again > neither does %u. Right, I didn't specifically point that out in the documentation changes, but I can if people feel it's neceessary. > What bothered me more was the cost of adding another output > column to CSV log mode. That's not something you're going to be able to > finesse such that only people who care pay the cost. I definitely feel this is something that we should be logging in the CSV also, and you're right, there doesn't appear to be a way to do that without just outright changing the format and causing people to have to update anything/everything that uses it. I have a hard time with the idea that we'll commit to never changing that format though, so do we want to provide a way for users to specify the format (ala log_line_prefix), or just ask users to expect and deal with format changes when they happen..? I noticed Dimitri would like another change to the CSV log format (which looked reasonable to me, asking to have something split out from the query string itself), it'd certainly be better to change both in the same release than split them across two (of course, we might come up with something else in the future...). I have to admit to being a bit suprised the CSV logging wasn't implemented with a 'format' type option. I'm not sure if I have the cycles or even if we would want to try and add that now, but it strikes me as something we should probably do. Updated patch attached, included new comments for elog.c too. Thanks, Stephen commit 4e27ab79ef9b0d0c3c9824d672e06160dd227cc2 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 12 12:22:16 2011 -0500 Improve comments at the top of elog.c Add in some comments about how certain usually available backend systems may be unavailable or which won't function properly in elog.c due to the current transaction being in a failed state. commit d3ca4063ba8e16930278947c32c336b5b80cdaba Author: Stephen Frost <sfrost@snowman.net> Date: Fri Jan 14 11:19:45 2011 -0500 Add %U option to log_line_prefix for current role This adds a new option to log_line_prefix (%U) to allow the current role to be logged, which is valuable information when an application or user is using SET ROLE and roles which are set 'noinherit'. This also changes the current definition of %u to be 'Session user', to avoid confusion when a superuser uses 'SET SESSION AUTHORIZATION'. Otherwise, a log might read 'login_user none' but actually be running as a different user due to SET SESSION AUTHORIZATION. The 'username' field for CSV logging was also updated to be 'Session user'. Note: SET SESSION AUTHORIZATION is only allowed for superusers, and the logged username will only change if SET SESSION AUTHORIZATION is called, so this is unlikely to have signifigant user impact. Last, but certainly not least, role_name was added as a new column to the CSV log output and corresponding example CSV table definition. This is a user-visible change which should be called out in the release notes.
Attachment
Andrew Dunstan <andrew@dunslane.net> writes: > On 01/12/2011 08:59 PM, Tom Lane wrote: >> I'm not actually concerned about adding a few extra cycles to SET ROLE >> for this. What bothered me more was the cost of adding another output >> column to CSV log mode. That's not something you're going to be able to >> finesse such that only people who care pay the cost. > I think it's time to revisit the design of CSV logs again, now we have > two or three releases worth of experience with it. It needs some > flexibility and refinement. It would definitely be nice to support optional columns a little better. I'm not even sure whether the runtime overhead is worth worrying about (maybe it is, maybe it isn't, I have no data). But I do know that adding a column to the CSV output format spec causes a flag day for users. How can we avoid that? regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > I think it's time to revisit the design of CSV logs again, now we have > > two or three releases worth of experience with it. It needs some > > flexibility and refinement. > > It would definitely be nice to support optional columns a little better. > I'm not even sure whether the runtime overhead is worth worrying about > (maybe it is, maybe it isn't, I have no data). But I do know that > adding a column to the CSV output format spec causes a flag day for > users. How can we avoid that? My first thought would be to have a 'log_csv_format' GUC that's very similar to 'log_line_prefix' (and uses the same variables if possible..). We could then ship a default in postgresql.conf that matches what the current format is while adding the other options if people want to use them. If we could have all the processing to generate that line go through the same function for log_line_prefix and log_csv_format, that'd be even better. Makes me tempted to throw out the current notion of 'log_line_*prefix*' and replace it with 'log_line_*format*' to match exactly the 'log_csv_format' that I'm proposing. That'd undoubtably cause more user headaches tho... :( Thanks, Stephen
On 01/14/2011 11:48 AM, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Andrew Dunstan<andrew@dunslane.net> writes: >>> I think it's time to revisit the design of CSV logs again, now we have >>> two or three releases worth of experience with it. It needs some >>> flexibility and refinement. >> It would definitely be nice to support optional columns a little better. >> I'm not even sure whether the runtime overhead is worth worrying about >> (maybe it is, maybe it isn't, I have no data). But I do know that >> adding a column to the CSV output format spec causes a flag day for >> users. How can we avoid that? > My first thought would be to have a 'log_csv_format' GUC that's very > similar to 'log_line_prefix' (and uses the same variables if > possible..). We could then ship a default in postgresql.conf that > matches what the current format is while adding the other options if > people want to use them. > > If we could have all the processing to generate that line go through the > same function for log_line_prefix and log_csv_format, that'd be even > better. Makes me tempted to throw out the current notion of > 'log_line_*prefix*' and replace it with 'log_line_*format*' to match > exactly the 'log_csv_format' that I'm proposing. That'd undoubtably > cause more user headaches tho... :( > > I'm not sure I really want to make it that flexible :-) To deal with the issue Tom's referring to, I think it would be sufficient if we just allowed users to suppress production of certain columns (as long as we never do anything so evil as to add a new column in the middle). There are some other issues with the format. I know Josh has bitched about the presence of command tags in certain fields, for example. cheers andrew
On Fri, Jan 14, 2011 at 4:56 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > I'm not sure I really want to make it that flexible :-) > > To deal with the issue Tom's referring to, I think it would be sufficient if > we just allowed users to suppress production of certain columns (as long as > we never do anything so evil as to add a new column in the middle). > > There are some other issues with the format. I know Josh has bitched about > the presence of command tags in certain fields, for example. If there is going to be any change, how about using fixed columns (an possibly allowing them to be empty for stuff that's expensive to create/write), but adding a 1st column that contains a "version" identifyer. And to make it easy, maybe the PG major version as the version value. If the 1st column is always the version, tools can easily know if they understand all the columns (and what order they are in) and it' easy to write a "conversion" that strips/re-aranges columns from a newer CVS dump to match an older one if you have tools that don't know about newer column layouts.. Personally, I'm not worried about the CSV logs being backwards compatible as long as there's a very easy way to know what I might be looking at, so conversion is easy... But then again, I don't have multiple gigabytes of logs to process either. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Andrew Dunstan <andrew@dunslane.net> writes: > On 01/14/2011 11:48 AM, Stephen Frost wrote: >> My first thought would be to have a 'log_csv_format' GUC that's very >> similar to 'log_line_prefix' (and uses the same variables if >> possible..). We could then ship a default in postgresql.conf that >> matches what the current format is while adding the other options if >> people want to use them. > I'm not sure I really want to make it that flexible :-) It actually sounded like a pretty good idea to me. The current CSV format is already overly bulky/verbose, because it includes absolutely everything anybody ever wanted before now. Allowing people to select what they actually need, and thereby get rid of some of the overhead they're currently paying, would be a good thing. regards, tom lane
On 01/14/2011 05:04 PM, Aidan Van Dyk wrote: > If there is going to be any change, how about using fixed columns (an > possibly allowing them to be empty for stuff that's expensive to > create/write), but adding a 1st column that contains a "version" > identifyer. And to make it easy, maybe the PG major version as the > version value. > > If the 1st column is always the version, tools can easily know if > they understand all the columns (and what order they are in) and it' > easy to write a "conversion" that strips/re-aranges columns from a > newer CVS dump to match an older one if you have tools that don't know > about newer column layouts.. > > The whole point of having CSV logs is so you can load them into a database table without needing preprocessing tools. So I'm not going to be very receptive to changes that are predicated on using such tools. cheers andrew
Aidan Van Dyk <aidan@highrise.ca> writes: > If there is going to be any change, how about using fixed columns (an > possibly allowing them to be empty for stuff that's expensive to > create/write), but adding a 1st column that contains a "version" > identifyer. And to make it easy, maybe the PG major version as the > version value. Seems like that just adds even more overhead, without really solving any of the problems we're concerned about. Code consuming the CSV log would still need a-priori knowledge of what columns to expect. regards, tom lane
On 01/14/2011 05:08 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 01/14/2011 11:48 AM, Stephen Frost wrote: >>> My first thought would be to have a 'log_csv_format' GUC that's very >>> similar to 'log_line_prefix' (and uses the same variables if >>> possible..). We could then ship a default in postgresql.conf that >>> matches what the current format is while adding the other options if >>> people want to use them. >> I'm not sure I really want to make it that flexible :-) > It actually sounded like a pretty good idea to me. The current CSV > format is already overly bulky/verbose, because it includes absolutely > everything anybody ever wanted before now. Allowing people to select > what they actually need, and thereby get rid of some of the overhead > they're currently paying, would be a good thing. > > If you have a format string, what do you want to do with the bits of the format that aren't field references? What about delimiters? A format string makes it too easy to muck up and too hard to get right, IMNSHO. History has shown how easy it is to muck up CSVs. The suggestion I made of allowing people to suppress production of certain columns would take care of the bulk problem much more safely, I think. We've actually had remarkably few issues with CSV logs not being loadable, that I know of anyway. When we implemented it, I expected many more issues with it than we've had. I'd like to keep it that way. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 01/14/2011 05:08 PM, Tom Lane wrote: >> It actually sounded like a pretty good idea to me. > If you have a format string, what do you want to do with the bits of the > format that aren't field references? I was thinking of it as being strictly a field list. I don't know whether it's really practical to borrow log_line_prefix's one-character names for the fields (for one thing, there would need to be names for all the existing CSV columns, not all of which equate to log_line_prefix escapes); but in any case anything other than field references would be disallowed. If you prefer to use a name list as the syntax that's fine with me. regards, tom lane
On Fri, Jan 14, 2011 at 8:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 01/14/2011 05:08 PM, Tom Lane wrote: >>> It actually sounded like a pretty good idea to me. > >> If you have a format string, what do you want to do with the bits of the >> format that aren't field references? > > I was thinking of it as being strictly a field list. I don't know > whether it's really practical to borrow log_line_prefix's one-character > names for the fields (for one thing, there would need to be names for > all the existing CSV columns, not all of which equate to log_line_prefix > escapes); but in any case anything other than field references would be > disallowed. If you prefer to use a name list as the syntax that's fine > with me. I think we're in the process of designing a manned mission to Mars to solve the problem that our shoelaces are untied. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > If you have a format string, what do you want to do with the bits of the > > format that aren't field references? > > I was thinking of it as being strictly a field list. I don't know > whether it's really practical to borrow log_line_prefix's one-character > names for the fields (for one thing, there would need to be names for > all the existing CSV columns, not all of which equate to log_line_prefix > escapes); I'm not really happy about the idea that you can only get certain information in a log file if you use CSV format. I also don't know that there's really any particular reason log_line_prefix's names have to be one character. > but in any case anything other than field references would be > disallowed. If you prefer to use a name list as the syntax that's fine > with me. I do like the idea of having just a field list though, to keep things simple for the CSV users, and we could also pre-process the list into flag variables or a bitmask or similar to be able to quickly check if a certain field should be included or not. I'm not really keen about how log_line_prefix currently parses the direct user-provided syntax every time; strikes me as inefficient. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I was thinking of it as being strictly a field list. I don't know >> whether it's really practical to borrow log_line_prefix's one-character >> names for the fields (for one thing, there would need to be names for >> all the existing CSV columns, not all of which equate to log_line_prefix >> escapes); > I'm not really happy about the idea that you can only get certain > information in a log file if you use CSV format. I said no such thing! The point here is that there is a great deal of stuff in the textual log format that is not governed by log_line_prefix, so log_line_prefix provides no precedent for naming it: the error level, the SQLSTATE, the primary message, the detail, the hint, etc, all come out without any connection to log_line_prefix. In CSV all of those already exist as columns. If we want users to be able to control which CSV columns get emitted, they'll need to be able to name those columns, and log_line_prefix doesn't provide any precedent for that. > I also don't know > that there's really any particular reason log_line_prefix's names > have to be one character. Well, it's pretty much conventional for %-escapes to be that way, though I agree we're kind of straining the system already. > I do like the idea of having just a field list though, to keep things > simple for the CSV users, and we could also pre-process the list into > flag variables or a bitmask or similar to be able to quickly check if a > certain field should be included or not. I'm not really keen about how > log_line_prefix currently parses the direct user-provided syntax every > time; strikes me as inefficient. For log_line_prefix I'm not sure you could do a lot better. I agree that a field name list for CSV would have to be preprocessed somehow for efficiency. regards, tom lane
On 01/14/2011 08:41 PM, Robert Haas wrote: > > I think we're in the process of designing a manned mission to Mars to > solve the problem that our shoelaces are untied. > What's your suggestion, then? cheers andre
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 14, 2011 at 8:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I was thinking of it as being strictly a field list. > I think we're in the process of designing a manned mission to Mars to > solve the problem that our shoelaces are untied. How so? ISTM the problems at hand are (a) we can't add a new CSV column without causing a flag day for users who may not even care about the information, and (b) we're worried that emitting all these columns may result in a performance hit, again for information that a particular user may not need. A user-settable column list seems pretty on-target for solving those problems to me. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > A user-settable column list seems pretty on-target > for solving those problems to me. I'm looking into implementing this. An interesting initial question is- should the users be able to control the *order* of the columns? My gut feeling, if we're giving them a GUC that's a list of fields, is 'yes', but I'm happy to listen to other thoughts. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> A user-settable column list seems pretty on-target >> for solving those problems to me. > I'm looking into implementing this. > An interesting initial question is- should the users be able to control > the *order* of the columns? My gut feeling, if we're giving them a GUC > that's a list of fields, is 'yes', but I'm happy to listen to other > thoughts. Yeah, I was just thinking about that in connection with the suggestion of using a bitmap as the pre-parsed representation (which would more or less force adoption of the fixed-column-order approach). I really think we can't get away with that. Remember what Andrew pointed out upthread: it's important to be able to load the csvlog output directly into a table without any extra processing. Suppose a DBA is logging columns A,B,D and he later realizes that logging C would be a good thing too. He's going to have to ALTER TABLE ADD COLUMN to add C to his logging table ... and now it's at the end. This is no problem if he can set the GUC to be "A,B,D,C" and have the field order be honored. Otherwise he's got a problem. In any case, if the GUC representation is a list of field names, I think the POLA demands that the system honor the list order. You could escape that expectation by controlling the feature with a pile of booleans (csv_log_pid = on, csv_log_timestamp = off, etc) but I can't say that that sort of API appeals to me. BTW, in case you didn't know, there are some GUCs defined as lists of identifiers already (look for GUC_LIST bits). Be sure to steal code. regards, tom lane
On 01/14/2011 09:51 PM, Tom Lane wrote: > Stephen Frost<sfrost@snowman.net> writes: >> * Tom Lane (tgl@sss.pgh.pa.us) wrote: >>> A user-settable column list seems pretty on-target >>> for solving those problems to me. >> I'm looking into implementing this. >> An interesting initial question is- should the users be able to control >> the *order* of the columns? My gut feeling, if we're giving them a GUC >> that's a list of fields, is 'yes', but I'm happy to listen to other >> thoughts. > Yeah, I was just thinking about that in connection with the suggestion > of using a bitmap as the pre-parsed representation (which would more or > less force adoption of the fixed-column-order approach). I really think > we can't get away with that. Remember what Andrew pointed out upthread: > it's important to be able to load the csvlog output directly into a > table without any extra processing. Suppose a DBA is logging columns > A,B,D and he later realizes that logging C would be a good thing too. > He's going to have to ALTER TABLE ADD COLUMN to add C to his logging > table ... and now it's at the end. This is no problem if he can set the > GUC to be "A,B,D,C" and have the field order be honored. Otherwise he's > got a problem. Ok, you sold me. Until I read this I was inclined to say not, on KISS principles. The only thing I'd suggest extra is that we might allow "version_n_m" as shorthand for the default table layout from the relevant version. cheers andrew
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > In any case, if the GUC representation is a list of field names, I think > the POLA demands that the system honor the list order. Agreed. That puts us back into the question of how to make it efficient. My best thought at the moment, which doesn't strike me as particularly efficient, is to build an array of the columns as enum's and then have loop through the array and use a switch() on the enum. At least it's all integer-based there then and we're not calling strcmp() for every field or strchr to find the next field, but couldn't we do better? > BTW, in case you didn't know, there are some GUCs defined as lists of > identifiers already (look for GUC_LIST bits). Be sure to steal code. No, I didn't.. Excellent. Thanks! Stephen
* Andrew Dunstan (andrew@dunslane.net) wrote: > The only thing I'd suggest extra is that we might allow > "version_n_m" as shorthand for the default table layout from the > relevant version. I like that idea, makes the default a lot simpler to deal with too. :) Thanks! Stephen
Andrew Dunstan <andrew@dunslane.net> writes: > The only thing I'd suggest extra is that we might allow "version_n_m" as > shorthand for the default table layout from the relevant version. Mmm ... seems like that just complicates matters. To make that useful, you have to assume that there *is* a default table layout, it's different across versions, and everything that looks at this GUC value will know instantly what it is for each version. The last bit is kind of a killer for tools like pgfouine, no? In any case I thought the expectation here was that the default column list would be frozen at what it is now, and probably will never change. regards, tom lane
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> In any case, if the GUC representation is a list of field names, I think >> the POLA demands that the system honor the list order. > Agreed. That puts us back into the question of how to make it > efficient. My best thought at the moment, which doesn't strike me as > particularly efficient, is to build an array of the columns as enum's > and then have loop through the array and use a switch() on the enum. Yeah, an array or list of integer codes was what I was thinking too. > At least it's all integer-based there then and we're not calling > strcmp() for every field or strchr to find the next field, but > couldn't we do better? I really doubt that the cycles spent in the loop + switch are going to amount to anything at all, compared to the cycles involved in formatting each field and then pushing it through the CSV logic. Not to mention the I/O costs of sending the string somewhere afterwards. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > everything that looks at this GUC value > will know instantly what it is for each version. > The last bit is kind of a killer for tools like pgfouine, no? Ugh.. Could we just accept it as input but return the full list if asked for it> > In any case I thought the > expectation here was that the default column list would be frozen at > what it is now, and probably will never change. This I don't like.. When I install a new version fresh, I like to get all of the "bells & whistles" that go along with it, which, in my view, would include new fields that the smart PG folks have decided might be useful for me. I'd like to provide a way for users who are upgrading to be able to get the old behavior back, to minimize the trouble for them, and being able to say "just change the version_9_1 to version_9_0 in your log_csv_fields GUC" is a heck of a lot better than "well, rip out the default and replace it with this huge list of fields". I'd been puzzling over how to deal with this big list of fields in postgresql.conf and I do like the idea of some kind of short-cut being provided to ease the pain for users. What about something other than version_x_y? I could maybe see having a 'default' and an 'all' instead.. Then have the default be what we have currently and 'all' be the full list I'm thinking about. Thanks, Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Yeah, an array or list of integer codes was what I was thinking too. Hm, yeah, a list of integer codes might be even better/simpler. Okay, next user-interface question- thoughts on handling SIGHUP? My first reaction is that we should create a new log file on SIGHUP (if we don't already, havn't checked), or maybe just on SIGHUP if this variable changes. Point being, until we get Andrew's jagged-csv-import magic committed to core, we won't be able to import a log file that a user has changed the field list for mid-stream (following the add-a-new-column use-case we've been discussing). Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > I'd been puzzling over how to deal with this big list of fields in > postgresql.conf and I do like the idea of some kind of short-cut being > provided to ease the pain for users. Yeah, I agree with the worry that a default value that's a mile long is going to be a bit of a PITA. But I don't think we're there yet on having a better solution. > What about something other than > version_x_y? I could maybe see having a 'default' and an 'all' > instead.. Then have the default be what we have currently and 'all' be > the full list I'm thinking about. If "default" always means what it means today, I can live with that. But if the meaning of "all" changes from version to version, that seems like a royal mess. Again, I'm concerned that an external tool like pgfouine be able to make sense of the value without too much context. If it doesn't know what some of the columns are, it can just ignore them --- but a magic summary identifier that it doesn't understand at all is a problem. regards, tom lane
Stephen Frost <sfrost@snowman.net> writes: > Okay, next user-interface question- thoughts on handling SIGHUP? My > first reaction is that we should create a new log file on SIGHUP (if we > don't already, havn't checked), or maybe just on SIGHUP if this variable > changes. > Point being, until we get Andrew's jagged-csv-import magic committed to > core, we won't be able to import a log file that a user has changed the > field list for mid-stream (following the add-a-new-column use-case we've > been discussing). Now I think you're reaching the mission-to-mars stage that Robert was complaining about. Solving that sort of problem is well outside the scope of this patch. I don't care if people have to shut down and restart their servers in order to make a change to the log format. Even if I did, the other patch sounds like a better approach. regards, tom lane
On Fri, Jan 14, 2011 at 9:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 01/14/2011 08:41 PM, Robert Haas wrote: >> I think we're in the process of designing a manned mission to Mars to >> solve the problem that our shoelaces are untied. > What's your suggestion, then? If there's a practical way to add the requested escape, add it to the text format and leave reengineering the CSV format for another day. Yeah, I know that's not the most beautiful solution in the world, but we're doing engineering here, not theology. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 14, 2011 at 9:24 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> What's your suggestion, then? > If there's a practical way to add the requested escape, add it to the > text format and leave reengineering the CSV format for another day. > Yeah, I know that's not the most beautiful solution in the world, but > we're doing engineering here, not theology. Well, the original patch was exactly that. But I don't agree with that approach; I think allowing the capabilities of text and CSV logs to diverge significantly would be a mistake. If a piece of information is valuable enough to need a way to include it in textual log entries, then you need a way to include it in CSV log entries too. If it's not valuable enough to do the work to support it in CSV, then we can live without it. regards, tom lane
On 01/15/2011 11:08 AM, Tom Lane wrote: > Robert Haas<robertmhaas@gmail.com> writes: >> On Fri, Jan 14, 2011 at 9:24 PM, Andrew Dunstan<andrew@dunslane.net> wrote: >>> What's your suggestion, then? >> If there's a practical way to add the requested escape, add it to the >> text format and leave reengineering the CSV format for another day. >> Yeah, I know that's not the most beautiful solution in the world, but >> we're doing engineering here, not theology. > Well, the original patch was exactly that. But I don't agree with that > approach; I think allowing the capabilities of text and CSV logs to > diverge significantly would be a mistake. If a piece of information is > valuable enough to need a way to include it in textual log entries, > then you need a way to include it in CSV log entries too. If it's not > valuable enough to do the work to support it in CSV, then we can live > without it. > > Yeah, I agree, that's exactly the kind of divergence we usually try to avoid. And it's hardly theology to say let's not do a half-assed job on this. cheers andrew
Excerpts from Tom Lane's message of sáb ene 15 00:34:40 -0300 2011: > Stephen Frost <sfrost@snowman.net> writes: > > What about something other than > > version_x_y? I could maybe see having a 'default' and an 'all' > > instead.. Then have the default be what we have currently and 'all' be > > the full list I'm thinking about. > > If "default" always means what it means today, I can live with that. > But if the meaning of "all" changes from version to version, that seems > like a royal mess. Again, I'm concerned that an external tool like > pgfouine be able to make sense of the value without too much context. > If it doesn't know what some of the columns are, it can just ignore them > --- but a magic summary identifier that it doesn't understand at all is > a problem. Maybe if we offered a way for the utility to find out the field list from the magic identifier, it would be enough. (It would be neat to have magic identifiers for "terse", "verbose", etc, that mimicked the behavior of client processing) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 01/17/2011 11:44 AM, Alvaro Herrera wrote: > Excerpts from Tom Lane's message of sáb ene 15 00:34:40 -0300 2011: >> Stephen Frost<sfrost@snowman.net> writes: >>> What about something other than >>> version_x_y? I could maybe see having a 'default' and an 'all' >>> instead.. Then have the default be what we have currently and 'all' be >>> the full list I'm thinking about. >> If "default" always means what it means today, I can live with that. >> But if the meaning of "all" changes from version to version, that seems >> like a royal mess. Again, I'm concerned that an external tool like >> pgfouine be able to make sense of the value without too much context. >> If it doesn't know what some of the columns are, it can just ignore them >> --- but a magic summary identifier that it doesn't understand at all is >> a problem. > Maybe if we offered a way for the utility to find out the field list > from the magic identifier, it would be enough. > > (It would be neat to have magic identifiers for "terse", "verbose", > etc, that mimicked the behavior of client processing) > Just output a header line with the column names. We've long been able to import such files. If the list of columns changes we should rotate log files before outputting the new format. That might get a little tricky to coordinate between backends. cheers andrew
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > Point being, until we get Andrew's jagged-csv-import magic committed to > > core, we won't be able to import a log file that a user has changed the > > field list for mid-stream (following the add-a-new-column use-case we've > > been discussing). > > Now I think you're reaching the mission-to-mars stage that Robert was > complaining about. Solving that sort of problem is well outside the > scope of this patch. I don't care if people have to shut down and > restart their servers in order to make a change to the log format. > Even if I did, the other patch sounds like a better approach. Alright, here's the latest on this patch. I've added a log_csv_fields GUC along with the associated magic to make it work (at least for me). Also added 'role_name' and '%U' options. Requires postmaster restart to change, didn't include any 'short-cut' field options, though I don't think it'd be hard to do if we can decide on it. Default remains the same as what was in 9.0. commit ff249aeac7216da623bf77840380d5e767f681fc Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 00:26:52 2011 -0500 Add log_csv_fields GUC for CSV output & curr_role This patch adds a new GUC called 'log_csv_fields'. This GUC allows the user to control the set of fields written to the CSV output as well as the order in which they are written. The default set of fields remains those that were included in 9.0, to avoid breaking existing user configurations. In passing, update 'user_name' for log_line_prefix and log_csv_fields to mean 'session user' (which could be reset by a superuser with set session authorization), and add a 'role_name' option (%U) to log_line_prefix and log_csv_fields, to allow users to log the current role (as set by SET ROLE- not impacted by SECURITY DEFINER functions). Thanks, Stephen
Attachment
On Wed, Jan 19, 2011 at 14:36, Stephen Frost <sfrost@snowman.net> wrote: > Alright, here's the latest on this patch. I've added a log_csv_fields > GUC along with the associated magic to make it work (at least for me). > Also added 'role_name' and '%U' options. Requires postmaster restart to > change, didn't include any 'short-cut' field options, though I don't > think it'd be hard to do if we can decide on it. Default remains the > same as what was in 9.0. Hi, I reviewed log_csv_options.patch. It roughly looks good, but I'd like to discuss about the design in some points. ==== Features ==== The patch adds "log_csv_fields" GUC variable. It allows to customize columns in csvlog. The default setting is what we were writing 9.0 or earlier versions. It also add "role_name" for log_csv_fields and "%U" for log_line_prefix to record role names. They are the name set by SET ROLE. OTOH, user_name and %u shows authorization user names. ==== Discussions ==== * How about "csvlog_fields" rather than "log_csv_fields"? Since we have variables with syslog_ prefix, csvlog_ prefix seemsto be better. * We use %<what> syntax to specify fields in logs for log_line_prefix, but will use long field names for log_csv_fields.Do you have any better idea to share the names in both options? At least I want to share the short descriptionfor them in postgresql.conf. * log_csv_fields's GUC context is PGC_POSTMASTER. Is it by design? PGC_SIGHUP would be more consistent compared with log_line_prefix.However, the csv format will not be valid because the column definitions will be changed in the middle offile. * "none" is not so useful for the initial "role_name" field. Should we use user_name instead of the empty role_name? ==== Comments for codes ==== Some of the items are trivial, though. * What objects do you want to allocate in TopMemoryContext in assign_log_csv_fields() ? AFAICS, we don't have to keep rawstringand column_list in long-term context. Or, if you need TopMemoryContext, those variables should be pfree'ed at theend of function. * appendStringInfoChar() calls in write_csvlog() seem to be wrong when the last field is not application_name. * Docs need more cross-reference hyper-links for "see also" items. * Docs need some tags for itemized elements or pre-formatted codes. They looks itemized in the sgml files, but will be flattenedin complied HTML files. * A declaration of assign_log_csv_fields() at the top of elog.c needs "extern". * There is a duplicated declaration for build_default_csvlog_list(). * list_free() is NIL-safe. You don't have to check whether the list is NIL before call the function. -- Itagaki Takahiro
Itagaki, * Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > ==== Discussions ==== > * How about "csvlog_fields" rather than "log_csv_fields"? > Since we have variables with syslog_ prefix, csvlog_ prefix > seems to be better. Sure, not a big deal, to be honest, as long as we can actually agree on something... Not changed in the patch, but I can if people want. > * We use %<what> syntax to specify fields in logs for log_line_prefix, > but will use long field names for log_csv_fields. Do you have any > better idea to share the names in both options? At least I want to > share the short description for them in postgresql.conf. No, I don't, and that's going well beyond what I feel makes sense for this patch at this time. We could review that for 9.2 or later, but we've had quite enough expanding-of-requirements for this patch already, imnsho. > * log_csv_fields's GUC context is PGC_POSTMASTER. Is it by design? > PGC_SIGHUP would be more consistent compared with log_line_prefix. > However, the csv format will not be valid because the column > definitions will be changed in the middle of file. Doing SIGHUP would require addressing how to get all of the backends to close the old log file and open the new one, because we don't want to have a given log file which has two different CSV formats in it (we wouldn't be able to load it into the database...). This was specifically addressed in the thread leading up to this patch... > * "none" is not so useful for the initial "role_name" field. > Should we use user_name instead of the empty role_name? none is what it is, however, if you query 'show role'. I would rather be consistant with that than log something else. > ==== Comments for codes ==== > Some of the items are trivial, though. > > * What objects do you want to allocate in TopMemoryContext in > assign_log_csv_fields() ? AFAICS, we don't have to keep rawstring > and column_list in long-term context. Or, if you need TopMemoryContext, > those variables should be pfree'ed at the end of function. You're right, rawstring and column_list don't need to be in TopMemoryContext. I just moved the switch to Top to be after those are allocated. > * appendStringInfoChar() calls in write_csvlog() seem to be wrong > when the last field is not application_name. Urgh, right, fixed to *not* include a trailing comma on the last column. > * Docs need more cross-reference hyper-links for "see also" items. > > * Docs need some tags for itemized elements or pre-formatted codes. > They looks itemized in the sgml files, but will be flattened in > complied HTML files. Not sure what you're referring to here...? Can you elaborate? I'm not great with the docs. :/ > * A declaration of assign_log_csv_fields() at the top of elog.c > needs "extern". Err, no, I don't think it does. None of the other exported functions from elog.c have extern declarations in elog.c... I did realize that I probably shouldnt have the declaration at the top of elog.c for assign_loc_csv_fields() or build_default_csvlog_list(). > * There is a duplicated declaration for build_default_csvlog_list(). Removed the duplicate in elog.c. > * list_free() is NIL-safe. You don't have to check whether the list > is NIL before call the function. Fixed. Updated patch attached. Thanks! Stephen
Attachment
> Updated patch attached. I think we need to improve postgresql.conf.sample a bit more, especially the long line for #log_csv_fields = '...'. 330 characters in it! #1. Leave the long line because it is needed. #2. Hide thevariable from the default conf. #3. Use short %x mnemonic both in log_line_prefix and log_csv_fields. (It might requiremany additional mnemonics.) Which is better, or another idea? On Sat, Jan 29, 2011 at 13:06, Stephen Frost <sfrost@snowman.net> wrote: >> * log_csv_fields's GUC context is PGC_POSTMASTER. Is it by design? > > Doing SIGHUP would require addressing how to get all of the backends to > close the old log file and open the new one, because we don't want to > have a given log file which has two different CSV formats in it (we > wouldn't be able to load it into the database...). This was > specifically addressed in the thread leading up to this patch... I think it depends default log filename, that contains %S (seconds) suffix. We can remove %S from log_filename; if we use a log per-day, those log might contain different columns even after restart. If we cannot avoid zigged csv fields completely, SIGHUP seems reasonable for it. >> * What objects do you want to allocate in TopMemoryContext in >> assign_log_csv_fields() ? > I just moved the switch to Top to be after those are allocated. How about changing the type of csv_log_fields from List* to fixed array of LogCSVFields? If so, we can use an array-initializer instead of build_default_csvlog_list() ? The code will be simplified. Fixed length won't be a problem because it would be rare that the same field are specified many times. >> * Docs need some tags for itemized elements or pre-formatted codes. >> They looks itemized in the sgml files, but will be flattened in >> complied HTML files. > > Not sure what you're referring to here...? Can you elaborate? I'm not > great with the docs. :/ Could you try to "make html" in the doc directory? Your new decumentation after | These columns may be included in the CSV output: will be unaligned plain text without some tags. -- Itagaki Takahiro
* Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > I think we need to improve postgresql.conf.sample a bit more, especially > the long line for #log_csv_fields = '...'. 330 characters in it! > #1. Leave the long line because it is needed. It's needed to match what the current default is.. > #2. Hide the variable from the default conf. I don't like that idea. > #3. Use short %x mnemonic both in log_line_prefix and log_csv_fields. > (It might require many additional mnemonics.) It would require a lot more, and wouldn't scale well either (this was discussed previously..). > I think it depends default log filename, that contains %S (seconds) > suffix. We can remove %S from log_filename; if we use a log per-day, > those log might contain different columns even after restart. If we > cannot avoid zigged csv fields completely, SIGHUP seems reasonable for it. This problem is bigger than SIGHUP, but at least with a restart required, the default will work properly. The default configuration wouldn't work w/ a change to the log line and a SIGHUP done. If you change the log filename then it could generate jagged CSV files even on restart. We'd have to move the old log file out of the way when/if we detect that it had a different CSV format and that's really just not practical. We don't want to cause problems for people, but I don't think there's a way to prevent jagged CSVs if they're going to go out of thier way to configure PG to create them. > How about changing the type of csv_log_fields from List* to fixed > array of LogCSVFields? If so, we can use an array-initializer > instead of build_default_csvlog_list() ? The code will be simplified. > Fixed length won't be a problem because it would be rare that the > same field are specified many times. I really don't think changing it to an array is necessary or even particularly helpful, and I don't agree that the code would actually be simpler- you still have to make sure the CSV fields are kept in the order requested by the user. Also, you'd have to remember to update the length of the static array every time a new log option was added or risk writing past the end of the array.. > Could you try to "make html" in the doc directory? > Your new decumentation after > | These columns may be included in the CSV output: > will be unaligned plain text without some tags. Alright, I can take a look at improving the documentation situation, though I certainly wouldn't complain if someone who has it all working already were to help.. Thanks, Stephen
On Sun, Feb 6, 2011 at 23:31, Stephen Frost <sfrost@snowman.net> wrote: > * Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: >> I think we need to improve postgresql.conf.sample a bit more, especially >> the long line for #log_csv_fields = '...'. 330 characters in it! >> #1. Leave the long line because it is needed. > It's needed to match what the current default is.. I agree that it's logically good design, but we could not accept it as long as it breaks tools in the real world... Will it break "SHOW ALL" and "SELECT * FROM pg_settings" output? I'm worried that they are not designed to display such a long value. >> I think it depends default log filename, that contains %S (seconds) >> suffix. We can remove %S from log_filename; if we use a log per-day, >> those log might contain different columns even after restart. If we >> cannot avoid zigged csv fields completely, SIGHUP seems reasonable for it. > > This problem is bigger than SIGHUP, but at least with a restart > required, the default will work properly. The default configuration > wouldn't work w/ a change to the log line and a SIGHUP done. "Only works with the default settings" is just wrong design. If we cannot provide a perfect solution, we should allow users to control everything as they like. I still think PGC_SIGHUP is the best mode for the parameter, with a note of caution in the docs. -- Itagaki Takahiro
* Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > I agree that it's logically good design, but we could not accept it > as long as it breaks tools in the real world... If it does, I think it's pretty clear that those tools are themselves broken.. > Will it break "SHOW ALL" and "SELECT * FROM pg_settings" output? > I'm worried that they are not designed to display such a long value. It certainly won't break those commands in PostgreSQL. If tools made assumptions about how long a string could be that don't match PG's understand, those tools are broken. This also isn't the only string and such tools could be broken by, eg, setting log_prefix to a very long value (entirely possible to do..). > >> I think it depends default log filename, that contains %S (seconds) > >> suffix. We can remove %S from log_filename; if we use a log per-day, > >> those log might contain different columns even after restart. If we > >> cannot avoid zigged csv fields completely, SIGHUP seems reasonable for it. > > > > This problem is bigger than SIGHUP, but at least with a restart > > required, the default will work properly. The default configuration > > wouldn't work w/ a change to the log line and a SIGHUP done. > > "Only works with the default settings" is just wrong design. It's also not anywhere close to what I said. If you have a suggestion about how to fix it that's reasonable, please suggest it. Suggesting that "because we could, given a complicated enough configuration, create jagged files anyway, we should encourage it to happen by allowing the change on SIGHUP" isn't a solution to the problem and will just create more problems. It will certainly work just fine with more than just the default settings, but, yes, there are some settings which would cause PG to create jagged CSV files. If we keep it as requiring restart and then automatically move or truncate log files which are in the way on that restart, or decide to not start at all, we could make it so PG doesn't create a jagged CSV file. I don't particularly care for any of those options. At least one of those would be a solution, however none of those include allowing it on SIGHUP, which is what you're advocating. > If we cannot provide a perfect solution, we should allow users to > control everything as they like. I still think PGC_SIGHUP is the > best mode for the parameter, with a note of caution in the docs. Doing it on a SIGHUP would be *guaranteed* to create jagged CSV files which then couldn't be loaded into PG. I don't agree with this and the impression I got from Tom and Andrew was that they agreed to not do it on SIGHUP. It's unfortuante that we seem to be at an impasse here, but of everyone voicing opinions on it so far, it would appear that you're in the minority. Thanks, Stephen
* Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > Could you try to "make html" in the doc directory? Yeah, doesn't seem to work for me (missing '/bin/collateindex.pl', apparently..). > Your new decumentation after > | These columns may be included in the CSV output: > will be unaligned plain text without some tags. Ok, I've cleaned up that part of the documentation to be a table instead of the listings that were there, seems like a better approach anyway. Updated patch attached, which has also been rebased against HEAD. Thanks! Stephen commit d8dddd1c425a4c320540769084ceeb7d23bc3662 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 14:02:05 2011 -0500 Change log_csv_options listing to a table This patch changes the listing of field options available to log_csv_options into a table, which will hopefully both look better and be clearer. commit f9851cdfaeb931f01c015f5651b72d16957c7114 Merge: 3e71e33 5ed45ac Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 13:26:17 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit 3e71e338a2b9352d730f59a989027e33d99bea50 Author: Stephen Frost <sfrost@snowman.net> Date: Fri Jan 28 22:44:33 2011 -0500 Cleanup log_csv_options patch Clean up of various function declarations to hopefully be correct and clean and matching PG conventions. Also move TopMemoryContext usage to later, since the local variables don't need to be in TopMemoryContext. Lastly, ensure that a comma is not produced after the last CSV field, and that one is produced if application_name is not the last field. Review by Itagaki Takahiro, thanks! commit 1825def11badd661d219fa4c516f06e0ad423443 Merge: ff249ae 847e8c7 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 06:50:03 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit ff249aeac7216da623bf77840380d5e767f681fc Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 00:26:52 2011 -0500 Add log_csv_fields GUC for CSV output & curr_role This patch adds a new GUC called 'log_csv_fields'. This GUC allows the user to control the set of fields written to the CSV output as well as the order in which they are written. The default set of fields remains those that were included in 9.0, to avoid breaking existing user configurations. In passing, update 'user_name' for log_line_prefix and log_csv_fields to mean 'session user' (which could be reset by a superuser with set session authorization), and add a 'role_name' option (%U) to log_line_prefix and log_csv_fields, to allow users to log the current role (as set by SET ROLE- not impacted by SECURITY DEFINER functions).
Attachment
On Mon, Feb 7, 2011 at 04:10, Stephen Frost <sfrost@snowman.net> wrote: > Yeah, doesn't seem to work for me (missing '/bin/collateindex.pl', > apparently..). You might need "yum install openjade stylesheets" or similar packages and re-"configure". > Ok, I've cleaned up that part of the documentation to be a table instead > of the listings that were there, seems like a better approach anyway. Yeah, that's a good job! >> I agree that it's logically good design, but we could not accept it >> as long as it breaks tools in the real world... > If it does, I think it's pretty clear that those tools are themselves > broken.. The word "break" was my wrong choice, but your new parameter still requires very wide monitors to display SHOW ALL and pg_settings. I'd like to solve the issue even though the feature itself is useful. One fast and snappy solution might be to set the default value to "default", that means the compatible set of columns. Other better ideas? For implementation, write_csvlog() has many following lines:if (curr_field != num_fields) appendStringInfoChar(&buf, ','); It will be cleaner if we add first_col flag and move it out of the switch statement. Other questions I raised before might be matters of preference. I'd like to here about them form third person.* name: log_csv_fields vs. csvlog_fields* when to assign: PGC_POSTMASTER vs.PGC_SIGHUP -- Itagaki Takahiro
On Thu, Feb 10, 2011 at 06:56:15PM +0900, Itagaki Takahiro wrote: > On Mon, Feb 7, 2011 at 04:10, Stephen Frost <sfrost@snowman.net> wrote: > >> I agree that it's logically good design, but we could not accept it > >> as long as it breaks tools in the real world... > > If it does, I think it's pretty clear that those tools are themselves > > broken.. > > The word "break" was my wrong choice, but your new parameter still > requires very wide monitors to display SHOW ALL and pg_settings. > I'd like to solve the issue even though the feature itself is useful. > One fast and snappy solution might be to set the default value to > "default", that means the compatible set of columns. > Other better ideas? If some tool barfs on a 330-byte GUC value, we might as well have that tool barf early and often, not just on non-default values. FWIW, a 330 byte boot_val doesn't seem like a big deal to me. If it were over _POSIX2_LINE_MAX (2048), that might be another matter. > Other questions I raised before might be matters of preference. > I'd like to here about them form third person. > * name: log_csv_fields vs. csvlog_fields +1 for csvlog_fields. We have the precedent of syslog_* and that log_* are all applicable to more than one log destination. > * when to assign: PGC_POSTMASTER vs. PGC_SIGHUP +1 for PGC_SIGHUP. PGC_POSTMASTER is mostly for things where we have not implemented code to instigate the change after startup (usually because the difficulty/value ratio of doing so is too high). There's no such problem here, merely the risk that the DBA might not be prepared to deal with a column list change mid-logfile. If anything, let's have the documentation mention pg_rotate_logfile() as potentially useful in conjunction. nm
On Thu, Feb 10, 2011 at 6:27 AM, Noah Misch <noah@leadboat.com> wrote: > FWIW, a 330 byte boot_val doesn't seem like a big deal to me. If it were over > _POSIX2_LINE_MAX (2048), that might be another matter. I don't think it's entirely stupid to worry about this completely screwing up the output of "SHOW ALL" on people using 80-character terminal windows. I haven't checked, but if it renders the output totally unreadable then I think we should try to find an alternative that doesn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > I don't think it's entirely stupid to worry about this completely > screwing up the output of "SHOW ALL" on people using 80-character > terminal windows. I haven't checked, but if it renders the output > totally unreadable then I think we should try to find an alternative > that doesn't. Alright, so I looked into this a bit and have a couple of comments: show all; output, at least on my test rig, is *already* well over 80-characters wide. The longest GUC is max_predicate_locks_per_transaction, which forces the first column to be 38 columns. We then have some rather long descriptions (which are only shown on show all, no clue why that is..), the longest being "Sets whether XML data in implicit parsing and serialization operations is to be considered as documents or content fragments." (for xmloption). Now, it's true that the longest default *setting* w/ this patch is the list of CSV fields, but it's not like 'show all;' really works that well on an 80-character terminal today. The second longest setting, on my system, is the path to postgresql.conf: /data/sfrost/pgsql/test/data/postgresql.conf That, plus the length of max_predicate_locks_per_transaction, would make 'show all;' go beyond 80 characters even if we took out the description (but we don't currently support that..). This new option *would* make a query against an individual 'show <name>;' return, in the default configuration, a value longer than 80-characters, but we're just talking 1 row being returned there. My feeling is that this could be improved by supporting multi-line configuration settings, and then changing the longer descriptions to be multi-line, but that still wouldn't get us down to 80-characters due to the combination of max_predicate_locks_per_transaction and config_file. Renaming a configuration option isn't exactly a trivial thing to do either. :/ One thing that would be nice, but probably non-trivial, would be to allow "show all;" to be a subselect, so you could do things like, I dunno, pull the max length of certain columns. :) We could also have a 'show all;' which just returns the name and setting and then a 'show all verbose;' for including the description, or a 'show verbose <name>;' for getting all three fields when looking at a specific variable. All-in-all, I think we're past the point of being able to make show all; fit completely on an 80-character terminal, even in \x mode. I'd be willing to work on the multi-line stuff for 9.2, if people are really interested in it, but I don't think not having that should be a show-stopper for this patch, and I think that would be more likely to break client applications than this change.. Thanks, Stephen
>Stephen Frost <sfrost@snowman.net> wrote: > That, plus the length of max_predicate_locks_per_transaction, > would make 'show all;' go beyond 80 characters even if we took out > the description Should we abbreviate something there? max_pred_locks_per_tran, maybe? -Kevin
On Fri, Feb 11, 2011 at 10:34 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >>Stephen Frost <sfrost@snowman.net> wrote: > >> That, plus the length of max_predicate_locks_per_transaction, >> would make 'show all;' go beyond 80 characters even if we took out >> the description > > Should we abbreviate something there? max_pred_locks_per_tran, > maybe? If we're going to abbreviate transaction, I'd vote for txn over tran, but I think Stephen's point that this is already a lost cause may have some validity. Not sure what other people think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Fri, Feb 11, 2011 at 10:34 AM, Kevin Grittner > > Should we abbreviate something there? max_pred_locks_per_tran, > > maybe? > > If we're going to abbreviate transaction, I'd vote for txn over tran, > but I think Stephen's point that this is already a lost cause may have > some validity. Not sure what other people think. There's lots of other GUCs with "transaction" spelled out in them.. :/ Another option, which I don't like, would be to use 'default' by 'default', and build the list on the fly every time if that's what it is.. That would give no insight into what the list of fields is for someone though, they'd have to go back to the documentation to figure it out, and that sucks.. Thanks, Stephen
On Fri, Feb 11, 2011 at 10:52 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Fri, Feb 11, 2011 at 10:34 AM, Kevin Grittner >> > Should we abbreviate something there? max_pred_locks_per_tran, >> > maybe? >> >> If we're going to abbreviate transaction, I'd vote for txn over tran, >> but I think Stephen's point that this is already a lost cause may have >> some validity. Not sure what other people think. > > There's lots of other GUCs with "transaction" spelled out in them.. :/ > > Another option, which I don't like, would be to use 'default' by > 'default', and build the list on the fly every time if that's what it > is.. That would give no insight into what the list of fields is for > someone though, they'd have to go back to the documentation to figure > it out, and that sucks.. Yeah. The root cause of this problem is that the way psql handles tabular output with a few very wide rows stinks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > Yeah. The root cause of this problem is that the way psql handles > tabular output with a few very wide rows stinks. True, but it would be kinda nice to support multi-line configuration variables. I still vote for that being "not required to get this patch in", but it's certainly something we could do later. Of course, you do have to ask yourself what having \n's in log_line_prefix would mean in the config file.. Thanks, Stephen
* Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > On Mon, Feb 7, 2011 at 04:10, Stephen Frost <sfrost@snowman.net> wrote: > > Yeah, doesn't seem to work for me (missing '/bin/collateindex.pl', > > apparently..). > > You might need "yum install openjade stylesheets" or similar packages > and re-"configure". I've got openjade, etc, installed, but I'm on Debian and it doesn't appear to include that collateindex.pl anywhere.. > For implementation, write_csvlog() has many following lines: > if (curr_field != num_fields) appendStringInfoChar(&buf, ','); > It will be cleaner if we add first_col flag and move it out of > the switch statement. Done. > Other questions I raised before might be matters of preference. > I'd like to here about them form third person. > * name: log_csv_fields vs. csvlog_fields Done. > * when to assign: PGC_POSTMASTER vs. PGC_SIGHUP I'm still in the PGC_POSTMASTER camp on this and I really think it's a more complicated change than just changing that value in the code, even if we all agreed it should be allowed on SIGHUP (which certainly isn't the case anyway..). In the end, if we really want that, we can always add it in the future. Updated patch attached, full git log below. Thanks, Stephen commit 6bd2b9f1d2bc3b166a3e5598ee590e25159c61a5 Author: Stephen Frost <sfrost@snowman.net> Date: Fri Feb 11 11:16:17 2011 -0500 Rename log_csv_fields GUC to csvlog_fields This patch renames the log_csv_fileds GUC to csvlog_fields, to better match the other csvlog_* options. Also cleaned up the CSV generation code a bit by moving the comma-adding code out of the switch() statement. commit a281ca611e6181339e92b488c815e0cb8c1298d2 Merge: d8dddd1 183d3cf Author: Stephen Frost <sfrost@snowman.net> Date: Fri Feb 11 08:37:27 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit d8dddd1c425a4c320540769084ceeb7d23bc3662 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 14:02:05 2011 -0500 Change log_csv_options listing to a table This patch changes the listing of field options available to log_csv_options into a table, which will hopefully both look better and be clearer. commit f9851cdfaeb931f01c015f5651b72d16957c7114 Merge: 3e71e33 5ed45ac Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 13:26:17 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit 3e71e338a2b9352d730f59a989027e33d99bea50 Author: Stephen Frost <sfrost@snowman.net> Date: Fri Jan 28 22:44:33 2011 -0500 Cleanup log_csv_options patch Clean up of various function declarations to hopefully be correct and clean and matching PG conventions. Also move TopMemoryContext usage to later, since the local variables don't need to be in TopMemoryContext. Lastly, ensure that a comma is not produced after the last CSV field, and that one is produced if application_name is not the last field. Review by Itagaki Takahiro, thanks! commit 1825def11badd661d219fa4c516f06e0ad423443 Merge: ff249ae 847e8c7 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 06:50:03 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit ff249aeac7216da623bf77840380d5e767f681fc Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 00:26:52 2011 -0500 Add log_csv_fields GUC for CSV output & curr_role This patch adds a new GUC called 'log_csv_fields'. This GUC allows the user to control the set of fields written to the CSV output as well as the order in which they are written. The default set of fields remains those that were included in 9.0, to avoid breaking existing user configurations. In passing, update 'user_name' for log_line_prefix and log_csv_fields to mean 'session user' (which could be reset by a superuser with set session authorization), and add a 'role_name' option (%U) to log_line_prefix and log_csv_fields, to allow users to log the current role (as set by SET ROLE- not impacted by SECURITY DEFINER functions).
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Feb 11, 2011 at 10:34 AM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: >> Should we abbreviate something there? �max_pred_locks_per_tran, >> maybe? > If we're going to abbreviate transaction, I'd vote for txn over tran, > but I think Stephen's point that this is already a lost cause may have > some validity. Not sure what other people think. Aren't we already using "xact" for that purpose in some user-visible places? But personally I'd be happy with "max_pred_locks_per_transaction" which gets the worst case down without being too obviously at variance with "max_locks_per_transaction". regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > If we're going to abbreviate transaction, I'd vote for txn over tran, > > but I think Stephen's point that this is already a lost cause may have > > some validity. Not sure what other people think. I agree w/ reducing that particular GUC a bit in size, but just to make it clear- that doesn't even come close to solving or fixing the 80-character terminal issue wrt 'show all;'... > Aren't we already using "xact" for that purpose in some user-visible > places? But personally I'd be happy with "max_pred_locks_per_transaction" > which gets the worst case down without being too obviously at variance > with "max_locks_per_transaction". Sounds good to me. The header length for show all would drop to only 206 characters (or so) with that change. If we offered a 'show all;' which didn't include 'description' and didn't have any settings longer than about 46 characters, *then* it'd fit on an 80-char terminal. Of course, if we had multi-line GUC support, we could put each field on a new line and each of those is well under 46 characters.. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: >> You might need "yum install openjade stylesheets" or similar packages >> and re-"configure". > I've got openjade, etc, installed, but I'm on Debian and it doesn't > appear to include that collateindex.pl anywhere.. FWIW, on Fedora 13 I see $ which collateindex.pl /usr/bin/collateindex.pl $ rpm -qf /usr/bin/collateindex.pl docbook-style-dsssl-1.79-11.fc13.noarch $ rpm -qi docbook-style-dsssl Name : docbook-style-dsssl Relocations: (not relocatable) Version : 1.79 Vendor: Fedora Project Release : 11.fc13 Build Date: Mon Jun 7 10:06:48 2010 Install Date: Fri Oct 1 10:07:37 2010 Build Host: x86-07.phx2.fedoraproject.org Group : Applications/Text Source RPM: docbook-style-dsssl-1.79-11.fc13.src.rpm Size : 2308505 License: Copyright only Signature : RSA/SHA256, Mon Jun 7 10:34:27 2010, Key ID 7edc6ad6e8e40fde Packager : Fedora Project URL : http://docbook.sourceforge.net/ Summary : Norman Walsh's modular stylesheets for DocBook Description : These DSSSL stylesheets allow to convert any DocBook document to another printed (for example, RTF or PostScript) or online (for example, HTML) format. They are highly customizable. regards, tom lane
Excerpts from Tom Lane's message of vie feb 11 13:49:33 -0300 2011: > Stephen Frost <sfrost@snowman.net> writes: > > * Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > >> You might need "yum install openjade stylesheets" or similar packages > >> and re-"configure". > > > I've got openjade, etc, installed, but I'm on Debian and it doesn't > > appear to include that collateindex.pl anywhere.. $ apt-file search collateindex.pl docbook-dsssl: /usr/bin/collateindex.pl docbook-dsssl: /usr/share/man/man1/collateindex.pl.1.gz -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > $ which collateindex.pl > /usr/bin/collateindex.pl > $ rpm -qf /usr/bin/collateindex.pl > docbook-style-dsssl-1.79-11.fc13.noarch Ah-hah, thanks for that! Apparently on Debian it's docbook-dsssl that contains it, and yes, it gets installed into /usr/bin (not /bin, should have figured that..). I'll give building the docs another shot. Thanks again, Stephen
Stephen Frost <sfrost@snowman.net> wrote: > Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I'd be happy with "max_pred_locks_per_transaction" which gets the >> worst case down without being too obviously at variance with >> "max_locks_per_transaction". > > Sounds good to me. The header length for show all would drop to > only 206 characters (or so) with that change. Patch attached. -Kevin
Attachment
I wrote: > Patch attached. This time with src/backend/utils/misc/postgresql.conf.sample fixed. -Kevin
Attachment
On Fri, Feb 11, 2011 at 11:23 AM, Stephen Frost <sfrost@snowman.net> wrote: > Updated patch attached, full git log below. The documentation for csvlog_fields should probably use <literal> around the default value. The sentence that begins "For details on what these fields are" should hyperlink the referenced sections of the documentation. The function prototype you added to elog.c is misformatted - the type should be on the line preceding the function name only for the definition, not for prototypes. The code for log_line_prefix = 'u' is indented wrong. Also, an else clause that has only an if hanging off of it can be turned into an "else if" for better readability. This part kind of concerns me: + * This is more of a 'safety valve' than anything else, + * since GUC processing really should happen before we do any error logging. + * We might even want to change this eventually to just not log CSV format logs + * if this ever happens, to avoid a discrepency in the CSV log file which would + * make it difficult to load into PG. I'm not really convinced that making the CSV log format variable is a good thing. One of the reasons we added that format in the first place is to make sure that we could generate log output in an easily parseable format, and this seems like a big step backwards for not much, especially if we can't even guarantee that we're not going to inject random differently-formatted lines during startup. Stylistically, build_default_csvlog_list and assign_csvlog_fields ought to be loops driven off an array, rather than hand-coded. I think this was discussed before, and I hate to remention it, but would it make sense to reuse the single-character codes from log_line_prefix rather than inventing new codes for the same fields? It would be awfully nice if we could inject something into the csvlog output format that would let client programs know which fields are present. This would be especially useful for people writing tools that are intended to work on ANY PG installation, rather than just, say, their own. I'm not sure if there's a clean way to do that, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > The documentation for csvlog_fields should probably use <literal> > around the default value. > > The sentence that begins "For details on what these fields are" should > hyperlink the referenced sections of the documentation. > > The function prototype you added to elog.c is misformatted - the type > should be on the line preceding the function name only for the > definition, not for prototypes. > > The code for log_line_prefix = 'u' is indented wrong. Also, an else > clause that has only an if hanging off of it can be turned into an > "else if" for better readability. Will fix. > This part kind of concerns me: > > + * This is more of a 'safety valve' than anything else, > + * since GUC processing really should happen before we do any error logging. > + * We might even want to change this eventually to just not log CSV format logs > + * if this ever happens, to avoid a discrepency in the CSV log file which would > + * make it difficult to load into PG. > > I'm not really convinced that making the CSV log format variable is a > good thing. One of the reasons we added that format in the first > place is to make sure that we could generate log output in an easily > parseable format, and this seems like a big step backwards for not > much, especially if we can't even guarantee that we're not going to > inject random differently-formatted lines during startup. I couldn't make it actually produce incorrect lines. I was worried about the possibility, but I don't think it's actually possible because postgresql.conf needs to be parsed and GUC handling has to work before we will even start trying to do CSV logging. I'll rework the comment. > Stylistically, build_default_csvlog_list and assign_csvlog_fields > ought to be loops driven off an array, rather than hand-coded. Sure, will fix. > I think this was discussed before, and I hate to remention it, but > would it make sense to reuse the single-character codes from > log_line_prefix rather than inventing new codes for the same fields? As I recall, Tom didn't like that idea and neither did I- there's only so many letters.. It also sucks wrt being self-documenting. I'd much rather support multi-line GUCs.. > It would be awfully nice if we could inject something into the csvlog > output format that would let client programs know which fields are > present. This would be especially useful for people writing tools > that are intended to work on ANY PG installation, rather than just, > say, their own. I'm not sure if there's a clean way to do that, > though. This would be called a 'header' in most typical CSV scenarios. Unfortunately, last I checked (maybe it's changed?), COPY w/ HEADER just throws the header away instead of doing anything useful with it. If it actually used the header to build the column list, then adding a header would be sufficient, provided all the necessary fields are in the table. If I wanted something to throw away the first record of a file before loading it, I'd use tail. I'll see about adding an option to have it output a header. Thanks, Stephen
On 02/13/2011 08:26 AM, Stephen Frost wrote: > This would be called a 'header' in most typical CSV scenarios. > Unfortunately, last I checked (maybe it's changed?), COPY w/ HEADER just > throws the header away instead of doing anything useful with it. If it > actually used the header to build the column list, then adding a header > would be sufficient, provided all the necessary fields are in the table. See the discussion back around the the 8.1 release (IIRC) when we added the HEADER option. > If I wanted something to throw away the first record of a file before > loading it, I'd use tail. > The whole point of us having direct CSV import is to minimise the requirement for preprocessing. That said, I think there's probably a good case for an option to use the header line as a column list. I know of at least one application I have written that could benefit from it. But that's work for 9.2 or later. cheers andrew
Robert, * Robert Haas (robertmhaas@gmail.com) wrote: > On Fri, Feb 11, 2011 at 11:23 AM, Stephen Frost <sfrost@snowman.net> wrote: > > Updated patch attached, full git log below. Thanks again for the review. Updated patch attached. > The documentation for csvlog_fields should probably use <literal> > around the default value. Fixed. > The sentence that begins "For details on what these fields are" should > hyperlink the referenced sections of the documentation. Fixed. > The function prototype you added to elog.c is misformatted - the type > should be on the line preceding the function name only for the > definition, not for prototypes. Fixed. > The code for log_line_prefix = 'u' is indented wrong. Also, an else > clause that has only an if hanging off of it can be turned into an > "else if" for better readability. Fixed. > This part kind of concerns me: > > + * This is more of a 'safety valve' than anything else, > + * since GUC processing really should happen before we do any error logging. > + * We might even want to change this eventually to just not log CSV format logs > + * if this ever happens, to avoid a discrepency in the CSV log file which would > + * make it difficult to load into PG. > > I'm not really convinced that making the CSV log format variable is a > good thing. One of the reasons we added that format in the first > place is to make sure that we could generate log output in an easily > parseable format, and this seems like a big step backwards for not > much, especially if we can't even guarantee that we're not going to > inject random differently-formatted lines during startup. Comment, function, and whole issue removed. > Stylistically, build_default_csvlog_list and assign_csvlog_fields > ought to be loops driven off an array, rather than hand-coded. Done, added CSVFieldNames and modiffied assign_csvlog_fileds to use it (build_default_csvlog_list was removed). > I think this was discussed before, and I hate to remention it, but > would it make sense to reuse the single-character codes from > log_line_prefix rather than inventing new codes for the same fields? I'd rather ditch the log_line_prefix idea of single-letter codes since it's never going to scale. Perhaps making log_line_prefix with work %csvlog_name instead of just the %<single-letter> codes might work. I don't see a solution which doesn't involve changing log_line_prefix though, in any case. > It would be awfully nice if we could inject something into the csvlog > output format that would let client programs know which fields are > present. This would be especially useful for people writing tools > that are intended to work on ANY PG installation, rather than just, > say, their own. I'm not sure if there's a clean way to do that, > though. Added csvlog_header GUC to allow the user to ask for the header to be printed at the top of each log file. If and when an option is added to PG's COPY to respect the header, this should resolve that issue. Also updated to HEAD. Full git log below, patch attached. Thanks, Stephen commit 592c256ffff4ffde77fc29ff28fdedd2c9f2dafd Merge: 33639eb cebbaa1 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 13 21:11:44 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit 33639ebfe67b0dd58a0a89161e9f0d5237830ed4 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 13 21:08:08 2011 -0500 Add csvlog_header GUC, other cleanup This patch adds a csvlog_header option which will start each CSV log file with a header which matches the GUC (and hence the format of the CSV log file generated). Numerous other whitespace clean-ups, removed build_default_csvlog_list(), since it wasn't actually necessary or useful. Added an array which lists the text strings of the various CSVLOG options to simplify assign_csvlog_fields(). commit 6bd2b9f1d2bc3b166a3e5598ee590e25159c61a5 Author: Stephen Frost <sfrost@snowman.net> Date: Fri Feb 11 11:16:17 2011 -0500 Rename log_csv_fields GUC to csvlog_fields This patch renames the log_csv_fileds GUC to csvlog_fields, to better match the other csvlog_* options. Also cleaned up the CSV generation code a bit by moving the comma-adding code out of the switch() statement. commit a281ca611e6181339e92b488c815e0cb8c1298d2 Merge: d8dddd1 183d3cf Author: Stephen Frost <sfrost@snowman.net> Date: Fri Feb 11 08:37:27 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit d8dddd1c425a4c320540769084ceeb7d23bc3662 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 14:02:05 2011 -0500 Change log_csv_options listing to a table This patch changes the listing of field options available to log_csv_options into a table, which will hopefully both look better and be clearer. commit f9851cdfaeb931f01c015f5651b72d16957c7114 Merge: 3e71e33 5ed45ac Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 13:26:17 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit 3e71e338a2b9352d730f59a989027e33d99bea50 Author: Stephen Frost <sfrost@snowman.net> Date: Fri Jan 28 22:44:33 2011 -0500 Cleanup log_csv_options patch Clean up of various function declarations to hopefully be correct and clean and matching PG conventions. Also move TopMemoryContext usage to later, since the local variables don't need to be in TopMemoryContext. Lastly, ensure that a comma is not produced after the last CSV field, and that one is produced if application_name is not the last field. Review by Itagaki Takahiro, thanks! commit 1825def11badd661d219fa4c516f06e0ad423443 Merge: ff249ae 847e8c7 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 06:50:03 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit ff249aeac7216da623bf77840380d5e767f681fc Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 00:26:52 2011 -0500 Add log_csv_fields GUC for CSV output & curr_role This patch adds a new GUC called 'log_csv_fields'. This GUC allows the user to control the set of fields written to the CSV output as well as the order in which they are written. The default set of fields remains those that were included in 9.0, to avoid breaking existing user configurations. In passing, update 'user_name' for log_line_prefix and log_csv_fields to mean 'session user' (which could be reset by a superuser with set session authorization), and add a 'role_name' option (%U) to log_line_prefix and log_csv_fields, to allow users to log the current role (as set by SET ROLE- not impacted by SECURITY DEFINER functions).
Attachment
Andrew, * Andrew Dunstan (andrew@dunslane.net) wrote: > See the discussion back around the the 8.1 release (IIRC) when we > added the HEADER option. I recall seeing it and not agreeing with it then. :) > >If I wanted something to throw away the first record of a file before > >loading it, I'd use tail. > > The whole point of us having direct CSV import is to minimise the > requirement for preprocessing. Having a 'throw away 1 line' option is just silly to me, but documenting it as "header" is worse. Water under the bridge at this point though. > That said, I think there's probably a good case for an option to use > the header line as a column list. I know of at least one application > I have written that could benefit from it. But that's work for 9.2 > or later. I agree it's work for 9.2. I could probably help with this if people agree that it should be done. Thanks, Stephen
On Mon, Feb 14, 2011 at 11:51, Stephen Frost <sfrost@snowman.net> wrote: >> It would be awfully nice if we could inject something into the csvlog >> output format that would let client programs know which fields are >> present. This would be especially useful for people writing tools >> that are intended to work on ANY PG installation, rather than just, >> say, their own. I'm not sure if there's a clean way to do that, >> though. > > Added csvlog_header GUC to allow the user to ask for the header to be > printed at the top of each log file. If and when an option is added to > PG's COPY to respect the header, this should resolve that issue. We need to design csvlog_header more carefully. csvlog_header won't work if log_filename is low-resolution, ex. log-per-day. It's still useful when a DBA reads the file manually, but documentation would better. Or, should we skip writing headers when the open log file is not empty? It works unless when csvlog_fields is modified before restart, and also on logger's crash and restart, though it's a rare case. A few comments and trivial issues: * It might be my misunderstanding, but there was a short description for %U for in log_line_prefix in postgresql.conf, right? Did you remove it in the latest version? * The long csvlog_fields default is a problem also in postgresql.conf, but we have no solution for the issue... The current code is the best for now. * In assign_csvlog_fields(), we need to cleanup memory and memory context before return on error. + /* check if no option matched, and if so, return error */ + if (curr_option == MAX_CSVLOG_OPTS) + return NULL; * An added needless "return" should be removed. *** 2139,2144 **** write_csvlog(ErrorData *edata) --- 2379,2386 ---- write_pipe_chunks(buf.data, buf.len, LOG_DESTINATION_CSVLOG); pfree(buf.data); + + return; } /* -- Itagaki Takahiro
Itagaki, * Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > We need to design csvlog_header more carefully. csvlog_header won't work > if log_filename is low-resolution, ex. log-per-day. This isn't any different a problem to the issue of someone changing the csvlog_fields GUC but not checking if the log file already exists on restart. I've suggested a number of different options, but none of them are terribly good, and I havn't heard anyone supporting any solution to this issue. > It's still useful when > a DBA reads the file manually, but documentation would better. Eh? If you mean that we should add documentation to make users aware of the possible issue of changing these values without making sure the log file gets rotated- sure, I'd be happy to do that. > Or, should we skip writing headers when the open log file is not > empty? This doesn't help the csvlog_fields issue, unfortunately. I don't think it'd be hard to implement this to help with the header issue, I'm just not sure if it makes sense to do so when the actual list of fields could change... > * It might be my misunderstanding, but there was a short description for %U > for in log_line_prefix in postgresql.conf, right? Did you remove it in the > latest version? No, and I don't see where I ever added it.. I've fixed it. > * In assign_csvlog_fields(), we need to cleanup memory and memory context > before return on error. > + /* check if no option matched, and if so, return error */ > + if (curr_option == MAX_CSVLOG_OPTS) > + return NULL; Fixed this and a couple of similar issues. > * An added needless "return" should be removed. Meh, I like explicit returns, but since it generated a hunk all by itself, I'll clear it out. Updated patch attached, git log below. Thanks, Stephen commit 304e35ebb74f68da69163ed9dd1dd453b67181e7 Author: Stephen Frost <sfrost@snowman.net> Date: Mon Feb 14 09:26:03 2011 -0500 csvlog_fields: fix leak, other cleanup Fix a couple of potential memory leaks in assign_csvlog_fields(). Also added a few comments, removed an extra 'return;', and added %U to the sample postgresql.conf. commit 592c256ffff4ffde77fc29ff28fdedd2c9f2dafd Merge: 33639eb cebbaa1 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 13 21:11:44 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit 33639ebfe67b0dd58a0a89161e9f0d5237830ed4 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 13 21:08:08 2011 -0500 Add csvlog_header GUC, other cleanup This patch adds a csvlog_header option which will start each CSV log file with a header which matches the GUC (and hence the format of the CSV log file generated). Numerous other whitespace clean-ups, removed build_default_csvlog_list(), since it wasn't actually necessary or useful. Added an array which lists the text strings of the various CSVLOG options to simplify assign_csvlog_fields(). commit 6bd2b9f1d2bc3b166a3e5598ee590e25159c61a5 Author: Stephen Frost <sfrost@snowman.net> Date: Fri Feb 11 11:16:17 2011 -0500 Rename log_csv_fields GUC to csvlog_fields This patch renames the log_csv_fileds GUC to csvlog_fields, to better match the other csvlog_* options. Also cleaned up the CSV generation code a bit by moving the comma-adding code out of the switch() statement. commit a281ca611e6181339e92b488c815e0cb8c1298d2 Merge: d8dddd1 183d3cf Author: Stephen Frost <sfrost@snowman.net> Date: Fri Feb 11 08:37:27 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit d8dddd1c425a4c320540769084ceeb7d23bc3662 Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 14:02:05 2011 -0500 Change log_csv_options listing to a table This patch changes the listing of field options available to log_csv_options into a table, which will hopefully both look better and be clearer. commit f9851cdfaeb931f01c015f5651b72d16957c7114 Merge: 3e71e33 5ed45ac Author: Stephen Frost <sfrost@snowman.net> Date: Sun Feb 6 13:26:17 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit 3e71e338a2b9352d730f59a989027e33d99bea50 Author: Stephen Frost <sfrost@snowman.net> Date: Fri Jan 28 22:44:33 2011 -0500 Cleanup log_csv_options patch Clean up of various function declarations to hopefully be correct and clean and matching PG conventions. Also move TopMemoryContext usage to later, since the local variables don't need to be in TopMemoryContext. Lastly, ensure that a comma is not produced after the last CSV field, and that one is produced if application_name is not the last field. Review by Itagaki Takahiro, thanks! commit 1825def11badd661d219fa4c516f06e0ad423443 Merge: ff249ae 847e8c7 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 06:50:03 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_csv_options commit ff249aeac7216da623bf77840380d5e767f681fc Author: Stephen Frost <sfrost@snowman.net> Date: Wed Jan 19 00:26:52 2011 -0500 Add log_csv_fields GUC for CSV output & curr_role This patch adds a new GUC called 'log_csv_fields'. This GUC allows the user to control the set of fields written to the CSV output as well as the order in which they are written. The default set of fields remains those that were included in 9.0, to avoid breaking existing user configurations. In passing, update 'user_name' for log_line_prefix and log_csv_fields to mean 'session user' (which could be reset by a superuser with set session authorization), and add a 'role_name' option (%U) to log_line_prefix and log_csv_fields, to allow users to log the current role (as set by SET ROLE- not impacted by SECURITY DEFINER functions).
Attachment
On Mon, Feb 14, 2011 at 23:30, Stephen Frost <sfrost@snowman.net> wrote: > > * In assign_csvlog_fields(), we need to cleanup memory and memory context > > before return on error. > Fixed this and a couple of similar issues. Not yet fixed. Switched memory context is not restored on error. > Updated patch attached, git log below. Now I mark the patch to "Ready for Committer", because I don't have suggestions any more. For reference, I note my previous questions. Some of them might be TODO items, or might not. We can add the basic feature in 9.1, and improve it 9.2 or later versions. * csvlog_fields and csvlog_header won't work with non-default log_filename when it doesn't contain seconds in the format.They expect they can always open empty log files. * The long default value for csvlog_fields leads long text line in postgresql.conf, SHOW ALL, pg_settings view, but therewere no better alternative solutions in the past discussion. * csvlog_fields is marked as PGC_POSTMASTER. It can protect mixed formats in a csv file on default log_filename, but othersimilar GUC variables are usually marked AS PGC_SIGHUP. -- Itagaki Takahiro
On Tue, Feb 15, 2011 at 12:46 AM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Mon, Feb 14, 2011 at 23:30, Stephen Frost <sfrost@snowman.net> wrote: >> > * In assign_csvlog_fields(), we need to cleanup memory and memory context >> > before return on error. >> Fixed this and a couple of similar issues. > > Not yet fixed. Switched memory context is not restored on error. > >> Updated patch attached, git log below. > > Now I mark the patch to "Ready for Committer", > because I don't have suggestions any more. > > For reference, I note my previous questions. Some of them might be TODO > items, or might not. We can add the basic feature in 9.1, and improve it > 9.2 or later versions. > > * csvlog_fields and csvlog_header won't work with non-default log_filename > when it doesn't contain seconds in the format. They expect they can always > open empty log files. > > * The long default value for csvlog_fields leads long text line in > postgresql.conf, SHOW ALL, pg_settings view, but there were no better > alternative solutions in the past discussion. > > * csvlog_fields is marked as PGC_POSTMASTER. It can protect mixed formats > in a csv file on default log_filename, but other similar GUC variables > are usually marked AS PGC_SIGHUP. I think we should push this whole patch out to 9.2. It seems to me that there are significant unresolved design issues here which need more time and thought than we can realistically give them now. In addition to the above, there is the problem of making the data self-identifying, which I think is really, really important for third-party tools. I am not keen to push a half-baked solution into the tree now that we will have to live with for years. The payoff (getting %U) seems quite out of proportion to the potential downsides of making a change of this type at this late date. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Feb 11, 2011 at 6:20 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > I wrote: >>> Patch attached. > > This time with src/backend/utils/misc/postgresql.conf.sample fixed. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > The payoff > (getting %U) seems quite out of proportion to the potential downsides > of making a change of this type at this late date. I'd be happy to go back to the original patch/idea of just the simple addition of %U as an option for log_line_prefix. I'd be quite frustrated to not have *any* way to log the current role in 9.1. I don't think anyone is going to be too bent out of shape that we can't do it with CSV initially, so long as we agree that we'll try and add that for 9.2. Pushing the CSV log changes to 9.2 would be fine with me, and I'd be happy to continue working on it and the GUC changes I was suggesting to address the long config line, and perhaps figure out a way to handle changes on SIGHUP. Thanks, Stephen
* Itagaki Takahiro (itagaki.takahiro@gmail.com) wrote: > On Mon, Feb 14, 2011 at 23:30, Stephen Frost <sfrost@snowman.net> wrote: > > > * In assign_csvlog_fields(), we need to cleanup memory and memory context > > > before return on error. > > Fixed this and a couple of similar issues. > > Not yet fixed. Switched memory context is not restored on error. Ugh, sorry about that, I should have realized that needed to be done. Updated patch attached. > > Updated patch attached, git log below. > > Now I mark the patch to "Ready for Committer", > because I don't have suggestions any more. Thanks. > For reference, I note my previous questions. Some of them might be TODO > items, or might not. We can add the basic feature in 9.1, and improve it > 9.2 or later versions. That's what I would have thought, ah well. :) > * csvlog_fields and csvlog_header won't work with non-default log_filename > when it doesn't contain seconds in the format. They expect they can always > open empty log files. Or that the user will deal with changes and header lines mid-file if they decide to use a log_filename which causes it to happen. > * The long default value for csvlog_fields leads long text line in > postgresql.conf, SHOW ALL, pg_settings view, but there were no better > alternative solutions in the past discussion. Not without making GUCs able to be multi-line. If people are interested in this, I'll try to make it happen for 9.2. > * csvlog_fields is marked as PGC_POSTMASTER. It can protect mixed formats > in a csv file on default log_filename, but other similar GUC variables > are usually marked AS PGC_SIGHUP. The problem here is primairly that each backend does write_csvlog() and there's no easy way to make sure that none of them write the new format to the old file (or the old format to the new file) before a switch is done. I can try looking into this but I'm concerned the only solution would be to introduce some amount of locking which could slow down the overall logging process which might be unacceptable performance-wise. Preventing CSV logs from appending to existing files could be pretty easily done, provided we can agree on what to call the new files, but that wouldn't change PGC_POSTMASTER. Thanks, Stephen
* Stephen Frost (sfrost@snowman.net) wrote: > Ugh, sorry about that, I should have realized that needed to be done. > Updated patch attached. Errr, for real this time. Thanks, Stephen commit 25e94dcb390f56502bc46e683b438c20d2dc74e0 Author: Stephen Frost <sfrost@snowman.net> Date: Tue Feb 15 08:50:17 2011 -0500 assign_csvlog_fields() - reset context on error On error, we need to make sure to reset the memory context back to what it was when we entered.
Attachment
* Stephen Frost (sfrost@snowman.net) wrote: > I'd be happy to go back to the original patch/idea of just the simple > addition of %U as an option for log_line_prefix. Updated patch attached which just adds %U support to log_line_prefix. Will work on adding CSV support for this in 9.2, along with associated other issues regarding supporting variable CSV format output. Thanks, Stephen commit c1b06c04af0c886c6ec27917368f3c674227ed2d Author: Stephen Frost <sfrost@snowman.net> Date: Tue Feb 15 10:21:38 2011 -0500 Add %U option to log_line_prefix This patch adds a %U option to log_line_prefix, to allow logging of the current role (previously not possible). Also reworks %u a bit and adds documentation to clarify what each means.
Attachment
On Tue, Feb 15, 2011 at 10:26 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Stephen Frost (sfrost@snowman.net) wrote: >> I'd be happy to go back to the original patch/idea of just the simple >> addition of %U as an option for log_line_prefix. > > Updated patch attached which just adds %U support to log_line_prefix. > Will work on adding CSV support for this in 9.2, along with associated > other issues regarding supporting variable CSV format output. Something along these lines would be OK with me (I haven't yet validated every detail), but there were previous objections to adding any new fields to log_line_prefix until we had a flexible CSV format. I think that's raising the bar a bit too high, personally, but I don't have the only vote around here... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Stephen Frost <sfrost@snowman.net> writes: > * Robert Haas (robertmhaas@gmail.com) wrote: >> The payoff >> (getting %U) seems quite out of proportion to the potential downsides >> of making a change of this type at this late date. > I'd be happy to go back to the original patch/idea of just the simple > addition of %U as an option for log_line_prefix. I'd be quite > frustrated to not have *any* way to log the current role in 9.1. I > don't think anyone is going to be too bent out of shape that we can't do > it with CSV initially, so long as we agree that we'll try and add that > for 9.2. Given that this has been like this right along, I don't see why it's all that urgent to force a half-baked solution into 9.1. I'm also concerned that if we do do that, you'll lose motivation to work on cleaning it up for 9.2 ;-) regards, tom lane
On Tue, Feb 15, 2011 at 10:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Stephen Frost <sfrost@snowman.net> writes: >> * Robert Haas (robertmhaas@gmail.com) wrote: >>> The payoff >>> (getting %U) seems quite out of proportion to the potential downsides >>> of making a change of this type at this late date. > >> I'd be happy to go back to the original patch/idea of just the simple >> addition of %U as an option for log_line_prefix. I'd be quite >> frustrated to not have *any* way to log the current role in 9.1. I >> don't think anyone is going to be too bent out of shape that we can't do >> it with CSV initially, so long as we agree that we'll try and add that >> for 9.2. > > Given that this has been like this right along, I don't see why it's > all that urgent to force a half-baked solution into 9.1. I'm also > concerned that if we do do that, you'll lose motivation to work on > cleaning it up for 9.2 ;-) Trying to arm-twist people into working on A before we're willing to give them B doesn't necessary serve us very well. I'd rather leave the problem of making the CSV format more flexible to someone who is really motivated to work on *that problem*, whether that person ends up being Stephen or not. Just my $0.02. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Something along these lines would be OK with me (I haven't yet > validated every detail), but there were previous objections to adding > any new fields to log_line_prefix until we had a flexible CSV format. > I think that's raising the bar a bit too high, personally, but I don't > have the only vote around here... I think I was the one objecting. I don't necessarily say that we have to have a "flexible" CSV format, but I do say that facilities that are available in log_line_prefix and not in CSV logs are a bad thing. regards, tom lane
On Tue, Feb 15, 2011 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Something along these lines would be OK with me (I haven't yet >> validated every detail), but there were previous objections to adding >> any new fields to log_line_prefix until we had a flexible CSV format. >> I think that's raising the bar a bit too high, personally, but I don't >> have the only vote around here... > > I think I was the one objecting. I don't necessarily say that we have > to have a "flexible" CSV format, but I do say that facilities that are > available in log_line_prefix and not in CSV logs are a bad thing. Well, I guess the other option is to just add it to the format, full stop. But as someone pointed out previously, that's not a terribly scalable solution, but perhaps it could be judged adequate for this particular case. While I generally agree with the principal, I also wonder if it might be better to just add this field in log_line_prefix and wait for someone to complain about that as other than a theoretical matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tom, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Given that this has been like this right along, I don't see why it's > all that urgent to force a half-baked solution into 9.1. I'm also > concerned that if we do do that, you'll lose motivation to work on > cleaning it up for 9.2 ;-) The addition to log_line_prefix is hardly 'half-baked' as a solution, to that problem (I just pulled the hunks from the rest of the patch as they were completely independent, and tested them). I've also gone and added the csvlog_fields/csvlog_header patch to the 2011-Next commitfest. :P I've also already started looking at changing syslogger to have it figure out if it should be writing a header out or not. If we can decide what semantics we should have when the log file exists and we're not planning to rotate it on startup, it won't be hard for me to implement them (well, I hope). Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > Well, I guess the other option is to just add it to the format, full > stop. But as someone pointed out previously, that's not a terribly > scalable solution, but perhaps it could be judged adequate for this > particular case. Think I suggested that at one point. I'm all for doing that on a major version change like this one, but I think we already had some concerns about that on this thread (Andrew maybe?). > While I generally agree with the principal, I also wonder if it might > be better to just add this field in log_line_prefix and wait for > someone to complain about that as other than a theoretical matter. I might be working against myself, but I'll complain right now about the lack of any way to have a header on the CSV logs and that you don't get to control what fields are logged. That said, I'm not currently using them either, so my vote doesn't count for much. Of course, I'll also complain about the lack of any way to get PG to respect the header, forcing me to do fun things like: for file in *results*; doHEADER=`head -1 $file`sed -e 's:""::g' < $file | \ psql -d beac -h sauron -c \ "\copy my_table($HEADER) from STDIN with csv header" done on a regular basis. How forcing me to do that rather than asking someone else to use 'tail -n +2' makes sense is beyond me.. Thanks, Stephen
On Tue, Feb 15, 2011 at 11:13 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Well, I guess the other option is to just add it to the format, full >> stop. But as someone pointed out previously, that's not a terribly >> scalable solution, but perhaps it could be judged adequate for this >> particular case. > > Think I suggested that at one point. I'm all for doing that on a major > version change like this one, but I think we already had some concerns > about that on this thread (Andrew maybe?). > >> While I generally agree with the principal, I also wonder if it might >> be better to just add this field in log_line_prefix and wait for >> someone to complain about that as other than a theoretical matter. > > I might be working against myself, but I'll complain right now about the > lack of any way to have a header on the CSV logs and that you don't get > to control what fields are logged. That said, I'm not currently using > them either, so my vote doesn't count for much. Of course, I'll also > complain about the lack of any way to get PG to respect the header, > forcing me to do fun things like: > > for file in *results*; do > HEADER=`head -1 $file` > sed -e 's:""::g' < $file | \ > psql -d beac -h sauron -c \ > "\copy my_table ($HEADER) from STDIN with csv header" > done > > on a regular basis. How forcing me to do that rather than asking > someone else to use 'tail -n +2' makes sense is beyond me.. It's not an either/or proposition. We could certainly support header on/off/ignore, with the new extensible COPY syntax. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/15/2011 11:13 AM, Stephen Frost wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Well, I guess the other option is to just add it to the format, full >> stop. But as someone pointed out previously, that's not a terribly >> scalable solution, but perhaps it could be judged adequate for this >> particular case. > Think I suggested that at one point. I'm all for doing that on a major > version change like this one, but I think we already had some concerns > about that on this thread (Andrew maybe?). I could live with it for a release if I thought we had a clear path ahead, but I think there are some design issues that we need to think about before we start providing for header lines and variable formats in CSV logs, particularly w.r.t. log rotation etc. So I'm slightly nervous about going ahead with this right now. >> While I generally agree with the principal, I also wonder if it might >> be better to just add this field in log_line_prefix and wait for >> someone to complain about that as other than a theoretical matter. > I might be working against myself, but I'll complain right now about the > lack of any way to have a header on the CSV logs and that you don't get > to control what fields are logged. That said, I'm not currently using > them either, so my vote doesn't count for much. Of course, I'll also > complain about the lack of any way to get PG to respect the header, > forcing me to do fun things like: > > for file in *results*; do > HEADER=`head -1 $file` > sed -e 's:""::g'< $file | \ > psql -d beac -h sauron -c \ > "\copy my_table ($HEADER) from STDIN with csv header" > done > > on a regular basis. How forcing me to do that rather than asking > someone else to use 'tail -n +2' makes sense is beyond me.. > You don't really make your case any better by continuing this argument from years ago. I can tell you from experience that the CSV HEADER feature is distinctly useful as it is. If you want to add a mode that uses the header line as a column list on import, then make that case, and I'll support it in fact, but it's not an alternative to having the header ignored, which is a feature I would vigorously resist removing. (Incidentally, I think it won't be trivial - the COPY code expects to know the columns by the time it opens the file). cheers andrew
* Andrew Dunstan (andrew@dunslane.net) wrote: > On 02/15/2011 11:13 AM, Stephen Frost wrote: > >Think I suggested that at one point. I'm all for doing that on a major > >version change like this one, but I think we already had some concerns > >about that on this thread (Andrew maybe?). > > I could live with it for a release if I thought we had a clear path > ahead, but I think there are some design issues that we need to > think about before we start providing for header lines and variable > formats in CSV logs, particularly w.r.t. log rotation etc. So I'm > slightly nervous about going ahead with this right now. I believe the suggestion that Robert and I were talking about above was to just unilatterally change the CSV log file output format to include current_role. No header lines, no variable output format, etc. I do think we can make header lines and variable output work, if we can get agreement on what the semantics should be. > You don't really make your case any better by continuing this > argument from years ago. I can tell you from experience that the CSV > HEADER feature is distinctly useful as it is. If you want to add a > mode that uses the header line as a column list on import, then make > that case, and I'll support it in fact, but it's not an alternative > to having the header ignored, which is a feature I would vigorously > resist removing. I'm not really interested in removing it. I guess I have a vain hope that with arguing I'll convince someone to take up the mantle of implementing the 'use header' option. :) Not getting much traction though, so I expect I'll work on it this summer. > (Incidentally, I think it won't be trivial - the > COPY code expects to know the columns by the time it opens the > file). Thanks for that insight, I'll take a look at how things work and see if I can come up with a sensible proposal. Thanks, Stephen
On Tue, Feb 15, 2011 at 1:02 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Andrew Dunstan (andrew@dunslane.net) wrote: >> On 02/15/2011 11:13 AM, Stephen Frost wrote: >> >Think I suggested that at one point. I'm all for doing that on a major >> >version change like this one, but I think we already had some concerns >> >about that on this thread (Andrew maybe?). >> >> I could live with it for a release if I thought we had a clear path >> ahead, but I think there are some design issues that we need to >> think about before we start providing for header lines and variable >> formats in CSV logs, particularly w.r.t. log rotation etc. So I'm >> slightly nervous about going ahead with this right now. > > I believe the suggestion that Robert and I were talking about above was > to just unilatterally change the CSV log file output format to include > current_role. No header lines, no variable output format, etc. > > I do think we can make header lines and variable output work, if we can > get agreement on what the semantics should be. I think we're back to not having a consensus on a reasonable way to proceed here. Let's take this up again for 9.2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 02/16/2011 04:24 PM, Robert Haas wrote: > On Tue, Feb 15, 2011 at 1:02 PM, Stephen Frost<sfrost@snowman.net> wrote: >> * Andrew Dunstan (andrew@dunslane.net) wrote: >>> On 02/15/2011 11:13 AM, Stephen Frost wrote: >>>> Think I suggested that at one point. I'm all for doing that on a major >>>> version change like this one, but I think we already had some concerns >>>> about that on this thread (Andrew maybe?). >>> I could live with it for a release if I thought we had a clear path >>> ahead, but I think there are some design issues that we need to >>> think about before we start providing for header lines and variable >>> formats in CSV logs, particularly w.r.t. log rotation etc. So I'm >>> slightly nervous about going ahead with this right now. >> I believe the suggestion that Robert and I were talking about above was >> to just unilatterally change the CSV log file output format to include >> current_role. No header lines, no variable output format, etc. >> >> I do think we can make header lines and variable output work, if we can >> get agreement on what the semantics should be. > I think we're back to not having a consensus on a reasonable way to > proceed here. Let's take this up again for 9.2. > That's up to you. I can certainly live with what is suggested in Stephen's penultimate para above. cheers andrew
On Wed, Feb 16, 2011 at 5:55 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 02/16/2011 04:24 PM, Robert Haas wrote: >> >> On Tue, Feb 15, 2011 at 1:02 PM, Stephen Frost<sfrost@snowman.net> wrote: >>> >>> * Andrew Dunstan (andrew@dunslane.net) wrote: >>>> >>>> On 02/15/2011 11:13 AM, Stephen Frost wrote: >>>>> >>>>> Think I suggested that at one point. I'm all for doing that on a major >>>>> version change like this one, but I think we already had some concerns >>>>> about that on this thread (Andrew maybe?). >>>> >>>> I could live with it for a release if I thought we had a clear path >>>> ahead, but I think there are some design issues that we need to >>>> think about before we start providing for header lines and variable >>>> formats in CSV logs, particularly w.r.t. log rotation etc. So I'm >>>> slightly nervous about going ahead with this right now. >>> >>> I believe the suggestion that Robert and I were talking about above was >>> to just unilatterally change the CSV log file output format to include >>> current_role. No header lines, no variable output format, etc. >>> >>> I do think we can make header lines and variable output work, if we can >>> get agreement on what the semantics should be. >> >> I think we're back to not having a consensus on a reasonable way to >> proceed here. Let's take this up again for 9.2. >> > > That's up to you. I can certainly live with what is suggested in Stephen's > penultimate para above. OK. If no one objects further, Stephen and I will make that happen. Otherwise: he's dead, Jim. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >>> On Tue, Feb 15, 2011 at 1:02 PM, Stephen Frost<sfrost@snowman.net> �wrote: >>>> I believe the suggestion that Robert and I were talking about above was >>>> to just unilatterally change the CSV log file output format to include >>>> current_role. �No header lines, no variable output format, etc. > OK. If no one objects further, Stephen and I will make that happen. > Otherwise: he's dead, Jim. I can't remember at the moment: have we changed the CSV format in any releases since it was first created? And if so, did anyone complain? If there's precedent showing this isn't going to be a problem for CSV users, I won't object. Otherwise I think that we should try to have just one flag day for them, not two. regards, tom lane
On Wed, Feb 16, 2011 at 7:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>>> On Tue, Feb 15, 2011 at 1:02 PM, Stephen Frost<sfrost@snowman.net> wrote: >>>>> I believe the suggestion that Robert and I were talking about above was >>>>> to just unilatterally change the CSV log file output format to include >>>>> current_role. No header lines, no variable output format, etc. > >> OK. If no one objects further, Stephen and I will make that happen. >> Otherwise: he's dead, Jim. > > I can't remember at the moment: have we changed the CSV format in any > releases since it was first created? And if so, did anyone complain? > > If there's precedent showing this isn't going to be a problem for CSV > users, I won't object. Otherwise I think that we should try to have > just one flag day for them, not two. CSV log files were introduced in 8.3.0 by commit fd801f4faa8e0f00bc314b16549e3d8e8aa1b653. There are several follow-on commits making adjustments, but they all appear to be 8.3-vintage: 230e8962f3a47cae4729ad7c017410d28caf1370 3bf66d6f1c3a8d266c3e6ed939e763a001179faf 77c166ba6cf6fe8f7e9737b7fe1793d886dd5cf8 -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > I can't remember at the moment: have we changed the CSV format in any > releases since it was first created? And if so, did anyone complain? It was changed between 8.4 and 9.0 (application_name was added). I've looked around a bit in the archives w/ google and havn't found a single complaint. Perhaps google is failing me, but it seems this isn't too bad. We've had CSV log output since 8.3, for reference, so it was unchanged 8.3 -> 8.4, then changed between 8.4 -> 9.0. Thanks, Stephen
On 02/16/2011 08:38 PM, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> I can't remember at the moment: have we changed the CSV format in any >> releases since it was first created? And if so, did anyone complain? > It was changed between 8.4 and 9.0 (application_name was added). I've > looked around a bit in the archives w/ google and havn't found a single > complaint. Perhaps google is failing me, but it seems this isn't too > bad. We've had CSV log output since 8.3, for reference, so it was > unchanged 8.3 -> 8.4, then changed between 8.4 -> 9.0. > > Frankly, compared with other issues that we've sometimes inflicted on people upgrading, this one strikes me as fairly low grade. cheers andrew
* Robert Haas (robertmhaas@gmail.com) wrote: > CSV log files were introduced in 8.3.0 by commit > fd801f4faa8e0f00bc314b16549e3d8e8aa1b653. There are several follow-on > commits making adjustments, but they all appear to be 8.3-vintage: > > 230e8962f3a47cae4729ad7c017410d28caf1370 > 3bf66d6f1c3a8d266c3e6ed939e763a001179faf > 77c166ba6cf6fe8f7e9737b7fe1793d886dd5cf8 This list appears to miss out on 8217cfbd991856d25d73b0f7afcf43d99f90b653 ..? Thanks, Stephen
On Wed, Feb 16, 2011 at 8:58 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> CSV log files were introduced in 8.3.0 by commit >> fd801f4faa8e0f00bc314b16549e3d8e8aa1b653. There are several follow-on >> commits making adjustments, but they all appear to be 8.3-vintage: >> >> 230e8962f3a47cae4729ad7c017410d28caf1370 >> 3bf66d6f1c3a8d266c3e6ed939e763a001179faf >> 77c166ba6cf6fe8f7e9737b7fe1793d886dd5cf8 > > This list appears to miss out on > 8217cfbd991856d25d73b0f7afcf43d99f90b653 ..? Ah, so it does. Sounds like you win. Have we a patch implementing the sounds-like-its-agreed change, then? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Feb 16, 2011 at 8:58 PM, Stephen Frost <sfrost@snowman.net> wrote: > > This list appears to miss out on > > 8217cfbd991856d25d73b0f7afcf43d99f90b653 ..? > > Ah, so it does. Sounds like you win. Have we a patch implementing > the sounds-like-its-agreed change, then? Working on it and expect to post it tonight. (this is about 10x simpler than the previous patch, hah) Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > Ah, so it does. Sounds like you win. Have we a patch implementing > the sounds-like-its-agreed change, then? Patch attached, rebased to current master. Full git log: Thanks, Stephen commit 47eebe20deb5da56ea6eb413ee80110887790440 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Feb 16 21:42:14 2011 -0500 Add current role to csvlog output This patch adds the current role to the csvlog output. It also slightly changes the user_name column to return the session user, if it's been changed from the login user, instead of the original login user. This is only possible through SET SESSION AUTHORIZATION, which is only allowed for superusers. These changes allow a clear view of what privileges commands are being run as. commit 7456d4fc98e6207b562dd0325dc09bbb1c915ae9 Merge: c1b06c0 9301698 Author: Stephen Frost <sfrost@snowman.net> Date: Wed Feb 16 21:03:59 2011 -0500 Merge branch 'master' of git://git.postgresql.org/git/postgresql into log_role_basic commit c1b06c04af0c886c6ec27917368f3c674227ed2d Author: Stephen Frost <sfrost@snowman.net> Date: Tue Feb 15 10:21:38 2011 -0500 Add %U option to log_line_prefix This patch adds a %U option to log_line_prefix, to allow logging of the current role (previously not possible). Also reworks %u a bit and adds documentation to clarify what each means.
Attachment
On Wed, Feb 16, 2011 at 9:52 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> Ah, so it does. Sounds like you win. Have we a patch implementing >> the sounds-like-its-agreed change, then? > > Patch attached, rebased to current master. Ugg, wait a minute. This not only adds %U; it also changes the behavior of %u, which I don't think we've agreed on. Also, emitting 'none' when not SET ROLE has been done is pretty ugly. I'm back to thinking we need to push this out to 9.2 and take more time to think about this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Ugg, wait a minute. This not only adds %U; it also changes the > behavior of %u, which I don't think we've agreed on. Also, emitting > 'none' when not SET ROLE has been done is pretty ugly. I'm back to > thinking we need to push this out to 9.2 and take more time to think > about this. Yeah, I thought what was supposed to be emitted was the value of current_user, not SQL's weird definition of what SET ROLE means. regards, tom lane
* Robert Haas (robertmhaas@gmail.com) wrote: > Ugg, wait a minute. This not only adds %U; it also changes the > behavior of %u, which I don't think we've agreed on. Also, emitting > 'none' when not SET ROLE has been done is pretty ugly. I'm back to > thinking we need to push this out to 9.2 and take more time to think > about this. As I explained in various commit logs and, as I recall, when I first posted about it, the behavior change for %u could only come about when someone used 'SET SESSION AUTHORIZATION', which requires superuser privileges. It makes more sense to me for 'user_name' to be equivilant to 'SESSION USER', but it's really not that big a deal either way. Guess I had foolishly thought that people were alright with it by lack of any comments on it. :( Does anyone else want to chime in on this? I actually came across that problem because the documentation was poor regarding exactly what that column meant. If we actually want it to be "the name that the user first used to authenticate to the system with", then let's update the documentation accordingly and we can remove those changes. Thanks, Stephen
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > Ugg, wait a minute. This not only adds %U; it also changes the > > behavior of %u, which I don't think we've agreed on. Also, emitting > > 'none' when not SET ROLE has been done is pretty ugly. I'm back to > > thinking we need to push this out to 9.2 and take more time to think > > about this. > > Yeah, I thought what was supposed to be emitted was the value of > current_user, not SQL's weird definition of what SET ROLE means. current_user uses GetUserNameFromId() and goes through the cache lookups to get there. I was using what show_role() returns (which is also what 'show role;' returns). I'd be happy to make it emit an empty string when 'none' is returned though. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Yeah, I thought what was supposed to be emitted was the value of >> current_user, not SQL's weird definition of what SET ROLE means. > current_user uses GetUserNameFromId() and goes through the cache lookups > to get there. I was using what show_role() returns (which is also what > 'show role;' returns). I'd be happy to make it emit an empty string > when 'none' is returned though. Well, that just doesn't seem useful to me in the real world. If I were using this, I would expect it to emit a real user name that matches the currently applied permissions checking. All the time. "show role" does what it does because the SQL standard says so, not because anybody outside the standards committee thinks that's a sane definition. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Well, that just doesn't seem useful to me in the real world. If I were > using this, I would expect it to emit a real user name that matches the > currently applied permissions checking. All the time. I wouldn't have ever thought to use %U w/o %u, to be honest. Unless I'm missing something though, this change would just be emitting what show_session_authorization() returns when show_role() returns 'none'. That's certainly fine by me. > "show role" does > what it does because the SQL standard says so, not because anybody > outside the standards committee thinks that's a sane definition. Guess it actually makes some sense to me. Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > Ugg, wait a minute. This not only adds %U; it also changes the > behavior of %u, which I don't think we've agreed on. Also, emitting > 'none' when not SET ROLE has been done is pretty ugly. I'm back to > thinking we need to push this out to 9.2 and take more time to think > about this. %u, user_name, etc changes reverted. %U now always returns the role currently being used for permissions checks, by using show_session_authorization() when show_role() returns 'none'. Ditto for CSV updates. git log below, re-based patch attached. All regression tests passed, tested with log_line_prefix and csvlog also, all looks good to me. Robert, if you say this has to be punted to 9.2 again, I'm giving up. ;) Thanks, Stephen
Attachment
On Thu, Feb 17, 2011 at 11:45 AM, Stephen Frost <sfrost@snowman.net> wrote: > Robert, if you say this has to be punted to 9.2 again, I'm giving up. ;) Frankly, this patch has already consumed more than its fair share of my attention. Having said that, I've just spent some more time on it. I tightened up both the code and the docs a bit. I fixed log_line_prefix so that it doesn't needlessly compute the value to be used for %U when %U isn't used. I fixed the CSV logging code to do proper escaping. Updated patch attached. It seems there's at least one more thing to worry about here, which is the overhead of this computation when CSV logging is in use. If no SET ROLE or SET SESSION AUTHORIZATION commands are in use, the code will call show_role(), which will return "none". We'll then strcmp() that against "none" and decide to call show_session_authorization(), which will call strtoul() to find the comma separator and then return a pointer to the string that follows it. Now, none of that is enormously expensive, so maybe it's not worth worrying about, but since logging can be a hotspot, I thought I'd mention it and solicit an opinion on whether that's likely to be a problem in practice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Robert, > It seems there's at least one more thing to worry about here, which is > the overhead of this computation when CSV logging is in use. If no > SET ROLE or SET SESSION AUTHORIZATION commands are in use, the code > will call show_role(), which will return "none". We'll then strcmp() > that against "none" and decide to call show_session_authorization(), > which will call strtoul() to find the comma separator and then return > a pointer to the string that follows it. Now, none of that is > enormously expensive, so maybe it's not worth worrying about, but > since logging can be a hotspot, I thought I'd mention it and solicit > an opinion on whether that's likely to be a problem in practice. That seems like enough to need a performance test. No clear ideas here on how we'd measure the overhead of that accurately, though. Suggestions? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Robert Haas <robertmhaas@gmail.com> writes: > It seems there's at least one more thing to worry about here, which is > the overhead of this computation when CSV logging is in use. If no > SET ROLE or SET SESSION AUTHORIZATION commands are in use, the code > will call show_role(), which will return "none". We'll then strcmp() > that against "none" and decide to call show_session_authorization(), > which will call strtoul() to find the comma separator and then return > a pointer to the string that follows it. Now, none of that is > enormously expensive, so maybe it's not worth worrying about, but > since logging can be a hotspot, I thought I'd mention it and solicit > an opinion on whether that's likely to be a problem in practice. Well, in the first place, going through two not-very-related APIs in order to reverse-engineer what miscinit.c already knows is pretty silly (not to mention full of possible bugs). We ought to be looking at the GetUserId state directly. Now you will complain that elog.c mustn't try to map that OID back to string form, which is true. But IIRC, we used to keep the current userid stored in both OID and string form. The string form was removed as unnecessary overhead, but maybe it'd be a good idea to put that back. In short, add a bit of overhead at SetUserId time in order to make this cheap (and accurate) in elog.c. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > It seems there's at least one more thing to worry about here, which is > > the overhead of this computation when CSV logging is in use. If no > > SET ROLE or SET SESSION AUTHORIZATION commands are in use, the code > > will call show_role(), which will return "none". We'll then strcmp() > > that against "none" and decide to call show_session_authorization(), > > which will call strtoul() to find the comma separator and then return > > a pointer to the string that follows it. Now, none of that is > > enormously expensive, so maybe it's not worth worrying about, but > > since logging can be a hotspot, I thought I'd mention it and solicit > > an opinion on whether that's likely to be a problem in practice. > > Well, in the first place, going through two not-very-related APIs in > order to reverse-engineer what miscinit.c already knows is pretty silly > (not to mention full of possible bugs). We ought to be looking at the > GetUserId state directly. GetUserId can end up being set in a number of places though, often in places where we can't fail (SetUserIdAndSecContext has some nice comments on this). > Now you will complain that elog.c mustn't try to map that OID back to > string form, which is true. But IIRC, we used to keep the current > userid stored in both OID and string form. The string form was removed > as unnecessary overhead, but maybe it'd be a good idea to put that back. The OID and the string are kept in the role_string and session_authorization_string GUCs respectively. They're just not in a terribly useful format, and because SetUserId() can change things w/o the GUCs getting updated, there's a risk that they're wrong, which is why show_role() does the stroul() dance to check if GetCurrentRoleId() matches to what it stuffed into role_string. > In short, add a bit of overhead at SetUserId time in order to make this > cheap (and accurate) in elog.c. We can't do the lookup in SetUserIDAndSecContext(), and I'm not convinced we actually want to anyway, since that would end up returning what the role is inside of security definer functions and the like. We're already setting a variable in assign_session_authorization and assign_role that has the information we need. We could inspect role_string ourselves (including the strcmp() and strtoul()) instead of asking show_role() to do it for us but that doesn't strike me as all *that* much of an improvement and goes around the API that at least exists. We could certainly have a second set of variables which are set by assign_role/assign_session_authorization that are in a format we can more easily use but what would that mean for the GUC variables..? I don't know that we'd want to keep them duplicating the data.. Would it be possible to actually use a struct instead of a straight-up string there? Is there any particular reason we keep monkeying around with storing the OID, superuser bit, role name, etc, as a string anyway..? Thanks, Stephen
On Thu, Feb 17, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> It seems there's at least one more thing to worry about here, which is >> the overhead of this computation when CSV logging is in use. If no >> SET ROLE or SET SESSION AUTHORIZATION commands are in use, the code >> will call show_role(), which will return "none". We'll then strcmp() >> that against "none" and decide to call show_session_authorization(), >> which will call strtoul() to find the comma separator and then return >> a pointer to the string that follows it. Now, none of that is >> enormously expensive, so maybe it's not worth worrying about, but >> since logging can be a hotspot, I thought I'd mention it and solicit >> an opinion on whether that's likely to be a problem in practice. > > Well, in the first place, going through two not-very-related APIs in > order to reverse-engineer what miscinit.c already knows is pretty silly > (not to mention full of possible bugs). We ought to be looking at the > GetUserId state directly. > > Now you will complain that elog.c mustn't try to map that OID back to > string form, which is true. But IIRC, we used to keep the current > userid stored in both OID and string form. The string form was removed > as unnecessary overhead, but maybe it'd be a good idea to put that back. > > In short, add a bit of overhead at SetUserId time in order to make this > cheap (and accurate) in elog.c. As Stephen says, I think this is utterly impractical; those routines can't ever throw any kind of error. I was mostly wondering whether we ought to create a function show_explicitly_set_role() or somesuch that would do basically the same thing as the proposed code, but try to avoid the strcmp in the common case where no set role has been done. I'm not very certain it's worth worrying about, though. write_csvlog() is not a trivial function as it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 17, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In short, add a bit of overhead at SetUserId time in order to make this >> cheap (and accurate) in elog.c. > As Stephen says, I think this is utterly impractical; those routines > can't ever throw any kind of error. Why would they need to throw an error? It'd be on the caller's head to supply the role name along with OID. We can keep the name in a static buffer of size NAMEDATALEN, so don't tell me about palloc failures either. The logging design as it stands seems to me to be a Rube Goldberg device that is probably going to have corner-case bugs quite aside from its possible performance issues. regards, tom lane
On Thu, Feb 17, 2011 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Feb 17, 2011 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> In short, add a bit of overhead at SetUserId time in order to make this >>> cheap (and accurate) in elog.c. > >> As Stephen says, I think this is utterly impractical; those routines >> can't ever throw any kind of error. > > Why would they need to throw an error? It'd be on the caller's head to > supply the role name along with OID. We can keep the name in a static > buffer of size NAMEDATALEN, so don't tell me about palloc failures > either. OK, but there are not a small number of callers of that function, and they don't necessarily have the correct info at hand. For example, you'd need to add prevUserAsText to TransactionStateData, which doesn't seem appealing. > The logging design as it stands seems to me to be a Rube Goldberg device > that is probably going to have corner-case bugs quite aside from its > possible performance issues. I think you're overreacting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Is this something for the next commit-fest? --------------------------------------------------------------------------- Stephen Frost wrote: -- Start of PGP signed section. > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > > It seems there's at least one more thing to worry about here, which is > > > the overhead of this computation when CSV logging is in use. If no > > > SET ROLE or SET SESSION AUTHORIZATION commands are in use, the code > > > will call show_role(), which will return "none". We'll then strcmp() > > > that against "none" and decide to call show_session_authorization(), > > > which will call strtoul() to find the comma separator and then return > > > a pointer to the string that follows it. Now, none of that is > > > enormously expensive, so maybe it's not worth worrying about, but > > > since logging can be a hotspot, I thought I'd mention it and solicit > > > an opinion on whether that's likely to be a problem in practice. > > > > Well, in the first place, going through two not-very-related APIs in > > order to reverse-engineer what miscinit.c already knows is pretty silly > > (not to mention full of possible bugs). We ought to be looking at the > > GetUserId state directly. > > GetUserId can end up being set in a number of places though, often in > places where we can't fail (SetUserIdAndSecContext has some nice > comments on this). > > > Now you will complain that elog.c mustn't try to map that OID back to > > string form, which is true. But IIRC, we used to keep the current > > userid stored in both OID and string form. The string form was removed > > as unnecessary overhead, but maybe it'd be a good idea to put that back. > > The OID and the string are kept in the role_string and > session_authorization_string GUCs respectively. They're just not in a > terribly useful format, and because SetUserId() can change things w/o > the GUCs getting updated, there's a risk that they're wrong, which is > why show_role() does the stroul() dance to check if GetCurrentRoleId() > matches to what it stuffed into role_string. > > > In short, add a bit of overhead at SetUserId time in order to make this > > cheap (and accurate) in elog.c. > > We can't do the lookup in SetUserIDAndSecContext(), and I'm not > convinced we actually want to anyway, since that would end up returning > what the role is inside of security definer functions and the like. > We're already setting a variable in assign_session_authorization and > assign_role that has the information we need. We could inspect > role_string ourselves (including the strcmp() and strtoul()) instead > of asking show_role() to do it for us but that doesn't strike me as all > *that* much of an improvement and goes around the API that at least > exists. > > We could certainly have a second set of variables which are set by > assign_role/assign_session_authorization that are in a format we can > more easily use but what would that mean for the GUC variables..? I > don't know that we'd want to keep them duplicating the data.. Would it > be possible to actually use a struct instead of a straight-up string > there? Is there any particular reason we keep monkeying around with > storing the OID, superuser bit, role name, etc, as a string anyway..? > > Thanks, > > Stephen -- End of PGP section, PGP failed! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
* Bruce Momjian (bruce@momjian.us) wrote: > Is this something for the next commit-fest? I already moved it there.. Thanks, Stephen