Thread: Similar to csvlog but not really, json logs?

Similar to csvlog but not really, json logs?

From
Michael Paquier
Date:
Hi all,

As mentioned here, we support multiple logging format:
http://www.postgresql.org/docs/devel/static/runtime-config-logging.html
Now what about a json format logging with one json object per log entry?

A single json entry would need more space than a csv one as we need to
track the field names with their values. Also, there is always the
argument that if an application needs json-format logs, it could use
csvlog on Postgres-side and do the transformation itself. But wouldn't
it be a win for application or tools if such an option is available
in-core?

Note that I am not planning to work on that in a close future, but it
would be a good TODO item for beginners if it is worth pursuing.

Thoughts?

Regards,
-- 
Michael



Re: Similar to csvlog but not really, json logs?

From
Peter Geoghegan
Date:
On Tue, Aug 26, 2014 at 6:45 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Thoughts?

I think that it would be a good beginner's project to make pprint()
print JSON. I spend enough time staring at its output that I've often
wished I could expand and collapse each part using my text editor's
folds feature. I guess I could teach my editor to do that, but that
seems like the wrong way to fix the problem. I might also want to
store them as jsonb, for particularly complicated debugging. The
existing infrastructure is user visible because of GUCs like
debug_print_parse.

-- 
Peter Geoghegan



Re: Similar to csvlog but not really, json logs?

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> Now what about a json format logging with one json object per log entry?

> A single json entry would need more space than a csv one as we need to
> track the field names with their values. Also, there is always the
> argument that if an application needs json-format logs, it could use
> csvlog on Postgres-side and do the transformation itself. But wouldn't
> it be a win for application or tools if such an option is available
> in-core?

I think the extra representational overhead is already a good reason to
say "no".  There is not any production scenario I've ever heard of where
log output volume isn't a consideration.
        regards, tom lane



Re: Similar to csvlog but not really, json logs?

From
Tom Lane
Date:
Peter Geoghegan <pg@heroku.com> writes:
> I think that it would be a good beginner's project to make pprint()
> print JSON.

There's something to be said for that (or, really, for any standardized
widely-popular textual data format; but JSON is a perfectly reasonable
candidate).

> The existing infrastructure is user visible because of GUCs like
> debug_print_parse.

There is that :-(.  The fact that these strings are stored in the catalogs
isn't a problem as long as we make the change in a major version upgrade.
But to the extent that there is client-side code that expects to make
sense of the strings, it could be a problem.  Is there any such code?
If so, are we really beholden to not break it?  It's not like we don't
change those data representations routinely anyway ...
        regards, tom lane



Re: Similar to csvlog but not really, json logs?

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
> > Now what about a json format logging with one json object per log entry?
>
> > A single json entry would need more space than a csv one as we need to
> > track the field names with their values. Also, there is always the
> > argument that if an application needs json-format logs, it could use
> > csvlog on Postgres-side and do the transformation itself. But wouldn't
> > it be a win for application or tools if such an option is available
> > in-core?
>
> I think the extra representational overhead is already a good reason to
> say "no".  There is not any production scenario I've ever heard of where
> log output volume isn't a consideration.

The flip side is that there are absolutely production cases where what
we output is either too little or too much- being able to control that
and then have the (filtered) result in JSON would be more-or-less
exactly what a client of ours is looking for.

To try to clarify that a bit, as it comes across as rather opaque even
on my re-reading, consider a case where you can't have the
"credit_card_number" field ever exported to an audit or log file, but
you're required to log all other changes to a table.  Then consider that
such a situation extends to individual INSERT or UPDATE commands- you
need the command logged, but you can't have the contents of that column
in the log file.

Our current capabilities around logging and auditing are dismal and
extremely frustrating when faced with these kinds of, quite real,
requirements.  I'll be in an internal meeting more-or-less all day
tomorrow discussing auditing and how we might make things easier for
organizations which have these requirements- would certainly welcome any
thoughts in that direction.
Thanks,
    Stephen

Re: Similar to csvlog but not really, json logs?

From
Peter Geoghegan
Date:
On Tue, Aug 26, 2014 at 7:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think that it would be a good beginner's project to make pprint()
>> print JSON.
>
> There's something to be said for that (or, really, for any standardized
> widely-popular textual data format; but JSON is a perfectly reasonable
> candidate).

Yeah. The structures that the optimizer produces in particular are
very intricate. It would be quite nice to have a way to manipulate it
mechanically.

>> The existing infrastructure is user visible because of GUCs like
>> debug_print_parse.
>
> There is that :-(.  The fact that these strings are stored in the catalogs
> isn't a problem as long as we make the change in a major version upgrade.
> But to the extent that there is client-side code that expects to make
> sense of the strings, it could be a problem.  Is there any such code?
> If so, are we really beholden to not break it?  It's not like we don't
> change those data representations routinely anyway ...

I highly doubt that there is any such code in the wild. It seems to
only be intended for debugging user-defined rules, which are not a
popular feature. Personally, I've occasionally used tricks like
diffing two files with similar Nodes to see where and how differences
that are of interest arise.  :-)

-- 
Peter Geoghegan



Re: Similar to csvlog but not really, json logs?

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> I think the extra representational overhead is already a good reason to
>> say "no".  There is not any production scenario I've ever heard of where
>> log output volume isn't a consideration.

> The flip side is that there are absolutely production cases where what
> we output is either too little or too much- being able to control that
> and then have the (filtered) result in JSON would be more-or-less
> exactly what a client of ours is looking for.

> To try to clarify that a bit, as it comes across as rather opaque even
> on my re-reading, consider a case where you can't have the
> "credit_card_number" field ever exported to an audit or log file, but
> you're required to log all other changes to a table.  Then consider that
> such a situation extends to individual INSERT or UPDATE commands- you
> need the command logged, but you can't have the contents of that column
> in the log file.

Hmm ... that's a lovely use-case, but somehow I don't find "let's output
in JSON" to be a credible solution.  Not only are you paying extra output
log volume to do that, but you are also supposing that some downstream
process is going to parse the JSON, remove some data (which JSON-izing
the log format didn't especially help with, be honest), and re-emit JSON
afterwards.  There's no way that that scales to any production situation,
even if you had parallel log-filtering processes which you won't.

I am interested in thinking about your scenario; I just don't think
that JSON output format is any real part of the answer.
        regards, tom lane



Re: Similar to csvlog but not really, json logs?

From
Alvaro Herrera
Date:
Stephen Frost wrote:

> The flip side is that there are absolutely production cases where what
> we output is either too little or too much- being able to control that
> and then have the (filtered) result in JSON would be more-or-less
> exactly what a client of ours is looking for.

My impression is that the JSON fields are going to be more or less
equivalent to the current csvlog columns (what else could it be?).  So
if you can control what you give your auditors by filtering by
individual JSON attributes, surely you could count columns in the
hardcoded CSV definition we use for csvlog just as well.

> To try to clarify that a bit, as it comes across as rather opaque even
> on my re-reading, consider a case where you can't have the
> "credit_card_number" field ever exported to an audit or log file, but
> you're required to log all other changes to a table.  Then consider that
> such a situation extends to individual INSERT or UPDATE commands- you
> need the command logged, but you can't have the contents of that column
> in the log file.

It seems a bit far-fetched to think that you will be able to rip out
parts of queries by applying JSON operators to the query text.  Perhaps
your intention is to log queries using something similar to the JSON
blobs I'm using the DDL deparse patch?

> Our current capabilities around logging and auditing are dismal and
> extremely frustrating when faced with these kinds of, quite real,
> requirements.  I'll be in an internal meeting more-or-less all day
> tomorrow discussing auditing and how we might make things easier for
> organizations which have these requirements- would certainly welcome any
> thoughts in that direction.

My own thought is: JSON is good, but sadly it doesn't cure cancer.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Similar to csvlog but not really, json logs?

From
Stephen Frost
Date:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > The flip side is that there are absolutely production cases where what
> > we output is either too little or too much- being able to control that
> > and then have the (filtered) result in JSON would be more-or-less
> > exactly what a client of ours is looking for.
>
> > To try to clarify that a bit, as it comes across as rather opaque even
> > on my re-reading, consider a case where you can't have the
> > "credit_card_number" field ever exported to an audit or log file, but
> > you're required to log all other changes to a table.  Then consider that
> > such a situation extends to individual INSERT or UPDATE commands- you
> > need the command logged, but you can't have the contents of that column
> > in the log file.
>
> Hmm ... that's a lovely use-case, but somehow I don't find "let's output
> in JSON" to be a credible solution.  Not only are you paying extra output
> log volume to do that, but you are also supposing that some downstream
> process is going to parse the JSON, remove some data (which JSON-izing
> the log format didn't especially help with, be honest), and re-emit JSON
> afterwards.  There's no way that that scales to any production situation,
> even if you had parallel log-filtering processes which you won't.

That's not what I was trying to suggest, though I understand how I
wasn't clear.  There's no downstream process to parse and filter the
JSON, in an ideal situation.

Consider an audit system where which columns end up in the audit log are
controlled by issuing ALTER TABLE .. ALTER COLUMN type statements.
Where JSON plays into this is that we can't depend on the table
structure to tell us what the columns are in an audit log any longer
and therefore that information needs to be included- and JSON happens to
conveniently do that.

As for parallel logging- that's a different subject but one which a lot
of people are very interested in (I expect Nasby would be happy to
comment on that subject, as he and I discussed it a number of years ago
at PGCon, since their systems are one of those where you simply can't
turn on query logging or the system will essentially stop).

> I am interested in thinking about your scenario; I just don't think
> that JSON output format is any real part of the answer.

I'm certainly open to other suggestions.  JSON was suggested by the
client as a possible approach, but better ideas are always welcome.
Thanks,
    Stephen

Re: Similar to csvlog but not really, json logs?

From
Stephen Frost
Date:
* Alvaro Herrera (alvherre@2ndquadrant.com) wrote:
> Stephen Frost wrote:
>
> > The flip side is that there are absolutely production cases where what
> > we output is either too little or too much- being able to control that
> > and then have the (filtered) result in JSON would be more-or-less
> > exactly what a client of ours is looking for.
>
> My impression is that the JSON fields are going to be more or less
> equivalent to the current csvlog columns (what else could it be?).  So
> if you can control what you give your auditors by filtering by
> individual JSON attributes, surely you could count columns in the
> hardcoded CSV definition we use for csvlog just as well.

I don't want to invent a CSV and SQL parser to address this
requirement..  That'd be pretty horrible.

> > To try to clarify that a bit, as it comes across as rather opaque even
> > on my re-reading, consider a case where you can't have the
> > "credit_card_number" field ever exported to an audit or log file, but
> > you're required to log all other changes to a table.  Then consider that
> > such a situation extends to individual INSERT or UPDATE commands- you
> > need the command logged, but you can't have the contents of that column
> > in the log file.
>
> It seems a bit far-fetched to think that you will be able to rip out
> parts of queries by applying JSON operators to the query text.  Perhaps
> your intention is to log queries using something similar to the JSON
> blobs I'm using the DDL deparse patch?

Right- we need to pass the queries through a normalization structure
which can then consider what's supposed to be sent on to the log file-
ideally that would happen on a per-backend basis, allowing the filtering
to be parallelized.  It's quite a bit more than what we've currently got
going on, which is more-or-less "dump the string we were sent",
certainly.

> My own thought is: JSON is good, but sadly it doesn't cure cancer.

Unfortunately, straw-man arguments don't either. ;)
Thanks!
    Stephen

Re: Similar to csvlog but not really, json logs?

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> Consider an audit system where which columns end up in the audit log are
> controlled by issuing ALTER TABLE .. ALTER COLUMN type statements.

<blink>

I'd like to consider such a thing, but it seems like utter pie in the
sky.  Do you really believe that elog() could know enough about what it's
printing to apply such a filter?  Do you think elog() should be allowed
to do catalog accesses in order to find out what the filter conditions
should be (hint: no)?  Perhaps you think that we don't ever need to emit
error messages before we've analyzed a query enough to figure out which
tables are involved?  Let alone which columns?  Let alone which literals
elsewhere in the query string might be somehow associated with those
columns?

I suggest that you should spend most of your meeting tomorrow tamping down
hard on the expectations of whoever you're speaking with.
        regards, tom lane



Re: Similar to csvlog but not really, json logs?

From
Tom Lane
Date:
I wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> Consider an audit system where which columns end up in the audit log are
>> controlled by issuing ALTER TABLE .. ALTER COLUMN type statements.

> I'd like to consider such a thing, but it seems like utter pie in the
> sky.

On further thought: the existing postmaster log is primarily an error log,
and for the reasons I mentioned, it seems useless to imagine filtering it
on security-based grounds.  You can either route it to /dev/null or
restrict viewing to suitably privileged people; there is no middle ground
with any usefulness.

However: for some values of "audit" it seems like an audit log could
consist of reports of changes actually applied to the database.  If that's
your definition then it's surely possible to imagine omitting particular
columns (or rows) from the output, because we've already eliminated all
the cases where the system couldn't make sense of the input it was fed.
So I think Alvaro was right to suggest that maybe the logical-decoding
subsystem, rather than the existing logging subsystem, is where to look
for solutions.  You could do logical decoding of changes and emit a
filtered version of that to some output that's independent of the current
postmaster logging arrangements.
        regards, tom lane



Re: Similar to csvlog but not really, json logs?

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Consider an audit system where which columns end up in the audit log are
> > controlled by issuing ALTER TABLE .. ALTER COLUMN type statements.
>
> <blink>
>
> I'd like to consider such a thing, but it seems like utter pie in the
> sky.  Do you really believe that elog() could know enough about what it's
> printing to apply such a filter?  Do you think elog() should be allowed
> to do catalog accesses in order to find out what the filter conditions
> should be (hint: no)?  Perhaps you think that we don't ever need to emit
> error messages before we've analyzed a query enough to figure out which
> tables are involved?  Let alone which columns?  Let alone which literals
> elsewhere in the query string might be somehow associated with those
> columns?

I wasn't intending to suggest that this would be handled by elog()
directly, no.

I agree that we're going to need to separate auditing and filtering from
elog.  Perhaps that means it was the wrong context in which to bring up
these questions about alternative log output formats, but I'd like to
avoid having a *completely* independent system from the logging
infrastructure, if possible.

The pgaudit system is an example of how this could be done independently
but there's some serious limitations there.

For the elog piece, we'd mainly need to figure out how to get it to log
useful information when there are *serious* problems (think PANIC and
the like) but not to leak information (do I care that someone typo'd the
INSERT command and produced an ERROR?  no..  or if I do, could I at
least have it just log "ERROR: INSERT (parse error)" or something?).
It's certainly never going to have the smarts to figure out that
characters 16-32 of this malformed INSERT statement shouldn't be logged.
Thanks,
    Stephen

Re: Similar to csvlog but not really, json logs?

From
Stephen Frost
Date:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> >> Consider an audit system where which columns end up in the audit log are
> >> controlled by issuing ALTER TABLE .. ALTER COLUMN type statements.
>
> > I'd like to consider such a thing, but it seems like utter pie in the
> > sky.
>
> On further thought: the existing postmaster log is primarily an error log,
> and for the reasons I mentioned, it seems useless to imagine filtering it
> on security-based grounds.  You can either route it to /dev/null or
> restrict viewing to suitably privileged people; there is no middle ground
> with any usefulness.

Well, see my other email about possibly making it useful while also not
leaking information.  If that's not possible (and that may well be the
case), then so be it and it'll be addressed by suitably privileged
individuals, but I've not quite given up on it entirely yet.  Still, I
agree that it's not quite the same as auditing.

> However: for some values of "audit" it seems like an audit log could
> consist of reports of changes actually applied to the database.  If that's
> your definition then it's surely possible to imagine omitting particular
> columns (or rows) from the output, because we've already eliminated all
> the cases where the system couldn't make sense of the input it was fed.
> So I think Alvaro was right to suggest that maybe the logical-decoding
> subsystem, rather than the existing logging subsystem, is where to look
> for solutions.  You could do logical decoding of changes and emit a
> filtered version of that to some output that's independent of the current
> postmaster logging arrangements.

That's definitely one avenue we've looked at- but it also has certain
limitations (can you figure out which role on which connection caused
the record in the logical decoding system?).  I believe Andreas has
ideas about how to improve that situation and that may end up being the
correct answer, though it does leave unaddressed a wide area of "we
parsed the command- knew exactly what you intended to do, but we didn't
allow you to do it" type of scenarios (eg: permission denied) that
auditors are often very interested in.

Also, just to clarify, we're certainly not making promises that these
things will suddenly appear- tomorrow's internal meeting is primairly
with a certain Greg Smith to discuss the requirements and try to flesh
out ideas to propose to the community and get feedback on.
Thanks!
    Stephen

Re: Similar to csvlog but not really, json logs?

From
Fujii Masao
Date:
On Wed, Aug 27, 2014 at 12:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Stephen Frost <sfrost@snowman.net> writes:
>>> Consider an audit system where which columns end up in the audit log are
>>> controlled by issuing ALTER TABLE .. ALTER COLUMN type statements.
>
>> I'd like to consider such a thing, but it seems like utter pie in the
>> sky.
>
> On further thought: the existing postmaster log is primarily an error log,
> and for the reasons I mentioned, it seems useless to imagine filtering it
> on security-based grounds.  You can either route it to /dev/null or
> restrict viewing to suitably privileged people; there is no middle ground
> with any usefulness.
>
> However: for some values of "audit" it seems like an audit log could
> consist of reports of changes actually applied to the database.  If that's
> your definition then it's surely possible to imagine omitting particular
> columns (or rows) from the output, because we've already eliminated all
> the cases where the system couldn't make sense of the input it was fed.
> So I think Alvaro was right to suggest that maybe the logical-decoding
> subsystem, rather than the existing logging subsystem, is where to look
> for solutions.  You could do logical decoding of changes and emit a
> filtered version of that to some output that's independent of the current
> postmaster logging arrangements.

There is a case where read query not generating WAL needs to be audited
(e.g., a user might want to audit any queries accessing the table which
contains credit card number). So I doubt if logical decoding is really good
answer for the audit.

Regards,

-- 
Fujii Masao



Re: Similar to csvlog but not really, json logs?

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Stephen Frost wrote:

> To try to clarify that a bit, as it comes across as rather opaque even
> on my re-reading, consider a case where you can't have the
> "credit_card_number" field ever exported to an audit or log file, but
> you're required to log all other changes to a table.  Then consider that
> such a situation extends to individual INSERT or UPDATE commands- you
> need the command logged, but you can't have the contents of that column
> in the log file.

Perhaps you need a better example. Storing raw credit cards in the database 
is a bad idea (and potential PCI violation); audit/log files are only one 
of the many ways things can leak out. Encrypting sensitive columns is a 
solution that solves your auditing problem, and works on all current versions 
of Postgres. :)

> Our current capabilities around logging and auditing are dismal

No arguments there.

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201408271200
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlP+AKgACgkQvJuQZxSWSsjf7gCg00BwRbwRi/UPrHBs1RdfWX/I
TRsAn2CDrG/ycetKOQFbn/4rnSSYPz9j
=Ju0B
-----END PGP SIGNATURE-----





Re: Similar to csvlog but not really, json logs?

From
Jim Nasby
Date:
On 8/26/14, 8:45 PM, Michael Paquier wrote:
> Hi all,
>
> As mentioned here, we support multiple logging format:
> http://www.postgresql.org/docs/devel/static/runtime-config-logging.html
> Now what about a json format logging with one json object per log entry?
>
> A single json entry would need more space than a csv one as we need to
> track the field names with their values. Also, there is always the
> argument that if an application needs json-format logs, it could use
> csvlog on Postgres-side and do the transformation itself. But wouldn't
> it be a win for application or tools if such an option is available
> in-core?
>
> Note that I am not planning to work on that in a close future, but it
> would be a good TODO item for beginners if it is worth pursuing.

Perhaps instead of doing this in-core it would be better to make log handling more extensible? I'm thinking add a
specific"binary" format and an external tool that can parse that and do whatever the user wants with it. That means we
don'thave to keep adding more complexity to the internal log handling (which already has the risk of being a
bottleneck),while allowing maximum user flexibility.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



Re: Similar to csvlog but not really, json logs?

From
Andres Freund
Date:
On 2014-08-26 23:04:48 -0500, Jim Nasby wrote:
> On 8/26/14, 8:45 PM, Michael Paquier wrote:
> >Hi all,
> >
> >As mentioned here, we support multiple logging format:
> >http://www.postgresql.org/docs/devel/static/runtime-config-logging.html
> >Now what about a json format logging with one json object per log entry?
> >
> >A single json entry would need more space than a csv one as we need to
> >track the field names with their values. Also, there is always the
> >argument that if an application needs json-format logs, it could use
> >csvlog on Postgres-side and do the transformation itself. But wouldn't
> >it be a win for application or tools if such an option is available
> >in-core?
> >
> >Note that I am not planning to work on that in a close future, but it
> >would be a good TODO item for beginners if it is worth pursuing.
> 
> Perhaps instead of doing this in-core it would be better to make log handling more extensible? I'm thinking add a
specific"binary" format and an external tool that can parse that and do whatever the user wants with it. That means we
don'thave to keep adding more complexity to the internal log handling (which already has the risk of being a
bottleneck),while allowing maximum user flexibility.
 

There's a logging hook. Most of this should be doable from there.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Similar to csvlog but not really, json logs?

From
Petr Jelinek
Date:
On 27/08/14 18:53, Andres Freund wrote:
> On 2014-08-26 23:04:48 -0500, Jim Nasby wrote:
>> On 8/26/14, 8:45 PM, Michael Paquier wrote:
>>> Hi all,
>>>
>>> As mentioned here, we support multiple logging format:
>>> http://www.postgresql.org/docs/devel/static/runtime-config-logging.html
>>> Now what about a json format logging with one json object per log entry?
>>>
>>> A single json entry would need more space than a csv one as we need to
>>> track the field names with their values. Also, there is always the
>>> argument that if an application needs json-format logs, it could use
>>> csvlog on Postgres-side and do the transformation itself. But wouldn't
>>> it be a win for application or tools if such an option is available
>>> in-core?
>>>
>>> Note that I am not planning to work on that in a close future, but it
>>> would be a good TODO item for beginners if it is worth pursuing.
>>
>> Perhaps instead of doing this in-core it would be better to make log handling more extensible? I'm thinking add a
specific"binary" format and an external tool that can parse that and do whatever the user wants with it. That means we
don'thave to keep adding more complexity to the internal log handling (which already has the risk of being a
bottleneck),while allowing maximum user flexibility.
 
>
> There's a logging hook. Most of this should be doable from there.
>

Yes, as demonstrated by https://github.com/mpihlak/pg_logforward


--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: Similar to csvlog but not really, json logs?

From
Josh Berkus
Date:
On 08/27/2014 09:53 AM, Andres Freund wrote:
>> > Perhaps instead of doing this in-core it would be better to make log handling more extensible? I'm thinking add a
specific"binary" format and an external tool that can parse that and do whatever the user wants with it. That means we
don'thave to keep adding more complexity to the internal log handling (which already has the risk of being a
bottleneck),while allowing maximum user flexibility.
 
> There's a logging hook. Most of this should be doable from there.

Is there any docs at all on the logging hooks?  I couldn't find any.
Maybe that's why people keep trying to reinvent them.

The main reason I personally would like to have JSON logs has more to do
with formatting and extracting data.  For example, if you have a
prepared statement and log_min_duration_statement turned on, you get
something like this in the "message" and "details" fields:

"duration: 8.253 ms  execute <unnamed>: SELECT login FROM users WHERE id
= $1, "parameters: $1 = '90700'"

... and then for various analytics, like pgBadger and troubleshooting,
you have to break out the parts of those fields by regex and split,
which is error-prone; in our query replay tool, I have at least a dozen
commits tweaking the regexes because of some special case I didn't
account for (like an array parameter).

It would be vastly easier to work with the above as JSON:

...
"message" : { "duration" : 8.253, "command" : "execute",
"statement_name" : "<unnamed>", "statement" : "SELECT login FROM users
WHERE id = $1" }, "details" : { "parameters" : { "$1" : 90700 } }
...

This would allow me, or Dalibo, to remove literally dozens of lines of
error-prone regexing code.

That fix would, IMHO, make it worth enabling JSON logging as a logging
hook or something similar.  If we're just going to convert the CSV to
JSON, with the existing fields?  Waste of time, I can do that with a
5-line script in at least 3 different languages.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com