Re: How could we make it simple to access the log as a table? - Mailing list pgsql-hackers

From Marc Mamin
Subject Re: How could we make it simple to access the log as a table?
Date
Msg-id C4DAC901169B624F933534A26ED7DF310861B586@JENMAIL01.ad.intershop.net
Whole thread Raw
In response to Re: How could we make it simple to access the log as a table?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
> The  CSV format is both rich and
> machine-parseable (good start!) but it takes an unreasonable amount of
> work to make it usefully queryable.  We need something that looks more
> like a big red button.

Hello,

The Pg logs consist of a rich soup with many different information kind.
It's comfortable for reading to have all that stuff sequentially
available, but not as input for an ETL/reporting tool.

Moreover, there is a will to group many different sources in single
reporting tools, which may exclude Postgres as solution in many cases.

And everybody with more than 1 Postgres instance will like to have a
single point of access for all of them.

Having the logs available as Postgres table is nice, but it is only a
tiny part toward reporting
and not such complicated to implement with the current facilities,
except maybe for true time monitoring.

I think that more users would get satisfied with a  utility tool that
transform the log content into a more etl friendly format.
Otherwise every reporting or monitoring project will have to implement
the "same" data parsing again and again.

Here a few loose ideas:

This transformation could take place on the fly (e.g.:
log_destination='pg_etl.sh'), or later on at some other place, using the
csv output as source.

this "pg_etl" tool could possibly generate more than one csv logs,
e.g.:
- query_durations- checkpoint_logs

currently getting those kind of data requires string matching of the log
messages.
Alternatively, an additional "target" column in the log output would
make sense.Some  enhancements of the current format would also help.examples:- a "normed" format of the logged queries
(http://postgresql.1045698.n5.nabble.com/pg-stat-statements-with-query-t
ree-based-normalization-td4989745.html)- add placeholders for the host, ip and port in log_filename.  e.g. log_filename
='pg-$HOST_$PORT-%Y%m%d_%H%M.log'
- posssibly in the log_line_prefix & csv content too.
  - use a constant logname for the active log and add the timestamp only
when switching   e.g.  pg-myhost_3332-20120530_1200.csv        pg-myhost_3332-20120530_1300.csv
pg-myhost_3332.csv    <= current   

best regards,

Marc Mamin



pgsql-hackers by date:

Previous
From: Waldecir Faria
Date:
Subject: Re: Function call hierarchy/path since getting the buffer until access its data
Next
From: Simon Riggs
Date:
Subject: Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)