Thread: Help in Parsing PG log usings CSV format

Help in Parsing PG log usings CSV format

From
Arvind Singh
Date:
Help needed in parsing PostgreSQL CSV Log
Hello friends,
I am working an a section of application which needs to Parse CSV Logs generated by PostgreSql server.
- The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log
- The Server version in 9.0.4
- The application is developed in C Sharp
    * The basic utility after Parse the Log is to show contents in a DataGridView.
    * There are other filter options like to view log contents for a particular range of Time for a Day.
 
**However the main problem that is, the Log format is not readable**
 
 
A Sample Log data line
> 2012-03-21 11:59:20.640 IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle",2012-03-21 11:59:20 IST,2/163,0,LOG,00000,"statement: SELECT version()",,,,,,,,"exec_simple_query, .\src\backend\tcop\postgres.c:900",""
 
As you can see the columns in the Log are comma separated , But however individual values  are not Quote Enclosed.
 
For instance the 1st,4rth,6th .. columns
 
**Is there a setting in PostgreSQL configuration to place quotes around all columns in a Logfili
 
 
I just want to update the columns so that all are within Quotes
 
what happens wrong is when it reaches the column where sql statement is place. it also has commas set for table columns. The log line is a mix bunch of quote-enclosed and non-quote-enclosed column. is there is a configuration or utility to convert the non-quoted column to quoted column
 
PS : the copy utility or any other utility cannot be used , as we have to parse the log within a C Sharp application

Thanks for any advice and help

Re: Help in Parsing PG log usings CSV format

From
Martin Gregorie
Date:
On Thu, 2012-03-22 at 09:32 +0000, Arvind Singh wrote:
> Help needed in parsing PostgreSQL CSV Log
> Hello friends,
> I am working an a section of application which needs to Parse CSV Logs generated by PostgreSql server.
> - The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log
> - The Server version in 9.0.4
> - The application is developed in C Sharp
>     * The basic utility after Parse the Log is to show contents in a DataGridView.
>     * There are other filter options like to view log contents for a particular range of Time for a Day.
>
> **However the main problem that is, the Log format is not readable**
>
>
> A Sample Log data line
> > 2012-03-21 11:59:20.640 IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle",2012-03-21
11:59:20IST,2/163,0,LOG,00000,"statement: SELECT version()",,,,,,,,"exec_simple_query,
.\src\backend\tcop\postgres.c:900",""
>
> As you can see the columns in the Log are comma separated , But however individual values  are not Quote Enclosed.
>
> For instance the 1st,4rth,6th .. columns
>
>
Thats fairly standard. A good CSV parser only requires a field to be
quoted if it contains commas or quotes. In the latter case the internal
quotes should be doubled, i.e the three fields in the following:

  unquoted field,"contains commas, etc.","Fred said ""Cool!""."

should be handled correctly by a decent CSV parser.


Martin




Re: Help in Parsing PG log usings CSV format

From
"Albe Laurenz"
Date:
Arvind Singh wrote:
> Help needed in parsing PostgreSQL CSV Log

[...]

> **However the main problem that is, the Log format is not readable**

> A Sample Log data line
> > 2012-03-21 11:59:20.640
>
IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle
",2012-03-21 11:59:20
> IST,2/163,0,LOG,00000,"statement: SELECT
version()",,,,,,,,"exec_simple_query,
> .\src\backend\tcop\postgres.c:900",""
>
> As you can see the columns in the Log are comma separated , But
however individual values  are not
> Quote Enclosed.
>
> For instance the 1st,4rth,6th .. columns
>
> **Is there a setting in PostgreSQL configuration to place quotes
around all columns in a Logfili
>
> I just want to update the columns so that all are within Quotes
>
> what happens wrong is when it reaches the column where sql statement
is place. it also has commas set
> for table columns. The log line is a mix bunch of quote-enclosed and
non-quote-enclosed column. is
> there is a configuration or utility to convert the non-quoted column
to quoted column

The columns that are not quoted are guaranteed not to contain a comma.
So it shouldn't be a problem to parse them.

In fact, it is quite easy. As an example, see here:
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgreplay/pgreplay/parse.c?re
v=1.14&content-type=text/x-cvsweb-markup
In the function parse_csvlog_entry, after the comment
"read next line after start timestamp from log file"
you can find code that parses such a line.

The code is in C, so it should be easy to port it to C#, which
is essentially Java, which has C-like syntax.

Yours,
Laurenz Albe

Re: Parsing PG log usings CSV format

From
Arvind Singh
Date:
Thank you sir,
 
i have sorted out the problem on
" The columns that are not quoted are guaranteed not to contain a comma. "
 
But i have another query, the structure of PG Log CSV as mentioned in manual and as below has 24 columns
http://www.postgresql.org/docs/9.1/static/runtime-config-logging.html
-------------------------------------------
log_time timestamp,
  user_name text,
  database_name text,
  process_id integer,
  connection_from text,
  session_id text,
  session_line_num bigint,
  command_tag text,
  session_start_time timestamp,
  virtual_transaction_id text,
  transaction_id bigint,
  error_severity text,
  sql_state_code text,
  message text,
  detail text,
  hint text,
  internal_query text,
  internal_query_pos integer,
  context text,
  query text,
  query_pos integer,
  location text,
  application_name text,
  PRIMARY KEY (session_id, session_line_num)

-------------------------------------------  
However every Log line added contains only 22 columns, a sample log line is given below
-------------------------------------------

2012-03-22 15:29:53.546 IST,"postgres","stock_apals",2396,"localhost:2766",4f6af819.95c,9,"SELECT",2012-03-22 15:29:53 IST,3/0,0,LOG,00000,"QUERY STATISTICS","! system usage stats:
! 0.015000 elapsed 0.000000 user 0.015625 system sec
! [0.078125 user 0.062500 sys total]",,,,,"SELECT SUBSTR(itemname, 1, 15) as Product,
avg(rate) as Avgrate
FROM   issue_stock WHERE extract(year from issue_stock.edate) = '2011'
GROUP BY itemname
order by itemname",,"ShowUsage, .\src\backend\tcop\postgres.c:4305",""
-------------------------------------------
 
> the last column of the Log is not the Primarykey ?
> the last column as of yet is unknown because at all occurances it is a empty quoted column ?
> the column numbers dont match with the generated log
 
Is this is a different Log format , can you guide us to a page where the column specifications can be matched.
 
Yrs truly
arvind pal singh
 
> Subject: RE: [GENERAL] Help in Parsing PG log usings CSV format
> Date: Thu, 22 Mar 2012 16:47:11 +0100
> From: laurenz.albe@wien.gv.at
> To: arvindps@hotmail.com; pgsql-general@postgresql.org
>
> Arvind Singh wrote:
> > Help needed in parsing PostgreSQL CSV Log
>
> [...]
>
> > **However the main problem that is, the Log format is not readable**
>
> > A Sample Log data line
> > > 2012-03-21 11:59:20.640
> >
> IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle
> ",2012-03-21 11:59:20
> > IST,2/163,0,LOG,00000,"statement: SELECT
> version()",,,,,,,,"exec_simple_query,
> > .\src\backend\tcop\postgres.c:900",""
> >
> > As you can see the columns in the Log are comma separated , But
> however individual values are not
> > Quote Enclosed.
> >
> > For instance the 1st,4rth,6th .. columns
> >
> > **Is there a setting in PostgreSQL configuration to place quotes
> around all columns in a Logfili
> >
> > I just want to update the columns so that all are within Quotes
> >
> > what happens wrong is when it reaches the column where sql statement
> is place. it also has commas set
> > for table columns. The log line is a mix bunch of quote-enclosed and
> non-quote-enclosed column. is
> > there is a configuration or utility to convert the non-quoted column
> to quoted column
>
> The columns that are not quoted are guaranteed not to contain a comma.
> So it shouldn't be a problem to parse them.
>
> In fact, it is quite easy. As an example, see here:
> http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgreplay/pgreplay/parse.c?re
> v=1.14&content-type=text/x-cvsweb-markup
> In the function parse_csvlog_entry, after the comment
> "read next line after start timestamp from log file"
> you can find code that parses such a line.
>
> The code is in C, so it should be easy to port it to C#, which
> is essentially Java, which has C-like syntax.
>
> Yours,
> Laurenz Albe

Re: Parsing PG log usings CSV format

From
"Albe Laurenz"
Date:
Arvind Singh wrote:
> i have sorted out the problem on
> " The columns that are not quoted are guaranteed not to contain a
comma. "

Good.

> But i have another query, the structure of PG Log CSV as mentioned in
manual and as below has 24
> columns
> http://www.postgresql.org/docs/9.1/static/runtime-config-logging.html
> -------------------------------------------
>
>
>         log_time timestamp,
>           user_name text,
>           database_name text,
>           process_id integer,
>           connection_from text,
>           session_id text,
>           session_line_num bigint,
>           command_tag text,
>           session_start_time timestamp,
>           virtual_transaction_id text,
>           transaction_id bigint,
>           error_severity text,
>           sql_state_code text,
>           message text,
>           detail text,
>           hint text,
>           internal_query text,
>           internal_query_pos integer,
>           context text,
>           query text,
>           query_pos integer,
>           location text,
>           application_name text,
>           PRIMARY KEY (session_id, session_line_num)


Actually, that's only 23 columns. The last line is a constraint.

> However every Log line added contains only 22 columns, a sample log
line is given below
> -------------------------------------------
>
>     2012-03-22 15:29:53.546
>
IST,"postgres","stock_apals",2396,"localhost:2766",4f6af819.95c,9,"SELEC
T",2012-03-22 15:29:53
> IST,3/0,0,LOG,00000,"QUERY STATISTICS","! system usage stats:
>     ! 0.015000 elapsed 0.000000 user 0.015625 system sec
>     ! [0.078125 user 0.062500 sys total]",,,,,"SELECT
SUBSTR(itemname, 1, 15) as Product,
>     avg(rate) as Avgrate
>     FROM   issue_stock WHERE extract(year from issue_stock.edate) =
'2011'
>     GROUP BY itemname
>     order by itemname",,"ShowUsage,
.\src\backend\tcop\postgres.c:4305",""
>
>
> -------------------------------------------
>
> > the last column of the Log is not the Primarykey ?
> > the last column as of yet is unknown because at all occurances it is
a empty quoted column ?
> > the column numbers dont match with the generated log
>
> Is this is a different Log format , can you guide us to a page where
the column specifications can be
> matched.

You must be using PostgreSQL 8.4 or earlier.
The last column, "application_name", was added in 9.0.
Look up the documentation for your version and you'll find it.

Yours,
Laurenz Albe

Re: Help in Parsing PG log usings CSV format

From
Jasen Betts
Date:
On 2012-03-22, Martin Gregorie <martin@gregorie.org> wrote:

> Thats fairly standard. A good CSV parser only requires a field to be
> quoted if it contains commas or quotes.

quotes,commas, or linebreaks

copy ( values (2,'comma, etc'),(3,'and "quote".'),(1,'line
break') ) to stdout with csv;

--
⚂⚃ 100% natural