Home > mailing lists

Thread: Parsing libpq PQtrace files

Parsing libpq PQtrace files

From

Dominique Devienne

Date:

17 October 2023, 14:16:14

Hi. I'm tracing my client-side connection(s) for the first time.

The format seems to be line oriented.

Selected text columns with embedded newlines have those newlines

encoded as \x0a to remain single-line in the trace. So far so good.

But then on Parse, the query text, which also contains embedded newlines

but also embedded double-quotes, for column aliases, or table names, or whatever,

even why not a single char literal like '"' (i.e. single-quote, double-quote, single-quote),

seems to be double-quoted only. So how am I supposed to know when the SQL ends?

The message length (320 in the sample trace fragment below) does not seems to correspond

to the length in the trace file itself (perhaps was the original length before encoding in the trace?).

So how am I supposed to parse the trace given those multi-line-SQL Parse entries?

Which could have an arbitrary number of embedded double-quotes, supposed to be

the end of the SQL to parse?

I'm probably missing something...

As I suppose to trace format is supposed to be easily machine parseable, no?

I'd appreciate any insights. Thanks, --DD

PS: And on the off chance there's code out there that parses these traces,

if someone knows a public source of similar code, I'd appreciate a link or two.

```

2023-10-17 15:27:36.334749 B 5 ReadyForQuery I
2023-10-17 15:27:36.334881 F 320 Parse "" "

select ..., ... as "name-mismatch", ...
from "Name needing quoting".ztable1 s
full outer join " Name needing quoting".ztable2 m using(col1)
where ...
order by ...

" 0
2023-10-17 15:27:36.334889 F 14 Bind "" "" 0 0 1 1
2023-10-17 15:27:36.334895 F 6 Describe P ""
2023-10-17 15:27:36.334900 F 9 Execute "" 0
2023-10-17 15:27:36.334904 F 4 Sync
2023-10-17 15:27:36.338536 B 4 ParseComplete
2023-10-17 15:27:36.338579 B 4 BindComplete
2023-10-17 15:27:36.338601 B 146 RowDescription 5 "col2" 333038706 9 25 65535 -1 1 "col3" 0 0 25 65535 -1 1 "name-mismatch" 0 0 16 1 -1 1 "col4" 0 0 25 65535 -1 1 "col5" 333038718 2 19 64 -1 1
2023-10-17 15:27:36.338659 B 108 DataRow 5 -1 7 ...
2023-10-17 15:27:36.338690 B 108 DataRow 5 -1 7 ...

2023-10-17 14:46:40.576933 B 226 DataRow 5 ... 139 'line1\x0aline2' 1 '\x01' 1 '\x00'
...

2023-10-17 15:27:36.338958 B 14 CommandComplete "SELECT 10"
2023-10-17 15:27:36.338971 B 5 ReadyForQuery I

```

Re: Parsing libpq PQtrace files

From

Alvaro Herrera

Date:

17 October 2023, 16:11:56

On 2023-Oct-17, Dominique Devienne wrote:

> But then on Parse, the query text, which also contains embedded
> newlines but also embedded double-quotes, for column aliases, or table
> names, or whatever, even why not a single char literal like '"' (i.e.
> single-quote, double-quote, single-quote), seems to be double-quoted
> only. So how am I supposed to know when the SQL ends?

Yeah, I've never heard of anybody trying to machine-parse these lines,
so if those points are not addressed in the trace output, that probably
explains why.  It would probably be a good idea to change that.  This
stuff is relatively new, and most likely has thus far only used by
visual inspecting the contents of the trace file (at least that's how
*I* have used them), and only with very simple dummy queries.

So, patches welcome, I guess.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"En las profundidades de nuestro inconsciente hay una obsesiva necesidad
de un universo lógico y coherente. Pero el universo real se halla siempre
un paso más allá de la lógica" (Irulan)

Re: Parsing libpq PQtrace files

From

Dominique Devienne

Date:

17 October 2023, 16:31:25

On Tue, Oct 17, 2023 at 6:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

On 2023-Oct-17, Dominique Devienne wrote:

> But then on Parse, the query text, which also contains embedded
> newlines but also embedded double-quotes, for column aliases, or table
> names, or whatever, even why not a single char literal like '"' (i.e.
> single-quote, double-quote, single-quote), seems to be double-quoted
> only. So how am I supposed to know when the SQL ends?

Yeah, I've never heard of anybody trying to machine-parse these lines,
so if those points are not addressed in the trace output, that probably
explains why. It would probably be a good idea to change that. This
stuff is relatively new, and most likely has thus far only used by
visual inspecting the contents of the trace file (at least that's how
*I* have used them), and only with very simple dummy queries.

Bummer... In fact, I'm not a fan of the whole FILE* based API,

I'd have preferred a callback based API where I decide what to do with

the message, w/ info about which connection the tracing is for, so I can

safely multiplex concurrent connections (from the same process) to the

same file for example. Or log thread-related information too. Or decide

to split high level info from large row-info from a GB-sized COPY, possibly

keeping only the high level one.

The doc doesn't mention where using the same file for several concurrent

connections is safe or not. When things are flushed or not. Whether I can insert

my own lines on that FILE*, w/o interspacing them with the fwrite()s from libpq.

So, patches welcome, I guess.

I often dump trace files to SQLite for analysis, in full or in part.

That's my main reason for wanting to machine parse those files.

I can of course use some heuristics to work-around the multi-line Parse lines,

but I'm not a fan of brittle code, and I'm surprised tracing's output is not more consistent and always line based.

I'm not a LibPQ or PostgreSQL dev I'm afraid, so not likely to provide any.

Merely a user of both. Thanks for the input Alvaro. --DD