Thread: Parsing libpq PQtrace files
Hi. I'm tracing my client-side connection(s) for the first time.
The format seems to be line oriented.
Selected text columns with embedded newlines have those newlines
encoded as \x0a to remain single-line in the trace. So far so good.
But then on Parse, the query text, which also contains embedded newlines
but also embedded double-quotes, for column aliases, or table names, or whatever,
even why not a single char literal like '"' (i.e. single-quote, double-quote, single-quote),
seems to be double-quoted only. So how am I supposed to know when the SQL ends?
The message length (320 in the sample trace fragment below) does not seems to correspond
to the length in the trace file itself (perhaps was the original length before encoding in the trace?).
So how am I supposed to parse the trace given those multi-line-SQL Parse entries?
Which could have an arbitrary number of embedded double-quotes, supposed to be
the end of the SQL to parse?
I'm probably missing something...
As I suppose to trace format is supposed to be easily machine parseable, no?
I'd appreciate any insights. Thanks, --DD
PS: And on the off chance there's code out there that parses these traces,
if someone knows a public source of similar code, I'd appreciate a link or two.
```
2023-10-17 15:27:36.334749 B 5 ReadyForQuery I
2023-10-17 15:27:36.334881 F 320 Parse "" "
select ..., ... as "name-mismatch", ...
from "Name needing quoting".ztable1 s
full outer join " Name needing quoting".ztable2 m using(col1)
where ...
order by ...
" 0
2023-10-17 15:27:36.334889 F 14 Bind "" "" 0 0 1 1
2023-10-17 15:27:36.334895 F 6 Describe P ""
2023-10-17 15:27:36.334900 F 9 Execute "" 0
2023-10-17 15:27:36.334904 F 4 Sync
2023-10-17 15:27:36.338536 B 4 ParseComplete
2023-10-17 15:27:36.338579 B 4 BindComplete
2023-10-17 15:27:36.338601 B 146 RowDescription 5 "col2" 333038706 9 25 65535 -1 1 "col3" 0 0 25 65535 -1 1 "name-mismatch" 0 0 16 1 -1 1 "col4" 0 0 25 65535 -1 1 "col5" 333038718 2 19 64 -1 1
2023-10-17 15:27:36.338659 B 108 DataRow 5 -1 7 ...
2023-10-17 15:27:36.338690 B 108 DataRow 5 -1 7 ...
2023-10-17 15:27:36.334881 F 320 Parse "" "
select ..., ... as "name-mismatch", ...
from "Name needing quoting".ztable1 s
full outer join " Name needing quoting".ztable2 m using(col1)
where ...
order by ...
" 0
2023-10-17 15:27:36.334889 F 14 Bind "" "" 0 0 1 1
2023-10-17 15:27:36.334895 F 6 Describe P ""
2023-10-17 15:27:36.334900 F 9 Execute "" 0
2023-10-17 15:27:36.334904 F 4 Sync
2023-10-17 15:27:36.338536 B 4 ParseComplete
2023-10-17 15:27:36.338579 B 4 BindComplete
2023-10-17 15:27:36.338601 B 146 RowDescription 5 "col2" 333038706 9 25 65535 -1 1 "col3" 0 0 25 65535 -1 1 "name-mismatch" 0 0 16 1 -1 1 "col4" 0 0 25 65535 -1 1 "col5" 333038718 2 19 64 -1 1
2023-10-17 15:27:36.338659 B 108 DataRow 5 -1 7 ...
2023-10-17 15:27:36.338690 B 108 DataRow 5 -1 7 ...
2023-10-17 14:46:40.576933 B 226 DataRow 5 ... 139 'line1\x0aline2' 1 '\x01' 1 '\x00'
...
...
2023-10-17 15:27:36.338958 B 14 CommandComplete "SELECT 10"
2023-10-17 15:27:36.338971 B 5 ReadyForQuery I
2023-10-17 15:27:36.338971 B 5 ReadyForQuery I
```
On 2023-Oct-17, Dominique Devienne wrote: > But then on Parse, the query text, which also contains embedded > newlines but also embedded double-quotes, for column aliases, or table > names, or whatever, even why not a single char literal like '"' (i.e. > single-quote, double-quote, single-quote), seems to be double-quoted > only. So how am I supposed to know when the SQL ends? Yeah, I've never heard of anybody trying to machine-parse these lines, so if those points are not addressed in the trace output, that probably explains why. It would probably be a good idea to change that. This stuff is relatively new, and most likely has thus far only used by visual inspecting the contents of the trace file (at least that's how *I* have used them), and only with very simple dummy queries. So, patches welcome, I guess. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "En las profundidades de nuestro inconsciente hay una obsesiva necesidad de un universo lógico y coherente. Pero el universo real se halla siempre un paso más allá de la lógica" (Irulan)
On Tue, Oct 17, 2023 at 6:12 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2023-Oct-17, Dominique Devienne wrote:
> But then on Parse, the query text, which also contains embedded
> newlines but also embedded double-quotes, for column aliases, or table
> names, or whatever, even why not a single char literal like '"' (i.e.
> single-quote, double-quote, single-quote), seems to be double-quoted
> only. So how am I supposed to know when the SQL ends?
Yeah, I've never heard of anybody trying to machine-parse these lines,
so if those points are not addressed in the trace output, that probably
explains why. It would probably be a good idea to change that. This
stuff is relatively new, and most likely has thus far only used by
visual inspecting the contents of the trace file (at least that's how
*I* have used them), and only with very simple dummy queries.
Bummer... In fact, I'm not a fan of the whole FILE* based API,
I'd have preferred a callback based API where I decide what to do with
the message, w/ info about which connection the tracing is for, so I can
safely multiplex concurrent connections (from the same process) to the
same file for example. Or log thread-related information too. Or decide
to split high level info from large row-info from a GB-sized COPY, possibly
keeping only the high level one.
The doc doesn't mention where using the same file for several concurrent
connections is safe or not. When things are flushed or not. Whether I can insert
my own lines on that FILE*, w/o interspacing them with the fwrite()s from libpq.
So, patches welcome, I guess.
I often dump trace files to SQLite for analysis, in full or in part.
That's my main reason for wanting to machine parse those files.
I can of course use some heuristics to work-around the multi-line Parse lines,
but I'm not a fan of brittle code, and I'm surprised tracing's output is not more consistent and always line based.
I'm not a LibPQ or PostgreSQL dev I'm afraid, so not likely to provide any.
Merely a user of both. Thanks for the input Alvaro. --DD