Re: Detoasting optionally to make Explain-Analyze less misleading - Mailing list pgsql-hackers
From | stepan rutz |
---|---|
Subject | Re: Detoasting optionally to make Explain-Analyze less misleading |
Date | |
Msg-id | 5f7fbfef-5daa-4b95-8c25-feeb861e42db@gmx.de Whole thread Raw |
In response to | Re: Detoasting optionally to make Explain-Analyze less misleading (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
List | pgsql-hackers |
Hi Thomas, indeed by doing less the code also becomes trivial and ExplainPropertyInteger can be used as a oneliner. My intention was to actually get the realistic payload-bytes from the wire-protocol counted by the serialization option. I am also adding the protocol bits and the length of the data that is generated by serialization output-functions. So it should (hopefully) be the real numbers. Attached is a new revision of the patch that prints kB (floor'ed by integer-division by 1024). Maybe that is also misleading and bytes would be nicer (though harder to read). The output is now as follows: db1=# explain (analyze, serialize) select * from test; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on test (cost=0.00..22.00 rows=1200 width=40) (actual time=0.014..0.017 rows=1 loops=1) Planning Time: 0.245 ms Execution Time: 292.983 ms Serialized Bytes: 77039 kB (4 rows) Definately a lot nicer and controlled by ExplainPropertyInteger in terms of formatting. The main motivation was to actually get a correct feeling for the execution time. Actually counting the bytes gives an impression of what would go over the wire. Only the big numbers matter here of course. Regards, Stepan On 02.11.23 20:32, Tomas Vondra wrote: > > On 11/2/23 20:09, stepan rutz wrote: >> Hi Thomas, >> >> you are right of course. Thanks! >> >> I have attached a new version of the patch that supports the syntax like >> suggested. The previous patch was insonsistent in style indeed. >> >> explain (analyze, serialize) >> >> and >> >> explain (analyze, serialize binary) >> >> That doesn't make too much of a difference for most scenarios I am >> certain. However the the seralize option itself does. Mostly because it >> performs the detoasting and that was a trap for me in the past with just >> plain analyze. >> >> >> Eg this scenario really is not too far fetched in a world where people >> have large JSONB values. >> >> >> db1=# create table test(id bigint, val text); >> >> db1=# insert into test(val) select string_agg(s::text, ',') from (select >> generate_series(1, 10_000_000) as s) as a1; >> >> now we have a cell that has roughly 80Mb in it. A large detoasting that >> will happen in reallife but in explain(analyze). >> >> and then... >> >> db1=# explain (analyze) select * from test; >> QUERY PLAN >> --------------------------------------------------------------------------------------------------- >> Seq Scan on test (cost=0.00..22.00 rows=1200 width=40) (actual >> time=0.018..0.020 rows=1 loops=1) >> Planning Time: 0.085 ms >> Execution Time: 0.044 ms >> (3 rows) >> >> db1=# explain (analyze, serialize) select * from test; >> QUERY PLAN >> --------------------------------------------------------------------------------------------------- >> Seq Scan on test (cost=0.00..22.00 rows=1200 width=40) (actual >> time=0.023..0.027 rows=1 loops=1) >> Planning Time: 0.077 ms >> Execution Time: 303.281 ms >> Serialized Bytes: 78888953 Bytes. Mode Text. Bandwidth 248.068 MB/sec >> (4 rows) >> >> db1=# >> >> So the explain(analyze) does not process the ~80 MB in 0.044ms in any >> way of course. > Honestly, I see absolutely no point in printing this. I have little idea > what to do with the "bytes". We have to transfer this over network, but > surely there's other data not included in this sum, right? > > But the bandwidth seems pretty bogus/useless, as it's calculated from > execution time, which includes everything else, not just serialization. > So what does it say? It certainly does not include the network transfer. > > IMO we should either print nothing or just the bytes. I don't see the > point in printing the mode, which is determined by the command. > >> Actually I could print the serialized bytes using 1. grouping-separators >> (eg 78_888_953) or 2. in the way pg_size_pretty does it. >> >> If doing it the pg_size_pretty way I am uncertain if it would be ok to >> query the actual pg_size_pretty function via its (certainly frozen) oid >> of 3166 and do OidFunctionCall1(3166...) to invoke it. Otherwise I'd say >> it would be nice if the code from that function would be made available >> as a utility function for all c-code. Any suggestions on this topic? >> > I'm rather skeptical about this proposal, mostly because it makes it > harder to process the explain output in scripts etc. > > But more importantly, it's a completely separate matter from what this > patch does, so if you want to pursue this, I suggest you start a > separate thread. If you want to introduce separators, surely this is not > the only place that should do it (e.g. why not to do that for "rows" or > "cost" estimates)? > > BTW if you really want to print amount of memory, maybe print it in > kilobytes, like every other place in explain.c? Also, explain generally > prints stuff in "key: value" style (in text format). > > > regards >
Attachment
pgsql-hackers by date: