Re: Detoasting optionally to make Explain-Analyze less misleading - Mailing list pgsql-hackers

From stepan rutz
Subject Re: Detoasting optionally to make Explain-Analyze less misleading
Date
Msg-id 5f7fbfef-5daa-4b95-8c25-feeb861e42db@gmx.de
Whole thread Raw
In response to Re: Detoasting optionally to make Explain-Analyze less misleading  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
Hi Thomas,

indeed by doing less the code also becomes trivial and
ExplainPropertyInteger can be used as a oneliner.

My intention was to actually get the realistic payload-bytes from the
wire-protocol counted by the serialization option. I am also adding the
protocol bits and the length of the data that is generated by
serialization output-functions. So it should (hopefully) be the real
numbers.

Attached is a new revision of the patch that prints kB (floor'ed by
integer-division by 1024). Maybe that is also misleading and bytes would
be nicer (though harder to read).

The output is now as follows:

db1=# explain (analyze, serialize) select * from test;
                                             QUERY PLAN
---------------------------------------------------------------------------------------------------
  Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
time=0.014..0.017 rows=1 loops=1)
  Planning Time: 0.245 ms
  Execution Time: 292.983 ms
  Serialized Bytes: 77039 kB
(4 rows)

Definately a lot nicer and controlled by  ExplainPropertyInteger in
terms of formatting.

The main motivation was to actually get a correct feeling for the
execution time. Actually counting the bytes gives an impression of what
would go over the wire. Only the big numbers matter here of course.

Regards, Stepan



On 02.11.23 20:32, Tomas Vondra wrote:
>
> On 11/2/23 20:09, stepan rutz wrote:
>> Hi Thomas,
>>
>> you are right of course. Thanks!
>>
>> I have attached a new version of the patch that supports the syntax like
>> suggested. The previous patch was insonsistent in style indeed.
>>
>> explain (analyze, serialize)
>>
>> and
>>
>> explain (analyze, serialize binary)
>>
>> That doesn't make too much of a difference for most scenarios I am
>> certain. However the the seralize option itself does. Mostly because it
>> performs the detoasting and that was a trap for me in the past with just
>> plain analyze.
>>
>>
>> Eg this scenario really is not too far fetched in a world where people
>> have large JSONB values.
>>
>>
>> db1=# create table test(id bigint, val text);
>>
>> db1=# insert into test(val) select string_agg(s::text, ',') from (select
>> generate_series(1, 10_000_000) as s) as a1;
>>
>> now we have a cell that has roughly 80Mb in it. A large detoasting that
>> will happen in reallife but in explain(analyze).
>>
>> and then...
>>
>> db1=# explain (analyze) select * from test;
>>                                              QUERY PLAN
>> ---------------------------------------------------------------------------------------------------
>>   Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
>> time=0.018..0.020 rows=1 loops=1)
>>   Planning Time: 0.085 ms
>>   Execution Time: 0.044 ms
>> (3 rows)
>>
>> db1=# explain (analyze, serialize) select * from test;
>>                                              QUERY PLAN
>> ---------------------------------------------------------------------------------------------------
>>   Seq Scan on test  (cost=0.00..22.00 rows=1200 width=40) (actual
>> time=0.023..0.027 rows=1 loops=1)
>>   Planning Time: 0.077 ms
>>   Execution Time: 303.281 ms
>>   Serialized Bytes: 78888953 Bytes. Mode Text. Bandwidth 248.068 MB/sec
>> (4 rows)
>>
>> db1=#
>>
>> So the explain(analyze) does not process the ~80 MB in 0.044ms in any
>> way of course.
> Honestly, I see absolutely no point in printing this. I have little idea
> what to do with the "bytes". We have to transfer this over network, but
> surely there's other data not included in this sum, right?
>
> But the bandwidth seems pretty bogus/useless, as it's calculated from
> execution time, which includes everything else, not just serialization.
> So what does it say? It certainly does not include the network transfer.
>
> IMO we should either print nothing or just the bytes. I don't see the
> point in printing the mode, which is determined by the command.
>
>> Actually I could print the serialized bytes using 1. grouping-separators
>> (eg 78_888_953) or 2. in the way pg_size_pretty does it.
>>
>> If doing it the pg_size_pretty way I am uncertain if it would be ok to
>> query the actual pg_size_pretty function via its (certainly frozen) oid
>> of 3166 and do OidFunctionCall1(3166...) to invoke it. Otherwise I'd say
>> it would be nice if the code from that function would be made available
>> as a utility function for all c-code.  Any suggestions on this topic?
>>
> I'm rather skeptical about this proposal, mostly because it makes it
> harder to process the explain output in scripts etc.
>
> But more importantly, it's a completely separate matter from what this
> patch does, so if you want to pursue this, I suggest you start a
> separate thread. If you want to introduce separators, surely this is not
> the only place that should do it (e.g. why not to do that for "rows" or
> "cost" estimates)?
>
> BTW if you really want to print amount of memory, maybe print it in
> kilobytes, like every other place in explain.c? Also, explain generally
> prints stuff in "key: value" style (in text format).
>
>
> regards
>

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Detoasting optionally to make Explain-Analyze less misleading
Next
From: Matthias van de Meent
Date:
Subject: Re: Detoasting optionally to make Explain-Analyze less misleading