Re: Detoasting optionally to make Explain-Analyze less misleading - Mailing list pgsql-hackers

From stepan rutz
Subject Re: Detoasting optionally to make Explain-Analyze less misleading
Date
Msg-id 03f952d6-b773-be93-e5d5-8f9ea9c2d461@gmx.de
Whole thread Raw
In response to Re: Detoasting optionally to make Explain-Analyze less misleading  (stepan rutz <stepan.rutz@gmx.de>)
Responses Re: Detoasting optionally to make Explain-Analyze less misleading
List pgsql-hackers
Hi,

please see a revised version yesterday's mail. The patch attached now
provides the following:

EXPLAIN(ANALYZE,SERIALIZE)

and

EXPLAIN(ANALYZE,SERIALIZEBINARY)

and timing output.

Both options perform the serialization during analyze and provide an
additional output in the plan like this:


template1=# explain (analyze,serialize) select * from t12 limit 1;
                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------

  ...

  Serialized Bytes: 36 bytes
  Execution Time: 0.035 ms
(5 rows)

or also this


template1=# explain (analyze,serialize) select * from t1;
                                          QUERY PLAN
---------------------------------------------------------------------------------------------
  Seq Scan on t1  (cost=0.00..1.02 rows=2 width=19) (actual
time=0.101..0.111 rows=5 loops=1)
  Planning Time: 0.850 ms
  Serialized Bytes: 85777978 bytes
  Execution Time: 354.284 ms
(4 rows)


Its tempting to divide Serialized-Bytes by Execution-Time to get an idea
of the serialization bandwidth. This is /dev/null serialization though.
The results are length-counted and then discarded.

Since detoasting happens implicitly during serialization, the number of
bytes becomes huge in this case and accounts for the detoasted lengths
as well. I tried to get the number of bytes send for the protocol's
messages and the attribute headers correctly. For the actual values I am
quite sure I get the correct measures, as one can really tell by sending
more values across. Null is 4 bytes on the wire interestingly. I didn't
know that, but it makes sense, since its using the same prefix
length-field as all values do.

I have checked the JBDC driver and it uses binary and text formats
depending on an attribute's type oid. So having the SERIALIZEBINARY
option is not accurate, as in reality both formats can be occur for the
same tuple.

Please provide some feedback on the new patch and let me know if this
makes sense. In general this kind of option for EXPLAIN is a good thing
for sure.


Greetings,

Stepan


On 14.09.23 21:27, stepan rutz wrote:
> Hi Tom, Hi Matthias,
>
> you are right of course. I have looked at the code from printtup.c and
> made a new version of the patch.
>
> Thanks for the MemoryContextReset hint too (@Matthias)
>
> This time is called  EXPLAIN(ANALYZE,SERIALIZE) (hey, it also sounds
> nicer phonetically)
>
> If the option SERIALIZE is active, the output functions are called and
> they perform the detoasting, which I have even checked.
>
> So things are better this way, however I hardcoded the output option
> "Text" (format=0). In printtup.c there is an incoming array which
> applies Text (format=0) or Binary (format=1) for each column
> individually. I am not sure whether this is even needed. I left in the
> if-statement from printtup.c which calls the binary output method of a
> given type. The result of the output is ignored and apparently free'd
> because of the memory-context-reset at the end.
>
> Please also note, that I added a call to DestReceiver's rDestroy hook,
> which was missing from explain.c before altogether.
>
> Feedback is appreciated.
>
> /Stepan
>
>
> On 12.09.23 17:26, Tom Lane wrote:
>> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
>>> Hmm, maybe we should measure the overhead of serializing the tuples
>>> instead.
>>> The difference between your patch and "serializing the tuples, but not
>>> sending them" is that serializing also does the detoasting, but also
>>> includes any time spent in the serialization functions of the type. So
>>> an option "SERIALIZE" which measures all the time the server spent on
>>> the query (except the final step of sending the bytes to the client)
>>> would likely be more useful than "just" detoasting.
>> +1, that was my immediate reaction to the proposal as well. Some
>> output functions are far from cheap.  Doing only the detoast part
>> seems like it's still misleading.
>>
>> Do we need to go as far as offering both text-output and binary-output
>> options?
>>
>>             regards, tom lane

Attachment

pgsql-hackers by date:

Previous
From: Yurii Rashkovskii
Date:
Subject: ALTER ROLE documentation improvement
Next
From: "David E. Wheeler"
Date:
Subject: Re: JSON Path and GIN Questions