Thread: explain format json, unit for serialize and memory are different.

hi.

explain(analyze, format json, serialize, memory, costs  off, Timing
off) select * from tenk1;
           QUERY PLAN
---------------------------------
 [
   {
     "Plan": {
       "Node Type": "Seq Scan",
       "Parallel Aware": false,
       "Async Capable": false,
       "Relation Name": "tenk1",
       "Alias": "tenk1",
       "Actual Rows": 10000,
       "Actual Loops": 1
     },
     "Planning": {
       "Memory Used": 23432,
       "Memory Allocated": 65536
     },
     "Planning Time": 0.290,
     "Triggers": [
     ],
     "Serialization": {
       "Output Volume": 1143,
       "Format": "text"
     },
     "Execution Time": 58.814
   }
 ]

explain(analyze, format text, serialize, memory, costs  off, Timing
off) select * from tenk1;
                    QUERY PLAN
---------------------------------------------------
 Seq Scan on tenk1 (actual rows=10000 loops=1)
 Planning:
   Memory: used=23432 bytes  allocated=65536 bytes
 Planning Time: 0.289 ms
 Serialization: output=1143kB  format=text
 Execution Time: 58.904 ms

under format json,  "Output Volume": 1143,
1143 is kiB unit, and is not the same as "Memory Used" or "Memory
Allocated" byte unit.

Do we need to convert it to byte for the non-text format option for EXPLAIN?



Re: explain format json, unit for serialize and memory are different.

From
Daniel Gustafsson
Date:
> On 13 May 2024, at 11:16, jian he <jian.universality@gmail.com> wrote:

> under format json,  "Output Volume": 1143,
> 1143 is kiB unit, and is not the same as "Memory Used" or "Memory
> Allocated" byte unit.

Nice catch.

> Do we need to convert it to byte for the non-text format option for EXPLAIN?

Since json (and yaml/xml) is intended to be machine-readable I think we use a
single unit for all values, and document this fact.

--
Daniel Gustafsson




Re: explain format json, unit for serialize and memory are different.

From
Michael Paquier
Date:
On Mon, May 13, 2024 at 11:22:08AM +0200, Daniel Gustafsson wrote:
> Since json (and yaml/xml) is intended to be machine-readable I think we use a
> single unit for all values, and document this fact.

Agreed with the documentation gap.  Another thing that could be worth
considering is to add the units aside with the numerical values, say:
"Memory Used": {"value": 23432, "units": "bytes"}

That would require changing ExplainProperty() so as the units are
showed in some shape, while still being readable when parsed.  I
wouldn't recommend doing that in v17, but perhaps v18 could do better?

Units are also ignored for the XML and yaml outputs.
--
Michael

Attachment
On Tue, 14 May 2024 at 17:40, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, May 13, 2024 at 11:22:08AM +0200, Daniel Gustafsson wrote:
> > Since json (and yaml/xml) is intended to be machine-readable I think we use a
> > single unit for all values, and document this fact.
>
> Agreed with the documentation gap.  Another thing that could be worth
> considering is to add the units aside with the numerical values, say:
> "Memory Used": {"value": 23432, "units": "bytes"}
>
> That would require changing ExplainProperty() so as the units are
> showed in some shape, while still being readable when parsed.  I
> wouldn't recommend doing that in v17, but perhaps v18 could do better?

I think for v17, we should consider adding a macro to explain.c to
calculate the KB from bytes.  There are other inconsistencies that it
would be good to address. We normally round up to the nearest kilobyte
with (bytes + 1023) / 1024, but if you look at what 06286709e did, it
seems to be rounding to the nearest without any apparent justification
as to why.  It does (metrics->bytesSent + 512) / 1024.

show_memory_counters() could be modified to use the macro and show
kilobytes rather than bytes.

David



On Tue, 14 May 2024 at 18:16, David Rowley <dgrowleyml@gmail.com> wrote:
> I think for v17, we should consider adding a macro to explain.c to
> calculate the KB from bytes.  There are other inconsistencies that it
> would be good to address. We normally round up to the nearest kilobyte
> with (bytes + 1023) / 1024, but if you look at what 06286709e did, it
> seems to be rounding to the nearest without any apparent justification
> as to why.  It does (metrics->bytesSent + 512) / 1024.
>
> show_memory_counters() could be modified to use the macro and show
> kilobytes rather than bytes.

Here's a patch for that.

I checked the EXPLAIN SERIALIZE thread and didn't see any mention of
the + 512 thing. It seems Tom added it just before committing and no
patch ever made it to the mailing list with + 512. The final patch on
the list is in [1].

For the EXPLAIN MEMORY part, the bytes vs kB wasn't discussed.  The
closest the thread came to that was what Abhijit mentioned in [2].

I also adjusted some inconsistencies around spaces between the digits
and kB.  In other places in EXPLAIN we write "100kB" not "100 kB".  I
see we print times with a space ("Execution Time: 1.719 ms"), so we're
not very consistent overall, but since the EXPLAIN MEMORY is new, it
makes sense to change it now to be aligned to the other kB stuff in
explain.c

The patch does change a long into an int64 in show_hash_info(). I
wondered if that part should be backpatched.  It does not seem very
robust to me to divide a Size by 1024 and expect it to fit into a
long. With MSVC 64 bit, sizeof(Size) == 8 and sizeof(long) == 4.  I
understand work_mem is limited to 2GB on that platform, but it does
not seem like a good reason to use a long.

David

[1] https://www.postgresql.org/message-id/CAEze2WhopAFRS4xdNtma6XtYCRqydPWAg83jx8HZTowpeXzOyg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/ZaF1fB_hMqycSq-S%40toroid.org

Attachment
On Tue, May 14, 2024 at 6:33 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 14 May 2024 at 18:16, David Rowley <dgrowleyml@gmail.com> wrote:
> > I think for v17, we should consider adding a macro to explain.c to
> > calculate the KB from bytes.  There are other inconsistencies that it
> > would be good to address. We normally round up to the nearest kilobyte
> > with (bytes + 1023) / 1024, but if you look at what 06286709e did, it
> > seems to be rounding to the nearest without any apparent justification
> > as to why.  It does (metrics->bytesSent + 512) / 1024.
> >
> > show_memory_counters() could be modified to use the macro and show
> > kilobytes rather than bytes.
>
> Here's a patch for that.
>
> I checked the EXPLAIN SERIALIZE thread and didn't see any mention of
> the + 512 thing. It seems Tom added it just before committing and no
> patch ever made it to the mailing list with + 512. The final patch on
> the list is in [1].
>
> For the EXPLAIN MEMORY part, the bytes vs kB wasn't discussed.  The
> closest the thread came to that was what Abhijit mentioned in [2].



static void
show_memory_counters(ExplainState *es, const MemoryContextCounters
*mem_counters)
{
if (es->format == EXPLAIN_FORMAT_TEXT)
{
ExplainIndentText(es);
appendStringInfo(es->str,
"Memory: used=%zukB  allocated=%zukB",
BYTES_TO_KILOBYTES(mem_counters->totalspace - mem_counters->freespace),
BYTES_TO_KILOBYTES(mem_counters->totalspace));
appendStringInfoChar(es->str, '\n');
}
else
{
ExplainPropertyInteger("Memory Used", "bytes",
   mem_counters->totalspace - mem_counters->freespace,
   es);
ExplainPropertyInteger("Memory Allocated", "bytes",
   mem_counters->totalspace, es);
}
}

the "else" branch, also need to apply BYTES_TO_KILOBYTES marco?
otherwise, it's inconsistent?

> I also adjusted some inconsistencies around spaces between the digits
> and kB.  In other places in EXPLAIN we write "100kB" not "100 kB".  I
> see we print times with a space ("Execution Time: 1.719 ms"), so we're
> not very consistent overall, but since the EXPLAIN MEMORY is new, it
> makes sense to change it now to be aligned to the other kB stuff in
> explain.c
>
> The patch does change a long into an int64 in show_hash_info(). I
> wondered if that part should be backpatched.  It does not seem very
> robust to me to divide a Size by 1024 and expect it to fit into a
> long. With MSVC 64 bit, sizeof(Size) == 8 and sizeof(long) == 4.  I
> understand work_mem is limited to 2GB on that platform, but it does
> not seem like a good reason to use a long.
>

I also checked output
from function show_incremental_sort_group_info and show_sort_info,
the "kB" usage is consistent.



On Wed, 15 May 2024 at 01:18, jian he <jian.universality@gmail.com> wrote:
> else
> {
> ExplainPropertyInteger("Memory Used", "bytes",
>    mem_counters->totalspace - mem_counters->freespace,
>    es);
> ExplainPropertyInteger("Memory Allocated", "bytes",
>    mem_counters->totalspace, es);
> }
> }
>
> the "else" branch, also need to apply BYTES_TO_KILOBYTES marco?

Yeah, I missed that. Here's another patch.

Thanks for looking.

David

Attachment

Re: explain format json, unit for serialize and memory are different.

From
Michael Paquier
Date:
On Wed, May 15, 2024 at 02:01:21AM +1200, David Rowley wrote:
> Yeah, I missed that. Here's another patch.
>
> Thanks for looking.

Thanks for bringing in a patch that makes the whole picture more
consistent across the board.  When it comes to MEMORY, I can get
behind your suggestion to use kB and call it a day, while SERIALIZE
would apply the same conversion at 1023b.

It would be nice to document the units implied in the non-text
formats, rather than have the users guess what these are.

Perhaps Alvaro and Tom would like to chime in, as committers of
respectively 5de890e3610d and 06286709ee06?
--
Michael

Attachment
Michael Paquier <michael@paquier.xyz> writes:
> Perhaps Alvaro and Tom would like to chime in, as committers of
> respectively 5de890e3610d and 06286709ee06?

No objection here.  In a green field I might argue for
round-to-nearest instead of round-up, but it looks like we
have several precedents for round-up, so let's avoid changing
that existing behavior.

            regards, tom lane



explain (format json, analyze, wal, buffers, memory, serialize) insert
into tenk1 select * from tenk1 limit 1;
                  QUERY PLAN
-----------------------------------------------
 [
   {
     "Plan": {
       "Node Type": "ModifyTable",
       "Operation": "Insert",
       "Parallel Aware": false,
       "Async Capable": false,
       "Relation Name": "tenk1",
       "Alias": "tenk1",
       "Startup Cost": 0.00,
       "Total Cost": 0.04,
       "Plan Rows": 0,
       "Plan Width": 0,
       "Actual Startup Time": 0.030,
       "Actual Total Time": 0.030,
       "Actual Rows": 0,
       "Actual Loops": 1,
       "Shared Hit Blocks": 3,
       "Shared Read Blocks": 0,
       "Shared Dirtied Blocks": 0,
       "Shared Written Blocks": 0,
       "Local Hit Blocks": 0,
       "Local Read Blocks": 0,
       "Local Dirtied Blocks": 0,
       "Local Written Blocks": 0,
       "Temp Read Blocks": 0,
       "Temp Written Blocks": 0,
       "WAL Records": 1,
       "WAL FPI": 0,
       "WAL Bytes": 299,
       "Plans": [
         {
           "Node Type": "Limit",
           "Parent Relationship": "Outer",
           "Parallel Aware": false,
           "Async Capable": false,
           "Startup Cost": 0.00,
           "Total Cost": 0.04,
           "Plan Rows": 1,
           "Plan Width": 244,
           "Actual Startup Time": 0.011,
           "Actual Total Time": 0.011,
           "Actual Rows": 1,
           "Actual Loops": 1,
           "Shared Hit Blocks": 2,
           "Shared Read Blocks": 0,
           "Shared Dirtied Blocks": 0,
           "Shared Written Blocks": 0,
           "Local Hit Blocks": 0,
           "Local Read Blocks": 0,
           "Local Dirtied Blocks": 0,
           "Local Written Blocks": 0,
           "Temp Read Blocks": 0,
           "Temp Written Blocks": 0,
           "WAL Records": 0,
           "WAL FPI": 0,
           "WAL Bytes": 0,
           "Plans": [
             {
               "Node Type": "Seq Scan",
               "Parent Relationship": "Outer",
               "Parallel Aware": false,
               "Async Capable": false,
               "Relation Name": "tenk1",
               "Alias": "tenk1_1",
               "Startup Cost": 0.00,
               "Total Cost": 445.00,
               "Plan Rows": 10000,
               "Plan Width": 244,
               "Actual Startup Time": 0.009,
               "Actual Total Time": 0.009,
               "Actual Rows": 1,
               "Actual Loops": 1,
               "Shared Hit Blocks": 2,
               "Shared Read Blocks": 0,
               "Shared Dirtied Blocks": 0,
               "Shared Written Blocks": 0,
               "Local Hit Blocks": 0,
               "Local Read Blocks": 0,
               "Local Dirtied Blocks": 0,
               "Local Written Blocks": 0,
               "Temp Read Blocks": 0,
               "Temp Written Blocks": 0,
               "WAL Records": 0,
               "WAL FPI": 0,
               "WAL Bytes": 0
             }
           ]
         }
       ]
     },
     "Planning": {
       "Shared Hit Blocks": 0,
       "Shared Read Blocks": 0,
       "Shared Dirtied Blocks": 0,
       "Shared Written Blocks": 0,
       "Local Hit Blocks": 0,
       "Local Read Blocks": 0,
       "Local Dirtied Blocks": 0,
       "Local Written Blocks": 0,
       "Temp Read Blocks": 0,
       "Temp Written Blocks": 0,
       "Memory Used": 68080,
       "Memory Allocated": 131072
     },
     "Planning Time": 0.659,
     "Triggers": [
     ],
     "Serialization": {
       "Time": 0.000,
       "Output Volume": 0,
       "Format": "text",
       "Shared Hit Blocks": 0,
       "Shared Read Blocks": 0,
       "Shared Dirtied Blocks": 0,
       "Shared Written Blocks": 0,
       "Local Hit Blocks": 0,
       "Local Read Blocks": 0,
       "Local Dirtied Blocks": 0,
       "Local Written Blocks": 0,
       "Temp Read Blocks": 0,
       "Temp Written Blocks": 0
     },
     "Execution Time": 0.065
   }
 ]

-------
       "Shared Hit Blocks": 0,
       "Shared Read Blocks": 0,
       "Shared Dirtied Blocks": 0,
       "Shared Written Blocks": 0,
       "Local Hit Blocks": 0,
       "Local Read Blocks": 0,
       "Local Dirtied Blocks": 0,
       "Local Written Blocks": 0,
       "Temp Read Blocks": 0,
       "Temp Written Blocks": 0

these information duplicated for json key "Serialization" and json key
"Planning"
i am not sure this is intended?



On Wed, 15 May 2024 at 13:44, jian he <jian.universality@gmail.com> wrote:
>        "Shared Hit Blocks": 0,
>        "Shared Read Blocks": 0,
>        "Shared Dirtied Blocks": 0,
>        "Shared Written Blocks": 0,
>        "Local Hit Blocks": 0,
>        "Local Read Blocks": 0,
>        "Local Dirtied Blocks": 0,
>        "Local Written Blocks": 0,
>        "Temp Read Blocks": 0,
>        "Temp Written Blocks": 0
>
> these information duplicated for json key "Serialization" and json key
> "Planning"
> i am not sure this is intended?

Looks ok to me.  Buffers used during planning are independent from the
buffers used when outputting rows to the client.

David



On Wed, May 15, 2024 at 10:13 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 15 May 2024 at 13:44, jian he <jian.universality@gmail.com> wrote:
> >        "Shared Hit Blocks": 0,
> >        "Shared Read Blocks": 0,
> >        "Shared Dirtied Blocks": 0,
> >        "Shared Written Blocks": 0,
> >        "Local Hit Blocks": 0,
> >        "Local Read Blocks": 0,
> >        "Local Dirtied Blocks": 0,
> >        "Local Written Blocks": 0,
> >        "Temp Read Blocks": 0,
> >        "Temp Written Blocks": 0
> >
> > these information duplicated for json key "Serialization" and json key
> > "Planning"
> > i am not sure this is intended?
>
> Looks ok to me.  Buffers used during planning are independent from the
> buffers used when outputting rows to the client.
>

looking at serializeAnalyzeReceive.
I am not sure which part of serializeAnalyzeReceive will update pgBufferUsage.

I am looking for an example where this information under json key
"Serialization" is not zero.
So far I have tried:

create table s(a text);
insert into s select repeat('a', 1024) from generate_series(1,1024);
explain (format json, analyze, wal, buffers, memory, serialize, timing
off)  select * from s;



On Wed, 15 May 2024 at 15:40, jian he <jian.universality@gmail.com> wrote:
> I am looking for an example where this information under json key
> "Serialization" is not zero.
> So far I have tried:

Something that requires detoasting.

> create table s(a text);
> insert into s select repeat('a', 1024) from generate_series(1,1024);
> explain (format json, analyze, wal, buffers, memory, serialize, timing
> off)  select * from s;

Something bigger than 1024 bytes or use SET STORAGE EXTERNAL or EXTENDED.

create table s(a text);
insert into s select repeat('a', 1024*1024) from generate_series(1,10);
explain (format text, analyze, buffers, serialize, timing off)  select * from s;

 Serialization: output=10241kB  format=text
   Buffers: shared hit=36

David



On Wed, 15 May 2024 at 13:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Michael Paquier <michael@paquier.xyz> writes:
> > Perhaps Alvaro and Tom would like to chime in, as committers of
> > respectively 5de890e3610d and 06286709ee06?
>
> No objection here.  In a green field I might argue for
> round-to-nearest instead of round-up, but it looks like we
> have several precedents for round-up, so let's avoid changing
> that existing behavior.

Thanks. I've pushed the patch now.

David