Thread: explain format json, unit for serialize and memory are different.
hi. explain(analyze, format json, serialize, memory, costs off, Timing off) select * from tenk1; QUERY PLAN --------------------------------- [ { "Plan": { "Node Type": "Seq Scan", "Parallel Aware": false, "Async Capable": false, "Relation Name": "tenk1", "Alias": "tenk1", "Actual Rows": 10000, "Actual Loops": 1 }, "Planning": { "Memory Used": 23432, "Memory Allocated": 65536 }, "Planning Time": 0.290, "Triggers": [ ], "Serialization": { "Output Volume": 1143, "Format": "text" }, "Execution Time": 58.814 } ] explain(analyze, format text, serialize, memory, costs off, Timing off) select * from tenk1; QUERY PLAN --------------------------------------------------- Seq Scan on tenk1 (actual rows=10000 loops=1) Planning: Memory: used=23432 bytes allocated=65536 bytes Planning Time: 0.289 ms Serialization: output=1143kB format=text Execution Time: 58.904 ms under format json, "Output Volume": 1143, 1143 is kiB unit, and is not the same as "Memory Used" or "Memory Allocated" byte unit. Do we need to convert it to byte for the non-text format option for EXPLAIN?
> On 13 May 2024, at 11:16, jian he <jian.universality@gmail.com> wrote: > under format json, "Output Volume": 1143, > 1143 is kiB unit, and is not the same as "Memory Used" or "Memory > Allocated" byte unit. Nice catch. > Do we need to convert it to byte for the non-text format option for EXPLAIN? Since json (and yaml/xml) is intended to be machine-readable I think we use a single unit for all values, and document this fact. -- Daniel Gustafsson
On Mon, May 13, 2024 at 11:22:08AM +0200, Daniel Gustafsson wrote: > Since json (and yaml/xml) is intended to be machine-readable I think we use a > single unit for all values, and document this fact. Agreed with the documentation gap. Another thing that could be worth considering is to add the units aside with the numerical values, say: "Memory Used": {"value": 23432, "units": "bytes"} That would require changing ExplainProperty() so as the units are showed in some shape, while still being readable when parsed. I wouldn't recommend doing that in v17, but perhaps v18 could do better? Units are also ignored for the XML and yaml outputs. -- Michael
Attachment
On Tue, 14 May 2024 at 17:40, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, May 13, 2024 at 11:22:08AM +0200, Daniel Gustafsson wrote: > > Since json (and yaml/xml) is intended to be machine-readable I think we use a > > single unit for all values, and document this fact. > > Agreed with the documentation gap. Another thing that could be worth > considering is to add the units aside with the numerical values, say: > "Memory Used": {"value": 23432, "units": "bytes"} > > That would require changing ExplainProperty() so as the units are > showed in some shape, while still being readable when parsed. I > wouldn't recommend doing that in v17, but perhaps v18 could do better? I think for v17, we should consider adding a macro to explain.c to calculate the KB from bytes. There are other inconsistencies that it would be good to address. We normally round up to the nearest kilobyte with (bytes + 1023) / 1024, but if you look at what 06286709e did, it seems to be rounding to the nearest without any apparent justification as to why. It does (metrics->bytesSent + 512) / 1024. show_memory_counters() could be modified to use the macro and show kilobytes rather than bytes. David
On Tue, 14 May 2024 at 18:16, David Rowley <dgrowleyml@gmail.com> wrote: > I think for v17, we should consider adding a macro to explain.c to > calculate the KB from bytes. There are other inconsistencies that it > would be good to address. We normally round up to the nearest kilobyte > with (bytes + 1023) / 1024, but if you look at what 06286709e did, it > seems to be rounding to the nearest without any apparent justification > as to why. It does (metrics->bytesSent + 512) / 1024. > > show_memory_counters() could be modified to use the macro and show > kilobytes rather than bytes. Here's a patch for that. I checked the EXPLAIN SERIALIZE thread and didn't see any mention of the + 512 thing. It seems Tom added it just before committing and no patch ever made it to the mailing list with + 512. The final patch on the list is in [1]. For the EXPLAIN MEMORY part, the bytes vs kB wasn't discussed. The closest the thread came to that was what Abhijit mentioned in [2]. I also adjusted some inconsistencies around spaces between the digits and kB. In other places in EXPLAIN we write "100kB" not "100 kB". I see we print times with a space ("Execution Time: 1.719 ms"), so we're not very consistent overall, but since the EXPLAIN MEMORY is new, it makes sense to change it now to be aligned to the other kB stuff in explain.c The patch does change a long into an int64 in show_hash_info(). I wondered if that part should be backpatched. It does not seem very robust to me to divide a Size by 1024 and expect it to fit into a long. With MSVC 64 bit, sizeof(Size) == 8 and sizeof(long) == 4. I understand work_mem is limited to 2GB on that platform, but it does not seem like a good reason to use a long. David [1] https://www.postgresql.org/message-id/CAEze2WhopAFRS4xdNtma6XtYCRqydPWAg83jx8HZTowpeXzOyg%40mail.gmail.com [2] https://www.postgresql.org/message-id/ZaF1fB_hMqycSq-S%40toroid.org
Attachment
On Tue, May 14, 2024 at 6:33 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Tue, 14 May 2024 at 18:16, David Rowley <dgrowleyml@gmail.com> wrote: > > I think for v17, we should consider adding a macro to explain.c to > > calculate the KB from bytes. There are other inconsistencies that it > > would be good to address. We normally round up to the nearest kilobyte > > with (bytes + 1023) / 1024, but if you look at what 06286709e did, it > > seems to be rounding to the nearest without any apparent justification > > as to why. It does (metrics->bytesSent + 512) / 1024. > > > > show_memory_counters() could be modified to use the macro and show > > kilobytes rather than bytes. > > Here's a patch for that. > > I checked the EXPLAIN SERIALIZE thread and didn't see any mention of > the + 512 thing. It seems Tom added it just before committing and no > patch ever made it to the mailing list with + 512. The final patch on > the list is in [1]. > > For the EXPLAIN MEMORY part, the bytes vs kB wasn't discussed. The > closest the thread came to that was what Abhijit mentioned in [2]. static void show_memory_counters(ExplainState *es, const MemoryContextCounters *mem_counters) { if (es->format == EXPLAIN_FORMAT_TEXT) { ExplainIndentText(es); appendStringInfo(es->str, "Memory: used=%zukB allocated=%zukB", BYTES_TO_KILOBYTES(mem_counters->totalspace - mem_counters->freespace), BYTES_TO_KILOBYTES(mem_counters->totalspace)); appendStringInfoChar(es->str, '\n'); } else { ExplainPropertyInteger("Memory Used", "bytes", mem_counters->totalspace - mem_counters->freespace, es); ExplainPropertyInteger("Memory Allocated", "bytes", mem_counters->totalspace, es); } } the "else" branch, also need to apply BYTES_TO_KILOBYTES marco? otherwise, it's inconsistent? > I also adjusted some inconsistencies around spaces between the digits > and kB. In other places in EXPLAIN we write "100kB" not "100 kB". I > see we print times with a space ("Execution Time: 1.719 ms"), so we're > not very consistent overall, but since the EXPLAIN MEMORY is new, it > makes sense to change it now to be aligned to the other kB stuff in > explain.c > > The patch does change a long into an int64 in show_hash_info(). I > wondered if that part should be backpatched. It does not seem very > robust to me to divide a Size by 1024 and expect it to fit into a > long. With MSVC 64 bit, sizeof(Size) == 8 and sizeof(long) == 4. I > understand work_mem is limited to 2GB on that platform, but it does > not seem like a good reason to use a long. > I also checked output from function show_incremental_sort_group_info and show_sort_info, the "kB" usage is consistent.
On Wed, 15 May 2024 at 01:18, jian he <jian.universality@gmail.com> wrote: > else > { > ExplainPropertyInteger("Memory Used", "bytes", > mem_counters->totalspace - mem_counters->freespace, > es); > ExplainPropertyInteger("Memory Allocated", "bytes", > mem_counters->totalspace, es); > } > } > > the "else" branch, also need to apply BYTES_TO_KILOBYTES marco? Yeah, I missed that. Here's another patch. Thanks for looking. David
Attachment
On Wed, May 15, 2024 at 02:01:21AM +1200, David Rowley wrote: > Yeah, I missed that. Here's another patch. > > Thanks for looking. Thanks for bringing in a patch that makes the whole picture more consistent across the board. When it comes to MEMORY, I can get behind your suggestion to use kB and call it a day, while SERIALIZE would apply the same conversion at 1023b. It would be nice to document the units implied in the non-text formats, rather than have the users guess what these are. Perhaps Alvaro and Tom would like to chime in, as committers of respectively 5de890e3610d and 06286709ee06? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > Perhaps Alvaro and Tom would like to chime in, as committers of > respectively 5de890e3610d and 06286709ee06? No objection here. In a green field I might argue for round-to-nearest instead of round-up, but it looks like we have several precedents for round-up, so let's avoid changing that existing behavior. regards, tom lane
explain (format json, analyze, wal, buffers, memory, serialize) insert into tenk1 select * from tenk1 limit 1; QUERY PLAN ----------------------------------------------- [ { "Plan": { "Node Type": "ModifyTable", "Operation": "Insert", "Parallel Aware": false, "Async Capable": false, "Relation Name": "tenk1", "Alias": "tenk1", "Startup Cost": 0.00, "Total Cost": 0.04, "Plan Rows": 0, "Plan Width": 0, "Actual Startup Time": 0.030, "Actual Total Time": 0.030, "Actual Rows": 0, "Actual Loops": 1, "Shared Hit Blocks": 3, "Shared Read Blocks": 0, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0, "WAL Records": 1, "WAL FPI": 0, "WAL Bytes": 299, "Plans": [ { "Node Type": "Limit", "Parent Relationship": "Outer", "Parallel Aware": false, "Async Capable": false, "Startup Cost": 0.00, "Total Cost": 0.04, "Plan Rows": 1, "Plan Width": 244, "Actual Startup Time": 0.011, "Actual Total Time": 0.011, "Actual Rows": 1, "Actual Loops": 1, "Shared Hit Blocks": 2, "Shared Read Blocks": 0, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0, "WAL Records": 0, "WAL FPI": 0, "WAL Bytes": 0, "Plans": [ { "Node Type": "Seq Scan", "Parent Relationship": "Outer", "Parallel Aware": false, "Async Capable": false, "Relation Name": "tenk1", "Alias": "tenk1_1", "Startup Cost": 0.00, "Total Cost": 445.00, "Plan Rows": 10000, "Plan Width": 244, "Actual Startup Time": 0.009, "Actual Total Time": 0.009, "Actual Rows": 1, "Actual Loops": 1, "Shared Hit Blocks": 2, "Shared Read Blocks": 0, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0, "WAL Records": 0, "WAL FPI": 0, "WAL Bytes": 0 } ] } ] }, "Planning": { "Shared Hit Blocks": 0, "Shared Read Blocks": 0, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0, "Memory Used": 68080, "Memory Allocated": 131072 }, "Planning Time": 0.659, "Triggers": [ ], "Serialization": { "Time": 0.000, "Output Volume": 0, "Format": "text", "Shared Hit Blocks": 0, "Shared Read Blocks": 0, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0 }, "Execution Time": 0.065 } ] ------- "Shared Hit Blocks": 0, "Shared Read Blocks": 0, "Shared Dirtied Blocks": 0, "Shared Written Blocks": 0, "Local Hit Blocks": 0, "Local Read Blocks": 0, "Local Dirtied Blocks": 0, "Local Written Blocks": 0, "Temp Read Blocks": 0, "Temp Written Blocks": 0 these information duplicated for json key "Serialization" and json key "Planning" i am not sure this is intended?
On Wed, 15 May 2024 at 13:44, jian he <jian.universality@gmail.com> wrote: > "Shared Hit Blocks": 0, > "Shared Read Blocks": 0, > "Shared Dirtied Blocks": 0, > "Shared Written Blocks": 0, > "Local Hit Blocks": 0, > "Local Read Blocks": 0, > "Local Dirtied Blocks": 0, > "Local Written Blocks": 0, > "Temp Read Blocks": 0, > "Temp Written Blocks": 0 > > these information duplicated for json key "Serialization" and json key > "Planning" > i am not sure this is intended? Looks ok to me. Buffers used during planning are independent from the buffers used when outputting rows to the client. David
On Wed, May 15, 2024 at 10:13 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Wed, 15 May 2024 at 13:44, jian he <jian.universality@gmail.com> wrote: > > "Shared Hit Blocks": 0, > > "Shared Read Blocks": 0, > > "Shared Dirtied Blocks": 0, > > "Shared Written Blocks": 0, > > "Local Hit Blocks": 0, > > "Local Read Blocks": 0, > > "Local Dirtied Blocks": 0, > > "Local Written Blocks": 0, > > "Temp Read Blocks": 0, > > "Temp Written Blocks": 0 > > > > these information duplicated for json key "Serialization" and json key > > "Planning" > > i am not sure this is intended? > > Looks ok to me. Buffers used during planning are independent from the > buffers used when outputting rows to the client. > looking at serializeAnalyzeReceive. I am not sure which part of serializeAnalyzeReceive will update pgBufferUsage. I am looking for an example where this information under json key "Serialization" is not zero. So far I have tried: create table s(a text); insert into s select repeat('a', 1024) from generate_series(1,1024); explain (format json, analyze, wal, buffers, memory, serialize, timing off) select * from s;
On Wed, 15 May 2024 at 15:40, jian he <jian.universality@gmail.com> wrote: > I am looking for an example where this information under json key > "Serialization" is not zero. > So far I have tried: Something that requires detoasting. > create table s(a text); > insert into s select repeat('a', 1024) from generate_series(1,1024); > explain (format json, analyze, wal, buffers, memory, serialize, timing > off) select * from s; Something bigger than 1024 bytes or use SET STORAGE EXTERNAL or EXTENDED. create table s(a text); insert into s select repeat('a', 1024*1024) from generate_series(1,10); explain (format text, analyze, buffers, serialize, timing off) select * from s; Serialization: output=10241kB format=text Buffers: shared hit=36 David
On Wed, 15 May 2024 at 13:23, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael@paquier.xyz> writes: > > Perhaps Alvaro and Tom would like to chime in, as committers of > > respectively 5de890e3610d and 06286709ee06? > > No objection here. In a green field I might argue for > round-to-nearest instead of round-up, but it looks like we > have several precedents for round-up, so let's avoid changing > that existing behavior. Thanks. I've pushed the patch now. David