Explain: Duplicate key "Workers" in JSON format - Mailing list pgsql-hackers

From Pierre Giraud
Subject Explain: Duplicate key "Workers" in JSON format
Date
Msg-id 41ee53a5-a36e-cc8f-1bee-63f6565bb1ee@dalibo.com
Whole thread Raw
Responses Re: Explain: Duplicate key "Workers" in JSON format
List pgsql-hackers
Hi,

I'm currently working on a tool to visualize an execution plan [1]. For
those who know PEV, it's actually a fork of this great tool since it
hasn't been active for more than 2 years.

Among other things, I'd like to show information for the parallel
queries. First of which is information about the workers.

I'm facing a problem when I am trying to parse a plan in the JSON
format. The "Workers" key may be duplicated.

While it's not invalid to have keys with the same name several times at
the same level in a JSON object, it makes it almost impossible to get
the full info when parsed. Indeed when parsing such a JSON string only
the last key is kept. Part of the information is lost.

JSON validators warn us with the following message : "Duplicate key,
names should be unique."

Here's an example of a plan in VERBOSE mode.

[
  {
    "Plan": {
      "Node Type": "Gather Merge",
      "Parallel Aware": false,
      "Actual Startup Time": 1720.052,
      "Actual Total Time": 4252.290,
      "Actual Rows": 10000000,
      "Actual Loops": 1,
      "Output": ["c1", "c2"],
      "Workers Planned": 2,
      "Workers Launched": 2,
      "Plans": [
        {
          "Node Type": "Sort",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "Actual Startup Time": 1558.638,
          "Actual Total Time": 2127.522,
          "Actual Rows": 3333333,
          "Actual Loops": 3,
          "Output": ["c1", "c2"],
          "Sort Key": ["t1.c1"],
          "Sort Method": "external merge",
          "Sort Space Used": 126152,
          "Sort Space Type": "Disk",
          "Workers": [
            {
              "Worker Number": 0,
              "Sort Method": "external merge",
              "Sort Space Used": 73552,
              "Sort Space Type": "Disk"
            },
            {
              "Worker Number": 1,
              "Sort Method": "external merge",
              "Sort Space Used": 73320,
              "Sort Space Type": "Disk"
            }
          ],
          "Workers": [
            {
              "Worker Number": 0,
              "Actual Startup Time": 1487.846,
              "Actual Total Time": 1996.879,
              "Actual Rows": 2692973,
              "Actual Loops": 1
            },
            {
              "Worker Number": 1,
              "Actual Startup Time": 1468.256,
              "Actual Total Time": 2012.744,
              "Actual Rows": 2684443,
              "Actual Loops": 1
            }
          ],
          "Plans": [
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Outer",
              "Parallel Aware": true,
              "Relation Name": "t1",
              "Schema": "public",
              "Alias": "t1",
              "Actual Startup Time": 0.211,
              "Actual Total Time": 372.858,
              "Actual Rows": 3333333,
              "Actual Loops": 3,
              "Output": ["c1", "c2"],
              "Workers": [
                {
                  "Worker Number": 0,
                  "Actual Startup Time": 0.029,
                  "Actual Total Time": 368.356,
                  "Actual Rows": 2692973,
                  "Actual Loops": 1
                },
                {
                  "Worker Number": 1,
                  "Actual Startup Time": 0.033,
                  "Actual Total Time": 368.874,
                  "Actual Rows": 2684443,
                  "Actual Loops": 1
                }
              ]
            }
          ]
        }
      ]
    },
    "Planning Time": 0.170,
    "Triggers": [
    ],
    "Execution Time": 4695.141
  }
]

As you can see, the "Workers" key is duplicated in the Sort node.

Here's the equivalent in TEXT format:

---------------------------------
 Gather Merge  (cost=735306.27..1707599.95 rows=8333364 width=17)
(actual time=1560.468..3749.583 rows=10000000 loops=1)
   Output: c1, c2
   Workers Planned: 2
   Workers Launched: 2
   ->  Sort  (cost=734306.25..744722.95 rows=4166682 width=17) (actual
time=1474.182..1967.788 rows=3333333 loops=3)
         Output: c1, c2
         Sort Key: t1.c1
         Sort Method: external merge  Disk: 125168kB
         Worker 0:  Sort Method: external merge  Disk: 73768kB
         Worker 1:  Sort Method: external merge  Disk: 74088kB
         Worker 0: actual time=1431.136..1883.370 rows=2700666 loops=1
         Worker 1: actual time=1431.175..1891.630 rows=2712505 loops=1
         ->  Parallel Seq Scan on public.t1  (cost=0.00..105264.82
rows=4166682 width=17) (actual time=0.214..386.014 rows=3333333 loops=3)
               Output: c1, c2
               Worker 0: actual time=0.027..382.325 rows=2700666 loops=1
               Worker 1: actual time=0.038..384.951 rows=2712505 loops=1
 Planning Time: 0.180 ms
 Execution Time: 4166.867 ms
(18 rows)
---------------------------------

I think that the text format should stay as is.

For the JSON format however it would be better in my opinion if
"Workers" data is merged. Parsing should not imply anything else than
"var myObj = JSON.parse(theJsonString);".

What do you think?

Thanks.

[1] https://dalibo.github.io/pev2/



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: mingw32 floating point diff
Next
From: Ibrar Ahmed
Date:
Subject: Re: WIP/PoC for parallel backup