Thread: nested xml/json to table

nested xml/json to table

From

Wim Bertels

Date:

17 March 2023, 10:05:18

Hello,

in python pandas there is for example a json_normalize function,
i didn't find something similar or better in postgresql?

what would be the general idea: "easily" convert an hierarchical
structure like json or xml to a table; for example creating columns by
appending the key-names when going doing down the three, using null for
empty values, adding more columns as needed by the given structure.
(1-way operation)

a few conceptual gists:
jsonX=
{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                                "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Sta.."; 
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                                    "para": "A meta-m..",
                        "GlossSeeAlso": 
                            ["GML", "XML"]
                                },
                    "GlossSee": "markup"
                }
            }
        }
    }
}

select *
from json_to_table('jsonX');
-- generated columns with no data/only nulls could be removed..
-- arrays could be unnested in the process as well

glossary | glossary.title   | glossary.title.GlossDiv.title | .. 
-------------------------------------------------------------
null     | example glossary | S                        |
..

the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee
with value "markup"

---

what if there are different structures that need to be combined?
(they could be added in the same manner as before)

jsonY=
{
s1:[{
    "f1": "a",
    "f2": "b",
    "f3": { "f3.1": "c",
        "f3.2": "d"}
   },
   { 
    "f1": "e",
    "f4": "g"
   }
   ]
}
           
select *
from json_to_table('jsonY');
-- generated columns with no data/only nulls could be removed..
-- separator sign is untrusted

s1  | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4
-------------------------------------------------------------
null| a     | b     | null  | c          | d          | null
null| e     | null  | null  | null       | null       | g


any ideas or suggestions (apart from plpython)? 
Wim

Re: nested xml/json to table

From

Thomas Kellerer

Date:

17 March 2023, 10:21:23

Wim Bertels schrieb am 17.03.2023 um 11:05:
> what would be the general idea: "easily" convert an hierarchical
> structure like json or xml to a table; for example creating columns by
> appending the key-names when going doing down the three, using null for
> empty values, adding more columns as needed by the given structure.
> (1-way operation)
>
> a few conceptual gists:
> jsonX=
> {
>     "glossary": {
>         "title": "example glossary",
>         "GlossDiv": {
>             "title": "S",
>             "GlossList": {
>                 "GlossEntry": {
>                                 "ID": "SGML",
>                     "SortAs": "SGML",
>                     "GlossTerm": "Sta..";
>                     "Acronym": "SGML",
>                     "Abbrev": "ISO 8879:1986",
>                     "GlossDef": {
>                                     "para": "A meta-m..",
>                         "GlossSeeAlso": 
>                             ["GML", "XML"]
>                                 },
>                     "GlossSee": "markup"
>                 }
>             }
>         }
>     }
> }
>
> select *
> from json_to_table('jsonX');
> -- generated columns with no data/only nulls could be removed..
> -- arrays could be unnested in the process as well
>
> glossary | glossary.title   | glossary.title.GlossDiv.title | ..
> -------------------------------------------------------------
> null     | example glossary | S                        |
> ..
>
> the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee
> with value "markup"
>
> ---
>
> what if there are different structures that need to be combined?
> (they could be added in the same manner as before)
>
> jsonY=
> {
> s1:[{
>     "f1": "a",
>     "f2": "b",
>     "f3": { "f3.1": "c",
>         "f3.2": "d"}
>    },
>    {
>     "f1": "e",
>     "f4": "g"
>    }
>    ]
> }
>
> select *
> from json_to_table('jsonY');
> -- generated columns with no data/only nulls could be removed..
> -- separator sign is untrusted
>
> s1  | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4
> -------------------------------------------------------------
> null| a     | b     | null  | c          | d          | null
> null| e     | null  | null  | null       | null       | g


You can't have a function that returns a different set of columns each time you call it
(without specifying the output columns - which you don't want).

I have once written a function to flatten a JSON hierarchy to multiple rows.

Applied to your first example it would return the following:

path                                             | key          | value
-------------------------------------------------+--------------+-----------------
/glossary                                        | title        | example glossary
/glossary/GlossDiv                               | title        | S
/glossary/GlossDiv/GlossList/GlossEntry          | ID           | SGML
/glossary/GlossDiv/GlossList/GlossEntry          | Abbrev       | ISO 8879:1986
/glossary/GlossDiv/GlossList/GlossEntry          | SortAs       | SGML
/glossary/GlossDiv/GlossList/GlossEntry          | Acronym      | SGML
/glossary/GlossDiv/GlossList/GlossEntry          | GlossSee     | markup
/glossary/GlossDiv/GlossList/GlossEntry          | GlossTerm    | Sta..
/glossary/GlossDiv/GlossList/GlossEntry/GlossDef | para         | A meta-m..
/glossary/GlossDiv/GlossList/GlossEntry/GlossDef | GlossSeeAlso | ["GML", "XML"]

And the following for the second example:

path   | key  | value
-------+------+------
/s1    | f1   | a
/s1    | f2   | b
/s1/f3 | f3.1 | c
/s1/f3 | f3.2 | d
/s1    | f1   | e
/s1    | f4   | g


Thomas

Attachment

flatten_json.sql

Re: nested xml/json to table

From

Wim Bertels

Date:

17 March 2023, 11:04:42

Thomas Kellerer schreef op vr 17-03-2023 om 11:21 [+0100]:
> Wim Bertels schrieb am 17.03.2023 um 11:05:
> > what would be the general idea: "easily" convert an hierarchical
> > structure like json or xml to a table; for example creating columns
> > by
> > appending the key-names when going doing down the three, using null
> > for
> > empty values, adding more columns as needed by the given structure.
> > (1-way operation)
> 
> 
> You can't have a function that returns a different set of columns
> each time you call it
> (without specifying the output columns - which you don't want).

Hello Thomas,

thanks for the feedback,
i was wondering in the likes of existing built-in functions or
extensions (not CREATE FUNCTION)

mvg,
Wim