Thread: nested xml/json to table
Hello, in python pandas there is for example a json_normalize function, i didn't find something similar or better in postgresql? what would be the general idea: "easily" convert an hierarchical structure like json or xml to a table; for example creating columns by appending the key-names when going doing down the three, using null for empty values, adding more columns as needed by the given structure. (1-way operation) a few conceptual gists: jsonX= { "glossary": { "title": "example glossary", "GlossDiv": { "title": "S", "GlossList": { "GlossEntry": { "ID": "SGML", "SortAs": "SGML", "GlossTerm": "Sta.."; "Acronym": "SGML", "Abbrev": "ISO 8879:1986", "GlossDef": { "para": "A meta-m..", "GlossSeeAlso": ["GML", "XML"] }, "GlossSee": "markup" } } } } } select * from json_to_table('jsonX'); -- generated columns with no data/only nulls could be removed.. -- arrays could be unnested in the process as well glossary | glossary.title | glossary.title.GlossDiv.title | .. ------------------------------------------------------------- null | example glossary | S | .. the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee with value "markup" --- what if there are different structures that need to be combined? (they could be added in the same manner as before) jsonY= { s1:[{ "f1": "a", "f2": "b", "f3": { "f3.1": "c", "f3.2": "d"} }, { "f1": "e", "f4": "g" } ] } select * from json_to_table('jsonY'); -- generated columns with no data/only nulls could be removed.. -- separator sign is untrusted s1 | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4 ------------------------------------------------------------- null| a | b | null | c | d | null null| e | null | null | null | null | g any ideas or suggestions (apart from plpython)? Wim
Wim Bertels schrieb am 17.03.2023 um 11:05: > what would be the general idea: "easily" convert an hierarchical > structure like json or xml to a table; for example creating columns by > appending the key-names when going doing down the three, using null for > empty values, adding more columns as needed by the given structure. > (1-way operation) > > a few conceptual gists: > jsonX= > { > "glossary": { > "title": "example glossary", > "GlossDiv": { > "title": "S", > "GlossList": { > "GlossEntry": { > "ID": "SGML", > "SortAs": "SGML", > "GlossTerm": "Sta.."; > "Acronym": "SGML", > "Abbrev": "ISO 8879:1986", > "GlossDef": { > "para": "A meta-m..", > "GlossSeeAlso": > ["GML", "XML"] > }, > "GlossSee": "markup" > } > } > } > } > } > > select * > from json_to_table('jsonX'); > -- generated columns with no data/only nulls could be removed.. > -- arrays could be unnested in the process as well > > glossary | glossary.title | glossary.title.GlossDiv.title | .. > ------------------------------------------------------------- > null | example glossary | S | > .. > > the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee > with value "markup" > > --- > > what if there are different structures that need to be combined? > (they could be added in the same manner as before) > > jsonY= > { > s1:[{ > "f1": "a", > "f2": "b", > "f3": { "f3.1": "c", > "f3.2": "d"} > }, > { > "f1": "e", > "f4": "g" > } > ] > } > > select * > from json_to_table('jsonY'); > -- generated columns with no data/only nulls could be removed.. > -- separator sign is untrusted > > s1 | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4 > ------------------------------------------------------------- > null| a | b | null | c | d | null > null| e | null | null | null | null | g You can't have a function that returns a different set of columns each time you call it (without specifying the output columns - which you don't want). I have once written a function to flatten a JSON hierarchy to multiple rows. Applied to your first example it would return the following: path | key | value -------------------------------------------------+--------------+----------------- /glossary | title | example glossary /glossary/GlossDiv | title | S /glossary/GlossDiv/GlossList/GlossEntry | ID | SGML /glossary/GlossDiv/GlossList/GlossEntry | Abbrev | ISO 8879:1986 /glossary/GlossDiv/GlossList/GlossEntry | SortAs | SGML /glossary/GlossDiv/GlossList/GlossEntry | Acronym | SGML /glossary/GlossDiv/GlossList/GlossEntry | GlossSee | markup /glossary/GlossDiv/GlossList/GlossEntry | GlossTerm | Sta.. /glossary/GlossDiv/GlossList/GlossEntry/GlossDef | para | A meta-m.. /glossary/GlossDiv/GlossList/GlossEntry/GlossDef | GlossSeeAlso | ["GML", "XML"] And the following for the second example: path | key | value -------+------+------ /s1 | f1 | a /s1 | f2 | b /s1/f3 | f3.1 | c /s1/f3 | f3.2 | d /s1 | f1 | e /s1 | f4 | g Thomas
Attachment
Thomas Kellerer schreef op vr 17-03-2023 om 11:21 [+0100]: > Wim Bertels schrieb am 17.03.2023 um 11:05: > > what would be the general idea: "easily" convert an hierarchical > > structure like json or xml to a table; for example creating columns > > by > > appending the key-names when going doing down the three, using null > > for > > empty values, adding more columns as needed by the given structure. > > (1-way operation) > > > You can't have a function that returns a different set of columns > each time you call it > (without specifying the output columns - which you don't want). Hello Thomas, thanks for the feedback, i was wondering in the likes of existing built-in functions or extensions (not CREATE FUNCTION) mvg, Wim