Thread: Using JSONB with nested key-value structures, and skipping/wildcardingthe key
Using JSONB with nested key-value structures, and skipping/wildcardingthe key
From
Shaheed Haque
Date:
Hi,
I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB column called 'snapshot'. In Python terms, each row's 'snapshot' looks like this:
======================
snapshot = {
'pay_definition' : {
'1234': {..., 'name': 'foo', ...},
'99': {..., 'name': 'bar', ...},
}
======================
I'd like to find all unique values of 'name' in all rows of MyModel. I have this working using native JSON functions from the ORM like this:
=====================
class PayDef(Func):
function='to_jsonb'
template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', flat=True)
=====================
So, skipping the ordering/distinct/ORM parts, the core looks like this:
My question is if this the best way to solve this problem? The way my current logic works, reading from inside out is, I think:
To provide context on what "better" might be:
Thanks, Shaheed
P.S. I posted a Django-centric version of this to the relevant mailing list but got no replies; nevertheless, apologies for the cross post.
I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB column called 'snapshot'. In Python terms, each row's 'snapshot' looks like this:
======================
snapshot = {
'pay_definition' : {
'1234': {..., 'name': 'foo', ...},
'99': {..., 'name': 'bar', ...},
}
======================
I'd like to find all unique values of 'name' in all rows of MyModel. I have this working using native JSON functions from the ORM like this:
=====================
class PayDef(Func):
function='to_jsonb'
template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', flat=True)
=====================
So, skipping the ordering/distinct/ORM parts, the core looks like this:
to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name')
My question is if this the best way to solve this problem? The way my current logic works, reading from inside out is, I think:
- Pass in the 'snapshot'.
- Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this key.
- To skip the unknown numeric keys, "jsonb_each()" turns each key, value pair into an inner row like ['1234', {...}].
- To get to the value column of the inner row "row_to_json()->'value'".
- To get the name field's value "->'name'".
- A final call to "to_jsonb" in the PayDefs class. This bit is clearly Django-specific.
To provide context on what "better" might be:
- Snapshot JSONs might easily be 20MB in size.
- Each 'pay_definition' is probablyonly about 1kB in size, and there might be 50 of them in a snapshot.
- There might be 1000 MyModel instances in a given query.
- I'm using PostgreSQL 12
Thanks, Shaheed
P.S. I posted a Django-centric version of this to the relevant mailing list but got no replies; nevertheless, apologies for the cross post.
Re: Using JSONB with nested key-value structures, andskipping/wildcarding the key
From
Alban Hertroys
Date:
> On 1 Jun 2020, at 20:18, Shaheed Haque <shaheedhaque@gmail.com> wrote: > > Hi, > > I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB column called 'snapshot'. In Python terms,each row's 'snapshot' looks like this: > > ====================== > snapshot = { > 'pay_definition' : { > '1234': {..., 'name': 'foo', ...}, > '99': {..., 'name': 'bar', ...}, > } > ====================== > > I'd like to find all unique values of 'name' in all rows of MyModel. I have this working using native JSON functions fromthe ORM like this: > > ===================== > class PayDef(Func): > function='to_jsonb' > template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')" > > MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', flat=True) > ===================== > > So, skipping the ordering/distinct/ORM parts, the core looks like this: > > to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’) I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django): select snapshot->’pay_definition’->k.value->’name’ from MyModel join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true I don’t know how that compares performance-wise to using jsonb_each, but perhaps worth a try. Obviously, the way it’s writtenabove it doesn’t return distinct values of ’name’ yet, but that’s fairly easy to remedy. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Re: Using JSONB with nested key-value structures, andskipping/wildcarding the key
From
Shaheed Haque
Date:
Hi,
On Mon, 1 Jun 2020 at 23:50, Alban Hertroys <haramrae@gmail.com> wrote:
> On 1 Jun 2020, at 20:18, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> Hi,
>
> I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB column called 'snapshot'. In Python terms, each row's 'snapshot' looks like this:
>
> ======================
> snapshot = {
> 'pay_definition' : {
> '1234': {..., 'name': 'foo', ...},
> '99': {..., 'name': 'bar', ...},
> }
> ======================
>
> I'd like to find all unique values of 'name' in all rows of MyModel. I have this working using native JSON functions from the ORM like this:
>
> =====================
> class PayDef(Func):
> function='to_jsonb'
> template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
>
> MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', flat=True)
> =====================
>
> So, skipping the ordering/distinct/ORM parts, the core looks like this:
>
> to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’)
I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django):
select snapshot->’pay_definition’->k.value->’name’
from MyModel
join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true
... AS table_name(column_name)
except that I don't see any clues in the docs that jsonb_object_keys() is a "table function". Can you kindly clarify?
I don’t know how that compares performance-wise to using jsonb_each, but perhaps worth a try. Obviously, the way it’s written above it doesn’t return distinct values of ’name’ yet, but that’s fairly easy to remedy.
Indeed; this is what I managed to get to:
SELECT DISTINCT snapshot -> 'pay_definition' -> k.value -> 'name' AS name
FROM paiyroll_payrun
JOIN LATERAL jsonb_object_keys(snapshot -> 'pay_definition') AS k(value) ON true
ORDER BY name;
At any rate, I'll have to ponder the "raw" route absent some way to "JOIN LATERAL".
Thanks, Shaheed
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
Re: Using JSONB with nested key-value structures, andskipping/wildcarding the key
From
Thomas Kellerer
Date:
> One question: as a novice here, I think I understand the right hand > side of your JOIN "... k(value)" is shorthand for: > > ... AS table_name(column_name) > > except that I don't see any clues in the docs that > jsonb_object_keys() is a "table function".> Can you kindly clarify? The clue is in the column "return type" which states: "setof text" for jsonb_object_keys() A function returning "setof" is the same as a "table function"
Re: Using JSONB with nested key-value structures, andskipping/wildcarding the key
From
Alban Hertroys
Date:
> On 2 Jun 2020, at 9:30, Shaheed Haque <shaheedhaque@gmail.com> wrote: > > >> I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django): >> >> select snapshot->’pay_definition’->k.value->’name’ >> from MyModel >> join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true >> > I was unaware of the LATERAL keyword, so thanks. After a bit of Googling however, it seems that it is tricky/impossibleto use from the ORM (barring a full scale escape to a "raw" query). One question: as a novice here, I thinkI understand the right hand side of your JOIN "... k(value)" is shorthand for: > > ... AS table_name(column_name) > > except that I don't see any clues in the docs that jsonb_object_keys() is a "table function". Can you kindly clarify? Correct. Thomas already explained the return type, but the plural form of the function name is also an indication that itreturns multiple results. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.