Thread: Using JSONB with nested key-value structures, and skipping/wildcardingthe key

Hi,

I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB column called 'snapshot'. In Python terms, each row's 'snapshot' looks like this:

======================
snapshot = {
    'pay_definition' : {
        '1234': {..., 'name': 'foo', ...},
        '99': {..., 'name': 'bar', ...},
}
======================

I'd like to find all unique values of 'name' in all rows of MyModel. I have this working using native JSON functions from the ORM like this:

=====================
class PayDef(Func):
    function='to_jsonb'
    template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"

MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', flat=True)
=====================

So, skipping the ordering/distinct/ORM parts, the core looks like this:

to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name')

My question is if this the best way to solve this problem? The way my current logic works, reading from inside out is, I think:
  1. Pass in the 'snapshot'.
  2. Since 'snapshot' is a JSON field, "->'pay_definition'" traverses this key.
  3. To skip the unknown numeric keys, "jsonb_each()" turns each key, value pair into an inner row like ['1234', {...}].
  4. To get to the value column of the inner row "row_to_json()->'value'".
  5. To get the name field's value "->'name'".
  6. A final call to "to_jsonb" in the PayDefs class. This bit is clearly Django-specific.
For example, I think the pair of calls row_to_json(jsonb_each()) is needed because there is no jsonb_object_values() to complement jsonb_object_keys(). Likewise, since all I care about is the string value of 'name', is there a way to get rid of the PayDefs class, and its invocation of to_jsonb (this is probably Django-specific)?

To provide context on what "better" might be:
  • Snapshot JSONs might easily be 20MB in size.
  • Each 'pay_definition' is probablyonly about 1kB in size, and there might be 50 of them in a snapshot.
  • There might be 1000 MyModel instances in a given query.
  • I'm using PostgreSQL 12
so my concern is not have the database server or Django perform extraneous work converting between strings and JSON for example.

Thanks, Shaheed

P.S. I posted a Django-centric version of this to the relevant mailing list but got no replies; nevertheless, apologies for the cross post.
> On 1 Jun 2020, at 20:18, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> Hi,
>
> I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB column called 'snapshot'. In Python
terms,each row's 'snapshot' looks like this: 
>
> ======================
> snapshot = {
>     'pay_definition' : {
>         '1234': {..., 'name': 'foo', ...},
>         '99': {..., 'name': 'bar', ...},
> }
> ======================
>
> I'd like to find all unique values of 'name' in all rows of MyModel. I have this working using native JSON functions
fromthe ORM like this: 
>
> =====================
> class PayDef(Func):
>     function='to_jsonb'
>     template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
>
> MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', flat=True)
> =====================
>
> So, skipping the ordering/distinct/ORM parts, the core looks like this:
>
> to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’)


I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django):

select snapshot->’pay_definition’->k.value->’name’
  from MyModel
  join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true

I don’t know how that compares performance-wise to using jsonb_each, but perhaps worth a try. Obviously, the way it’s
writtenabove it doesn’t return distinct values of ’name’ yet, but that’s fairly easy to remedy. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Hi,

On Mon, 1 Jun 2020 at 23:50, Alban Hertroys <haramrae@gmail.com> wrote:

> On 1 Jun 2020, at 20:18, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
> Hi,
>
> I'm using Django's ORM to access Postgres12. My "MyModel" table has a JSONB column called 'snapshot'. In Python terms, each row's 'snapshot' looks like this:
>
> ======================
> snapshot = {
>     'pay_definition' : {
>         '1234': {..., 'name': 'foo', ...},
>         '99': {..., 'name': 'bar', ...},
> }
> ======================
>
> I'd like to find all unique values of 'name' in all rows of MyModel. I have this working using native JSON functions from the ORM like this:
>
> =====================
> class PayDef(Func):
>     function='to_jsonb'
>     template="%(function)s(row_to_json(jsonb_each(%(expressions)s->'pay_definition'))->'value'->'name')"
>
> MyModel.objects.annotate(paydef=PayDef(F('snapshot'))).order_by().distinct('paydef').values_list('paydef', flat=True)
> =====================
>
> So, skipping the ordering/distinct/ORM parts, the core looks like this:
>
> to_jsonb(row_to_json(jsonb_each('snapshot'->'pay_definition'))->'value'->'name’)


I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django):

select snapshot->’pay_definition’->k.value->’name’
  from MyModel
  join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true

I was unaware of the LATERAL keyword, so thanks. After a bit of Googling however, it seems that it is tricky/impossible to use from the ORM (barring a full scale escape to a "raw" query). One question: as a novice here, I think I understand the right hand side of your JOIN "... k(value)" is shorthand for:

... AS table_name(column_name)

except that I don't see any clues in the docs that jsonb_object_keys() is a "table function". Can you kindly clarify?
I don’t know how that compares performance-wise to using jsonb_each, but perhaps worth a try. Obviously, the way it’s written above it doesn’t return distinct values of ’name’ yet, but that’s fairly easy to remedy.
Indeed; this is what I managed to get to:
SELECT DISTINCT snapshot -> 'pay_definition' -> k.value -> 'name' AS name
FROM paiyroll_payrun
JOIN LATERAL jsonb_object_keys(snapshot -> 'pay_definition') AS k(value) ON true
ORDER BY name;
At any rate, I'll have to ponder the "raw" route absent some way to "JOIN LATERAL".

Thanks, Shaheed


Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

> One question: as a novice here, I think I understand the right hand
> side of your JOIN "... k(value)" is shorthand for:
>
> ... AS table_name(column_name)
>
> except that I don't see any clues in the docs that
> jsonb_object_keys() is a "table function".> Can you kindly clarify?

The clue is in the column "return type" which states: "setof text" for jsonb_object_keys()
A function returning "setof" is the same as a "table function"




> On 2 Jun 2020, at 9:30, Shaheed Haque <shaheedhaque@gmail.com> wrote:
>
>
>> I do something like this to get a set of sub-paths in a JSONB field (no idea how to write that in Django):
>>
>> select snapshot->’pay_definition’->k.value->’name’
>>   from MyModel
>>   join lateral jsonb_object_keys(snapshot->’pay_definition’) k(value) on true
>>
> I was unaware of the LATERAL keyword, so thanks. After a bit of Googling however, it seems that it is
tricky/impossibleto use from the ORM (barring a full scale escape to a "raw" query). One question: as a novice here, I
thinkI understand the right hand side of your JOIN "... k(value)" is shorthand for: 
>
> ... AS table_name(column_name)
>
> except that I don't see any clues in the docs that jsonb_object_keys() is a "table function". Can you kindly clarify?

Correct. Thomas already explained the return type, but the plural form of the function name is also an indication that
itreturns multiple results. 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.