[PERFORM] Ordering on GIN Index - Mailing list pgsql-performance

From Rory
Subject [PERFORM] Ordering on GIN Index
Date
Msg-id CANwWpaghGzMrwW1S85HdCZF59o3SwwiyJUmB+2jGF5SZnnYbwQ@mail.gmail.com
Whole thread Raw
List pgsql-performance
Hello mailing list!

We have a JSONB column in a table that has a key (string) => value (int) structure. We want to be able to create a btree index that accepts an arbitrary key to allowing sorting by key.

Table Example:
+----+------------------------------------------+
| ID |                  JSONB                   |
+----+------------------------------------------+
|  1 | {"key_1": 20, "key_2": 30, "key_52": -1} |
|  2 | {"key_1": 10}                            |
+----+------------------------------------------+

Here is the kind of query we want to run:
select id, (jsonb ->> 'key_1')::int as sort_key
from my_table 
where (jsonb ? 'key_1' and (jsonb ->> 'key_1')::int > 0) and (jsonb ? 'key_2' and jsonb ->> (jsonb ->> 'key_2')::int > 50)
order by sort_key desc
limit 100;

We know that we can create indexes for each individual key (create index my_table_key_1_idx on my_table using btree((jsonb -> 'key_1')) or using a partial index including the ? operator) but the issue is that there are around 5000 potential keys, which means 5000 indexes.

We tried doing the relational thing, and splitting the JSONB table into it's own separate table, which is great because we can use a simple btree index, but unfortunately this forces us to use weird queries such as:
select id, max(value) filter (where key = 'key_1') as sort_key
from my_table_split
where (
    (key = 'key_1' and value > 0) or
    (key = 'key_2' and value > 50)
)
group by id having count(*) = 2
order by sort_key desc 
limit 100;

Such a query takes a disappointing long time to aggregate. This also has the disadvantage that if we wanted to expand my_table we'd have to do an inner join further decreasing performance.

I see that in 2013 there was a talk (http://www.sai.msu.su/~megera/postgres/talks/Next%20generation%20of%20GIN.pdf) about ordered GIN indexes which seems perfect for our case, but I can't see any progress or updates on that.

Does anyone have any ideas on how to approach this in a for performant way with the Postgres we have today?

Thank you,
Rory.

pgsql-performance by date:

Previous
From: Daniel Blanch Bataller
Date:
Subject: Re: [PERFORM] Querying with multicolumn index
Next
From: Dinesh Chandra 12108
Date:
Subject: [PERFORM] Size of Temporary tablespace is increasing very much in postgresql9.1.