Home > mailing lists

[PERFORM] Ordering on GIN Index - Mailing list pgsql-performance

From	Rory
Subject	[PERFORM] Ordering on GIN Index
Date	December 14, 2016 05:18:09
Msg-id	CANwWpaghGzMrwW1S85HdCZF59o3SwwiyJUmB+2jGF5SZnnYbwQ@mail.gmail.com Whole thread Raw
List	pgsql-performance

Tree view

Hello mailing list!

We have a JSONB column in a table that has a key (string) => value (int) structure. We want to be able to create a btree index that accepts an arbitrary key to allowing sorting by key.

Table Example:

+----+------------------------------------------+
| ID | JSONB |
+----+------------------------------------------+
| 1 | {"key_1": 20, "key_2": 30, "key_52": -1} |
| 2 | {"key_1": 10} |
+----+------------------------------------------+

Here is the kind of query we want to run:

select id, (jsonb ->> 'key_1')::int as sort_key
from my_table
where (jsonb ? 'key_1' and (jsonb ->> 'key_1')::int > 0) and (jsonb ? 'key_2' and jsonb ->> (jsonb ->> 'key_2')::int > 50)

order by sort_key desc
limit 100;

We know that we can create indexes for each individual key (create index my_table_key_1_idx on my_table using btree((jsonb -> 'key_1')) or using a partial index including the ? operator) but the issue is that there are around 5000 potential keys, which means 5000 indexes.

We tried doing the relational thing, and splitting the JSONB table into it's own separate table, which is great because we can use a simple btree index, but unfortunately this forces us to use weird queries such as:

select id, max(value) filter (where key = 'key_1') as sort_key
from my_table_split
where (
(key = 'key_1' and value > 0) or
(key = 'key_2' and value > 50)
)
group by id having count(*) = 2

order by sort_key desc

limit 100;

Such a query takes a disappointing long time to aggregate. This also has the disadvantage that if we wanted to expand my_table we'd have to do an inner join further decreasing performance.

I see that in 2013 there was a talk (http://www.sai.msu.su/~megera/postgres/talks/Next%20generation%20of%20GIN.pdf) about ordered GIN indexes which seems perfect for our case, but I can't see any progress or updates on that.

Does anyone have any ideas on how to approach this in a for performant way with the Postgres we have today?

Thank you,

Rory.

pgsql-performance by date:

From: Daniel Blanch Bataller
Date: 14 December 2016, 00:14:44
Subject: Re: [PERFORM] Querying with multicolumn index

From: Dinesh Chandra 12108
Date: 15 December 2016, 13:28:39
Subject: [PERFORM] Size of Temporary tablespace is increasing very much in postgresql9.1.

[PERFORM] Ordering on GIN Index - Mailing list pgsql-performance

Previous

Next