DISTINCT on jsonb fields and Indexes - Mailing list pgsql-general

From Sankar P
Subject DISTINCT on jsonb fields and Indexes
Date
Msg-id CAMSEaH6ri7hH3-MywK+Vrb=RpkhzMufU2xXFdSKV9zubmniieg@mail.gmail.com
Whole thread Raw
Responses Re: DISTINCT on jsonb fields and Indexes  (David Rowley <dgrowleyml@gmail.com>)
Re: DISTINCT on jsonb fields and Indexes  (Michael Lewis <mlewis@entrata.com>)
List pgsql-general
I have a table with the schema:

CREATE TABLE fluent (id BIGSERIAL, record JSONB);

Then I created a couple of indexes:
1) CREATE INDEX idx_records ON fluent USING GIN (record);
2) CREATE INDEX idx_btree_records ON fluent USING BTREE ((record ->>
'destinationServiceName'));

Now, if I run a query to look up the distinct values of the field
`destinationServiceName`, via:

select distinct record ->> 'destinationServiceName' from fluent;

This query takes a lot of time, and does full table scan. The query planning is:

# explain analyze select distinct record ->> 'destinationServiceName'
from fluent;
                                                            QUERY PLAN
                                              Unique
(cost=1103803.97..1118803.97 rows=3000000 width=32) (actual
time=77282.528..78549.877 rows=10 loops=1)
   ->  Sort  (cost=1103803.97..1111303.97 rows=3000000 width=32)
(actual time=77282.525..78046.992 rows=3000000 loops=1)
         Sort Key: ((record ->> 'destinationServiceName'::text))
         Sort Method: external merge  Disk: 117456kB
         ->  Seq Scan on fluent  (cost=0.00..637500.00 rows=3000000
width=32) (actual time=14.440..69545.867 rows=3000000 loops=1)
 Planning Time: 0.187 ms
 Execution Time: 78574.221 ms

I see that none of the indexes are used. I want to do a few
aggregations, like "what are the distinct pairs of
`destinationServiceName` and `sourceServiceName` etc. " in these
records. Now, is such a querying possible at all without doing full
table scans ? I get such kind of aggregation support in elasticsearch
+ kibana, without doing full-table scan (I believe so, but I do not
have data to back this claim) and I am trying to see if this is
possible with any other extra index creation in postgres.

Any suggestions ? Thanks.

-- 
Sankar P
http://psankar.blogspot.com



pgsql-general by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Definition of REPEATABLE READ
Next
From: raf
Date:
Subject: Re: Hiding a GUC from SQL