I have a table with the schema:
CREATE TABLE fluent (id BIGSERIAL, record JSONB);
Then I created a couple of indexes:
1) CREATE INDEX idx_records ON fluent USING GIN (record);
2) CREATE INDEX idx_btree_records ON fluent USING BTREE ((record ->>
'destinationServiceName'));
Now, if I run a query to look up the distinct values of the field
`destinationServiceName`, via:
select distinct record ->> 'destinationServiceName' from fluent;
This query takes a lot of time, and does full table scan. The query planning is:
# explain analyze select distinct record ->> 'destinationServiceName'
from fluent;
QUERY PLAN
Unique
(cost=1103803.97..1118803.97 rows=3000000 width=32) (actual
time=77282.528..78549.877 rows=10 loops=1)
-> Sort (cost=1103803.97..1111303.97 rows=3000000 width=32)
(actual time=77282.525..78046.992 rows=3000000 loops=1)
Sort Key: ((record ->> 'destinationServiceName'::text))
Sort Method: external merge Disk: 117456kB
-> Seq Scan on fluent (cost=0.00..637500.00 rows=3000000
width=32) (actual time=14.440..69545.867 rows=3000000 loops=1)
Planning Time: 0.187 ms
Execution Time: 78574.221 ms
I see that none of the indexes are used. I want to do a few
aggregations, like "what are the distinct pairs of
`destinationServiceName` and `sourceServiceName` etc. " in these
records. Now, is such a querying possible at all without doing full
table scans ? I get such kind of aggregation support in elasticsearch
+ kibana, without doing full-table scan (I believe so, but I do not
have data to back this claim) and I am trying to see if this is
possible with any other extra index creation in postgres.
Any suggestions ? Thanks.
--
Sankar P
http://psankar.blogspot.com