Re: [GENERAL] jsonb case insensitive search - Mailing list pgsql-general
From | armand pirvu |
---|---|
Subject | Re: [GENERAL] jsonb case insensitive search |
Date | |
Msg-id | 03C23B57-C3C7-4EB3-8A18-30E90A45BBF7@gmail.com Whole thread Raw |
In response to | Re: [GENERAL] jsonb case insensitive search (armand pirvu <armand.pirvu@gmail.com>) |
Responses |
Re: [GENERAL] jsonb case insensitive search
(Karl Czajkowski <karlcz@isi.edu>)
|
List | pgsql-general |
I apologize before hand replying again on my own reply . I know it is frowned upon . My inline comments. > On Jun 1, 2017, at 2:05 PM, armand pirvu <armand.pirvu@gmail.com> wrote: > > Thank you Karl and David > > Ideally as far as I can tell the index would need to be show_id, file_id, lower(…) > > > The question is if this is possible ? > > > Thanks > Armand > > >> On Jun 1, 2017, at 12:24 PM, Karl Czajkowski <karlcz@isi.edu> wrote: >> >> On May 31, armand pirvu modulated: >> >>> The idea is that I would like to avoid having an index for each key >>> possibly wanted to search, but rather say have the whole record then >>> search by the key and get the key value, thus having one index serving >>> multiple purposes so to speak >>> >> >> First, benchmarking would be important to figure out if any proposed >> indexing actually speeds up the kinds of queries you want to perform. >> With the recently added parallel query features, a simpler indexing >> scheme with some brute-force search might be adequate? >> Not sure what you mean by benchmarking But I think comparative times , aka 2 seconds vs a couple milliseconds is quite a difference. A table scan while in certain cases is okay , in a case when there is heavy usage on the same part/area , it will becomea problem. >> But, you could use a search idiom like this: >> >> (lower(json_column::text)::json) -> lower('key') = 'value'::json >> >> This will down-convert the case on all values and keys. The left-hand >> parenthetic expression could be precomputed in an expression index to >> avoid repeated case conversion. But, typical searches will still have >> to scan the whole index to perform the projection and match the final >> value tests on the right-hand side. >> >> If you want to do things like substring matching on field values, you >> might stick with text and using regexp matches: >> >> (lower(json_column::text)) ~ ‘valuepattern' In this case a regular index will be ignored even though IMO it should scan the index and get the needed information The criteria I am after gets back 9 rows max out of 100k+ records so I say the restriction is darn good. Wouldn’t that bethe case for the optimizer to pick the path with the least resistance aka best restriction ? Granted it uses a lower functionwhich and the search in the text column which is the third in the index is not really starting form left. But theindex starts with show_id , file_id and those are always part of the key. I can see though once the show_id, file_id isNOT a good restriction anymore , than the last column will make the difference . Either case will that not translate intoan index scan ? Or the index to be considered in this case, event the last column search has to follow the left to right,aka not in between search ? >> >> or more structural searches: >> >> (lower(json_column::text)) ~ '"key": "[^"]*substring[^"]*"' >> >> Here, the left-hand expression could be trigram indexed to help with >> sparse, substring matching without a full index scan. We've had good >> luck using trigram indexing with regexp matching, though I've honestly >> never used it for the purpose sketched above... >> >> Karl > Seems to me trigram could be the answer since I have some decent results once I applied it, more to dig Overall could it be that the optimizer blatantly ignores a scan index which is cheaper than a table scan, or jsonb implementationstill has a long way to come up or the way it is used in my case is not the one designed for ? thanks Armand
pgsql-general by date: