Thread: Indexing fragments of a column's value ?
I'm asking about the possibility of indexing portions of a column's value where the column has a static field format. Example, a char(8) which contains all hex values (basically a hex number that's always 8 chars wide, leading zeros if needed). Someone might want to select all recs where the first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match '00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".
If I know the placement and width of the fields that need to be indexed, can indices be defined to facilitate queries ? Example...
- match all 8 chars
- match the 3rd and 4th chars
- match the last 4 chars
I suppose I could fragment the thing into multiple columns for the purposes of a search. So add a column called "last4" as a char(4) that matches the last 4 chars of that column. Then index that. etc... But inquiring to see if there is something more elegant.
Thanks !
David Gauthier <dfgpostgres@gmail.com> writes: > I'm asking about the possibility of indexing portions of a column's value > where the column has a static field format. GIN indexes are meant for exactly that. You might have to write your own opclass to break up the input values in the way you want though. A less difficult answer would be to write a function that breaks up the input into (say) an array of text and then use the existing GIN array support. But you'd pay for that by needing to write more complicated queries to use the index. regards, tom lane
Le ven. 3 nov. 2023 à 21:01, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
David Gauthier <dfgpostgres@gmail.com> writes:
> I'm asking about the possibility of indexing portions of a column's value
> where the column has a static field format.
GIN indexes are meant for exactly that. You might have to write your
own opclass to break up the input values in the way you want though.
A less difficult answer would be to write a function that breaks up
the input into (say) an array of text and then use the existing
GIN array support. But you'd pay for that by needing to write more
complicated queries to use the index.
regards, tom lane
Hello David,
Reading your mail, it seems that your data column contains 3 different kinds of atomic information:
* Characters 1-2
* Characters 3-4
* Characters 5-8
Does it make sense to split this data into 3 separate columns?
Each one could be indexed, and you can rebuild the original thanks to a generated column: https://www.postgresql.org/docs/current/ddl-generated-columns.html
HTH, Thomas
On Fri, Nov 3, 2023 at 3:34 PM David Gauthier <dfgpostgres@gmail.com> wrote:
I'm asking about the possibility of indexing portions of a column's value where the column has a static field format. Example, a char(8) which contains all hex values (basically a hex number that's always 8 chars wide, leading zeros if needed). Someone might want to select all recs where the first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match '00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".If I know the placement and width of the fields that need to be indexed, can indices be defined to facilitate queries ? Example...- match all 8 chars- match the 3rd and 4th chars- match the last 4 charsI suppose I could fragment the thing into multiple columns for the purposes of a search. So add a column called "last4" as a char(4) that matches the last 4 chars of that column. Then index that. etc... But inquiring to see if there is something more elegant.
Just curious why not store it as a HEX string (functional index, even).
And allow searching against the stringified hex, then your regex works?
And allow searching against the stringified hex, then your regex works?
Kirk
You can index on expressions, and these will be recognised by the query generator. drop schema tmp CASCADE; create schema tmp; set search_path = tmp; CREATE TABLE bagger ( eight CHAR(8) NOT NULL PRIMARY KEY , more text ); CREATE INDEX bagger_idx_12 ON bagger (substr(eight,1,2)); CREATE INDEX bagger_idx_34 ON bagger (substr(eight,3,2)); CREATE INDEX bagger_idx_58 ON bagger (substr(eight,5,4)); INSERT INTO bagger(eight, more) SELECT translate(to_hex( gs), ' ' , '0') , gs::text FROM generate_series(0,4000000000, 64999) gs ; VACUUM ANALYZE bagger; EXPLAIN ANALYZE SELECT * FROM bagger WHERE 1=1 -- AND eight >= '00' AND eight < '05' AND substr(eight, 1,2) >= '30' AND substr(eight,1,2) < '05' AND substr(eight, 3,2) >= '90' AND substr(eight,3,2) < 'A5' AND substr(eight, 5,4) >= '3333' AND substr(eight,5,4) < '4444' ; /*** The optimiser is smart enough to ignore one of the indexes. */ QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on bagger (cost=41.46..59.62 rows=1 width=19) (actual time=0.049..0.056 rows=0 loops=1) Recheck Cond: ((substr((eight)::text, 1, 2) >= '30'::text) AND (substr((eight)::text, 1, 2) < '05'::text) AND (substr((eight)::text, 3, 2) >= '90'::text) AND (substr((eight)::text, 3, 2) < 'A5'::text)) Filter: ((substr((eight)::text, 5, 4) >= '3333'::text) AND (substr((eight)::text, 5, 4) < '4444'::text)) -> BitmapAnd (cost=41.46..41.46 rows=16 width=0) (actual time=0.042..0.047 rows=0 loops=1) -> Bitmap Index Scan on bagger_idx_12 (cost=0.00..4.47 rows=308 width=0) (actual time=0.039..0.039 rows=0 loops=1) Index Cond: ((substr((eight)::text, 1, 2) >= '30'::text) AND (substr((eight)::text, 1, 2) < '05'::text)) -> Bitmap Index Scan on bagger_idx_34 (cost=0.00..36.74 rows=3205 width=0) (never executed) Index Cond: ((substr((eight)::text, 3, 2) >= '90'::text) AND (substr((eight)::text, 3, 2) < 'A5'::text)) Planning Time: 5.487 ms Execution Time: 0.310 ms (10 rows) HTH, AvK