Thread: Indexing fragments of a column's value ?

Indexing fragments of a column's value ?

From
David Gauthier
Date:
I'm asking about the possibility of indexing portions of a column's value where the column has a static field format.  Example, a char(8) which contains all hex values (basically a hex number that's always 8 chars wide, leading zeros if needed). Someone might want to select all recs where the first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match '00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".

If I know the placement and width of the fields that need to be indexed, can indices be defined to facilitate queries ?  Example...
- match all 8 chars
- match the 3rd and 4th chars
- match the last 4 chars

I suppose I could fragment the thing into multiple columns for the purposes of a search.  So add a column called "last4" as a char(4) that matches the last 4 chars of that column.  Then index that.  etc... But inquiring to see if there is something more elegant.

Thanks !

Re: Indexing fragments of a column's value ?

From
Tom Lane
Date:
David Gauthier <dfgpostgres@gmail.com> writes:
> I'm asking about the possibility of indexing portions of a column's value
> where the column has a static field format.

GIN indexes are meant for exactly that.  You might have to write your
own opclass to break up the input values in the way you want though.

A less difficult answer would be to write a function that breaks up
the input into (say) an array of text and then use the existing
GIN array support.  But you'd pay for that by needing to write more
complicated queries to use the index.

            regards, tom lane



Re: Indexing fragments of a column's value ?

From
Thomas Boussekey
Date:


Le ven. 3 nov. 2023 à 21:01, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
David Gauthier <dfgpostgres@gmail.com> writes:
> I'm asking about the possibility of indexing portions of a column's value
> where the column has a static field format.

GIN indexes are meant for exactly that.  You might have to write your
own opclass to break up the input values in the way you want though.

A less difficult answer would be to write a function that breaks up
the input into (say) an array of text and then use the existing
GIN array support.  But you'd pay for that by needing to write more
complicated queries to use the index.

                        regards, tom lane


Hello David,

Reading your mail, it seems that your data column contains 3 different kinds of atomic information:

* Characters 1-2
* Characters 3-4
* Characters 5-8

Does it make sense to split this data into 3 separate columns?
Each one could be indexed, and you can rebuild the original thanks to a generated column: https://www.postgresql.org/docs/current/ddl-generated-columns.html

HTH, Thomas

Re: Indexing fragments of a column's value ?

From
Kirk Wolak
Date:
On Fri, Nov 3, 2023 at 3:34 PM David Gauthier <dfgpostgres@gmail.com> wrote:
I'm asking about the possibility of indexing portions of a column's value where the column has a static field format.  Example, a char(8) which contains all hex values (basically a hex number that's always 8 chars wide, leading zeros if needed). Someone might want to select all recs where the first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match '00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".

If I know the placement and width of the fields that need to be indexed, can indices be defined to facilitate queries ?  Example...
- match all 8 chars
- match the 3rd and 4th chars
- match the last 4 chars

I suppose I could fragment the thing into multiple columns for the purposes of a search.  So add a column called "last4" as a char(4) that matches the last 4 chars of that column.  Then index that.  etc... But inquiring to see if there is something more elegant.

Just curious why not store it as a HEX string (functional index, even).
And allow searching against the stringified hex, then your regex works?

Kirk 

Re: Indexing fragments of a column's value ?

From
dld
Date:
You can index on expressions, and these will be recognised by the query 
generator.

drop schema tmp CASCADE;
create schema tmp;
set search_path = tmp;

CREATE TABLE bagger
         ( eight CHAR(8) NOT NULL PRIMARY KEY
         , more text
         );
CREATE INDEX bagger_idx_12 ON bagger (substr(eight,1,2));
CREATE INDEX bagger_idx_34 ON bagger (substr(eight,3,2));
CREATE INDEX bagger_idx_58 ON bagger (substr(eight,5,4));

INSERT INTO bagger(eight, more)
SELECT translate(to_hex( gs), ' ' , '0')
         , gs::text
FROM generate_series(0,4000000000, 64999) gs
         ;

VACUUM ANALYZE bagger;

EXPLAIN ANALYZE
SELECT * FROM bagger
WHERE 1=1
         -- AND eight >= '00' AND eight < '05'
         AND substr(eight, 1,2) >= '30' AND substr(eight,1,2) < '05'
         AND substr(eight, 3,2) >= '90' AND substr(eight,3,2) < 'A5'
         AND substr(eight, 5,4) >= '3333' AND substr(eight,5,4) < '4444'
         ;

/*** The optimiser is smart enough to ignore one of the indexes. */
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on bagger  (cost=41.46..59.62 rows=1 width=19) 
(actual time=0.049..0.056 rows=0 loops=1)
    Recheck Cond: ((substr((eight)::text, 1, 2) >= '30'::text) AND 
(substr((eight)::text, 1, 2) < '05'::text) AND (substr((eight)::text, 3, 
2) >= '90'::text) AND (substr((eight)::text, 3, 2) < 'A5'::text))
    Filter: ((substr((eight)::text, 5, 4) >= '3333'::text) AND 
(substr((eight)::text, 5, 4) < '4444'::text))
    ->  BitmapAnd  (cost=41.46..41.46 rows=16 width=0) (actual 
time=0.042..0.047 rows=0 loops=1)
          ->  Bitmap Index Scan on bagger_idx_12 (cost=0.00..4.47 
rows=308 width=0) (actual time=0.039..0.039 rows=0 loops=1)
                Index Cond: ((substr((eight)::text, 1, 2) >= '30'::text) 
AND (substr((eight)::text, 1, 2) < '05'::text))
          ->  Bitmap Index Scan on bagger_idx_34 (cost=0.00..36.74 
rows=3205 width=0) (never executed)
                Index Cond: ((substr((eight)::text, 3, 2) >= '90'::text) 
AND (substr((eight)::text, 3, 2) < 'A5'::text))
  Planning Time: 5.487 ms
  Execution Time: 0.310 ms
(10 rows)


HTH,

AvK