Thread: Indexing fragments of a column's value ?

Indexing fragments of a column's value ?

From

David Gauthier

Date:

03 November 2023, 19:33:51

I'm asking about the possibility of indexing portions of a column's value where the column has a static field format. Example, a char(8) which contains all hex values (basically a hex number that's always 8 chars wide, leading zeros if needed). Someone might want to select all recs where the first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match '00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".

If I know the placement and width of the fields that need to be indexed, can indices be defined to facilitate queries ? Example...

- match all 8 chars

- match the 3rd and 4th chars

- match the last 4 chars

I suppose I could fragment the thing into multiple columns for the purposes of a search. So add a column called "last4" as a char(4) that matches the last 4 chars of that column. Then index that. etc... But inquiring to see if there is something more elegant.

Thanks !

Re: Indexing fragments of a column's value ?

From

Tom Lane

Date:

03 November 2023, 20:01:30

David Gauthier <dfgpostgres@gmail.com> writes:
> I'm asking about the possibility of indexing portions of a column's value
> where the column has a static field format.

GIN indexes are meant for exactly that.  You might have to write your
own opclass to break up the input values in the way you want though.

A less difficult answer would be to write a function that breaks up
the input into (say) an array of text and then use the existing
GIN array support.  But you'd pay for that by needing to write more
complicated queries to use the index.

            regards, tom lane

Re: Indexing fragments of a column's value ?

From

Thomas Boussekey

Date:

03 November 2023, 21:07:58

Le ven. 3 nov. 2023 à 21:01, Tom Lane <tgl@sss.pgh.pa.us> a écrit :

David Gauthier <dfgpostgres@gmail.com> writes:
> I'm asking about the possibility of indexing portions of a column's value
> where the column has a static field format.

GIN indexes are meant for exactly that. You might have to write your
own opclass to break up the input values in the way you want though.

A less difficult answer would be to write a function that breaks up
the input into (say) an array of text and then use the existing
GIN array support. But you'd pay for that by needing to write more
complicated queries to use the index.

regards, tom lane

Hello David,

Reading your mail, it seems that your data column contains 3 different kinds of atomic information:

* Characters 1-2

* Characters 3-4

* Characters 5-8

Does it make sense to split this data into 3 separate columns?

Each one could be indexed, and you can rebuild the original thanks to a generated column: https://www.postgresql.org/docs/current/ddl-generated-columns.html

HTH, Thomas

Re: Indexing fragments of a column's value ?

From

Kirk Wolak

Date:

13 November 2023, 19:33:22

On Fri, Nov 3, 2023 at 3:34 PM David Gauthier <dfgpostgres@gmail.com> wrote:

I'm asking about the possibility of indexing portions of a column's value where the column has a static field format. Example, a char(8) which contains all hex values (basically a hex number that's always 8 chars wide, leading zeros if needed). Someone might want to select all recs where the first 2 digits are 'ff' or maybe the last 4 hex digits match regexp_match '00[cdef]{2}' or maybe a match of the entire string... "0dd63a87".

If I know the placement and width of the fields that need to be indexed, can indices be defined to facilitate queries ? Example...
- match all 8 chars
- match the 3rd and 4th chars
- match the last 4 chars

I suppose I could fragment the thing into multiple columns for the purposes of a search. So add a column called "last4" as a char(4) that matches the last 4 chars of that column. Then index that. etc... But inquiring to see if there is something more elegant.

Just curious why not store it as a HEX string (functional index, even).
And allow searching against the stringified hex, then your regex works?

Kirk

Re: Indexing fragments of a column's value ?

From

dld

Date:

14 November 2023, 15:49:46

You can index on expressions, and these will be recognised by the query 
generator.

drop schema tmp CASCADE;
create schema tmp;
set search_path = tmp;

CREATE TABLE bagger
         ( eight CHAR(8) NOT NULL PRIMARY KEY
         , more text
         );
CREATE INDEX bagger_idx_12 ON bagger (substr(eight,1,2));
CREATE INDEX bagger_idx_34 ON bagger (substr(eight,3,2));
CREATE INDEX bagger_idx_58 ON bagger (substr(eight,5,4));

INSERT INTO bagger(eight, more)
SELECT translate(to_hex( gs), ' ' , '0')
         , gs::text
FROM generate_series(0,4000000000, 64999) gs
         ;

VACUUM ANALYZE bagger;

EXPLAIN ANALYZE
SELECT * FROM bagger
WHERE 1=1
         -- AND eight >= '00' AND eight < '05'
         AND substr(eight, 1,2) >= '30' AND substr(eight,1,2) < '05'
         AND substr(eight, 3,2) >= '90' AND substr(eight,3,2) < 'A5'
         AND substr(eight, 5,4) >= '3333' AND substr(eight,5,4) < '4444'
         ;

/*** The optimiser is smart enough to ignore one of the indexes. */
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Bitmap Heap Scan on bagger  (cost=41.46..59.62 rows=1 width=19) 
(actual time=0.049..0.056 rows=0 loops=1)
    Recheck Cond: ((substr((eight)::text, 1, 2) >= '30'::text) AND 
(substr((eight)::text, 1, 2) < '05'::text) AND (substr((eight)::text, 3, 
2) >= '90'::text) AND (substr((eight)::text, 3, 2) < 'A5'::text))
    Filter: ((substr((eight)::text, 5, 4) >= '3333'::text) AND 
(substr((eight)::text, 5, 4) < '4444'::text))
    ->  BitmapAnd  (cost=41.46..41.46 rows=16 width=0) (actual 
time=0.042..0.047 rows=0 loops=1)
          ->  Bitmap Index Scan on bagger_idx_12 (cost=0.00..4.47 
rows=308 width=0) (actual time=0.039..0.039 rows=0 loops=1)
                Index Cond: ((substr((eight)::text, 1, 2) >= '30'::text) 
AND (substr((eight)::text, 1, 2) < '05'::text))
          ->  Bitmap Index Scan on bagger_idx_34 (cost=0.00..36.74 
rows=3205 width=0) (never executed)
                Index Cond: ((substr((eight)::text, 3, 2) >= '90'::text) 
AND (substr((eight)::text, 3, 2) < 'A5'::text))
  Planning Time: 5.487 ms
  Execution Time: 0.310 ms
(10 rows)


HTH,

AvK