Thread: Pattern matching ints

Pattern matching ints

From
Tim Smith
Date:
Hi,

Is there a more efficient way to pattern match integer columns other
than something like :

where cast(mynumber as text) ~ '.*123.*'


I also seem to recall you can't create indexes on casts either ?


Thx

Tim


Re: Pattern matching ints

From
Albe Laurenz
Date:
Tim Smith wrote:
> Is there a more efficient way to pattern match integer columns other
> than something like :
> 
> where cast(mynumber as text) ~ '.*123.*'
> 
> 
> I also seem to recall you can't create indexes on casts either ?

I don't think you can do this without converting the column to a string.

I guess that you can create an index if the cast function is immutable;
at any rate you can create an index on intcolumn::text.

But such an index would not help you with a query like the one you
show above.  The only thing that could speed up such a query would
be a trigram index on the string representation of the value.

Yours,
Laurenz Albe

Re: Pattern matching ints

From
Ian Barwick
Date:
On 26/01/15 20:32, Tim Smith wrote:
> Hi,
>
> Is there a more efficient way to pattern match integer columns other
> than something like :
>
> where cast(mynumber as text) ~ '.*123.*'
>
>
> I also seem to recall you can't create indexes on casts either ?

This is perfectly possible:

    postgres=# CREATE TABLE foo (id INT);
    CREATE TABLE
    postgres=# CREATE INDEX idx ON foo (CAST(id AS TEXT) text_pattern_ops);
    CREATE INDEX
    postgres=# INSERT INTO foo values(generate_series(1,1000000));
    INSERT 0 1000000

but not necessarily useful...

    postgres=# explain analyze SELECT * from foo where id::text ~ '.*12345.*';
                                                  QUERY PLAN
    -------------------------------------------------------------------------------------------------------
     Seq Scan on foo  (cost=0.00..21925.00 rows=100 width=4) (actual time=17.331..961.384 rows=20 loops=1)
       Filter: ((id)::text ~ '.*12345.*'::text)
       Rows Removed by Filter: 999980
     Planning time: 0.296 ms
     Execution time: 961.411 ms
    (5 rows)

However you might find the pg_trgm extension [1] useful:

    postgres=# CREATE TABLE foo (id INT);
    CREATE TABLE
    postgres=# CREATE INDEX trgm_idx ON foo using gist(cast(id as text) gist_trgm_ops);
    CREATE INDEX
    postgres=# INSERT INTO foo values(generate_series(1,1000000));
    INSERT 0 1000000
    postgres=# explain analyze SELECT * from foo where id::text ~ '.*12345.*';
                                                           QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on foo  (cost=200.47..4938.11 rows=5184 width=4) (actual time=61.163..61.211 rows=20 loops=1)
       Recheck Cond: ((id)::text ~ '.*12345.*'::text)
       Heap Blocks: exact=11
       ->  Bitmap Index Scan on trgm_idx  (cost=0.00..199.17 rows=5184 width=0) (actual time=61.140..61.140 rows=20
loops=1)
             Index Cond: ((id)::text ~ '.*12345.*'::text)
     Planning time: 0.241 ms
     Execution time: 61.257 ms
    (7 rows)


[1] http://www.postgresql.org/docs/current/interactive/pgtrgm.html


Regards


Ian Barwick

--
 Ian Barwick                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Pattern matching ints

From
Tim Smith
Date:
Ian,

Re: However you might find the pg_trgm extension [1] useful:

Indeed... pretty awesome.

Thanks !

On 26 January 2015 at 12:55, Ian Barwick <ian@2ndquadrant.com> wrote:
> On 26/01/15 20:32, Tim Smith wrote:
>> Hi,
>>
>> Is there a more efficient way to pattern match integer columns other
>> than something like :
>>
>> where cast(mynumber as text) ~ '.*123.*'
>>
>>
>> I also seem to recall you can't create indexes on casts either ?
>
> This is perfectly possible:
>
>     postgres=# CREATE TABLE foo (id INT);
>     CREATE TABLE
>     postgres=# CREATE INDEX idx ON foo (CAST(id AS TEXT) text_pattern_ops);
>     CREATE INDEX
>     postgres=# INSERT INTO foo values(generate_series(1,1000000));
>     INSERT 0 1000000
>
> but not necessarily useful...
>
>     postgres=# explain analyze SELECT * from foo where id::text ~ '.*12345.*';
>                                                   QUERY PLAN
>     -------------------------------------------------------------------------------------------------------
>      Seq Scan on foo  (cost=0.00..21925.00 rows=100 width=4) (actual time=17.331..961.384 rows=20 loops=1)
>        Filter: ((id)::text ~ '.*12345.*'::text)
>        Rows Removed by Filter: 999980
>      Planning time: 0.296 ms
>      Execution time: 961.411 ms
>     (5 rows)
>
> However you might find the pg_trgm extension [1] useful:
>
>     postgres=# CREATE TABLE foo (id INT);
>     CREATE TABLE
>     postgres=# CREATE INDEX trgm_idx ON foo using gist(cast(id as text) gist_trgm_ops);
>     CREATE INDEX
>     postgres=# INSERT INTO foo values(generate_series(1,1000000));
>     INSERT 0 1000000
>     postgres=# explain analyze SELECT * from foo where id::text ~ '.*12345.*';
>                                                            QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------
>      Bitmap Heap Scan on foo  (cost=200.47..4938.11 rows=5184 width=4) (actual time=61.163..61.211 rows=20 loops=1)
>        Recheck Cond: ((id)::text ~ '.*12345.*'::text)
>        Heap Blocks: exact=11
>        ->  Bitmap Index Scan on trgm_idx  (cost=0.00..199.17 rows=5184 width=0) (actual time=61.140..61.140 rows=20
loops=1)
>              Index Cond: ((id)::text ~ '.*12345.*'::text)
>      Planning time: 0.241 ms
>      Execution time: 61.257 ms
>     (7 rows)
>
>
> [1] http://www.postgresql.org/docs/current/interactive/pgtrgm.html
>
>
> Regards
>
>
> Ian Barwick
>
> --
>  Ian Barwick                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services