using regular expression queries with regular expression indexes - Mailing list pgsql-admin

From Mark Steben
Subject using regular expression queries with regular expression indexes
Date
Msg-id CADyzmyykegZ3ZSVm9hYYKirJYGUYYAx_cK1VQYPCz1q+EpQ89g@mail.gmail.com
Whole thread Raw
Responses Re: using regular expression queries with regular expression indexes
Re: using regular expression queries with regular expression indexes
List pgsql-admin
Good afternoon,
We are currently running postgres version 12.2. I have a query that uses regexp code to confirm if a text field is a date or not.  It is resource intensive and long running.  I have made a regexp index (actually several flavors of regexp indexes) to aid in this queries performance. However I cannot seem to 'convince' the planner to use the indexes I create. 
 Following is table/index construction for your reference:

 \d customertab
                                                Table "public.customertab"
          Column           |            Type             | Collation | Nullable |                     Default                    
---------------------------+-----------------------------+-----------+----------+-------------------------------------------------
 id                        | bigint                      |           | not null | nextval('customer_attributes_id_seq'::regclass)
 customer_attribute_key_id | integer                     |           |          |
 attributable_id           | integer                     |           |          |
 attributable_type         | character varying(255)      |           |          |
 value                     | character varying(255)      |           |          |
 created_at                | timestamp without time zone |           |          |
 updated_at                | timestamp without time zone |           |          |
 group_num                 | character varying(255)      |           |          |
 customer_transaction_id   | integer                     |           |          |
 account_id                | integer                     |           |          |
Indexes:
    "primets_custattr_pkey" PRIMARY KEY, btree (id)
    "custattr_account_id" btree (account_id)
    "custattr_value" btree (value) WHERE value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text AND value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text AND account_id IS NOT NULL
   
    "custattr_value_acctid_keyid" btree (value, account_id, customer_attribute_key_id) WHERE value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text AND value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text AND account_id IS NOT NULL
    
"custattr_value_includes" btree (value) INCLUDE (id, account_id, group_num) WHERE value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text AND value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text AND account_id IS NOT NULL

The query I am attempting to run:

explain select * FROM customertab ca
                 where ca.value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
                 AND ca.value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text
                   AND ca.account_id IS NOT NULL

and the explain:
 Bitmap Heap Scan on customertab ca  (cost=9907.40..903793.74 rows=479041 width=95)
   Recheck Cond: (((value)::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text) AND ((value)::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text) AND (accou
nt_id IS NOT NULL))
   ->  Bitmap Index Scan on custattr_value  (cost=0.00..9787.64 rows=479041 width=0)
(3 rows)


Finally if I remove the 'ca.account_id is not null' expression

  explain select * FROM customertab ca
                 where ca.value::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text
                 AND ca.value::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text  

This is the explain:
 Gather  (cost=1000.00..1641792.30 rows=479041 width=95)
   Workers Planned: 2

   ->  Parallel Seq Scan on customertab ca  (cost=0.00..1592888.20 rows=199600 width=95)
         Filter: (((value)::text ~ '^([0-9][0-9][0-9][0-9])-(([0]?[1-9])|([1][0-2]))-(([0-2]?[0-9])|([3][0-1]))$'::text) AND ((value)::text ~ '([0-9][0-9][0-9][0-9])-(0[13578]|1[02])-(0[1-9]|[12][0-9]|3[01])|([0-9][0-9][0
-9][0-9])-(0[469]|11])-(0[1-9]|[12][0-9]|30)|([0-9][0-9][0248][048]|[0-9][0-9][13579][26])-(02)-(0[1-9]|1[0-9]|2[0-9])|([0-9][0-9][0248][1235679]|[0-9][0-9][13579][01345789])-(02)-(0[1-9]|1[0-9]|2[0-8])'::text))
(4 rows)


Is there something else I can try, or am I missing something?  Comments welcome.





   

--
Mark Steben
 Database Administrator
@utoRevenue | Autobase 
  CRM division of Dominion Dealer Solutions 
95D Ashley Ave.
West Springfield, MA 01089

t: 413.327-3045
f: 413.383-9567

www.fb.com/DominionDealerSolutions
www.twitter.com/DominionDealer
 www.drivedominion.com





pgsql-admin by date:

Previous
From: Thomas Kellerer
Date:
Subject: Re: Performance of SELECT directly and from application
Next
From: Tom Lane
Date:
Subject: Re: using regular expression queries with regular expression indexes