Thread: Custom Operator for citext LIKE predicates question

Custom Operator for citext LIKE predicates question

From
"Efrain J. Berdecia"
Date:
After attempting to use gin and gist indexes for our queries that run against citext columns, our team has come up with the following to make our queries run from 2 mins to 25ms;
CREATE EXTENSION pg_trgm
CREATE EXTENSION btree_gin --may not be needed, checking

CREATE OPERATOR CLASS gin_trgm_ops_ci_new
FOR TYPE citext USING gin
AS
OPERATOR 1 % (text, text),
FUNCTION 1 btint4cmp (int4, int4),
FUNCTION 2 gin_extract_value_trgm (text, internal),
FUNCTION 3 gin_extract_query_trgm (text, internal, int2, internal, internal, internal, internal),
FUNCTION 4 gin_trgm_consistent (internal,int2, text, int4, internal, internal, internal, internal),
STORAGE int4;

ALTER OPERATOR FAMILY gin_trgm_ops_ci_new USING gin ADD
OPERATOR 3 ~~ (citext, citext),
OPERATOR 4 ~~* (citext, citext);
ALTER OPERATOR FAMILY gin_trgm_ops_ci_new USING gin ADD
OPERATOR 7 %> (text, text),
FUNCTION 6 (text,text) gin_trgm_triconsistent (internal, int2, text, int4, internal, internal, internal);


Our question is, does anyone see any flaw on this? 

Also, could this not be incorporated into postgres natively?

I'm posting the old and new explain plans;

New explain;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------
 Aggregate  (cost=874327.76..874327.77 rows=1 width=8) (actual time=21.952..21.954 rows=1 loops=1)
->  Nested Loop  (cost=1620.95..874284.13 rows=17449 width=0) (actual time=6.259..21.948 rows=9 loops=1)
->  Bitmap Heap Scan on t775 b1  (cost=1620.39..525029.25 rows=45632 width=35) (actual time=6.212..8.189 rows=13 loops=1)
Recheck Cond: ((c240001002 ~~ 'smp%'::citext) OR (c200000020 ~~ 'smp%'::citext) OR (c200000001 ~~ 'smp%'::citext))
Rows Removed by Index Recheck: 259
Filter: ((c400079600 <> 'ABC_BUSINESSSERVICE'::citext) AND (c400127400 = 'ABC.ASSET'::citext) AND ((c1000000001 = 'Mrictton Global'::citext) OR (c1000000001 = 'ABCOpsMonitoring'::citext) OR (c100000
0001 = 'Mrictton'::citext) OR (c1000000001 = 'Mrictton EITTE'::citext) OR (c1000000001 = 'Mrictton Finance'::citext) OR (c1000000001 = 'Mrictton Generic Services and Support'::citext) OR (c1000000001 = 'Mrictton G
lobal'::citext) OR (c1000000001 = 'Mrictton Global Demo Solutions'::citext) OR (c1000000001 = 'Mrictton HR Direct'::citext) OR (c1000000001 = 'Mrictton Marketing and Communications'::citext) OR (c1000000001 = 'Eri
csson Master Data Management'::citext) OR (c1000000001 = 'Mrictton OHS'::citext) OR (c1000000001 = 'Mrictton Patents and Licensing'::citext) OR (c1000000001 = 'Mrictton Sales'::citext) OR (c1000000001 = 'Mrictton
Security'::citext) OR (c1000000001 = 'Mrictton Shared Services'::citext) OR (c1000000001 = 'Mrictton Sourcing'::citext) OR (c1000000001 = 'Mrictton Supply ROD'::citext) OR (c1000000001 = 'Mrictton SW Supply Operat
ions'::citext) OR (c1000000001 = 'Remedy,a ABC Software Company'::citext)) AND (c400079600 = ANY ('{ABC_DATABASE,ABC_ACCOUNT,ABC_MEDIA,ABC.CORE:ABC_CONCRETECOLLECTION,ABC_PACKAGE,ABC_BIOS,ABC_SYSTEMSOFTWARE,ABC_KEYBOARD,ABC_LAN,ABC_LOGICALSYSTEMCOMPONENT,ABC_LNSGROUP,ABC_PHYSICALLOCATION,ABC_FLOPPYDRIVE,ABC_DOCUMENT,ABC_BUSINESSSERVICE,ABC_DATABASESTORAGE,ABC_NETWORKPORT,ABC_VIRTUALSYSTEMENABLER,ABC_POINTINGDEVICE,ABC_PRINTER,ABC_SYSTEMRESOURCE,ABC_CONNECTIVITYSEGMENT,ABC.CORE:ABC_BUSINESSPROCESS,ABC_PROTOCOLENDPOINT,ABC_TRANSACTION,ABC_APPLICATIONINFRASTRUCTURE,ABC_SOFTWARESERVER,ABC_UPS,ABC_ACTIVITY,ABC_CDROMDRIVE,ABC.CORE:ABC_RASD,ABC_PRODUCT,ABC_REMOTEFILESYSTEM,ABC_IPENDPOINT,ABC_LOCALFILESYSTEM,ABC_APPLICATION,ABC_IPCONNECTIVITYSUBNET,ABC_CLUSTER,ABC_CHASSIS,ABC_WAN,ABC_PATCH,ABC_ADMINDOMAIN,ABC.CORE:ABC_RESOURCEPOOL,ABC_IPXCONNECTIVITYNETWORK,ABC_HARDWARESYSTEMCOMPONENT,ABC_FILESYSTEM,ABC_MONITOR,ABC_CONNECTIVITYGROUP,ABC_EQUIPMENT,ABC_MAINFRAME,ABC_RACK,ABC_OPERATINGSYSTEM,ABC_PROCESSOR,ABC_SHARE,ABC_LANENDPOINT,ABC_HARDWAREPACKAGE,ABC_TAPEDRIVE,ABC_COMMUNICATIONENDPOINT,ABC_APPLICATIONSYSTEM,ABC_CARD,ABC_DISKPARTITION,ABC.CORE:ABC_VIRTUALSYSTEMSETTINGDATA,ABC_MEMORY,ABC_NTDOMAIN,ABC_COMPUTERSYSTEM,ABC_DISKDRIVE,ABC_SERVICEOFFERINGINSTANCE,ABC_ROLE,ABC_APPLICATIONSERVICE}'::citext[])))
Rows Removed by Filter: 62
Heap Blocks: exact=313
->  BitmapOr  (cost=1620.39..1620.39 rows=163489 width=0) (actual time=5.703..5.704 rows=0 loops=1)
->  Bitmap Index Scan on oto2  (cost=0.00..528.72 rows=54496 width=0) (actual time=0.724..0.724 rows=41 loops=1)
Index Cond: (c240001002 ~~ 'smp%'::citext)
->  Bitmap Index Scan on oto3  (cost=0.00..528.72 rows=54496 width=0) (actual time=4.852..4.852 rows=331 loops=1)
Index Cond: (c200000020 ~~ 'smp%'::citext)
->  Bitmap Index Scan on oto4  (cost=0.00..528.72 rows=54496 width=0) (actual time=0.127..0.127 rows=0 loops=1)
Index Cond: (c200000001 ~~ 'smp%'::citext)
->  Index Scan using i1279_0_400129200_t1279 on t1279 b2  (cost=0.56..7.64 rows=1 width=35) (actual time=1.057..1.058 rows=1 loops=13)
Index Cond: (c400129200 = b1.c400129200)
Filter: ((c7 <> 6) AND (c7 <> 8))
Rows Removed by Filter: 0
Planning Time: 2.478 ms
Execution Time: 22.059 ms
(21 rows)

Time: 26.510 ms

Old explain with slow plan;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1926420.44..1926420.70 rows=102 width=1199) (actual time=16396.091..16569.194 rows=9 loops=1)
->  Sort  (cost=1926420.44..1926458.76 rows=15326 width=1199) (actual time=16396.089..16569.190 rows=9 loops=1)
Sort Key: b1.c200000020 NULLS FIRST, ((concat((concat(b1.c1, '|'))::citext, COALESCE(b2.c1, ''::citext)))::citext)
Sort Method: quicksort  Memory: 29kB
->  WindowAgg  (cost=1000.56..1925832.51 rows=15326 width=1199) (actual time=16396.025..16569.138 rows=9 loops=1)
->  Gather  (cost=1000.56..1925564.30 rows=15326 width=1191) (actual time=4288.742..16569.068 rows=9 loops=1)
Workers Planned: 6
Workers Launched: 6
->  Nested Loop  (cost=0.56..1923031.70 rows=2554 width=1191) (actual time=9430.362..16387.794 rows=1 loops=7)
->  Parallel Seq Scan on t1279 b2  (cost=0.00..530806.15 rows=416134 width=910) (actual time=0.016..575.311 rows=353200 loops=7)
Filter: ((c7 <> 6) AND (c7 <> 8))
Rows Removed by Filter: 574840
->  Index Scan using efrain_test_ix_t775_2 on t775 b1  (cost=0.56..3.34 rows=1 width=316) (actual time=0.044..0.044 rows=0 loops=2472402)
Index Cond: ((c400129200 = b2.c400129200) AND (c400127400 = 'ABC.ASSET'::citext))
Filter: ((c400079600 <> 'ABC_BUSINESSSERVICE'::citext) AND ((c240001002 ~~ 'smp%'::citext) OR (c200000020 ~~ 'smp%'::citext) OR (c200000001 ~~ 'smp%'::citext)) AND ((c1000000001 =
'Mrictton Global'::citext) OR (c1000000001 = 'ABCOpsMonitoring'::citext) OR (c1000000001 = 'Mrictton'::citext) OR (c1000000001 = 'Mrictton EITTE'::citext) OR (c1000000001 = 'Mrictton Finance'::citext) OR (c1000000001 = 'Mrictton Generic Services and Support'::citext) OR (c1000000001 = 'Mrictton Global'::citext) OR (c1000000001 = 'Mrictton Global Demo Solutions'::citext) OR (c1000000001 = 'Mrictton HR Direct'::citext) OR (c1000000001 = 'Mrictton Marketing and Communications'::citext) OR (c1000000001 = 'Mrictton Master Data Management'::citext) OR (c1000000001 = 'Mrictton OHS'::citext) OR (c1000000001 = 'Mrictton Patents and Licensing'::citext) OR (c1000000001 = 'Mrictton Sales'::citext) OR (c1000000001 = 'Mrictton Security'::citext) OR (c1000000001 = 'Mrictton Shared Services'::citext) OR (c1000000001 = 'Mrictton Sourcing'::citext) OR (c1000000001 = 'Mrictton Supply ROD'::citext) OR (c1000000001 = 'Mrictton SW Supply Operations'::citext) OR (c1000000001 = 'Remedy,a ABC Software Company'::citext)) AND (c400079600 = ANY ('{ABC_DATABASE,ABC_ACCOUNT,ABC_MEDIA,ABC.CORE:ABC_CONCRETECOLLECTION,ABC_PACKAGE,ABC_BIOS,ABC_SYSTEMSOFTWARE,ABC_KEYBOARD,ABC_LAN,ABC_LOGICALSYSTEMCOMPONENT,ABC_LNSGROUP,ABC_PHYSICALLOCATION,ABC_FLOPPYDRIVE,ABC_DOCUMENT,ABC_BUSINESSSERVICE,ABC_DATABASESTORAGE,ABC_NETWORKPORT,ABC_VIRTUALSYSTEMENABLER,ABC_POINTINGDEVICE,ABC_PRINTER,ABC_SYSTEMRESOURCE,ABC_CONNECTIVITYSEGMENT,ABC.CORE:ABC_BUSINESSPROCESS,ABC_PROTOCOLENDPOINT,ABC_TRANSACTION,ABC_APPLICATIONINFRASTRUCTURE,ABC_SOFTWARESERVER,ABC_UPS,ABC_ACTIVITY,ABC_CDROMDRIVE,ABC.CORE:ABC_RASD,ABC_PRODUCT,ABC_REMOTEFILESYSTEM,ABC_IPENDPOINT,ABC_LOCALFILESYSTEM,ABC_APPLICATION,ABC_IPCONNECTIVITYSUBNET,ABC_CLUSTER,ABC_CHASSIS,ABC_WAN,ABC_PATCH,ABC_ADMINDOMAIN,ABC.CORE:ABC_RESOURCEPOOL,ABC_IPXCONNECTIVITYNETWORK,ABC_HARDWARESYSTEMCOMPONENT,ABC_FILESYSTEM,ABC_MONITOR,ABC_CONNECTIVITYGROUP,ABC_EQUIPMENT,ABC_MAINFRAME,ABC_RACK,ABC_OPERATINGSYSTEM,ABC_PROCESSOR,ABC_SHARE,ABC_LANENDPOINT,ABC_HARDWAREPACKAGE,ABC_TAPEDRIVE,ABC_COMMUNICATIONENDPOINT,ABC_APPLICATIONSYSTEM,ABC_CARD,ABC_DISKPARTITION,ABC.CORE:ABC_VIRTUALSYSTEMSETTINGDATA,ABC_MEMORY,
ABC_NTDOMAIN,ABC_COMPUTERSYSTEM,ABC_DISKDRIVE,ABC_SERVICEOFFERINGINSTANCE,ABC_ROLE,ABC_APPLICATIONSERVICE}'::citext[])))
Rows Removed by Filter: 1
Planning Time: 3.205 ms
Execution Time: 16569.351 ms
(18 rows)

Time: 16577.806 ms (00:16.578)

Products
PostgreSQL Community Edition
Product Version
PostgreSQL 12


Thanks.

Re: Custom Operator for citext LIKE predicates question

From
Tom Lane
Date:
"Efrain J. Berdecia" <ejberdecia@yahoo.com> writes:
> After attempting to use gin and gist indexes for our queries that run against citext columns, our team has come up
withthe following to make our queries run from 2 mins to 25ms;CREATE EXTENSION pg_trgmCREATE EXTENSION btree_gin --may
notbe needed, checking 
> CREATE OPERATOR CLASS gin_trgm_ops_ci_newFOR TYPE citext USING ginASOPERATOR 1 % (text, text),FUNCTION 1 btint4cmp
(int4,int4),FUNCTION 2 gin_extract_value_trgm (text, internal),FUNCTION 3 gin_extract_query_trgm (text, internal, int2,
internal,internal, internal, internal),FUNCTION 4 gin_trgm_consistent (internal,int2, text, int4, internal, internal,
internal,internal),STORAGE int4; 
> ALTER OPERATOR FAMILY gin_trgm_ops_ci_new USING gin ADDOPERATOR 3 ~~ (citext, citext),OPERATOR 4 ~~* (citext,
citext);ALTEROPERATOR FAMILY gin_trgm_ops_ci_new USING gin ADDOPERATOR 7 %> (text, text),FUNCTION 6 (text,text)
gin_trgm_triconsistent(internal, int2, text, int4, internal, internal, internal); 

> Our question is, does anyone see any flaw on this? 

Umm ... does it actually work?  I'd expect that you get case-sensitive
comparison behavior in such an index, because those support functions
are for plain text and they're not going to know that you'd like
case-insensitive behavior.

You generally can't make a new gin or gist opclass without actually
writing some C code, because the support functions embody all
the semantics of the operators.

            regards, tom lane



Re: Custom Operator for citext LIKE predicates question

From
"Efrain J. Berdecia"
Date:
Thank you for the feedback.

In our setup it has actually worked per the explains provided making the query run in milliseconds instead of seconds.

We weren't sure if this should be something that could be added natively with future Postgres deployments.

Thanks,
Efrain J. Berdecia


On Thursday, January 13, 2022, 12:58:27 AM EST, Tom Lane <tgl@sss.pgh.pa.us> wrote:


"Efrain J. Berdecia" <ejberdecia@yahoo.com> writes:

> After attempting to use gin and gist indexes for our queries that run against citext columns, our team has come up with the following to make our queries run from 2 mins to 25ms;CREATE EXTENSION pg_trgmCREATE EXTENSION btree_gin --may not be needed, checking
> CREATE OPERATOR CLASS gin_trgm_ops_ci_newFOR TYPE citext USING ginASOPERATOR 1 % (text, text),FUNCTION 1 btint4cmp (int4, int4),FUNCTION 2 gin_extract_value_trgm (text, internal),FUNCTION 3 gin_extract_query_trgm (text, internal, int2, internal, internal, internal, internal),FUNCTION 4 gin_trgm_consistent (internal,int2, text, int4, internal, internal, internal, internal),STORAGE int4;
> ALTER OPERATOR FAMILY gin_trgm_ops_ci_new USING gin ADDOPERATOR 3 ~~ (citext, citext),OPERATOR 4 ~~* (citext, citext);ALTER OPERATOR FAMILY gin_trgm_ops_ci_new USING gin ADDOPERATOR 7 %> (text, text),FUNCTION 6 (text,text) gin_trgm_triconsistent (internal, int2, text, int4, internal, internal, internal);

> Our question is, does anyone see any flaw on this? 


Umm ... does it actually work?  I'd expect that you get case-sensitive
comparison behavior in such an index, because those support functions
are for plain text and they're not going to know that you'd like
case-insensitive behavior.

You generally can't make a new gin or gist opclass without actually
writing some C code, because the support functions embody all
the semantics of the operators.

            regards, tom lane

Re: Custom Operator for citext LIKE predicates question

From
Tom Lane
Date:
"Efrain J. Berdecia" <ejberdecia@yahoo.com> writes:
> In our setup it has actually worked per the explains provided making the query run in milliseconds instead of
seconds.

To me, "work" includes "get the right answer".  I do not think you
are getting the same answers that citext would normally provide.
If you don't care about case-insensitivity, why don't you just
use plain text?

            regards, tom lane



Re: Custom Operator for citext LIKE predicates question

From
"Efrain J. Berdecia"
Date:
Good points. At least on the limited testing we did, we were able to get the same answer back with both executions; at least for the use cases we tested. 

We are still doing more thourough testing.

This is an application that is been ported from MS SQL server to postgres and apparently the migration dba team determined citext was the way to go to maintain MSSQL existing usage of the data in the columns.


Thanks,
Efrain J. Berdecia


On Thursday, January 13, 2022, 10:10:38 AM EST, Tom Lane <tgl@sss.pgh.pa.us> wrote:


"Efrain J. Berdecia" <ejberdecia@yahoo.com> writes:
> In our setup it has actually worked per the explains provided making the query run in milliseconds instead of seconds.

To me, "work" includes "get the right answer".  I do not think you
are getting the same answers that citext would normally provide.
If you don't care about case-insensitivity, why don't you just
use plain text?


            regards, tom lane