Re: citext performance - Mailing list pgsql-performance

From Nandakumar M
Subject Re: citext performance
Date
Msg-id CANcFUu6vPXrWNtBXYw-usj2OsRtFrAKnP=s-QZ0teeFcN4gk5g@mail.gmail.com
Whole thread Raw
In response to citext performance  (Deepak Somaiya <deepsom@yahoo.com>)
Responses Re: citext performance
List pgsql-performance
Hi,

I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).

Regards,
Nanda

On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote:
>
> Folks,
>  I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is
notperforming better than lower.Am I missing something? help is appreciated. 
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make
copiesof the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using
lowerto get case-insensitive matching." 
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
>     Node Type: "Seq Scan"
>     Parallel Aware: false
>     Relation Name: "test"
>     Alias: "test"
>     Startup Cost: 0.00
>     Total Cost: 23334.00
>     Plan Rows: 5000
>     Plan Width: 37
>     Actual Startup Time: 0.016
>     Actual Total Time: 680.199
>     Actual Rows: 1
>     Actual Loops: 1
>     Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>     Rows Removed by Filter: 999999
>   Planning Time: 0.045
>   Triggers:
>   Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
>     Node Type: "Seq Scan"
>     Parallel Aware: false
>     Relation Name: "testci"
>     Alias: "testci"
>     Startup Cost: 0.00
>     Total Cost: 20834.00
>     Plan Rows: 1
>     Plan Width: 37
>     Actual Startup Time: 0.017
>     Actual Total Time: 1184.485
>     Actual Rows: 1
>     Actual Loops: 1
>     Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>     Rows Removed by Filter: 999999
>   Planning Time: 0.029
>   Triggers:
>   Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>     Node Type: "Bitmap Heap Scan"
>     Parallel Aware: false
>     Relation Name: "test"
>     Alias: "test"
>     Startup Cost: 187.18
>     Total Cost: 7809.06
>     Plan Rows: 5000
>     Plan Width: 37
>     Actual Startup Time: 0.020
>     Actual Total Time: 0.020
>     Actual Rows: 1
>     Actual Loops: 1
>     Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>     Rows Removed by Index Recheck: 0
>     Exact Heap Blocks: 1
>     Lossy Heap Blocks: 0
>     Plans:
>       - Node Type: "Bitmap Index Scan"
>         Parent Relationship: "Outer"
>         Parallel Aware: false
>         Index Name: "textlowerindex"
>         Startup Cost: 0.00
>         Total Cost: 185.93
>         Plan Rows: 5000
>         Plan Width: 0
>         Actual Startup Time: 0.016
>         Actual Total Time: 0.016
>         Actual Rows: 1
>         Actual Loops: 1
>         Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>   Planning Time: 0.051
>   Triggers:
>   Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>     Node Type: "Index Scan"
>     Parallel Aware: false
>     Scan Direction: "Forward"
>     Index Name: "citextindex"
>     Relation Name: "testci"
>     Alias: "testci"
>     Startup Cost: 0.42
>     Total Cost: 8.44
>     Plan Rows: 1
>     Plan Width: 37
>     Actual Startup Time: 0.049
>     Actual Total Time: 0.050
>     Actual Rows: 1
>     Actual Loops: 1
>     Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>     Rows Removed by Index Recheck: 0
>   Planning Time: 0.051
>   Triggers:
>   Execution Time: 0.064
>
>
> Deepak


pgsql-performance by date:

Previous
From: Deepak Somaiya
Date:
Subject: citext performance
Next
From: Deepak Somaiya
Date:
Subject: Re: citext performance