Thread: citext performance

citext performance

From
Deepak Somaiya
Date:
Folks,
 I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.




"citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."


Here is what I have done 

drop table test;
drop table testci;

CREATE TABLE test (
id INTEGER PRIMARY KEY,
name character varying(254)
);
CREATE TABLE testci (
id INTEGER PRIMARY KEY,
name citext

);

INSERT INTO test(id, name)
SELECT generate_series(1000001,2000000), (md5(random()::text));

INSERT INTO testci(id, name)
SELECT generate_series(1,1000000), (md5(random()::text));


Now, I have done sequential search

explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan: 
    Node Type: "Seq Scan"
    Parallel Aware: false
    Relation Name: "test"
    Alias: "test"
    Startup Cost: 0.00
    Total Cost: 23334.00
    Plan Rows: 5000
    Plan Width: 37
    Actual Startup Time: 0.016
    Actual Total Time: 680.199
    Actual Rows: 1
    Actual Loops: 1
    Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
    Rows Removed by Filter: 999999
  Planning Time: 0.045
  Triggers: 
  Execution Time: 680.213


explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
- Plan: 
    Node Type: "Seq Scan"
    Parallel Aware: false
    Relation Name: "testci"
    Alias: "testci"
    Startup Cost: 0.00
    Total Cost: 20834.00
    Plan Rows: 1
    Plan Width: 37
    Actual Startup Time: 0.017
    Actual Total Time: 1184.485
    Actual Rows: 1
    Actual Loops: 1
    Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
    Rows Removed by Filter: 999999
  Planning Time: 0.029
  Triggers: 
  Execution Time: 1184.496



You can see sequential searches with lower working twice as fast as citext.

Now I added index on citext and equivalent functional index (lower) on text.


CREATE INDEX textlowerindex ON test (lower(name));
create index textindex on test(name);


Index creation took longer with citext v/s creating lower functional index.


Now here comes execution with indexes

explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');

- Plan: 
    Node Type: "Bitmap Heap Scan"
    Parallel Aware: false
    Relation Name: "test"
    Alias: "test"
    Startup Cost: 187.18
    Total Cost: 7809.06
    Plan Rows: 5000
    Plan Width: 37
    Actual Startup Time: 0.020
    Actual Total Time: 0.020
    Actual Rows: 1
    Actual Loops: 1
    Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
    Rows Removed by Index Recheck: 0
    Exact Heap Blocks: 1
    Lossy Heap Blocks: 0
    Plans: 
      - Node Type: "Bitmap Index Scan"
        Parent Relationship: "Outer"
        Parallel Aware: false
        Index Name: "textlowerindex"
        Startup Cost: 0.00
        Total Cost: 185.93
        Plan Rows: 5000
        Plan Width: 0
        Actual Startup Time: 0.016
        Actual Total Time: 0.016
        Actual Rows: 1
        Actual Loops: 1
        Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
  Planning Time: 0.051
  Triggers: 
  Execution Time: 0.035




explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');

- Plan: 
    Node Type: "Index Scan"
    Parallel Aware: false
    Scan Direction: "Forward"
    Index Name: "citextindex"
    Relation Name: "testci"
    Alias: "testci"
    Startup Cost: 0.42
    Total Cost: 8.44
    Plan Rows: 1
    Plan Width: 37
    Actual Startup Time: 0.049
    Actual Total Time: 0.050
    Actual Rows: 1
    Actual Loops: 1
    Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
    Rows Removed by Index Recheck: 0
  Planning Time: 0.051
  Triggers: 
  Execution Time: 0.064


Deepak

Re: citext performance

From
Nandakumar M
Date:
Hi,

I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).

Regards,
Nanda

On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote:
>
> Folks,
>  I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is
notperforming better than lower.Am I missing something? help is appreciated. 
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make
copiesof the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using
lowerto get case-insensitive matching." 
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
>     Node Type: "Seq Scan"
>     Parallel Aware: false
>     Relation Name: "test"
>     Alias: "test"
>     Startup Cost: 0.00
>     Total Cost: 23334.00
>     Plan Rows: 5000
>     Plan Width: 37
>     Actual Startup Time: 0.016
>     Actual Total Time: 680.199
>     Actual Rows: 1
>     Actual Loops: 1
>     Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>     Rows Removed by Filter: 999999
>   Planning Time: 0.045
>   Triggers:
>   Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
>     Node Type: "Seq Scan"
>     Parallel Aware: false
>     Relation Name: "testci"
>     Alias: "testci"
>     Startup Cost: 0.00
>     Total Cost: 20834.00
>     Plan Rows: 1
>     Plan Width: 37
>     Actual Startup Time: 0.017
>     Actual Total Time: 1184.485
>     Actual Rows: 1
>     Actual Loops: 1
>     Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>     Rows Removed by Filter: 999999
>   Planning Time: 0.029
>   Triggers:
>   Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>     Node Type: "Bitmap Heap Scan"
>     Parallel Aware: false
>     Relation Name: "test"
>     Alias: "test"
>     Startup Cost: 187.18
>     Total Cost: 7809.06
>     Plan Rows: 5000
>     Plan Width: 37
>     Actual Startup Time: 0.020
>     Actual Total Time: 0.020
>     Actual Rows: 1
>     Actual Loops: 1
>     Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>     Rows Removed by Index Recheck: 0
>     Exact Heap Blocks: 1
>     Lossy Heap Blocks: 0
>     Plans:
>       - Node Type: "Bitmap Index Scan"
>         Parent Relationship: "Outer"
>         Parallel Aware: false
>         Index Name: "textlowerindex"
>         Startup Cost: 0.00
>         Total Cost: 185.93
>         Plan Rows: 5000
>         Plan Width: 0
>         Actual Startup Time: 0.016
>         Actual Total Time: 0.016
>         Actual Rows: 1
>         Actual Loops: 1
>         Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>   Planning Time: 0.051
>   Triggers:
>   Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>     Node Type: "Index Scan"
>     Parallel Aware: false
>     Scan Direction: "Forward"
>     Index Name: "citextindex"
>     Relation Name: "testci"
>     Alias: "testci"
>     Startup Cost: 0.42
>     Total Cost: 8.44
>     Plan Rows: 1
>     Plan Width: 37
>     Actual Startup Time: 0.049
>     Actual Total Time: 0.050
>     Actual Rows: 1
>     Actual Loops: 1
>     Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>     Rows Removed by Index Recheck: 0
>   Planning Time: 0.051
>   Triggers:
>   Execution Time: 0.064
>
>
> Deepak


Re: citext performance

From
Deepak Somaiya
Date:
It is using index here , it is just that performance i.e query that use functional index (one with lower) is performing better then index created on citext column.

Deepak

On Sunday, April 8, 2018, 3:13:26 AM PDT, Nandakumar M <m.nanda92@gmail.com> wrote:


Hi,

I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).

Regards,
Nanda

On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote:
>
> Folks,
>  I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
>    Node Type: "Seq Scan"
>    Parallel Aware: false
>    Relation Name: "test"
>    Alias: "test"
>    Startup Cost: 0.00
>    Total Cost: 23334.00
>    Plan Rows: 5000
>    Plan Width: 37
>    Actual Startup Time: 0.016
>    Actual Total Time: 680.199
>    Actual Rows: 1
>    Actual Loops: 1
>    Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>    Rows Removed by Filter: 999999
>  Planning Time: 0.045
>  Triggers:
>  Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
>    Node Type: "Seq Scan"
>    Parallel Aware: false
>    Relation Name: "testci"
>    Alias: "testci"
>    Startup Cost: 0.00
>    Total Cost: 20834.00
>    Plan Rows: 1
>    Plan Width: 37
>    Actual Startup Time: 0.017
>    Actual Total Time: 1184.485
>    Actual Rows: 1
>    Actual Loops: 1
>    Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>    Rows Removed by Filter: 999999
>  Planning Time: 0.029
>  Triggers:
>  Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>    Node Type: "Bitmap Heap Scan"
>    Parallel Aware: false
>    Relation Name: "test"
>    Alias: "test"
>    Startup Cost: 187.18
>    Total Cost: 7809.06
>    Plan Rows: 5000
>    Plan Width: 37
>    Actual Startup Time: 0.020
>    Actual Total Time: 0.020
>    Actual Rows: 1
>    Actual Loops: 1
>    Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>    Rows Removed by Index Recheck: 0
>    Exact Heap Blocks: 1
>    Lossy Heap Blocks: 0
>    Plans:
>      - Node Type: "Bitmap Index Scan"
>        Parent Relationship: "Outer"
>        Parallel Aware: false
>        Index Name: "textlowerindex"
>        Startup Cost: 0.00
>        Total Cost: 185.93
>        Plan Rows: 5000
>        Plan Width: 0
>        Actual Startup Time: 0.016
>        Actual Total Time: 0.016
>        Actual Rows: 1
>        Actual Loops: 1
>        Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>  Planning Time: 0.051
>  Triggers:
>  Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>    Node Type: "Index Scan"
>    Parallel Aware: false
>    Scan Direction: "Forward"
>    Index Name: "citextindex"
>    Relation Name: "testci"
>    Alias: "testci"
>    Startup Cost: 0.42
>    Total Cost: 8.44
>    Plan Rows: 1
>    Plan Width: 37
>    Actual Startup Time: 0.049
>    Actual Total Time: 0.050
>    Actual Rows: 1
>    Actual Loops: 1
>    Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>    Rows Removed by Index Recheck: 0
>  Planning Time: 0.051
>  Triggers:
>  Execution Time: 0.064
>
>
> Deepak