Thread: citext performance

citext performance

From

Deepak Somaiya

Date:

06 April 2018, 22:51:14

Folks,

I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.

PostgreSQL: Documentation: 9.6: citext

"citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."

Here is what I have done

drop table test;

drop table testci;

CREATE TABLE test (

id INTEGER PRIMARY KEY,

name character varying(254)

);

CREATE TABLE testci (

id INTEGER PRIMARY KEY,

name citext

);

INSERT INTO test(id, name)

SELECT generate_series(1000001,2000000), (md5(random()::text));

INSERT INTO testci(id, name)

SELECT generate_series(1,1000000), (md5(random()::text));

Now, I have done sequential search

explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');

- Plan:

Node Type: "Seq Scan"

Parallel Aware: false

Relation Name: "test"

Alias: "test"

Startup Cost: 0.00

Total Cost: 23334.00

Plan Rows: 5000

Plan Width: 37

Actual Startup Time: 0.016

Actual Total Time: 680.199

Actual Rows: 1

Actual Loops: 1

Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"

Rows Removed by Filter: 999999

Planning Time: 0.045

Triggers:

Execution Time: 680.213

explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';

- Plan:

Node Type: "Seq Scan"

Parallel Aware: false

Relation Name: "testci"

Alias: "testci"

Startup Cost: 0.00

Total Cost: 20834.00

Plan Rows: 1

Plan Width: 37

Actual Startup Time: 0.017

Actual Total Time: 1184.485

Actual Rows: 1

Actual Loops: 1

Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"

Rows Removed by Filter: 999999

Planning Time: 0.029

Triggers:

Execution Time: 1184.496

You can see sequential searches with lower working twice as fast as citext.

Now I added index on citext and equivalent functional index (lower) on text.

CREATE INDEX textlowerindex ON test (lower(name));

create index textindex on test(name);

Index creation took longer with citext v/s creating lower functional index.

Now here comes execution with indexes

explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');

- Plan:

Node Type: "Bitmap Heap Scan"

Parallel Aware: false

Relation Name: "test"

Alias: "test"

Startup Cost: 187.18

Total Cost: 7809.06

Plan Rows: 5000

Plan Width: 37

Actual Startup Time: 0.020

Actual Total Time: 0.020

Actual Rows: 1

Actual Loops: 1

Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"

Rows Removed by Index Recheck: 0

Exact Heap Blocks: 1

Lossy Heap Blocks: 0

Plans:

- Node Type: "Bitmap Index Scan"

Parent Relationship: "Outer"

Parallel Aware: false

Index Name: "textlowerindex"

Startup Cost: 0.00

Total Cost: 185.93

Plan Rows: 5000

Plan Width: 0

Actual Startup Time: 0.016

Actual Total Time: 0.016

Actual Rows: 1

Actual Loops: 1

Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"

Planning Time: 0.051

Triggers:

Execution Time: 0.035

explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');

- Plan:

Node Type: "Index Scan"

Parallel Aware: false

Scan Direction: "Forward"

Index Name: "citextindex"

Relation Name: "testci"

Alias: "testci"

Startup Cost: 0.42

Total Cost: 8.44

Plan Rows: 1

Plan Width: 37

Actual Startup Time: 0.049

Actual Total Time: 0.050

Actual Rows: 1

Actual Loops: 1

Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"

Rows Removed by Index Recheck: 0

Planning Time: 0.051

Triggers:

Execution Time: 0.064

Deepak

Re: citext performance

From

Nandakumar M

Date:

08 April 2018, 16:12:46

Hi,

I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).

Regards,
Nanda

On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote:
>
> Folks,
>  I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is
notperforming better than lower.Am I missing something? help is appreciated. 
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make
copiesof the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using
lowerto get case-insensitive matching." 
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
>     Node Type: "Seq Scan"
>     Parallel Aware: false
>     Relation Name: "test"
>     Alias: "test"
>     Startup Cost: 0.00
>     Total Cost: 23334.00
>     Plan Rows: 5000
>     Plan Width: 37
>     Actual Startup Time: 0.016
>     Actual Total Time: 680.199
>     Actual Rows: 1
>     Actual Loops: 1
>     Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>     Rows Removed by Filter: 999999
>   Planning Time: 0.045
>   Triggers:
>   Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
>     Node Type: "Seq Scan"
>     Parallel Aware: false
>     Relation Name: "testci"
>     Alias: "testci"
>     Startup Cost: 0.00
>     Total Cost: 20834.00
>     Plan Rows: 1
>     Plan Width: 37
>     Actual Startup Time: 0.017
>     Actual Total Time: 1184.485
>     Actual Rows: 1
>     Actual Loops: 1
>     Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>     Rows Removed by Filter: 999999
>   Planning Time: 0.029
>   Triggers:
>   Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>     Node Type: "Bitmap Heap Scan"
>     Parallel Aware: false
>     Relation Name: "test"
>     Alias: "test"
>     Startup Cost: 187.18
>     Total Cost: 7809.06
>     Plan Rows: 5000
>     Plan Width: 37
>     Actual Startup Time: 0.020
>     Actual Total Time: 0.020
>     Actual Rows: 1
>     Actual Loops: 1
>     Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>     Rows Removed by Index Recheck: 0
>     Exact Heap Blocks: 1
>     Lossy Heap Blocks: 0
>     Plans:
>       - Node Type: "Bitmap Index Scan"
>         Parent Relationship: "Outer"
>         Parallel Aware: false
>         Index Name: "textlowerindex"
>         Startup Cost: 0.00
>         Total Cost: 185.93
>         Plan Rows: 5000
>         Plan Width: 0
>         Actual Startup Time: 0.016
>         Actual Total Time: 0.016
>         Actual Rows: 1
>         Actual Loops: 1
>         Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
>   Planning Time: 0.051
>   Triggers:
>   Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
>     Node Type: "Index Scan"
>     Parallel Aware: false
>     Scan Direction: "Forward"
>     Index Name: "citextindex"
>     Relation Name: "testci"
>     Alias: "testci"
>     Startup Cost: 0.42
>     Total Cost: 8.44
>     Plan Rows: 1
>     Plan Width: 37
>     Actual Startup Time: 0.049
>     Actual Total Time: 0.050
>     Actual Rows: 1
>     Actual Loops: 1
>     Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
>     Rows Removed by Index Recheck: 0
>   Planning Time: 0.051
>   Triggers:
>   Execution Time: 0.064
>
>
> Deepak

Re: citext performance

From

Deepak Somaiya

Date:

09 April 2018, 02:14:29

It is using index here , it is just that performance i.e query that use functional index (one with lower) is performing better then index created on citext column.

Deepak

On Sunday, April 8, 2018, 3:13:26 AM PDT, Nandakumar M <m.nanda92@gmail.com> wrote:

Hi,

I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).

Regards,
Nanda

On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote:
>
> Folks,
> I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
> Node Type: "Seq Scan"
> Parallel Aware: false
> Relation Name: "test"
> Alias: "test"
> Startup Cost: 0.00
> Total Cost: 23334.00
> Plan Rows: 5000
> Plan Width: 37
> Actual Startup Time: 0.016
> Actual Total Time: 680.199
> Actual Rows: 1
> Actual Loops: 1
> Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Rows Removed by Filter: 999999
> Planning Time: 0.045
> Triggers:
> Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
> Node Type: "Seq Scan"
> Parallel Aware: false
> Relation Name: "testci"
> Alias: "testci"
> Startup Cost: 0.00
> Total Cost: 20834.00
> Plan Rows: 1
> Plan Width: 37
> Actual Startup Time: 0.017
> Actual Total Time: 1184.485
> Actual Rows: 1
> Actual Loops: 1
> Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
> Rows Removed by Filter: 999999
> Planning Time: 0.029
> Triggers:
> Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
> Node Type: "Bitmap Heap Scan"
> Parallel Aware: false
> Relation Name: "test"
> Alias: "test"
> Startup Cost: 187.18
> Total Cost: 7809.06
> Plan Rows: 5000
> Plan Width: 37
> Actual Startup Time: 0.020
> Actual Total Time: 0.020
> Actual Rows: 1
> Actual Loops: 1
> Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Rows Removed by Index Recheck: 0
> Exact Heap Blocks: 1
> Lossy Heap Blocks: 0
> Plans:
> - Node Type: "Bitmap Index Scan"
> Parent Relationship: "Outer"
> Parallel Aware: false
> Index Name: "textlowerindex"
> Startup Cost: 0.00
> Total Cost: 185.93
> Plan Rows: 5000
> Plan Width: 0
> Actual Startup Time: 0.016
> Actual Total Time: 0.016
> Actual Rows: 1
> Actual Loops: 1
> Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Planning Time: 0.051
> Triggers:
> Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
> Node Type: "Index Scan"
> Parallel Aware: false
> Scan Direction: "Forward"
> Index Name: "citextindex"
> Relation Name: "testci"
> Alias: "testci"
> Startup Cost: 0.42
> Total Cost: 8.44
> Plan Rows: 1
> Plan Width: 37
> Actual Startup Time: 0.049
> Actual Total Time: 0.050
> Actual Rows: 1
> Actual Loops: 1
> Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
> Rows Removed by Index Recheck: 0
> Planning Time: 0.051
> Triggers:
> Execution Time: 0.064
>
>
> Deepak