Thread: citext performance
Folks,
I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.
"citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using
lower
to get case-insensitive matching."Here is what I have done
drop table test;
drop table testci;
CREATE TABLE test (
id INTEGER PRIMARY KEY,
name character varying(254)
);
CREATE TABLE testci (
id INTEGER PRIMARY KEY,
name citext
);
INSERT INTO test(id, name)
SELECT generate_series(1000001,2000000), (md5(random()::text));
INSERT INTO testci(id, name)
SELECT generate_series(1,1000000), (md5(random()::text));
Now, I have done sequential search
explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan:
Node Type: "Seq Scan"
Parallel Aware: false
Relation Name: "test"
Alias: "test"
Startup Cost: 0.00
Total Cost: 23334.00
Plan Rows: 5000
Plan Width: 37
Actual Startup Time: 0.016
Actual Total Time: 680.199
Actual Rows: 1
Actual Loops: 1
Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
Rows Removed by Filter: 999999
Planning Time: 0.045
Triggers:
Execution Time: 680.213
explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
- Plan:
Node Type: "Seq Scan"
Parallel Aware: false
Relation Name: "testci"
Alias: "testci"
Startup Cost: 0.00
Total Cost: 20834.00
Plan Rows: 1
Plan Width: 37
Actual Startup Time: 0.017
Actual Total Time: 1184.485
Actual Rows: 1
Actual Loops: 1
Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
Rows Removed by Filter: 999999
Planning Time: 0.029
Triggers:
Execution Time: 1184.496
You can see sequential searches with lower working twice as fast as citext.
Now I added index on citext and equivalent functional index (lower) on text.
CREATE INDEX textlowerindex ON test (lower(name));
create index textindex on test(name);
Index creation took longer with citext v/s creating lower functional index.
Now here comes execution with indexes
explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan:
Node Type: "Bitmap Heap Scan"
Parallel Aware: false
Relation Name: "test"
Alias: "test"
Startup Cost: 187.18
Total Cost: 7809.06
Plan Rows: 5000
Plan Width: 37
Actual Startup Time: 0.020
Actual Total Time: 0.020
Actual Rows: 1
Actual Loops: 1
Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
Rows Removed by Index Recheck: 0
Exact Heap Blocks: 1
Lossy Heap Blocks: 0
Plans:
- Node Type: "Bitmap Index Scan"
Parent Relationship: "Outer"
Parallel Aware: false
Index Name: "textlowerindex"
Startup Cost: 0.00
Total Cost: 185.93
Plan Rows: 5000
Plan Width: 0
Actual Startup Time: 0.016
Actual Total Time: 0.016
Actual Rows: 1
Actual Loops: 1
Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
Planning Time: 0.051
Triggers:
Execution Time: 0.035
explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
- Plan:
Node Type: "Index Scan"
Parallel Aware: false
Scan Direction: "Forward"
Index Name: "citextindex"
Relation Name: "testci"
Alias: "testci"
Startup Cost: 0.42
Total Cost: 8.44
Plan Rows: 1
Plan Width: 37
Actual Startup Time: 0.049
Actual Total Time: 0.050
Actual Rows: 1
Actual Loops: 1
Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
Rows Removed by Index Recheck: 0
Planning Time: 0.051
Triggers:
Execution Time: 0.064
Deepak
Hi, I have also faced the same problem with citext extension. It does not use index when thereby making it almost unusable. The problem has to do with how collation is handled from what I have read in old threads in postgres mailing list (please refer https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250 ). Regards, Nanda On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote: > > Folks, > I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is notperforming better than lower.Am I missing something? help is appreciated. > > PostgreSQL: Documentation: 9.6: citext > > > > > "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copiesof the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lowerto get case-insensitive matching." > > > Here is what I have done > > drop table test; > drop table testci; > > CREATE TABLE test ( > id INTEGER PRIMARY KEY, > name character varying(254) > ); > CREATE TABLE testci ( > id INTEGER PRIMARY KEY, > name citext > > ); > > INSERT INTO test(id, name) > SELECT generate_series(1000001,2000000), (md5(random()::text)); > > INSERT INTO testci(id, name) > SELECT generate_series(1,1000000), (md5(random()::text)); > > > Now, I have done sequential search > > explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de'); > - Plan: > Node Type: "Seq Scan" > Parallel Aware: false > Relation Name: "test" > Alias: "test" > Startup Cost: 0.00 > Total Cost: 23334.00 > Plan Rows: 5000 > Plan Width: 37 > Actual Startup Time: 0.016 > Actual Total Time: 680.199 > Actual Rows: 1 > Actual Loops: 1 > Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)" > Rows Removed by Filter: 999999 > Planning Time: 0.045 > Triggers: > Execution Time: 680.213 > > > explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad'; > - Plan: > Node Type: "Seq Scan" > Parallel Aware: false > Relation Name: "testci" > Alias: "testci" > Startup Cost: 0.00 > Total Cost: 20834.00 > Plan Rows: 1 > Plan Width: 37 > Actual Startup Time: 0.017 > Actual Total Time: 1184.485 > Actual Rows: 1 > Actual Loops: 1 > Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)" > Rows Removed by Filter: 999999 > Planning Time: 0.029 > Triggers: > Execution Time: 1184.496 > > > > You can see sequential searches with lower working twice as fast as citext. > > Now I added index on citext and equivalent functional index (lower) on text. > > > CREATE INDEX textlowerindex ON test (lower(name)); > create index textindex on test(name); > > > Index creation took longer with citext v/s creating lower functional index. > > > Now here comes execution with indexes > > explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de'); > > - Plan: > Node Type: "Bitmap Heap Scan" > Parallel Aware: false > Relation Name: "test" > Alias: "test" > Startup Cost: 187.18 > Total Cost: 7809.06 > Plan Rows: 5000 > Plan Width: 37 > Actual Startup Time: 0.020 > Actual Total Time: 0.020 > Actual Rows: 1 > Actual Loops: 1 > Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)" > Rows Removed by Index Recheck: 0 > Exact Heap Blocks: 1 > Lossy Heap Blocks: 0 > Plans: > - Node Type: "Bitmap Index Scan" > Parent Relationship: "Outer" > Parallel Aware: false > Index Name: "textlowerindex" > Startup Cost: 0.00 > Total Cost: 185.93 > Plan Rows: 5000 > Plan Width: 0 > Actual Startup Time: 0.016 > Actual Total Time: 0.016 > Actual Rows: 1 > Actual Loops: 1 > Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)" > Planning Time: 0.051 > Triggers: > Execution Time: 0.035 > > > > > explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de'); > > - Plan: > Node Type: "Index Scan" > Parallel Aware: false > Scan Direction: "Forward" > Index Name: "citextindex" > Relation Name: "testci" > Alias: "testci" > Startup Cost: 0.42 > Total Cost: 8.44 > Plan Rows: 1 > Plan Width: 37 > Actual Startup Time: 0.049 > Actual Total Time: 0.050 > Actual Rows: 1 > Actual Loops: 1 > Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)" > Rows Removed by Index Recheck: 0 > Planning Time: 0.051 > Triggers: > Execution Time: 0.064 > > > Deepak
It is using index here , it is just that performance i.e query that use functional index (one with lower) is performing better then index created on citext column.
Deepak
On Sunday, April 8, 2018, 3:13:26 AM PDT, Nandakumar M <m.nanda92@gmail.com> wrote:
Hi,
I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).
Regards,
Nanda
On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote:
>
> Folks,
> I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
> Node Type: "Seq Scan"
> Parallel Aware: false
> Relation Name: "test"
> Alias: "test"
> Startup Cost: 0.00
> Total Cost: 23334.00
> Plan Rows: 5000
> Plan Width: 37
> Actual Startup Time: 0.016
> Actual Total Time: 680.199
> Actual Rows: 1
> Actual Loops: 1
> Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Rows Removed by Filter: 999999
> Planning Time: 0.045
> Triggers:
> Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
> Node Type: "Seq Scan"
> Parallel Aware: false
> Relation Name: "testci"
> Alias: "testci"
> Startup Cost: 0.00
> Total Cost: 20834.00
> Plan Rows: 1
> Plan Width: 37
> Actual Startup Time: 0.017
> Actual Total Time: 1184.485
> Actual Rows: 1
> Actual Loops: 1
> Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
> Rows Removed by Filter: 999999
> Planning Time: 0.029
> Triggers:
> Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
> Node Type: "Bitmap Heap Scan"
> Parallel Aware: false
> Relation Name: "test"
> Alias: "test"
> Startup Cost: 187.18
> Total Cost: 7809.06
> Plan Rows: 5000
> Plan Width: 37
> Actual Startup Time: 0.020
> Actual Total Time: 0.020
> Actual Rows: 1
> Actual Loops: 1
> Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Rows Removed by Index Recheck: 0
> Exact Heap Blocks: 1
> Lossy Heap Blocks: 0
> Plans:
> - Node Type: "Bitmap Index Scan"
> Parent Relationship: "Outer"
> Parallel Aware: false
> Index Name: "textlowerindex"
> Startup Cost: 0.00
> Total Cost: 185.93
> Plan Rows: 5000
> Plan Width: 0
> Actual Startup Time: 0.016
> Actual Total Time: 0.016
> Actual Rows: 1
> Actual Loops: 1
> Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Planning Time: 0.051
> Triggers:
> Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
> Node Type: "Index Scan"
> Parallel Aware: false
> Scan Direction: "Forward"
> Index Name: "citextindex"
> Relation Name: "testci"
> Alias: "testci"
> Startup Cost: 0.42
> Total Cost: 8.44
> Plan Rows: 1
> Plan Width: 37
> Actual Startup Time: 0.049
> Actual Total Time: 0.050
> Actual Rows: 1
> Actual Loops: 1
> Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
> Rows Removed by Index Recheck: 0
> Planning Time: 0.051
> Triggers:
> Execution Time: 0.064
>
>
> Deepak
I have also faced the same problem with citext extension. It does not
use index when thereby making it almost unusable. The problem has to
do with how collation is handled from what I have read in old threads
in postgres mailing list (please refer
https://dba.stackexchange.com/questions/105244/index-on-column-with-data-type-citext-not-used/105250#105250
).
Regards,
Nanda
On Fri, Apr 6, 2018 at 10:21 PM, Deepak Somaiya <deepsom@yahoo.com> wrote:
>
> Folks,
> I read following (PostgreSQL: Documentation: 9.6: citext) and it does not hold true in my testing.. i.e citext is not performing better than lower.Am I missing something? help is appreciated.
>
> PostgreSQL: Documentation: 9.6: citext
>
>
>
>
> "citext is not as efficient as text because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. It is, however, slightly more efficient than using lower to get case-insensitive matching."
>
>
> Here is what I have done
>
> drop table test;
> drop table testci;
>
> CREATE TABLE test (
> id INTEGER PRIMARY KEY,
> name character varying(254)
> );
> CREATE TABLE testci (
> id INTEGER PRIMARY KEY,
> name citext
>
> );
>
> INSERT INTO test(id, name)
> SELECT generate_series(1000001,2000000), (md5(random()::text));
>
> INSERT INTO testci(id, name)
> SELECT generate_series(1,1000000), (md5(random()::text));
>
>
> Now, I have done sequential search
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
> - Plan:
> Node Type: "Seq Scan"
> Parallel Aware: false
> Relation Name: "test"
> Alias: "test"
> Startup Cost: 0.00
> Total Cost: 23334.00
> Plan Rows: 5000
> Plan Width: 37
> Actual Startup Time: 0.016
> Actual Total Time: 680.199
> Actual Rows: 1
> Actual Loops: 1
> Filter: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Rows Removed by Filter: 999999
> Planning Time: 0.045
> Triggers:
> Execution Time: 680.213
>
>
> explain (analyze on, format yaml) select * from testci where name='956d692092f0b9f85f36bf2b2501f3ad';
> - Plan:
> Node Type: "Seq Scan"
> Parallel Aware: false
> Relation Name: "testci"
> Alias: "testci"
> Startup Cost: 0.00
> Total Cost: 20834.00
> Plan Rows: 1
> Plan Width: 37
> Actual Startup Time: 0.017
> Actual Total Time: 1184.485
> Actual Rows: 1
> Actual Loops: 1
> Filter: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
> Rows Removed by Filter: 999999
> Planning Time: 0.029
> Triggers:
> Execution Time: 1184.496
>
>
>
> You can see sequential searches with lower working twice as fast as citext.
>
> Now I added index on citext and equivalent functional index (lower) on text.
>
>
> CREATE INDEX textlowerindex ON test (lower(name));
> create index textindex on test(name);
>
>
> Index creation took longer with citext v/s creating lower functional index.
>
>
> Now here comes execution with indexes
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
> Node Type: "Bitmap Heap Scan"
> Parallel Aware: false
> Relation Name: "test"
> Alias: "test"
> Startup Cost: 187.18
> Total Cost: 7809.06
> Plan Rows: 5000
> Plan Width: 37
> Actual Startup Time: 0.020
> Actual Total Time: 0.020
> Actual Rows: 1
> Actual Loops: 1
> Recheck Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Rows Removed by Index Recheck: 0
> Exact Heap Blocks: 1
> Lossy Heap Blocks: 0
> Plans:
> - Node Type: "Bitmap Index Scan"
> Parent Relationship: "Outer"
> Parallel Aware: false
> Index Name: "textlowerindex"
> Startup Cost: 0.00
> Total Cost: 185.93
> Plan Rows: 5000
> Plan Width: 0
> Actual Startup Time: 0.016
> Actual Total Time: 0.016
> Actual Rows: 1
> Actual Loops: 1
> Index Cond: "(lower((name)::text) = 'f6d7d5be1d0bed1cca11540d3a2667de'::text)"
> Planning Time: 0.051
> Triggers:
> Execution Time: 0.035
>
>
>
>
> explain (analyze on, format yaml) select * from test where lower(name)=lower('f6d7d5be1d0bed1cca11540d3a2667de');
>
> - Plan:
> Node Type: "Index Scan"
> Parallel Aware: false
> Scan Direction: "Forward"
> Index Name: "citextindex"
> Relation Name: "testci"
> Alias: "testci"
> Startup Cost: 0.42
> Total Cost: 8.44
> Plan Rows: 1
> Plan Width: 37
> Actual Startup Time: 0.049
> Actual Total Time: 0.050
> Actual Rows: 1
> Actual Loops: 1
> Index Cond: "(name = '956d692092f0b9f85f36bf2b2501f3ad'::citext)"
> Rows Removed by Index Recheck: 0
> Planning Time: 0.051
> Triggers:
> Execution Time: 0.064
>
>
> Deepak