Performance degradation in Index searches with special characters - Mailing list pgsql-performance

From Andrey Stikheev
Subject Performance degradation in Index searches with special characters
Date
Msg-id CALM5VP9ePykYMnz+0xf_+OKjV_raeHNWzerCBwE5ykRSCcd+AA@mail.gmail.com
Whole thread Raw
Responses Re: Performance degradation in Index searches with special characters
Re: Performance degradation in Index searches with special characters
List pgsql-performance

Dear PostgreSQL Community,

I am facing significant performance issues when executing queries that involve string comparisons with special characters, such as <, #, !, @, etc., especially when dealing with long strings. The query execution time increases drastically when these characters are used, whereas queries with alphabetic characters do not show such degradation. This behavior is observed both on macOS (using the official postgres:17 image via Docker) and on an Ubuntu 20.04 server running PostgreSQL in an LXC container.

Here is a minimal example:

testdb=# SELECT version();

                                                          version                                                          

---------------------------------------------------------------------------------------------------------------------------

 PostgreSQL 17.0 (Debian 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit

(1 row)

testdb=# CREATE TABLE test (value VARCHAR(10) NOT NULL);
CREATE TABLE
Time: 3.562 ms

testdb=# CREATE INDEX idx_test ON test (value);
CREATE INDEX
Time: 3.080 ms

testdb=# INSERT INTO test (value) VALUES ('<');
INSERT 0 1
Time: 3.365 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536);
 ?column?
----------
(0 rows)
Time: 4454.535 ms (00:04.455)

testdb=# SELECT 1 FROM test WHERE value = repeat('a', 65536);
 ?column?
----------
(0 rows)
Time: 3.772 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536) || 'a';
 ?column?
----------
(0 rows)
Time: 4.352 ms


Time: 9.503 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('<', 65536) COLLATE "C";

 ?column? 

----------

(0 rows)


Time: 3.299 ms


testdb=# SELECT 1 FROM test WHERE value = repeat('@', 8192);

 ?column? 

----------

(0 rows)


Time: 77.171 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('@', 16384);

 ?column? 

----------

(0 rows)


Time: 325.190 ms

testdb=# SELECT 1 FROM test WHERE value = repeat('@', 32768);

 ?column? 

----------

(0 rows)


Time: 1154.850 ms (00:01.155)

testdb=# SELECT 1 FROM test WHERE value = repeat('@', 65536);

 ?column? 

----------

(0 rows)


Time: 4490.206 ms (00:04.490)


testdb=# explain (analyze, verbose, buffers, costs, settings, timing, wal) SELECT 1 FROM test WHERE value = repeat('<', 65000);

---------------------------------------------------------------------------------------------
 Bitmap Heap Scan on public.test  (cost=4.20..13.67 rows=6 width=4) (actual time=4425.459..4425.459 rows=0 loops=1)
   Output: 1
   Recheck Cond: ((test.value)::text = '<<<<<<<<<<<<<<<<<<<<<<<<<<<< ... a lot symbols ...
   <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<'::text)
   Buffers: shared hit=1
   ->  Bitmap Index Scan on idx_test  (cost=0.00..4.20 rows=6 width=0) (actual time=4425.432..4425.432 rows=0 loops=1)
         Index Cond: ((test.value)::text = '<<<<<<<<<<<<<<<<<<<<<<<<<<<< ... a lot symbols ...<<<<<<<<<<<<<<<<<<<<<<<<<<'::text)
         Buffers: shared hit=1
 Planning Time: 1.082 ms
 Execution Time: 4425.602 ms
(9 rows)

Time: 4433.001 ms (00:04.433)

Observations:

  • The performance degradation occurs with certain special characters like < , !,  >, @ , #, ... .
  • Queries using alphabetic characters or appending/prepending characters execute much faster.
  • The execution time increases exponentially with the length of the string composed of special characters.
  • Changing the collation to 'C' in the query significantly improves performance.

Questions:

  • Is this performance degradation expected due to collation handling of certain special characters in PostgreSQL?
  • Are there any recommendations to improve performance without changing the column or database collation?
--
Best regards,
Andrey Stikheev

pgsql-performance by date:

Previous
From: Marcelo Zabani
Date:
Subject: Re: Partition pruning with array-contains check and current_setting function
Next
From: Tom Lane
Date:
Subject: Re: Performance degradation in Index searches with special characters