Home > mailing lists

bad query plans for ~ "^string" (and like "string%") (8.3.6) - Mailing list pgsql-performance

From	Marinos Yannikos
Subject	bad query plans for ~ "^string" (and like "string%") (8.3.6)
Date	April 8, 2009 10:42:38
Msg-id	49DCA9B4.4010501@geizhals.at Whole thread Raw
Responses	Re: bad query plans for ~ "^string" (and like "string%") (8.3.6) Re: bad query plans for ~ "^string" (and like "string%") (8.3.6)
List	pgsql-performance

Tree view

It seems that ANALYZE does not really sample text column values as much
as it could. We have some very bad query plans resulting from this:

...
          ->  Bitmap Index Scan on m_pkey  (cost=0.00..28.61 rows=102
width=0) (actual time=171.824..171.824 rows=683923 loops=1)
                Index Cond: ((e >= 'ean'::text) AND (e < 'eao'::text)

This gets even worse for longer strings, where we know that many
matching rows exist:

# explain analyze select substring(e,5) from m where id=257421 and e ~
'^ean=';
                                                         QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------
  Index Scan using m_pkey on m  (cost=0.00..12.50 rows=1 width=60)
(actual time=1623.795..1703.958 rows=18 loops=1)
    Index Cond: ((e >= 'ean='::text) AND (e < 'ean>'::text))
    Filter: ((e ~ '^ean='::text) AND (id = 257421))
  Total runtime: 1703.991 ms
(4 rows)

Here it would be much better to use the existing index on "id" (btree)
first because the current index condition selects 683k rows whereas the
result contains 18 rows. Using the index on id would yield 97 rows to
filter.

Is it possible to work around this problem somehow, other than adding
partial indexes for the ~ / LIKE condition (when it's constant) or a
2-dimensional index?

(what exactly does ANALYZE look at for text columns? in our case, about
7% of the rows match the index condition, so it seems that left-anchored
regexp/like matches are not evaluated using the gathered
most-common-value list at all)

Regards,
  Marinos

pgsql-performance by date:

From: Marinos Yannikos
Date: 08 April 2009, 08:45:37
Subject: Re: Best replication solution?

From: Marinos Yannikos
Date: 08 April 2009, 10:51:26
Subject: Re: bad query plans for ~ "^string" (and like "string%") (8.3.6)

bad query plans for ~ "^string" (and like "string%") (8.3.6) - Mailing list pgsql-performance

Previous

Next