Thread: starts_with, ^@ and index usage
Greetings hackers, I'm seeing some odd behavior around string prefix searching - hopefully I've missed something here (thanks to Nino Floris for originally flagging this). In PostgreSQL 11, a starts_with function and a ^@ operators were added for string prefix checking, as an alternative to LIKE 'foo%' [1] [2]. I've ran a few scenarios and have seen the following behavior: Queries tested: 1. EXPLAIN SELECT * FROM data WHERE name LIKE 'foo10%'; 2. EXPLAIN SELECT * FROM data WHERE name ^@ 'foo10'; 3. EXPLAIN SELECT * FROM data WHERE starts_with(name, 'foo10'); ... running against a table with 500k rows and enable_seqscan turned off. Results: Index | Operator class | LIKE 'X%' | ^@ | starts_with ------ | ---------------- | ----------------- | ----------------- | ----------- btree | text_ops | Parallel seq scan | Parallel seq scan | Seq scan btree | text_pattern_ops | Index scan | Parallel seq scan | Seq scan spgist | | Index scan | Index Scan | Seq scan First, starts_with doesn't seem to use SP-GIST indexes, contrary to the patch description (and also doesn't trigger a parallel seq scan) - is this intentional? The function is listed front-and-center on the string functions and operators page[3], and receives mention on the pattern matching page[4], without any mention of it being so problematic. Note that ^@ isn't documented on the string functions and operators, so it's not very discoverable; if added to the docs, I'd recommend adding a note on SP-GIST being required, since uninformed new users would probably expect a default btree index to work as well. Shay [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=710d90da1fd8c1d028215ecaf7402062079e99e9 [2] https://www.postgresql.org/message-id/flat/03300255-cff2-b508-50f4-f00cca0a57a1%40sigaev.ru#38d2020edf92f96d204cd2679d362c38 [3] https://www.postgresql.org/docs/current/functions-string.html [4] https://www.postgresql.org/docs/current/functions-matching.html
Shay Rojansky <roji@roji.org> writes: > In PostgreSQL 11, a starts_with function and a ^@ operators were added > for string prefix checking, as an alternative to LIKE 'foo%' [1] [2]. > First, starts_with doesn't seem to use SP-GIST indexes, contrary to > the patch description (and also doesn't trigger a parallel seq scan) - > is this intentional? The function is listed front-and-center on the > string functions and operators page[3], and receives mention on the > pattern matching page[4], without any mention of it being so > problematic. It seems like it didn't occur to anybody to tie starts_with() into the machinery for derived index operators. That wouldn't be hard, but it wasn't done. Before (I think) v12, function invocations never could be converted to indexquals anyway, so it's not surprising that a v11-era patch wouldn't have thought it needed to address that point. I do see that starts_with() is marked parallel safe, so it's not clear why it wouldn't be amenable to a parallel seqscan. The function (as opposed to the operator) isn't tied into selectivity estimation either, so maybe that has something to do with using a default selectivity estimate for it? But said estimate would almost always be too high, which doesn't seem like the direction that would discourage parallelism. > Note that ^@ isn't documented on the string functions and operators, That's another oversight. It seems clear that the original patch author was pretty narrowly focused on use of the operator with SP-GIST, and didn't think about how it should fit into the larger ecosystem. regards, tom lane
I wrote: > Shay Rojansky <roji@roji.org> writes: >> First, starts_with doesn't seem to use SP-GIST indexes, contrary to >> the patch description (and also doesn't trigger a parallel seq scan) - >> is this intentional? > It seems like it didn't occur to anybody to tie starts_with() into > the machinery for derived index operators. That wouldn't be hard, > but it wasn't done. I've started another thread with a patch for that [1]. >> Note that ^@ isn't documented on the string functions and operators, > That's another oversight. Well, "oversight" might be too strong a word. AFAICS from a quick look in pg_operator, most operators on type text are comparisons, pattern match, or text search, none of which do I want to fold into section 9.4. The only exceptions are ||, which we do document there under SQL operators, and ^@. Commit 710d90da1 apparently decided to treat ^@ as a pattern match operator, which I guess it could be if you hold your head at the right angle, but I doubt most people would think to look for it in section 9.7. I guess the most practical answer is to rename table 9.10 from "Other String Functions" to "Other String Functions and Operators", which is more parallel to table 9.9 anyway. Just as in 9.9, it would look weird to have a one-entry table of operators. (Maybe someday in the far future it'd make sense to split 9.10 into two tables.) regards, tom lane [1] https://www.postgresql.org/message-id/232599.1633800229%40sss.pgh.pa.us