Thread: Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Daniel Kalchev
Date:
Further to my cyrillic charset (--enable-locale) introduced woes, I would like 
to add, that when recompiling Postgres 6.5 without --enable-locale now the 
query runs reasonably fast and explain gives:

Index Scan using w_k_words_i on words  (cost=1112.13 rows=1 width=12)

instead of 

Index Scan using w_k_words_i on words  (cost=3335.38 rows=1 width=12)

The problem as it seems is that the restrictions to use indexes when locale is 
enabled are not consistently applied - explain shows that indices will be 
used, and the behavior with indexes and without indexes is different (with 
indexes it's noticeably slower :-) so indexes are apparently being used...

Apparently (for my current tests at least) the Bulgarian cyrillic 
(windows-1251) is handled reasonably well without locale support - untill now 
~* didn't work anyway.

Daniel Kalchev



Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Tom Lane
Date:
Daniel Kalchev <daniel@digsys.bg> writes:
> The problem as it seems is that the restrictions to use indexes when
> locale is enabled are not consistently applied - explain shows that
> indices will be used, and the behavior with indexes and without
> indexes is different (with indexes it's noticeably slower :-) so
> indexes are apparently being used...

Right, but what EXPLAIN doesn't show you (unless you can read the
much uglier EXPLAIN VERBOSE output) is what index restrictions are
being used.

LIKE doesn't know anything about indexes, nor vice versa.  What the
index-scan machinery *does* know about is <, <=, etc.  If you have
WHERE clauses like "x >= 33" and "x <= 54" then an index scan knows
to only scan the part of the index from 33 to 54.  So you never even
visit a large fraction of the table.  This is why an index scan can
be faster than a sequential scan even though the per-tuple overhead
of consulting the index is larger.

So, there is a special hack in the parser for LIKE (also for regexp
matches): if the parser sees that the match pattern has a fixed initial
substring, it inserts some >= and <= clauses that are designed to
exploit what the index scanner can do.

Our immediate problem is that we had to drop the <= clause in non-ASCII
locales because it was wrong.  So now an index scan driven by LIKE
restrictions is not nearly as restrictive as it was, and has to visit
many tuples (about half the table on average) whereas before it was
likely to visit only a few, if you had a reasonably long fixed initial
string.

There are some other problems (notably, that the extra clauses are
inserted even if there's no index and thus no way that they will be
helpful) but those we know how to fix, and I hope to address them for
6.6.  Fixing the <= problem requires knowledge about non-ASCII character
sets, and I for one don't know enough to fix it...
        regards, tom lane


Re: [HACKERS] Postgres 6.5 beta2 and beta3 problem

From
Bruce Momjian
Date:
> LIKE doesn't know anything about indexes, nor vice versa.  What the
> index-scan machinery *does* know about is <, <=, etc.  If you have
> WHERE clauses like "x >= 33" and "x <= 54" then an index scan knows
> to only scan the part of the index from 33 to 54.  So you never even
> visit a large fraction of the table.  This is why an index scan can
> be faster than a sequential scan even though the per-tuple overhead
> of consulting the index is larger.
> 

You know, everyone beats me up for that LIKE indexing hack, but every
month that goes by where someone does not come up with a better solution
makes me feel a little better about the criticism.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026