Thread: 9.2.4: Strange behavior when wildcard is on left side of search string

Hi,

this is with 9.2.4_PGDG / FC18 / 64bit upgraded from 9.1.8 via
dump/restore, settings kept for the most part.

Table has 1.5M records, the varchar(100) field in question has a *
varchar_ops* and a *varchar_pattern_ops* btree index.

3 Cases:

   - "MYFIELD" like 'BLA BLA *%*': *OK, about 7 msec*
   - "MYFIELD" like 'BLA *%* BLA': *OK, about 20 msec*
   - "MYFIELD" like '*%* BLA BLA': *NOT OK*

In the third case, the query will take anywhere between 4200ms and over
83Kms ms to deliver 2 results, in one case I broke it off after 10 MINUTES.
I never noticed this sort of behavior in the 9.x series.

According to explain, the query resolves into an index-only scan. I tried
to turn this off to
see how it behaves but the toggle in the .conf apparently has no effect.

What I find interesting is that, whereas with the default it would resolve
into an index-only scan on the *varchar_pattern_ops* index, after setting
indexscan_only=off, it would resolve into
an index-only scan on the *varchar_ops* index.

Another peculiarity is that the Explain for the bad case does not display
the "Sort" icon for the order-by clause, whereas the OK cases do display it.

Also, the problem does not appear to be a resource problem, as I have the
settings and resources to pull that whole table plus indexes into RAM, yet
still, there is constant disk activity during the query in the bad case.

And, yes, I DID reindex and/or *vacuum verbose analyze* the table after
each relevant change.

Thanks for any feedback on this. If you need any further info I'll be happy
to help as possible.

RD

Re: 9.2.4: Strange behavior when wildcard is on left side of search string

From
David Johnston
Date:
r d-3 wrote
> Hi,
>
> this is with 9.2.4_PGDG / FC18 / 64bit upgraded from 9.1.8 via
> dump/restore, settings kept for the most part.
>
> Table has 1.5M records, the varchar(100) field in question has a *
> varchar_ops* and a *varchar_pattern_ops* btree index.
>
> 3 Cases:
>
>    - "MYFIELD" like 'BLA BLA *%*': *OK, about 7 msec*
>    - "MYFIELD" like 'BLA *%* BLA': *OK, about 20 msec*
>    - "MYFIELD" like '*%* BLA BLA': *NOT OK*
>
> In the third case, the query will take anywhere between 4200ms and over
> 83Kms ms to deliver 2 results, in one case I broke it off after 10
> MINUTES.
> I never noticed this sort of behavior in the 9.x series.
>
> According to explain, the query resolves into an index-only scan. I tried
> to turn this off to
> see how it behaves but the toggle in the .conf apparently has no effect.
>
> What I find interesting is that, whereas with the default it would resolve
> into an index-only scan on the *varchar_pattern_ops* index, after setting
> indexscan_only=off, it would resolve into
> an index-only scan on the *varchar_ops* index.
>
> Another peculiarity is that the Explain for the bad case does not display
> the "Sort" icon for the order-by clause, whereas the OK cases do display
> it.
>
> Also, the problem does not appear to be a resource problem, as I have the
> settings and resources to pull that whole table plus indexes into RAM, yet
> still, there is constant disk activity during the query in the bad case.
>
> And, yes, I DID reindex and/or *vacuum verbose analyze* the table after
> each relevant change.
>
> Thanks for any feedback on this. If you need any further info I'll be
> happy
> to help as possible.
>
> RD

Going from slightly dated memory here but...

The further left the wildcard the larger the portion of the index that has
to be scanned.  In the worse case, your third example, everything has to be
scanned and basically the index is only acting as a table surrogate as
opposed to an index.  If you really want to anchor the start you should
index and search on the reverse of the string so you can write it as a
postfix wildcard.  The other option to index words via the full-text search
capabilities.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/9-2-4-Strange-behavior-when-wildcard-is-on-left-side-of-search-string-tp5751086p5751095.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: Re: 9.2.4: Strange behavior when wildcard is on left side of search string

From
Kevin Grittner
Date:
David Johnston <polobo@yahoo.com> wrote:=0A=0A> varchar(100)=0A=0A> The oth=
er option to index words via the full-text search capabilities.=0A=0AOr for=
 columns this short,=A0 a similarity search on a trigram index.=0A=0Ahttp:/=
/www.postgresql.org/docs/current/interactive/pgtrgm.html=0A=0A--=0AKevin Gr=
ittner=0AEnterpriseDB: http://www.enterprisedb.com=0AThe Enterprise Postgre=
SQL Company

Re: Re: 9.2.4: Strange behavior when wildcard is on left side of search string

From
Kevin Grittner
Date:
ERR ORR <rd0002@gmail.com> wrote:=0A=0A> the fact that in 9.2.4 it is so ba=
d that a simple query sometimes=0A> does not finish after 10+ minutes with =
incessant disk activity=0A> tends to look like a problem to me.=0A=0ATry ru=
nning a VACUUM FREEZE ANALYZE; command on the database, under=0Aa database =
superuser ID.=A0 If you upgraded with pg_dump (or some=0Aother logical popu=
lation of the data, versus a pg_upgrade run) you=0Aare probably getting bit=
ten by the initial setting of hint bits.=0A=0Ahttp://wiki.postgresql.org/wi=
ki/Hint_Bits=0A=0ASee if it is still slow after that....=0A=0A--=0AKevin Gr=
ittner=0AEnterpriseDB: http://www.enterprisedb.com=0AThe Enterprise Postgre=
SQL Company

Re: 9.2.4: Strange behavior when wildcard is on left side of search string

From
Christopher Browne
Date:
This doesn't seem either buggy or strange...

An index on the ordering of that column is not helpful in handling a
leading wildcard, and so the query optimizer will, in such cases, revert,
correctly, to using a sequential scan and filtering the results.

If you have cases where this sort of wildcard needs to be fast, a
functional index could help.

Create index foo on tbl (reverse(col));

And reverse the wildcard so the index is usable:

Select * from tbl where reverse(col) like 'esrever%';

That query can harness the reversed index.

Unfortunately, no ordered index helps with

Select * from too where col like '%something%';

For that, a full text search index can help, but that is a longer story.

At any rate, what you are observing is no surprise, and consistent with
what many database systems do.
ERR ORR <rd0002@gmail.com> writes:
> My understanding from your replies is that this behavior with *b-tree
> indices* is not considered a bug but rather a case of "works as designed",
> yet still and apart from the solution of my particular problem, the fact
> that in 9.2.4 it is so bad that a simple query sometimes does not finish
> after 10+ minutes with incessant disk activity tends to look like a problem
> to me.

It is no better or worse in 9.2.4 than in any prior release.

            regards, tom lane