Home > mailing lists

Re: TODO item: Implement Boyer-Moore searching in LIKE queries - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Date	August 2, 2016 22:14:01
Msg-id	8747.1470165235@sss.pgh.pa.us Whole thread Raw
In response to	Re: TODO item: Implement Boyer-Moore searching in LIKE queries (Karan Sikka <karanssikka@gmail.com>)
Responses	Re: TODO item: Implement Boyer-Moore searching in LIKE queries (Karan Sikka <karanssikka@gmail.com>)
List	pgsql-hackers

Tree view

Karan Sikka <karanssikka@gmail.com> writes:
> Just for the record, I'm leaning to the side of "not worth it". My
> thoughts are, if you are at a scale where you care about this 20%
> speedup, you would want to go all the way to an indexed structure,
> because the gains you would realize would exceed 20%, and 20% may not be
> a sufficient speedup anyway.

If I'm reading your test case correctly, 20% is actually a rather
impressive number, because it's 20% *overall* gain on queries that will
also involve TOAST fetch and decompress on the source data.  (Decompress
definitely, and I'm guessing those 5K strings don't compress well enough
to avoid getting pushed out-of-line; though it might be worth repeating
the test with chunks of 10K or 20K to be sure.)  So the percentage
improvement in the LIKE test proper must have been a lot more than that.

However, I'm dubious that LIKE patterns with long fixed substrings are a
common use-case, so I'm afraid that this might be quite a lot of work for
something that won't much benefit most users.  I'm also worried that the
setup costs might be enough to make it a net loss in many cases.

There are probably ways to amortize the setup costs, since typical
scenarios involve the same LIKE pattern across many rows, but implementing
that would add even more work.  (Having said that, I've had a bee in my
bonnet for a long time about removing per-row setup cost for repetitive
regex matches, and whatever infrastructure that needs would work for this
too.  And for strpos' B-M-H setup, looks like.  So this might be something
to look into with a suitably wide view of what the problem is.)

Not sure what advice to give you here.  I think this is in the grey zone
where it's hard to be sure whether it's worth putting work into.
        regards, tom lane

pgsql-hackers by date:

From: Vladimir Sitnikov
Date: 02 August 2016, 22:09:20
Subject: Re: Slowness of extended protocol

From: Chapman Flack
Date: 02 August 2016, 22:43:50
Subject: Re: AdvanceXLInsertBuffer vs. WAL segment compressibility

Re: TODO item: Implement Boyer-Moore searching in LIKE queries - Mailing list pgsql-hackers

Previous

Next