Re: WIP: index support for regexp search - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: WIP: index support for regexp search
Date
Msg-id CAPpHfdsitdJZNyQk5UCK0sAh1F08147pP7DkDH_Gh_Men8ofxw@mail.gmail.com
Whole thread Raw
In response to Re: WIP: index support for regexp search  ("Erikjan Rijkers" <er@xs4all.nl>)
List pgsql-hackers
On Wed, Apr 3, 2013 at 11:10 AM, Erikjan Rijkers <er@xs4all.nl> wrote:
On Tue, April 2, 2013 23:54, Alexander Korotkov wrote:

> [trgm-regexp-0.15.patch.gz]

Yes, it does look good now; Attached a list of measurements. Most of the searches that I put in
that test-program are now speeded up very much.

There still are a few regressions, for example:

HEAD          azjunk6  x[aeiou]{4,5}q          83  Seq Scan          1393.465 ms
trgm_regex15  azjunk6  x[aeiou]{4,5}q          83  Bitmap Heap Scan  1728.319 ms

HEAD          azjunk7  x[aeiou]{1,3}q      190031  Seq Scan         16819.555 ms
trgm_regex15  azjunk7  x[aeiou]{1,3}q      190031  Bitmap Heap Scan 21286.804 ms

Not exactly negligible, and ideally those regressions would be removed but with the huge
advantages for other cases I'd say it's worth it.

Thank you for testing!
Exploring results more detail I found version 13 to be buggy. This version is a dead end, we have quite different API now. Could you use v12 instead of v13 in comparison, please?
Sometimes we have regression in comparison with head in two reasons:
1) We select index scan in both cases but with patch we spent more time for analysis. It's inevitable disadvantage of any index. We can only take care of analysis doesn't take too long. Current testing results don't show this reason to be significant.
2) Sometimes we select index scan while sequential scan would be faster. It's also inevitable disadvantage until we have a relevant statistics. We now have similar situation, for example, with in-core geometrical search and LIKE/ILIKE search in pg_trgm. However,  probably, situation could be improved somehow even without such statistics. But I think we can do such conclusion based on synthetical testing, because improvements for synthetical cases could appear to be an worsening for real-life cases.

------
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: "Erikjan Rijkers"
Date:
Subject: Re: WIP: index support for regexp search
Next
From: Andres Freund
Date:
Subject: Re: regression test failed when enabling checksum