Re: WIP: index support for regexp search - Mailing list pgsql-hackers

From Erik Rijkers
Subject Re: WIP: index support for regexp search
Date
Msg-id d9141ddf8c4e763f57f41661c9af699d.squirrel@webmail.xs4all.nl
Whole thread Raw
In response to Re: WIP: index support for regexp search  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: WIP: index support for regexp search  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
On Sat, January 21, 2012 06:26, Alexander Korotkov wrote:
> Hi!
>
> Thank you for your feedback!
>
> On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers <er@xs4all.nl> wrote:
>
>> The patch yields spectacular speedups with small, simple-enough regexen.
>>  But it does not do a
>> good enough job when guessing where to use the index and where fall back
>> to Seq Scan.  This can
>> lead to (also spectacular) slow-downs, compared to Seq Scan.
>>
> Could you give some examples of regexes where index scan becomes slower
> than seq scan?
>


x[aeio]+q takes many minutes, uninterruptible.  It's now running for almost 30 minutes, I'm sure
it will come back eventually, but I think I'll kill it later on; I suppose you get the point  ;-)

Even with {n,m} quantifiers it's easy to hit slowdowns:
  (table azjunk6 is 112 MB, the index 693 MB.)  (MAX_COLOR_CHARS=4  <= your original patch)


instance           table    regex                  plan               time

HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3566.088 ms
HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3540.606 ms
HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3495.034 ms
HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3510.403 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3724.131 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3844.999 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3835.190 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3724.016 ms


HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3617.997 ms
HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3644.215 ms
HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3636.976 ms
HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3625.493 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   7885.247 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   8799.082 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   7754.152 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   7721.332 ms


This is with your patch as is; in instances compiled with higher MAX_COLOR_CHARS (I did 6 and 9),
it is of course even easier to dream up a slow regex...



Erik Rijkers







pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: WIP: index support for regexp search
Next
From: Hitoshi Harada
Date:
Subject: Re: Finer Extension dependencies