Re: Question about POSIX Regular Expressions performance on large dataset. - Mailing list pgsql-sql

From Scott Marlowe
Subject Re: Question about POSIX Regular Expressions performance on large dataset.
Date
Msg-id AANLkTinLvxuYR+aN4mCDr3dfCt0AU8yOg5Jwd-Szi5ez@mail.gmail.com
Whole thread Raw
In response to Re: Question about POSIX Regular Expressions performance on large dataset.  (Jose Ildefonso Camargo Tolosa <ildefonso.camargo@gmail.com>)
List pgsql-sql
You can do something similar on the same machine if you can come up
with a common way to partition your data.  Then you split your 1B rows
up into chunks of 10M or so and put each on a table and hit the right
table.  You can use partitioning / table inheritance if you want to,
or just know the table name ahead of time.

We did something similar with mnogo search.  We break it up into a few
hundred different schemas and hit the one for a particular site to
keep the individual mnogo search tables small and fast.

On Tue, Aug 17, 2010 at 8:30 PM, Jose Ildefonso Camargo Tolosa
<ildefonso.camargo@gmail.com> wrote:
> Hi, again,
>
> I just had this wacky idea, and wanted to share it:
>
> what do you think of having the dataset divided among several servers,
> and sending the query to all of them, and then just have the
> application "unify" the results from all the servers?
>
> Would that work for this kind of *one table* search? (there are no
> joins, and will never be).  I think it should, but: what do you think?
>
> Ildefonso.
>
> On Tue, Aug 17, 2010 at 9:51 PM, Jose Ildefonso Camargo Tolosa
> <ildefonso.camargo@gmail.com> wrote:
>> Hi!
>>
>> I'm analyzing the possibility of using PostgreSQL to store a huge
>> amount of data (around 1000M records, or so....), and these, even
>> though are short (each record just have a timestamp, and a string that
>> is less than 128 characters in length), the strings will be matched
>> against POSIX Regular Expressions (different regexps, and maybe
>> complex).
>>
>> Because I don't have a system large enough to test this here, I have
>> to ask you (I may borrow a medium-size server, but it would take a
>> week or more, so I decided to ask here first).  How is the performance
>> of Regexp matching in PostgreSQL?  Can it use indexes? My guess is:
>> no, because I don't see a way of generally indexing to match regexp :(
>> , so, tablescans for this huge dataset.....
>>
>> What do you think of this?
>>
>> Sincerely,
>>
>> Ildefonso Camargo
>>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>



--
To understand recursion, one must first understand recursion.


pgsql-sql by date:

Previous
From: Jose Ildefonso Camargo Tolosa
Date:
Subject: Re: Question about POSIX Regular Expressions performance on large dataset.
Next
From: Pavel Stehule
Date:
Subject: Re: plpgsql out parameter with select into