Re: Working with huge amount of data. - Mailing list pgsql-general

From Mario Lopez
Subject Re: Working with huge amount of data.
Date
Msg-id 47B07741.70104@lar3d.com
Whole thread Raw
In response to Re: Working with huge amount of data.  (Erik Jones <erik@myemma.com>)
List pgsql-general
Erik,

Thanks for your answers, actually this is a workable solution because my
data does not get updated so frequently (every 24 hours). The  problem
is that I would like a more advanced version of this, there must be
something I can do, I am going to try what Hubert Despez explained in
his articles.

Thanks :)

>
> On Feb 11, 2008, at 9:37 AM, Mario Lopez wrote:
>
>> Hi guys :-), I am working on a personal project in which I am trying
>> to make sense on a huge (at least for me) amount of data. I have
>> approximately 150 million rows of unique words (they are not exactly
>> words it is just for explaining the situation).
>>
>> The table I am inserting this is a quite simple table, something like
>> this:
>>
>> CREATE TABLE "public"."names" (
>> "id" SERIAL,
>> "name" VARCHAR(255)
>> ) WITHOUT OIDS;
>>
>> It is a requirement that I can make searches on the varchar with
>> queries that look the following way:
>>
>> SELECT * FROM names WHERE name LIKE ‘keyword%’
>> Or
>> SELECT * FROM names WHERE name LIKE ‘%keyword%’
>>
>> I optimized the first type of queries making partitions with every
>> letter that a name can begin with:
>>
>> CREATE TABLE "public"."names_a" (
>> CONSTRAINT "names_a_check" CHECK (("name")::text ~~ 'a%'::text)
>> ) INHERITS ("public"."names")
>> WITHOUT OIDS;
>>
>> The problem arises with the second type of queries, where there are
>> no possible partitions and that the search keywords are not known, I
>> have tried making indexes on the letter it ends with, or indexes that
>> specify that it contains the letter specified but none of them work
>> the planifier only make sequential scans over the table.
>>
>> For the moment the quickest scan I have being able to make is using
>> grep!!, surprisingly enough grep searches on an average of 20 seconds
>> a whole plain text file of 2 GB one name per line and PostgreSQL on
>> the fist type of queries takes like 50 seconds while the second type
>> of queries con take up to two minutes which is completely
>> unacceptable for an online search engine that has to attend a user
>> querying this information.
>>
>> How does this big search engines let’s say Google make this up? I am
>> amazed of the quickness on searching this amount of information in so
>> little time. Any approach I could take? I am open minded so anything
>> is acceptable not necessarily only PostgreSQL based solutions
>> (although I would prefer it). By the way Textual Search in PostgreSQL
>> is discarded because what I am looking at are not names that can be
>> decomposed on lexems, let's say that this varchar is composed of
>> random garbage.
>
> Actually, a friend of mine actually did exactly what you've tried:
> grep.  He had a cron job that would update the txt file from the
> table's data every five minutes and then his app would shell out to
> run those kinds of queries.  Of course, with a setup like that your
> results can be a little out of date (the period between runs of the
> cron job) but, if you can deal with that, that's actually a pretty
> simple solution that doesn't take too much setup.
>
> Erik Jones
>
> DBA | Emma®
> erik@myemma.com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


pgsql-general by date:

Previous
From: Erik Jones
Date:
Subject: Re: Working with huge amount of data.
Next
From: Tom Lane
Date:
Subject: Re: ERROR: expected just one rule action