Re: Working with huge amount of data. - Mailing list pgsql-general
From | Mario Lopez |
---|---|
Subject | Re: Working with huge amount of data. |
Date | |
Msg-id | 47B07741.70104@lar3d.com Whole thread Raw |
In response to | Re: Working with huge amount of data. (Erik Jones <erik@myemma.com>) |
List | pgsql-general |
Erik, Thanks for your answers, actually this is a workable solution because my data does not get updated so frequently (every 24 hours). The problem is that I would like a more advanced version of this, there must be something I can do, I am going to try what Hubert Despez explained in his articles. Thanks :) > > On Feb 11, 2008, at 9:37 AM, Mario Lopez wrote: > >> Hi guys :-), I am working on a personal project in which I am trying >> to make sense on a huge (at least for me) amount of data. I have >> approximately 150 million rows of unique words (they are not exactly >> words it is just for explaining the situation). >> >> The table I am inserting this is a quite simple table, something like >> this: >> >> CREATE TABLE "public"."names" ( >> "id" SERIAL, >> "name" VARCHAR(255) >> ) WITHOUT OIDS; >> >> It is a requirement that I can make searches on the varchar with >> queries that look the following way: >> >> SELECT * FROM names WHERE name LIKE ‘keyword%’ >> Or >> SELECT * FROM names WHERE name LIKE ‘%keyword%’ >> >> I optimized the first type of queries making partitions with every >> letter that a name can begin with: >> >> CREATE TABLE "public"."names_a" ( >> CONSTRAINT "names_a_check" CHECK (("name")::text ~~ 'a%'::text) >> ) INHERITS ("public"."names") >> WITHOUT OIDS; >> >> The problem arises with the second type of queries, where there are >> no possible partitions and that the search keywords are not known, I >> have tried making indexes on the letter it ends with, or indexes that >> specify that it contains the letter specified but none of them work >> the planifier only make sequential scans over the table. >> >> For the moment the quickest scan I have being able to make is using >> grep!!, surprisingly enough grep searches on an average of 20 seconds >> a whole plain text file of 2 GB one name per line and PostgreSQL on >> the fist type of queries takes like 50 seconds while the second type >> of queries con take up to two minutes which is completely >> unacceptable for an online search engine that has to attend a user >> querying this information. >> >> How does this big search engines let’s say Google make this up? I am >> amazed of the quickness on searching this amount of information in so >> little time. Any approach I could take? I am open minded so anything >> is acceptable not necessarily only PostgreSQL based solutions >> (although I would prefer it). By the way Textual Search in PostgreSQL >> is discarded because what I am looking at are not names that can be >> decomposed on lexems, let's say that this varchar is composed of >> random garbage. > > Actually, a friend of mine actually did exactly what you've tried: > grep. He had a cron job that would update the txt file from the > table's data every five minutes and then his app would shell out to > run those kinds of queries. Of course, with a setup like that your > results can be a little out of date (the period between runs of the > cron job) but, if you can deal with that, that's actually a pretty > simple solution that doesn't take too much setup. > > Erik Jones > > DBA | Emma® > erik@myemma.com > 800.595.4401 or 615.292.5888 > 615.292.0777 (fax) > > Emma helps organizations everywhere communicate & market in style. > Visit us online at http://www.myemma.com > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
pgsql-general by date: