Re: using text search - Mailing list pgsql-general
From | Rafał Pietrak |
---|---|
Subject | Re: using text search |
Date | |
Msg-id | 51756F83.3090503@zorro.isa-geek.com Whole thread Raw |
In response to | Re: using text search (Alfonso Afonso <aafonsoc@gmail.com>) |
Responses |
Re: using text search
|
List | pgsql-general |
Hi, W dniu 04/22/2013 05:57 PM, Alfonso Afonso pisze: > Hi Rafal > > Maybe you should think or consider to have normalized database to do this kind of select. > > I mean that if the keyword is a key you could add a column to table orders and have it ordered and indexed without havingto reevaluate the query every time. I wouldn't like to do that for the following reasons: 1. postgres is actually a "slave" database, where data is copied for www publication from "root" database. the copying is achieved by means of files produced by an export program, over which I don't have any control; not to mention, that I don't have any control over the layour/design/access to the "root" database. Under such circumstances I'd rather keep postgres schema as close to the file format, not to the "things" I do with the data afterwords - just in case "they" choose to change the format. 2. The whole lot is reinitialized by night, so normalization would have to be rerun as frequently. 3. The actual text-in-text search is not going to be very frequent - like once a week, or after every import; I can materialize a VIEW for that purpose, but it would save me an ocasional second query-run, when I spot errors in results of the first. Not much of a gain, but I keep that in mind for the future. > > About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lotof resources, but my approximation to your problem Yes. Now I can see the "lot of resources": the query (like you suggest below) currently runs for 5 minutes and haven't ended, yet. And this is really bad, since my "volumes" are the following: 1. currently KEYWORDS table contains just 4 (four) rows. 2. currently ORDERS contains c.a. 1mln records. 3. currently SELECT * from ORDERS where info ~~ ('%' || 'some-test-pattern' || '%'); complets in 1sec. 4. let's (for the purpose of this emial clearity) assing XX*PAT*XX to: " info ~~ ('%' || 'some-test-pattern' || '%')", for different PATTERNS 5. currently SELECT * from ORDERS where (XX*PAT1*XX) or (XX*PAT2*XX); executes in 1.7sec. So putting 'some-test-pattern' into a table makes things "nonelinearly" worse. I can provide EXPLAIN ANALYSE of both cases If that would help, but that will need some time, since the "two table" variant haven't finished in 5min, and I killed it before knowing how long it takes to complete. -R > would be doing direct select instead of join select (in fact it is almost the same, but more readable on this cases), like: > > select o.* from orders as o, keywords as k > where o.info like ('%' || k.phrase || '%') > > Hope help you. > > Bye for now > > El 22/04/2013, a las 14:15, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió: > >> ... or not (I'm not quite sure) >> >> Hello, >> >> I have the following tables: >> >> CREATE TABLE orders (info text, ....); >> CREATE TABLE keywords (phrase text, .....); >> >> And I need to find all the ORDERS rows, which conain a PHRASE present in the info column ... like so: >> SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase || '%'); >> >> ... only this does not work, since: >> ERROR: argument of JOIN/ON must be type boolean, not type text >> >> is this possible in SQL? Or may be this a job for "ts_something()" (havent' learned to use them, yet)??? >> >> -R >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > Alfonso Afonso > (personal) > > > > > > >
pgsql-general by date: