Thread: using text search
... or not (I'm not quite sure) Hello, I have the following tables: CREATE TABLE orders (info text, ....); CREATE TABLE keywords (phrase text, .....); And I need to find all the ORDERS rows, which conain a PHRASE present in the info column ... like so: SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase || '%'); ... only this does not work, since: ERROR: argument of JOIN/ON must be type boolean, not type text is this possible in SQL? Or may be this a job for "ts_something()" (havent' learned to use them, yet)??? -R
Hi Rafal Maybe you should think or consider to have normalized database to do this kind of select. I mean that if the keyword is a key you could add a column to table orders and have it ordered and indexed without havingto reevaluate the query every time. About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lot ofresources, but my approximation to your problem would be doing direct select instead of join select (in fact it is almostthe same, but more readable on this cases), like: select o.* from orders as o, keywords as k where o.info like ('%' || k.phrase || '%') Hope help you. Bye for now El 22/04/2013, a las 14:15, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió: > ... or not (I'm not quite sure) > > Hello, > > I have the following tables: > > CREATE TABLE orders (info text, ....); > CREATE TABLE keywords (phrase text, .....); > > And I need to find all the ORDERS rows, which conain a PHRASE present in the info column ... like so: > SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase || '%'); > > ... only this does not work, since: > ERROR: argument of JOIN/ON must be type boolean, not type text > > is this possible in SQL? Or may be this a job for "ts_something()" (havent' learned to use them, yet)??? > > -R > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general Alfonso Afonso (personal)
Hi, W dniu 04/22/2013 05:57 PM, Alfonso Afonso pisze: > Hi Rafal > > Maybe you should think or consider to have normalized database to do this kind of select. > > I mean that if the keyword is a key you could add a column to table orders and have it ordered and indexed without havingto reevaluate the query every time. I wouldn't like to do that for the following reasons: 1. postgres is actually a "slave" database, where data is copied for www publication from "root" database. the copying is achieved by means of files produced by an export program, over which I don't have any control; not to mention, that I don't have any control over the layour/design/access to the "root" database. Under such circumstances I'd rather keep postgres schema as close to the file format, not to the "things" I do with the data afterwords - just in case "they" choose to change the format. 2. The whole lot is reinitialized by night, so normalization would have to be rerun as frequently. 3. The actual text-in-text search is not going to be very frequent - like once a week, or after every import; I can materialize a VIEW for that purpose, but it would save me an ocasional second query-run, when I spot errors in results of the first. Not much of a gain, but I keep that in mind for the future. > > About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lotof resources, but my approximation to your problem Yes. Now I can see the "lot of resources": the query (like you suggest below) currently runs for 5 minutes and haven't ended, yet. And this is really bad, since my "volumes" are the following: 1. currently KEYWORDS table contains just 4 (four) rows. 2. currently ORDERS contains c.a. 1mln records. 3. currently SELECT * from ORDERS where info ~~ ('%' || 'some-test-pattern' || '%'); complets in 1sec. 4. let's (for the purpose of this emial clearity) assing XX*PAT*XX to: " info ~~ ('%' || 'some-test-pattern' || '%')", for different PATTERNS 5. currently SELECT * from ORDERS where (XX*PAT1*XX) or (XX*PAT2*XX); executes in 1.7sec. So putting 'some-test-pattern' into a table makes things "nonelinearly" worse. I can provide EXPLAIN ANALYSE of both cases If that would help, but that will need some time, since the "two table" variant haven't finished in 5min, and I killed it before knowing how long it takes to complete. -R > would be doing direct select instead of join select (in fact it is almost the same, but more readable on this cases), like: > > select o.* from orders as o, keywords as k > where o.info like ('%' || k.phrase || '%') > > Hope help you. > > Bye for now > > El 22/04/2013, a las 14:15, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió: > >> ... or not (I'm not quite sure) >> >> Hello, >> >> I have the following tables: >> >> CREATE TABLE orders (info text, ....); >> CREATE TABLE keywords (phrase text, .....); >> >> And I need to find all the ORDERS rows, which conain a PHRASE present in the info column ... like so: >> SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase || '%'); >> >> ... only this does not work, since: >> ERROR: argument of JOIN/ON must be type boolean, not type text >> >> is this possible in SQL? Or may be this a job for "ts_something()" (havent' learned to use them, yet)??? >> >> -R >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > Alfonso Afonso > (personal) > > > > > > >
And using substring function? Do you have the same results?
I understand your arguments about normalization but, as you know, BNF is the best SQL Developer friend ;)
I used to avoid text fields and search/filter them (when it is possible), mainly because this consumption issues.... if the query has been on execution for more than 5 minutes, the virtual table that postgresql is preparing in memory should be huge.
Bye for now and good luck :)
El 22/04/2013, a las 18:12, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:
About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lot of resources, but my approximation to your problem
Alfonso Afonso
(personal)
I forgot to say that the function is "position ( txtseach in txtcomplete)" :)
Bye
El 22/04/2013, a las 19:36, Alfonso Afonso <aafonsoc@gmail.com> escribió:
And using substring function? Do you have the same results?I understand your arguments about normalization but, as you know, BNF is the best SQL Developer friend ;)I used to avoid text fields and search/filter them (when it is possible), mainly because this consumption issues.... if the query has been on execution for more than 5 minutes, the virtual table that postgresql is preparing in memory should be huge.Bye for now and good luck :)El 22/04/2013, a las 18:12, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lot of resources, but my approximation to your problemAlfonso Afonso(personal)
Alfonso Afonso
(personal)
W dniu 04/22/2013 08:43 PM, Alfonso Afonso pisze:
I forgot to say that the function is "position ( txtseach in txtcomplete)" :)Bye
Alfonso, thenx
But if I may: How can I use that function? In a context of my problem?
then again. At the edge of desperation, I'm thinking of writing a function, that will fetch all the KEYWORDS in one query, then cook explicit WHERE clause by string operations, and then EXECUTE it. With (currently) four keywords, I'd expect such function to return results within 5 seconds at most.
but I'd expect that there should be a way to "tell this" to postgresql SQL directly. Isn't it?
-R
Hi Rafal This function returns the position where the substring is found, so you could do a query with clause position(table1.fieldin table2.field) The 0 result is not found and maybe, in your case, is faster the use of internal text functions instead of like comparison...hope helps. Bye El 23/04/2013, a las 11:24, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió: > W dniu 04/22/2013 08:43 PM, Alfonso Afonso pisze: >> I forgot to say that the function is "position ( txtseach in txtcomplete)" :) >> >> Bye > > > Alfonso, thenx > > But if I may: How can I use that function? In a context of my problem? > > then again. At the edge of desperation, I'm thinking of writing a function, that will fetch all the KEYWORDS in one query,then cook explicit WHERE clause by string operations, and then EXECUTE it. With (currently) four keywords, I'd expectsuch function to return results within 5 seconds at most. > > but I'd expect that there should be a way to "tell this" to postgresql SQL directly. Isn't it? > > > -R > > Alfonso Afonso (personal)
Ha! Got it! for enybody whois interested: with tst(regexp) as (SELECT '(' || array_to_string(array_agg(phrase), '|') || ')' from KEYWORDS) select o.* from ORDERS o, tst t where o.info ~ t.regexp; execution time: 6400ms. (keywords=4, orders=1mln) BTW: does anybody know if there is an index, that could improve the performence of the above regexp? -R W dniu 04/23/2013 02:30 PM, Alfonso Afonso pisze: > Hi Rafal > > This function returns the position where the substring is found, so you could do a query with clause position(table1.fieldin table2.field) > > The 0 result is not found and maybe, in your case, is faster the use of internal text functions instead of like comparison...hope helps. > > Bye > > El 23/04/2013, a las 11:24, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió: > >> W dniu 04/22/2013 08:43 PM, Alfonso Afonso pisze: >>> I forgot to say that the function is "position ( txtseach in txtcomplete)" :) >>> >>> Bye >> >> Alfonso, thenx >> >> But if I may: How can I use that function? In a context of my problem? >> >> then again. At the edge of desperation, I'm thinking of writing a function, that will fetch all the KEYWORDS in one query,then cook explicit WHERE clause by string operations, and then EXECUTE it. With (currently) four keywords, I'd expectsuch function to return results within 5 seconds at most. >> >> but I'd expect that there should be a way to "tell this" to postgresql SQL directly. Isn't it? >> >> >> -R >> >> > Alfonso Afonso > (personal) > > > > > > >