Home > mailing lists

Re: using text search - Mailing list pgsql-general

From	Rafał Pietrak
Subject	Re: using text search
Date	April 22, 2013 17:12:52
Msg-id	51756F83.3090503@zorro.isa-geek.com Whole thread Raw
In response to	Re: using text search (Alfonso Afonso <aafonsoc@gmail.com>)
Responses	Re: using text search
List	pgsql-general

Tree view

Hi,

W dniu 04/22/2013 05:57 PM, Alfonso Afonso pisze:
> Hi Rafal
>
> Maybe you should think or consider to have normalized database to do this kind of select.
>
> I mean that if the keyword is a key you could add a column to table orders and have it ordered and indexed without
havingto reevaluate the query every time. 
I wouldn't like to do that for the following reasons:
1. postgres is actually a "slave" database, where data is copied for www
publication from "root" database. the copying is achieved by means of
files produced by an export program, over which I don't have any
control; not to mention, that I don't have any control over the
layour/design/access to the "root" database. Under such circumstances
I'd rather keep postgres schema as close to the file format, not to the
"things" I do with the data afterwords - just in case "they" choose to
change the format.
2. The whole lot is reinitialized by night, so normalization would have
to be rerun as frequently.
3. The actual text-in-text search is not going to be very frequent -
like once a week, or after every import; I can materialize a VIEW for
that purpose, but it would save me an ocasional second query-run, when I
spot errors in results of the first. Not much of a gain, but I keep that
in mind for the future.

>
> About your question, if you are searching a text inside another text you could obtain bizarre results and consume a
lotof resources, but my approximation to your problem 
Yes. Now I can see the "lot of resources": the query (like you suggest
below) currently runs for 5 minutes and haven't ended, yet. And this is
really bad, since my "volumes" are the following:
1. currently KEYWORDS table contains just 4 (four) rows.
2. currently ORDERS contains c.a. 1mln records.
3. currently SELECT * from ORDERS where info ~~ ('%' ||
'some-test-pattern' || '%'); complets in 1sec.
4. let's (for the purpose of this emial clearity) assing XX*PAT*XX to: "
info ~~ ('%' || 'some-test-pattern' || '%')", for different PATTERNS
5. currently SELECT * from ORDERS where (XX*PAT1*XX) or (XX*PAT2*XX);
executes in 1.7sec.

So putting 'some-test-pattern' into a table makes things "nonelinearly"
worse.

I can provide EXPLAIN ANALYSE of both cases If that would help, but that
will need some time, since the "two table" variant haven't finished in
5min, and I killed it before knowing how long it takes to complete.


-R

> would be doing direct select instead of join select (in fact it is almost the same, but more readable on this cases),
like:
>
> select o.* from orders as o, keywords as k
> where o.info like ('%' || k.phrase || '%')
>
> Hope help you.
>
> Bye for now
>
> El 22/04/2013, a las 14:15, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:
>
>> ... or not (I'm not quite sure)
>>
>> Hello,
>>
>> I have the following tables:
>>
>> CREATE TABLE orders (info text, ....);
>> CREATE TABLE keywords (phrase text, .....);
>>
>> And I need to find all the ORDERS rows, which conain a PHRASE present in the info column ... like so:
>> SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase || '%');
>>
>> ... only this does not work, since:
>> ERROR:  argument of JOIN/ON must be type boolean, not type text
>>
>> is this possible in SQL? Or may be this a job for "ts_something()" (havent' learned to use them, yet)???
>>
>> -R
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
> Alfonso Afonso
> (personal)
>
>
>
>
>
>
>

pgsql-general by date:

From: Moshe Jacobson
Date: 22 April 2013, 17:05:57
Subject: Re: Need solution for weekly database "snapshot"

From: hamann.w@t-online.de
Date: 22 April 2013, 17:26:47
Subject: how can this get faster

Re: using text search - Mailing list pgsql-general

Previous

Next