Thread: grep -f keyword data query

grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello.

I would like to create the query like the following.
It work well, but extreme slow.
Is it possible to improve this query like the command ``grep -f keyword data``?

What kind of Index should I create on url_lists table?

Detail 

SELECT
  u.url 
FROM 
  url_lists u,
  keywords k
WHERE
  u.url @~ k.url
AND
  k.name = 'esc_url';

Explain output.

 Nested Loop  (cost=0.45..1570856063.28 rows=57122000000 width=57)
   ->  Index Scan using ix_name_keywords on keywords k  (cost=0.28..221.78 rows=5000 width=30)
         Index Cond: ((name)::text = 'esc_url'::text)
   ->  Index Only Scan using ix_url_url_lists on url_lists u  (cost=0.17..199927.17 rows=11424400 width=57)
         Index Cond: (url @~ k.url)

Best regards.

--
Hiroyuki Sato.


Re: grep -f keyword data query

From
Andreas Kretschmer
Date:
Hiroyuki Sato <hiroysato@gmail.com> wrote:

> Hello.
>
> I would like to create the query like the following.
> It work well, but extreme slow.
> Is it possible to improve this query like the command ``grep -f keyword data``?
>
> What kind of Index should I create on url_lists table?

can you show us the create table and create index - statements?
And, the explain analyse of your query? And, finally, the pg-version?

You got an index-scan, seems okay for me.



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: grep -f keyword data query

From
Tom Lane
Date:
Hiroyuki Sato <hiroysato@gmail.com> writes:
> I would like to create the query like the following.
> It work well, but extreme slow.
> ...
> Explain output.

>  Nested Loop  (cost=0.45..1570856063.28 rows=57122000000 width=57)
>    ->  Index Scan using ix_name_keywords on keywords k  (cost=0.28..221.78 rows=5000 width=30)
>          Index Cond: ((name)::text = 'esc_url'::text)
>    ->  Index Only Scan using ix_url_url_lists on url_lists u  (cost=0.17..199927.17 rows=11424400 width=57)
>          Index Cond: (url @~ k.url)

What in the world is this @~ operator?  And what sort of index are
you using now, that can accept it?  Are the rowcount estimates in
the EXPLAIN output accurate?  (If they are, it's hardly surprising
that the query takes a long time.)

            regards, tom lane


Re: grep -f keyword data query

From
Andreas Kretschmer
Date:

> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>
>
> Hiroyuki Sato <hiroysato@gmail.com> writes:
> > I would like to create the query like the following.
> > It work well, but extreme slow.
> > ...
> > Explain output.
>
> >  Nested Loop  (cost=0.45..1570856063.28 rows=57122000000 width=57)
> >    ->  Index Scan using ix_name_keywords on keywords k  (cost=0.28..221.78
> > rows=5000 width=30)
> >          Index Cond: ((name)::text = 'esc_url'::text)
> >    ->  Index Only Scan using ix_url_url_lists on url_lists u
> >  (cost=0.17..199927.17 rows=11424400 width=57)
> >          Index Cond: (url @~ k.url)
>
> What in the world is this @~ operator?  And what sort of index are
> you using now, that can accept it?  Are the rowcount estimates in
> the EXPLAIN output accurate?  (If they are, it's hardly surprising
> that the query takes a long time.)

in a privat mail he called an other operator: ~~. I think, the @~ is an error.

Here the tables and indexes (from his private mail):

create table url_lists (
  id int not null primary key,
  url text not null
);

create table keywords (
  id int not null primary key,
  name varchar(40) not null,
  url text not null
);

create index ix_url_url_lists on url_lists(url);

create index ix_url_keywords on keywords(url);
create index ix_name_keywords on keywords(name);




And he gave me a new explain, but not a explain ANALYSE. (it was the same
explain as above).

Regards, Andreas.


Re: grep -f keyword data query

From
Tom Lane
Date:
Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>> What in the world is this @~ operator?  And what sort of index are
>> you using now, that can accept it?  Are the rowcount estimates in
>> the EXPLAIN output accurate?  (If they are, it's hardly surprising
>> that the query takes a long time.)

> in a privat mail he called an other operator: ~~. I think, the @~ is an error.

Well, ~~ isn't directly indexable by btree indexes either, so there's
still something wrong with either the EXPLAIN output or the claimed
index definitions.

            regards, tom lane


Re: grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello Andreas and Tom

Thank you for replying.

Sorry, I re-created my questions. I was mis-pasted query log on previous question.
(@~ operator is PGroonga extension (http://pgroonga.github.io))
Please ignore it. 

Best regards.

1, Problem.
  (1) Following query is exteme slow. (478sec)
    SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';


  (2) grep -f kwd.txt sample.txt (exec time under 1sec)

2, Questions

  (1) Is it possible to improve this query like the command ``grep -f keyword data``?
  (2) What kind of Index should I create on url_lists table?
    
3, Environment 
  OS: CentOS7
  PostgreSQL 9.4

4, sample source
    

5, Create table 

    drop table if exists url_lists4;
    create table url_lists4 (
      id int not null primary key,
      url text not null
    );
    create index ix_url_url_lists4 on url_lists4(url);
      
    drop table if exists keywords4;
    create table keywords4 (
      id int not null primary key,
      name varchar(40) not null,
      url text not null
    );

    create index ix_url_keywords4 on keywords4(url);
    create index ix_name_keywords4 on keywords4(name);
      

    \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
    \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';

    vacuum url_lists4;
    vacuum keywords4;
    analyze url_lists4;
    analyze keywords4;

6, Query

    EXPLAIN SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';

    EXPLAIN ANALYZE SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';


    SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';

7, EXPLAIN

                                     QUERY PLAN                                  
    -----------------------------------------------------------------------------
     Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57)
       Join Filter: (u.url ~~ k.url)
       ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57)
       ->  Materialize  (cost=0.00..129.50 rows=5000 width=28)
             ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
                   Filter: ((name)::text = 'esc_url'::text)
    (6 rows)

8, EXPLAIN ANALYZE
                                                             QUERY PLAN                                                          
    -----------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57) (actual time=6011.642..478011.117 rows=4850 loops=1)
       Join Filter: (u.url ~~ k.url)
       Rows Removed by Join Filter: 2499995150
       ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57) (actual time=0.034..192.646 rows=500000 loops=1)
       ->  Materialize  (cost=0.00..129.50 rows=5000 width=28) (actual time=0.000..0.261 rows=5000 loops=500000)
             ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.021..1.705 rows=5000 loops=1)
                   Filter: ((name)::text = 'esc_url'::text)
     Planning time: 0.061 ms
     Execution time: 478011.773 ms
    (9 rows)


2015年12月28日(月) 3:39 Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>> What in the world is this @~ operator?  And what sort of index are
>> you using now, that can accept it?  Are the rowcount estimates in
>> the EXPLAIN output accurate?  (If they are, it's hardly surprising
>> that the query takes a long time.)

> in a privat mail he called an other operator: ~~. I think, the @~ is an error.

Well, ~~ isn't directly indexable by btree indexes either, so there's
still something wrong with either the EXPLAIN output or the claimed
index definitions.

                        regards, tom lane

Re: grep -f keyword data query

From
Tom Lane
Date:
Hiroyuki Sato <hiroysato@gmail.com> writes:
> Sorry, I re-created my questions. I was mis-pasted query log on previous
> question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> [ "like" is actually the operator of interest ]

Ah.  You might get some good results with trigram indexes (ie,
contrib/pg_trgm and a GIST or GIN index).  See

http://www.postgresql.org/docs/9.4/static/pgtrgm.html

            regards, tom lane


Re: grep -f keyword data query

From
Hiroyuki Sato
Date:

Hello Tom.

Thank you for replying.

I re-created index with pg_trgm. 
Execution time is 210sec. 
Yes It is faster than btree index. But still slow. 
It is possible to improve this query speed? 
Should I use another query or idex?

Best regards.

1, query

    SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';


2, create table (gist grgm_ops)

    drop table if exists url_lists4;
    create table url_lists4 (
      id int not null primary key,
      url text not null
    );
    --create index ix_url_url_lists4 on url_lists4(url);
    --
    -- change index 
    -- 
    create index ix_url_url_lists4 on url_lists4 using gist(url gist_trgm_ops);
      
    drop table if exists keywords4;
    create table keywords4 (
      id int not null primary key,
      name varchar(40) not null,
      url text not null
    );

    create index ix_url_keywords4 on keywords4(url);
    create index ix_name_keywords4 on keywords4(name);
      

    \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
    \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';

    vacuum url_lists4;
    vacuum keywords4;
    analyze url_lists4;
    analyze keywords4;

3, EXPLAIN

                                               QUERY PLAN                                            
    -------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.41..432684.50 rows=12500000 width=57)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Index Scan using ix_url_url_lists4 on url_lists4 u  (cost=0.41..61.52 rows=2500 width=57)
             Index Cond: (url ~~ k.url)
    (5 rows)

4, EXPLAIN ANALYZE
                                                                       QUERY PLAN                                                                   
    ------------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.41..432684.50 rows=12500000 width=57) (actual time=1048.428..210641.655 rows=4850 loops=1)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.033..28.878 rows=5000 loops=1)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Index Scan using ix_url_url_lists4 on url_lists4 u  (cost=0.41..61.52 rows=2500 width=57) (actual time=41.426..42.115 rows=1 loops=5000)
             Index Cond: (url ~~ k.url)
             Rows Removed by Index Recheck: 0
     Planning time: 0.138 ms
     Execution time: 210642.896 ms
    (8 rows)


2015年12月28日(月) 9:36 Tom Lane <tgl@sss.pgh.pa.us>:
Hiroyuki Sato <hiroysato@gmail.com> writes:
> Sorry, I re-created my questions. I was mis-pasted query log on previous
> question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> [ "like" is actually the operator of interest ]

Ah.  You might get some good results with trigram indexes (ie,
contrib/pg_trgm and a GIST or GIN index).  See

http://www.postgresql.org/docs/9.4/static/pgtrgm.html

                        regards, tom lane

Re: grep -f keyword data query

From
Tom Lane
Date:
Hiroyuki Sato <hiroysato@gmail.com> writes:
> I re-created index with pg_trgm.
> Execution time is 210sec.
> Yes It is faster than btree index. But still slow.
> It is possible to improve this query speed?
> Should I use another query or idex?

Did you try a GIN index?

            regards, tom lane


Re: grep -f keyword data query

From
Jeff Janes
Date:
On Sun, Dec 27, 2015 at 3:53 PM, Hiroyuki Sato <hiroysato@gmail.com> wrote:
> Hello Andreas and Tom
>
> Thank you for replying.
>
> Sorry, I re-created my questions. I was mis-pasted query log on previous
> question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> Please ignore it.
>
> Best regards.
>
> 1, Problem.
>   (1) Following query is exteme slow. (478sec)
>     SELECT
>       u.url
>     FROM
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>
>   (2) grep -f kwd.txt sample.txt (exec time under 1sec)

Certainly not in my hands.  The best I can get is 9 seconds.

>
> 2, Questions
>
>   (1) Is it possible to improve this query like the command ``grep -f
> keyword data``?

You will not get a general tool to match a specialized tool in the
specialized tool's own domain.  fgrep is good at what fgrep does.


Since your queries all have constant text strings at the beginning,
they could use the index.  But if you are not using the C collation,
then you need build a special index:

create index on url_lists4 (url text_pattern_ops);

But, the planner refuses to use this index for your query anyway,
because it can't see that the patterns are all left-anchored.

Really, your best bet is refactor your url data so it is stored with a
url_prefix and url_suffix column.  Then you can do exact matching
rather than pattern matching.

Cheers,

Jeff


Re: grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello Tom.

Thank you for replying.

This is Gin Index result.

It is slow too.

Best regards.

--
Hiroyuki Sato

1, create.sql

    drop table if exists url_lists4;
    create table url_lists4 (
      id int not null primary key,
      url text not null
    );
    --create index ix_url_url_lists4 on url_lists4(url);
    create index ix_url_url_lists4 on url_lists4 using gin(url gin_trgm_ops);
      
    drop table if exists keywords4;
    create table keywords4 (
      id int not null primary key,
      name varchar(40) not null,
      url text not null
    );

    create index ix_url_keywords4 on keywords4(url);
    create index ix_name_keywords4 on keywords4(name);
      

    \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
    \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';


2, EXPLAIN

                                            QUERY PLAN                                        
    ------------------------------------------------------------------------------------------
     Nested Loop  (cost=22.55..433522.66 rows=12500000 width=57)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Bitmap Heap Scan on url_lists4 u  (cost=22.55..61.68 rows=2500 width=57)
             Recheck Cond: (url ~~ k.url)
             ->  Bitmap Index Scan on ix_url_url_lists4  (cost=0.00..21.92 rows=2500 width=0)
                   Index Cond: (url ~~ k.url)
    (7 rows)

3, EXPLAIN ANALYZE
                                                                    QUERY PLAN                                                                 
    -------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=22.55..433522.66 rows=12500000 width=57) (actual time=7227.210..1753163.751 rows=4850 loops=1)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.035..16.577 rows=5000 loops=1)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Bitmap Heap Scan on url_lists4 u  (cost=22.55..61.68 rows=2500 width=57) (actual time=350.625..350.626 rows=1 loops=5000)
             Recheck Cond: (url ~~ k.url)
             Rows Removed by Index Recheck: 0
             Heap Blocks: exact=159
             ->  Bitmap Index Scan on ix_url_url_lists4  (cost=0.00..21.92 rows=2500 width=0) (actual time=350.618..350.618 rows=1 loops=5000)
                   Index Cond: (url ~~ k.url)
     Planning time: 0.169 ms
     Execution time: 1753165.329 ms
    (11 rows)



2015年12月29日(火) 2:34 Tom Lane <tgl@sss.pgh.pa.us>:
Hiroyuki Sato <hiroysato@gmail.com> writes:
> I re-created index with pg_trgm.
> Execution time is 210sec.
> Yes It is faster than btree index. But still slow.
> It is possible to improve this query speed?
> Should I use another query or idex?

Did you try a GIN index?

                        regards, tom lane

Re: grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello Jeff

Thank you for replying.

2015年12月29日(火) 4:35 Jeff Janes <jeff.janes@gmail.com>:
On Sun, Dec 27, 2015 at 3:53 PM, Hiroyuki Sato <hiroysato@gmail.com> wrote:
> Hello Andreas and Tom
>
> Thank you for replying.
>
> Sorry, I re-created my questions. I was mis-pasted query log on previous
> question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> Please ignore it.
>
> Best regards.
>
> 1, Problem.
>   (1) Following query is exteme slow. (478sec)
>     SELECT
>       u.url
>     FROM
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>
>   (2) grep -f kwd.txt sample.txt (exec time under 1sec)

Certainly not in my hands.  The best I can get is 9 seconds.

>
> 2, Questions
>
>   (1) Is it possible to improve this query like the command ``grep -f
> keyword data``?

You will not get a general tool to match a specialized tool in the
specialized tool's own domain.  fgrep is good at what fgrep does.

This is just same data and real data is forward proxy log.
There are no own domain. It is contain 20,000,000 log per day.
 


Since your queries all have constant text strings at the beginning,
they could use the index.  But if you are not using the C collation,
then you need build a special index:

create index on url_lists4 (url text_pattern_ops);

Currently I just use C collation (ASCII) only. 
 

But, the planner refuses to use this index for your query anyway,
because it can't see that the patterns are all left-anchored.

Really, your best bet is refactor your url data so it is stored with a
url_prefix and url_suffix column.  Then you can do exact matching
rather than pattern matching.

I see, exact matching faster than pattern matting. 
But I need pattern match in path part 
 I would like to pattern match '/a/b/c' part. 

That's why I asked this question.

If it is impossible to improve join speed, I will dump data once, and 
match it with grep or something tools.

Thanks.

 

Cheers,

Jeff

Re: grep -f keyword data query

From
David Rowley
Date:
On 30 December 2015 at 04:21, Hiroyuki Sato <hiroysato@gmail.com> wrote:
2015年12月29日(火) 4:35 Jeff Janes <jeff.janes@gmail.com>:
 

But, the planner refuses to use this index for your query anyway,
because it can't see that the patterns are all left-anchored.

Really, your best bet is refactor your url data so it is stored with a
url_prefix and url_suffix column.  Then you can do exact matching
rather than pattern matching.

I see, exact matching faster than pattern matting. 
But I need pattern match in path part 
 I would like to pattern match '/a/b/c' part. 

If your pattern matching is as simple as that, then why not split the /a/b/c/ part out as mentioned by Jeff? Alternatively you could just write a function which splits that out for you and returns it, then index that function, and then just include a call to that function in the join condition matching with the equality operator. That'll allow hash and merge joins to be possible again. 
 
--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello David

Thank you for replying.

2015年12月30日(水) 6:04 David Rowley <david.rowley@2ndquadrant.com>:
On 30 December 2015 at 04:21, Hiroyuki Sato <hiroysato@gmail.com> wrote:
2015年12月29日(火) 4:35 Jeff Janes <jeff.janes@gmail.com>:
 
But, the planner refuses to use this index for your query anyway,
because it can't see that the patterns are all left-anchored.

Really, your best bet is refactor your url data so it is stored with a
url_prefix and url_suffix column.  Then you can do exact matching
rather than pattern matching.
I see, exact matching faster than pattern matting. 
But I need pattern match in path part 
 I would like to pattern match '/a/b/c' part. 

If your pattern matching is as simple as that, then why not split the /a/b/c/ part out as mentioned by Jeff? Alternatively you could just write a function which splits that out for you and returns it, then index that function, and then just include a call to that function in the join condition matching with the equality operator. That'll allow hash and merge joins to be possible again. 

Could you tell me more detail about Alternatively part?

It is good idea to split host and part.
I'll try it. 

My matching pattern is the following
1, http://www.yahoo.com/a/b/% (host equal, path like)
2, http://%.yahoo.com/a/b/%   (host and path like )

Can I use equality operator in those cases?

Best regards.
 
 
--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: grep -f keyword data query

From
David Rowley
Date:
On 30 December 2015 at 13:56, Hiroyuki Sato <hiroysato@gmail.com> wrote:
2015年12月30日(水) 6:04 David Rowley <david.rowley@2ndquadrant.com>:
On 30 December 2015 at 04:21, Hiroyuki Sato <hiroysato@gmail.com> wrote:
2015年12月29日(火) 4:35 Jeff Janes <jeff.janes@gmail.com>:
 
But, the planner refuses to use this index for your query anyway,
because it can't see that the patterns are all left-anchored.

Really, your best bet is refactor your url data so it is stored with a
url_prefix and url_suffix column.  Then you can do exact matching
rather than pattern matching.
I see, exact matching faster than pattern matting. 
But I need pattern match in path part 
 I would like to pattern match '/a/b/c' part. 

If your pattern matching is as simple as that, then why not split the /a/b/c/ part out as mentioned by Jeff? Alternatively you could just write a function which splits that out for you and returns it, then index that function, and then just include a call to that function in the join condition matching with the equality operator. That'll allow hash and merge joins to be possible again. 

Could you tell me more detail about Alternatively part?

It is good idea to split host and part.
I'll try it. 

My matching pattern is the following
1, http://www.yahoo.com/a/b/% (host equal, path like)
2, http://%.yahoo.com/a/b/%   (host and path like )

It seems I misunderstood your pattern matching. The example you supplied earlier indicated you just needed to match the document part (/a/b/c/) and just ignore the protocol://host part, in which case you could have written a function which took a text parameter, say: "http://www.yahoo.com/a/b/c/", and returned "/a/b/c", then performed: create index on yourtable (thatfunction(yourcolumn)); However that method won't help you, as it seems your pattern matching is more complex than the previous example that you supplied. 

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: grep -f keyword data query

From
Arjen Nienhuis
Date:


On Dec 28, 2015 00:55, "Hiroyuki Sato" <hiroysato@gmail.com> wrote:
>
> Hello Andreas and Tom
>
> Thank you for replying.
>
> Sorry, I re-created my questions. I was mis-pasted query log on previous question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> Please ignore it. 
>
> Best regards.
>
> 1, Problem.
>   (1) Following query is exteme slow. (478sec)
>     SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>
>   (2) grep -f kwd.txt sample.txt (exec time under 1sec)
>

These are not the same 'query'. Grep will match any of the patterns for each url while postgres finds all matching combinations. You need to use '... OR ... OR ...' or 'ANY()' in postgres to get the same result.

> 2, Questions
>
>   (1) Is it possible to improve this query like the command ``grep -f keyword data``?

I get the best results by using OR of all the different patterns:

SELECT url FROM url_lists4
WHERE
    url LIKE 'http://ak.yahoo.co.jp/xwv/%'
    OR url LIKE 'http://ao.yahoo.co.jp/wdl/%'
    OR ...

In theory you could use:

... WHERE url LIKE ANY(ARRAY(SELECT url FROM keywords4 k WHERE k.name = 'esc_url'));

but that's very slow.

>   (2) What kind of Index should I create on url_lists table?

Both btree text_pattern_ops and gin trigram on the URL.

>     
> 3, Environment 
>   OS: CentOS7
>   PostgreSQL 9.4
>
> 4, sample source
>   https://github.com/hiroyuki-sato/postgres_like_test
>     
>
> 5, Create table 
>
>     drop table if exists url_lists4;
>     create table url_lists4 (
>       id int not null primary key,
>       url text not null
>     );
>     create index ix_url_url_lists4 on url_lists4(url);
>       
>     drop table if exists keywords4;
>     create table keywords4 (
>       id int not null primary key,
>       name varchar(40) not null,
>       url text not null
>     );
>
>     create index ix_url_keywords4 on keywords4(url);
>     create index ix_name_keywords4 on keywords4(name);
>       
>
>     \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
>     \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';
>
>     vacuum url_lists4;
>     vacuum keywords4;
>     analyze url_lists4;
>     analyze keywords4;
>
> 6, Query
>
>     EXPLAIN SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>     EXPLAIN ANALYZE SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>
>     SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
> 7, EXPLAIN
>
>                                      QUERY PLAN                                  
>     -----------------------------------------------------------------------------
>      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57)
>        Join Filter: (u.url ~~ k.url)
>        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57)
>        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28)
>              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
>                    Filter: ((name)::text = 'esc_url'::text)
>     (6 rows)
>
> 8, EXPLAIN ANALYZE
>                                                              QUERY PLAN                                                          
>     -----------------------------------------------------------------------------------------------------------------------------
>      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57) (actual time=6011.642..478011.117 rows=4850 loops=1)
>        Join Filter: (u.url ~~ k.url)
>        Rows Removed by Join Filter: 2499995150
>        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57) (actual time=0.034..192.646 rows=500000 loops=1)
>        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28) (actual time=0.000..0.261 rows=5000 loops=500000)
>              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.021..1.705 rows=5000 loops=1)
>                    Filter: ((name)::text = 'esc_url'::text)
>      Planning time: 0.061 ms
>      Execution time: 478011.773 ms
>     (9 rows)
>
>
> 2015年12月28日(月) 3:39 Tom Lane <tgl@sss.pgh.pa.us>:
>>
>> Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>> >> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>> >> What in the world is this @~ operator?  And what sort of index are
>> >> you using now, that can accept it?  Are the rowcount estimates in
>> >> the EXPLAIN output accurate?  (If they are, it's hardly surprising
>> >> that the query takes a long time.)
>>
>> > in a privat mail he called an other operator: ~~. I think, the @~ is an error.
>>
>> Well, ~~ isn't directly indexable by btree indexes either, so there's
>> still something wrong with either the EXPLAIN output or the claimed
>> index definitions.
>>
>>                         regards, tom lane

Re: grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello David 

Thank your for your comment.

2015年12月30日(水) 10:15 David Rowley <david.rowley@2ndquadrant.com>:
On 30 December 2015 at 13:56, Hiroyuki Sato <hiroysato@gmail.com> wrote:
2015年12月30日(水) 6:04 David Rowley <david.rowley@2ndquadrant.com>:
On 30 December 2015 at 04:21, Hiroyuki Sato <hiroysato@gmail.com> wrote:
2015年12月29日(火) 4:35 Jeff Janes <jeff.janes@gmail.com>:
 
But, the planner refuses to use this index for your query anyway,
because it can't see that the patterns are all left-anchored.

Really, your best bet is refactor your url data so it is stored with a
url_prefix and url_suffix column.  Then you can do exact matching
rather than pattern matching.
I see, exact matching faster than pattern matting. 
But I need pattern match in path part 
 I would like to pattern match '/a/b/c' part. 

If your pattern matching is as simple as that, then why not split the /a/b/c/ part out as mentioned by Jeff? Alternatively you could just write a function which splits that out for you and returns it, then index that function, and then just include a call to that function in the join condition matching with the equality operator. That'll allow hash and merge joins to be possible again. 

Could you tell me more detail about Alternatively part?

It is good idea to split host and part.
I'll try it. 

My matching pattern is the following
1, http://www.yahoo.com/a/b/% (host equal, path like)
2, http://%.yahoo.com/a/b/%   (host and path like )

It seems I misunderstood your pattern matching. The example you supplied earlier indicated you just needed to match the document part (/a/b/c/) and just ignore the protocol://host part, in which case you could have written a function which took a text parameter, say: "http://www.yahoo.com/a/b/c/", and returned "/a/b/c", then performed: create index on yourtable (thatfunction(yourcolumn)); However that method won't help you, as it seems your pattern matching is more complex than the previous example that you supplied. 

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello Arjen

Thank you for replying.

2016年1月4日(月) 16:49 Arjen Nienhuis <a.g.nienhuis@gmail.com>:


On Dec 28, 2015 00:55, "Hiroyuki Sato" <hiroysato@gmail.com> wrote:
>
> Hello Andreas and Tom
>
> Thank you for replying.
>
> Sorry, I re-created my questions. I was mis-pasted query log on previous question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> Please ignore it. 
>
> Best regards.
>
> 1, Problem.
>   (1) Following query is exteme slow. (478sec)
>     SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>
>   (2) grep -f kwd.txt sample.txt (exec time under 1sec)
>

These are not the same 'query'. Grep will match any of the patterns for each url while postgres finds all matching combinations. You need to use '... OR ... OR ...' or 'ANY()' in postgres to get the same result.

I thought the following query are same meaning.
Could you point me same examples about two differences?

  (1) u.url like k.url
  (2) u.url like 'k.url 1', or u.url like 'k.url2' ...

> 2, Questions
>
>   (1) Is it possible to improve this query like the command ``grep -f keyword data``?

I get the best results by using OR of all the different patterns:

SELECT url FROM url_lists4
WHERE
    url LIKE 'http://ak.yahoo.co.jp/xwv/%'
    OR url LIKE 'http://ao.yahoo.co.jp/wdl/%'
    OR ...


I'll try it.

BTW Do you know how many OR can I use ?
I have 5000 URLs.

In theory you could use:

... WHERE url LIKE ANY(ARRAY(SELECT url FROM keywords4 k WHERE k.name = 'esc_url'));

but that's very slow.

It's very interesting. It seems same query.
I'll check EXPLAIN 

 

>   (2) What kind of Index should I create on url_lists table?

Both btree text_pattern_ops and gin trigram on the URL.

>     
> 3, Environment 
>   OS: CentOS7
>   PostgreSQL 9.4
>
> 4, sample source
>   https://github.com/hiroyuki-sato/postgres_like_test
>     
>
> 5, Create table 
>
>     drop table if exists url_lists4;
>     create table url_lists4 (
>       id int not null primary key,
>       url text not null
>     );
>     create index ix_url_url_lists4 on url_lists4(url);
>       
>     drop table if exists keywords4;
>     create table keywords4 (
>       id int not null primary key,
>       name varchar(40) not null,
>       url text not null
>     );
>
>     create index ix_url_keywords4 on keywords4(url);
>     create index ix_name_keywords4 on keywords4(name);
>       
>
>     \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
>     \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';
>
>     vacuum url_lists4;
>     vacuum keywords4;
>     analyze url_lists4;
>     analyze keywords4;
>
> 6, Query
>
>     EXPLAIN SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>     EXPLAIN ANALYZE SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
>
>     SELECT
>       u.url 
>     FROM 
>       url_lists4 u,
>       keywords4 k
>     WHERE
>       u.url like k.url
>     AND
>       k.name = 'esc_url';
>
> 7, EXPLAIN
>
>                                      QUERY PLAN                                  
>     -----------------------------------------------------------------------------
>      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57)
>        Join Filter: (u.url ~~ k.url)
>        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57)
>        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28)
>              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
>                    Filter: ((name)::text = 'esc_url'::text)
>     (6 rows)
>
> 8, EXPLAIN ANALYZE
>                                                              QUERY PLAN                                                          
>     -----------------------------------------------------------------------------------------------------------------------------
>      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57) (actual time=6011.642..478011.117 rows=4850 loops=1)
>        Join Filter: (u.url ~~ k.url)
>        Rows Removed by Join Filter: 2499995150
>        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57) (actual time=0.034..192.646 rows=500000 loops=1)
>        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28) (actual time=0.000..0.261 rows=5000 loops=500000)
>              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.021..1.705 rows=5000 loops=1)
>                    Filter: ((name)::text = 'esc_url'::text)
>      Planning time: 0.061 ms
>      Execution time: 478011.773 ms
>     (9 rows)
>
>
> 2015年12月28日(月) 3:39 Tom Lane <tgl@sss.pgh.pa.us>:
>>
>> Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>> >> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>> >> What in the world is this @~ operator?  And what sort of index are
>> >> you using now, that can accept it?  Are the rowcount estimates in
>> >> the EXPLAIN output accurate?  (If they are, it's hardly surprising
>> >> that the query takes a long time.)
>>
>> > in a privat mail he called an other operator: ~~. I think, the @~ is an error.
>>
>> Well, ~~ isn't directly indexable by btree indexes either, so there's
>> still something wrong with either the EXPLAIN output or the claimed
>> index definitions.
>>
>>                         regards, tom lane


Best regards.
 

Re: grep -f keyword data query

From
Arjen Nienhuis
Date:


On Jan 4, 2016 09:45, "Hiroyuki Sato" <hiroysato@gmail.com> wrote:
>
> Hello Arjen
>
> Thank you for replying.
>
> 2016年1月4日(月) 16:49 Arjen Nienhuis <a.g.nienhuis@gmail.com>:
>>
>>
>> On Dec 28, 2015 00:55, "Hiroyuki Sato" <hiroysato@gmail.com> wrote:
>> >
>> > Hello Andreas and Tom
>> >
>> > Thank you for replying.
>> >
>> > Sorry, I re-created my questions. I was mis-pasted query log on previous question.
>> > (@~ operator is PGroonga extension (http://pgroonga.github.io))
>> > Please ignore it. 
>> >
>> > Best regards.
>> >
>> > 1, Problem.
>> >   (1) Following query is exteme slow. (478sec)
>> >     SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> >
>> >   (2) grep -f kwd.txt sample.txt (exec time under 1sec)
>> >
>>
>> These are not the same 'query'. Grep will match any of the patterns for each url while postgres finds all matching combinations. You need to use '... OR ... OR ...' or 'ANY()' in postgres to get the same result.
>
> I thought the following query are same meaning.
> Could you point me same examples about two differences?

If one URL matches two patterns, grep will return the match only once. In your SQL query it will match multiple times and will be in the result multiple times.

You can test this by putting both 'http' and 'yahoo' in the wordlist.

>
>   (1) u.url like k.url
>   (2) u.url like 'k.url 1', or u.url like 'k.url2' ...
>
>> > 2, Questions
>> >
>> >   (1) Is it possible to improve this query like the command ``grep -f keyword data``?
>>
>> I get the best results by using OR of all the different patterns:
>>
>> SELECT url FROM url_lists4
>> WHERE
>>     url LIKE 'http://ak.yahoo.co.jp/xwv/%'
>>     OR url LIKE 'http://ao.yahoo.co.jp/wdl/%'
>>     OR ...
>
>
> I'll try it.
>
> BTW Do you know how many OR can I use ?
> I have 5000 URLs.

There is no real limit. I tried with your test file (5000 patterns). I got 6 seconds planning time and 12 seconds execution time.

(I also tried making one big regular expression with url1|url2|... but that did fail)

>
>> In theory you could use:
>>
>> ... WHERE url LIKE ANY(ARRAY(SELECT url FROM keywords4 k WHERE k.name = 'esc_url'));
>>
>> but that's very slow.
>
> It's very interesting. It seems same query.
> I'll check EXPLAIN 
>
>  
>>
>> >   (2) What kind of Index should I create on url_lists table?
>>
>> Both btree text_pattern_ops and gin trigram on the URL.
>>
>> >     
>> > 3, Environment 
>> >   OS: CentOS7
>> >   PostgreSQL 9.4
>> >
>> > 4, sample source
>> >   https://github.com/hiroyuki-sato/postgres_like_test
>> >     
>> >
>> > 5, Create table 
>> >
>> >     drop table if exists url_lists4;
>> >     create table url_lists4 (
>> >       id int not null primary key,
>> >       url text not null
>> >     );
>> >     create index ix_url_url_lists4 on url_lists4(url);
>> >       
>> >     drop table if exists keywords4;
>> >     create table keywords4 (
>> >       id int not null primary key,
>> >       name varchar(40) not null,
>> >       url text not null
>> >     );
>> >
>> >     create index ix_url_keywords4 on keywords4(url);
>> >     create index ix_name_keywords4 on keywords4(name);
>> >       
>> >
>> >     \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
>> >     \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';
>> >
>> >     vacuum url_lists4;
>> >     vacuum keywords4;
>> >     analyze url_lists4;
>> >     analyze keywords4;
>> >
>> > 6, Query
>> >
>> >     EXPLAIN SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> >     EXPLAIN ANALYZE SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> >
>> >     SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> > 7, EXPLAIN
>> >
>> >                                      QUERY PLAN                                  
>> >     -----------------------------------------------------------------------------
>> >      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57)
>> >        Join Filter: (u.url ~~ k.url)
>> >        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57)
>> >        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28)
>> >              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
>> >                    Filter: ((name)::text = 'esc_url'::text)
>> >     (6 rows)
>> >
>> > 8, EXPLAIN ANALYZE
>> >                                                              QUERY PLAN                                                          
>> >     -----------------------------------------------------------------------------------------------------------------------------
>> >      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57) (actual time=6011.642..478011.117 rows=4850 loops=1)
>> >        Join Filter: (u.url ~~ k.url)
>> >        Rows Removed by Join Filter: 2499995150
>> >        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57) (actual time=0.034..192.646 rows=500000 loops=1)
>> >        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28) (actual time=0.000..0.261 rows=5000 loops=500000)
>> >              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.021..1.705 rows=5000 loops=1)
>> >                    Filter: ((name)::text = 'esc_url'::text)
>> >      Planning time: 0.061 ms
>> >      Execution time: 478011.773 ms
>> >     (9 rows)
>> >
>> >
>> > 2015年12月28日(月) 3:39 Tom Lane <tgl@sss.pgh.pa.us>:
>> >>
>> >> Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>> >> >> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>> >> >> What in the world is this @~ operator?  And what sort of index are
>> >> >> you using now, that can accept it?  Are the rowcount estimates in
>> >> >> the EXPLAIN output accurate?  (If they are, it's hardly surprising
>> >> >> that the query takes a long time.)
>> >>
>> >> > in a privat mail he called an other operator: ~~. I think, the @~ is an error.
>> >>
>> >> Well, ~~ isn't directly indexable by btree indexes either, so there's
>> >> still something wrong with either the EXPLAIN output or the claimed
>> >> index definitions.
>> >>
>> >>                         regards, tom lane
>
>
> Best regards.
>  

Re: grep -f keyword data query

From
Hiroyuki Sato
Date:
Hello Arjen.

Thank you for replying.

I'll try OR query on my environment.
Thanks.



2016年1月4日(月) 23:03 Arjen Nienhuis <a.g.nienhuis@gmail.com>:


On Jan 4, 2016 09:45, "Hiroyuki Sato" <hiroysato@gmail.com> wrote:
>
> Hello Arjen
>
> Thank you for replying.
>
> 2016年1月4日(月) 16:49 Arjen Nienhuis <a.g.nienhuis@gmail.com>:
>>
>>
>> On Dec 28, 2015 00:55, "Hiroyuki Sato" <hiroysato@gmail.com> wrote:
>> >
>> > Hello Andreas and Tom
>> >
>> > Thank you for replying.
>> >
>> > Sorry, I re-created my questions. I was mis-pasted query log on previous question.
>> > (@~ operator is PGroonga extension (http://pgroonga.github.io))
>> > Please ignore it. 
>> >
>> > Best regards.
>> >
>> > 1, Problem.
>> >   (1) Following query is exteme slow. (478sec)
>> >     SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> >
>> >   (2) grep -f kwd.txt sample.txt (exec time under 1sec)
>> >
>>
>> These are not the same 'query'. Grep will match any of the patterns for each url while postgres finds all matching combinations. You need to use '... OR ... OR ...' or 'ANY()' in postgres to get the same result.
>
> I thought the following query are same meaning.
> Could you point me same examples about two differences?

If one URL matches two patterns, grep will return the match only once. In your SQL query it will match multiple times and will be in the result multiple times.

You can test this by putting both 'http' and 'yahoo' in the wordlist.

>
>   (1) u.url like k.url
>   (2) u.url like 'k.url 1', or u.url like 'k.url2' ...
>
>> > 2, Questions
>> >
>> >   (1) Is it possible to improve this query like the command ``grep -f keyword data``?
>>
>> I get the best results by using OR of all the different patterns:
>>
>> SELECT url FROM url_lists4
>> WHERE
>>     url LIKE 'http://ak.yahoo.co.jp/xwv/%'
>>     OR url LIKE 'http://ao.yahoo.co.jp/wdl/%'
>>     OR ...
>
>
> I'll try it.
>
> BTW Do you know how many OR can I use ?
> I have 5000 URLs.

There is no real limit. I tried with your test file (5000 patterns). I got 6 seconds planning time and 12 seconds execution time.

(I also tried making one big regular expression with url1|url2|... but that did fail)

>
>> In theory you could use:
>>
>> ... WHERE url LIKE ANY(ARRAY(SELECT url FROM keywords4 k WHERE k.name = 'esc_url'));
>>
>> but that's very slow.
>
> It's very interesting. It seems same query.
> I'll check EXPLAIN 
>
>  
>>
>> >   (2) What kind of Index should I create on url_lists table?
>>
>> Both btree text_pattern_ops and gin trigram on the URL.
>>
>> >     
>> > 3, Environment 
>> >   OS: CentOS7
>> >   PostgreSQL 9.4
>> >
>> > 4, sample source
>> >   https://github.com/hiroyuki-sato/postgres_like_test
>> >     
>> >
>> > 5, Create table 
>> >
>> >     drop table if exists url_lists4;
>> >     create table url_lists4 (
>> >       id int not null primary key,
>> >       url text not null
>> >     );
>> >     create index ix_url_url_lists4 on url_lists4(url);
>> >       
>> >     drop table if exists keywords4;
>> >     create table keywords4 (
>> >       id int not null primary key,
>> >       name varchar(40) not null,
>> >       url text not null
>> >     );
>> >
>> >     create index ix_url_keywords4 on keywords4(url);
>> >     create index ix_name_keywords4 on keywords4(name);
>> >       
>> >
>> >     \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
>> >     \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';
>> >
>> >     vacuum url_lists4;
>> >     vacuum keywords4;
>> >     analyze url_lists4;
>> >     analyze keywords4;
>> >
>> > 6, Query
>> >
>> >     EXPLAIN SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> >     EXPLAIN ANALYZE SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> >
>> >     SELECT
>> >       u.url 
>> >     FROM 
>> >       url_lists4 u,
>> >       keywords4 k
>> >     WHERE
>> >       u.url like k.url
>> >     AND
>> >       k.name = 'esc_url';
>> >
>> > 7, EXPLAIN
>> >
>> >                                      QUERY PLAN                                  
>> >     -----------------------------------------------------------------------------
>> >      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57)
>> >        Join Filter: (u.url ~~ k.url)
>> >        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57)
>> >        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28)
>> >              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
>> >                    Filter: ((name)::text = 'esc_url'::text)
>> >     (6 rows)
>> >
>> > 8, EXPLAIN ANALYZE
>> >                                                              QUERY PLAN                                                          
>> >     -----------------------------------------------------------------------------------------------------------------------------
>> >      Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57) (actual time=6011.642..478011.117 rows=4850 loops=1)
>> >        Join Filter: (u.url ~~ k.url)
>> >        Rows Removed by Join Filter: 2499995150
>> >        ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57) (actual time=0.034..192.646 rows=500000 loops=1)
>> >        ->  Materialize  (cost=0.00..129.50 rows=5000 width=28) (actual time=0.000..0.261 rows=5000 loops=500000)
>> >              ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.021..1.705 rows=5000 loops=1)
>> >                    Filter: ((name)::text = 'esc_url'::text)
>> >      Planning time: 0.061 ms
>> >      Execution time: 478011.773 ms
>> >     (9 rows)
>> >
>> >
>> > 2015年12月28日(月) 3:39 Tom Lane <tgl@sss.pgh.pa.us>:
>> >>
>> >> Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>> >> >> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>> >> >> What in the world is this @~ operator?  And what sort of index are
>> >> >> you using now, that can accept it?  Are the rowcount estimates in
>> >> >> the EXPLAIN output accurate?  (If they are, it's hardly surprising
>> >> >> that the query takes a long time.)
>> >>
>> >> > in a privat mail he called an other operator: ~~. I think, the @~ is an error.
>> >>
>> >> Well, ~~ isn't directly indexable by btree indexes either, so there's
>> >> still something wrong with either the EXPLAIN output or the claimed
>> >> index definitions.
>> >>
>> >>                         regards, tom lane
>
>
> Best regards.
>