Re: grep -f keyword data query - Mailing list pgsql-general

From Hiroyuki Sato
Subject Re: grep -f keyword data query
Date
Msg-id CA+Tq-RpTFPoM8O5RTdwA8iYAuH4vP4LN=TCFGEV-KCLBUa4-6w@mail.gmail.com
Whole thread Raw
In response to Re: grep -f keyword data query  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: grep -f keyword data query  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: grep -f keyword data query  (Jeff Janes <jeff.janes@gmail.com>)
Re: grep -f keyword data query  (Arjen Nienhuis <a.g.nienhuis@gmail.com>)
List pgsql-general
Hello Andreas and Tom

Thank you for replying.

Sorry, I re-created my questions. I was mis-pasted query log on previous question.
(@~ operator is PGroonga extension (http://pgroonga.github.io))
Please ignore it. 

Best regards.

1, Problem.
  (1) Following query is exteme slow. (478sec)
    SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';


  (2) grep -f kwd.txt sample.txt (exec time under 1sec)

2, Questions

  (1) Is it possible to improve this query like the command ``grep -f keyword data``?
  (2) What kind of Index should I create on url_lists table?
    
3, Environment 
  OS: CentOS7
  PostgreSQL 9.4

4, sample source
    

5, Create table 

    drop table if exists url_lists4;
    create table url_lists4 (
      id int not null primary key,
      url text not null
    );
    create index ix_url_url_lists4 on url_lists4(url);
      
    drop table if exists keywords4;
    create table keywords4 (
      id int not null primary key,
      name varchar(40) not null,
      url text not null
    );

    create index ix_url_keywords4 on keywords4(url);
    create index ix_name_keywords4 on keywords4(name);
      

    \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
    \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';

    vacuum url_lists4;
    vacuum keywords4;
    analyze url_lists4;
    analyze keywords4;

6, Query

    EXPLAIN SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';

    EXPLAIN ANALYZE SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';


    SELECT
      u.url 
    FROM 
      url_lists4 u,
      keywords4 k
    WHERE
      u.url like k.url
    AND
      k.name = 'esc_url';

7, EXPLAIN

                                     QUERY PLAN                                  
    -----------------------------------------------------------------------------
     Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57)
       Join Filter: (u.url ~~ k.url)
       ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57)
       ->  Materialize  (cost=0.00..129.50 rows=5000 width=28)
             ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
                   Filter: ((name)::text = 'esc_url'::text)
    (6 rows)

8, EXPLAIN ANALYZE
                                                             QUERY PLAN                                                          
    -----------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=0.00..37510799.00 rows=12500000 width=57) (actual time=6011.642..478011.117 rows=4850 loops=1)
       Join Filter: (u.url ~~ k.url)
       Rows Removed by Join Filter: 2499995150
       ->  Seq Scan on url_lists4 u  (cost=0.00..10682.00 rows=500000 width=57) (actual time=0.034..192.646 rows=500000 loops=1)
       ->  Materialize  (cost=0.00..129.50 rows=5000 width=28) (actual time=0.000..0.261 rows=5000 loops=500000)
             ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.021..1.705 rows=5000 loops=1)
                   Filter: ((name)::text = 'esc_url'::text)
     Planning time: 0.061 ms
     Execution time: 478011.773 ms
    (9 rows)


2015年12月28日(月) 3:39 Tom Lane <tgl@sss.pgh.pa.us>:
Andreas Kretschmer <andreas@a-kretschmer.de> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> hat am 27. Dezember 2015 um 19:11 geschrieben:
>> What in the world is this @~ operator?  And what sort of index are
>> you using now, that can accept it?  Are the rowcount estimates in
>> the EXPLAIN output accurate?  (If they are, it's hardly surprising
>> that the query takes a long time.)

> in a privat mail he called an other operator: ~~. I think, the @~ is an error.

Well, ~~ isn't directly indexable by btree indexes either, so there's
still something wrong with either the EXPLAIN output or the claimed
index definitions.

                        regards, tom lane

pgsql-general by date:

Previous
From: Andreas Kretschmer
Date:
Subject: Re: Options for complex materialized views sharing most of the same logic?
Next
From: Tom Lane
Date:
Subject: Re: grep -f keyword data query