Re: grep -f keyword data query - Mailing list pgsql-general

From Hiroyuki Sato
Subject Re: grep -f keyword data query
Date
Msg-id CA+Tq-Ro_CHUHUizX-PwmEij-dS0Leekd-6yxx2G9D6D63rb7kA@mail.gmail.com
Whole thread Raw
In response to Re: grep -f keyword data query  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Hello Tom.

Thank you for replying.

This is Gin Index result.

It is slow too.

Best regards.

--
Hiroyuki Sato

1, create.sql

    drop table if exists url_lists4;
    create table url_lists4 (
      id int not null primary key,
      url text not null
    );
    --create index ix_url_url_lists4 on url_lists4(url);
    create index ix_url_url_lists4 on url_lists4 using gin(url gin_trgm_ops);
      
    drop table if exists keywords4;
    create table keywords4 (
      id int not null primary key,
      name varchar(40) not null,
      url text not null
    );

    create index ix_url_keywords4 on keywords4(url);
    create index ix_name_keywords4 on keywords4(name);
      

    \copy url_lists4(id,url) from 'sample.txt' with delimiter ',';
    \copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';


2, EXPLAIN

                                            QUERY PLAN                                        
    ------------------------------------------------------------------------------------------
     Nested Loop  (cost=22.55..433522.66 rows=12500000 width=57)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Bitmap Heap Scan on url_lists4 u  (cost=22.55..61.68 rows=2500 width=57)
             Recheck Cond: (url ~~ k.url)
             ->  Bitmap Index Scan on ix_url_url_lists4  (cost=0.00..21.92 rows=2500 width=0)
                   Index Cond: (url ~~ k.url)
    (7 rows)

3, EXPLAIN ANALYZE
                                                                    QUERY PLAN                                                                 
    -------------------------------------------------------------------------------------------------------------------------------------------
     Nested Loop  (cost=22.55..433522.66 rows=12500000 width=57) (actual time=7227.210..1753163.751 rows=4850 loops=1)
       ->  Seq Scan on keywords4 k  (cost=0.00..104.50 rows=5000 width=28) (actual time=0.035..16.577 rows=5000 loops=1)
             Filter: ((name)::text = 'esc_url'::text)
       ->  Bitmap Heap Scan on url_lists4 u  (cost=22.55..61.68 rows=2500 width=57) (actual time=350.625..350.626 rows=1 loops=5000)
             Recheck Cond: (url ~~ k.url)
             Rows Removed by Index Recheck: 0
             Heap Blocks: exact=159
             ->  Bitmap Index Scan on ix_url_url_lists4  (cost=0.00..21.92 rows=2500 width=0) (actual time=350.618..350.618 rows=1 loops=5000)
                   Index Cond: (url ~~ k.url)
     Planning time: 0.169 ms
     Execution time: 1753165.329 ms
    (11 rows)



2015年12月29日(火) 2:34 Tom Lane <tgl@sss.pgh.pa.us>:
Hiroyuki Sato <hiroysato@gmail.com> writes:
> I re-created index with pg_trgm.
> Execution time is 210sec.
> Yes It is faster than btree index. But still slow.
> It is possible to improve this query speed?
> Should I use another query or idex?

Did you try a GIN index?

                        regards, tom lane

pgsql-general by date:

Previous
From: Michael Rasmussen
Date:
Subject: Re: plpgsql multidimensional array assignment results in array of text instead of subarrays
Next
From: Hiroyuki Sato
Date:
Subject: Re: grep -f keyword data query