Home > mailing lists

Re: grep -f keyword data query - Mailing list pgsql-general

From	Hiroyuki Sato
Subject	Re: grep -f keyword data query
Date	December 28, 2015 05:01:14
Msg-id	CA+Tq-RqhjYYMopJpcCBDyS6ijvXj_GOVmFX5nijmOkM-V__PVg@mail.gmail.com Whole thread Raw
In response to	Re: grep -f keyword data query (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: grep -f keyword data query (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

Hello Tom.

Thank you for replying.

I re-created index with pg_trgm.

Execution time is 210sec.

Yes It is faster than btree index. But still slow.

It is possible to improve this query speed?

Should I use another query or idex?

Best regards.

1, query

SELECT

u.url

FROM

url_lists4 u,

keywords4 k

WHERE

u.url like k.url

AND

k.name = 'esc_url';

2, create table (gist grgm_ops)

drop table if exists url_lists4;

create table url_lists4 (

id int not null primary key,

url text not null

);

--create index ix_url_url_lists4 on url_lists4(url);

-- change index

create index ix_url_url_lists4 on url_lists4 using gist(url gist_trgm_ops);

drop table if exists keywords4;

create table keywords4 (

id int not null primary key,

name varchar(40) not null,

url text not null

);

create index ix_url_keywords4 on keywords4(url);

create index ix_name_keywords4 on keywords4(name);

\copy url_lists4(id,url) from 'sample.txt' with delimiter ',';

\copy keywords4(id,name,url) from 'keyword.txt' with delimiter ',';

vacuum url_lists4;

vacuum keywords4;

analyze url_lists4;

analyze keywords4;

3, EXPLAIN

QUERY PLAN

-------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..432684.50 rows=12500000 width=57)

-> Seq Scan on keywords4 k (cost=0.00..104.50 rows=5000 width=28)

Filter: ((name)::text = 'esc_url'::text)

-> Index Scan using ix_url_url_lists4 on url_lists4 u (cost=0.41..61.52 rows=2500 width=57)

Index Cond: (url ~~ k.url)

(5 rows)

4, EXPLAIN ANALYZE

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.41..432684.50 rows=12500000 width=57) (actual time=1048.428..210641.655 rows=4850 loops=1)

-> Seq Scan on keywords4 k (cost=0.00..104.50 rows=5000 width=28) (actual time=0.033..28.878 rows=5000 loops=1)

Filter: ((name)::text = 'esc_url'::text)

-> Index Scan using ix_url_url_lists4 on url_lists4 u (cost=0.41..61.52 rows=2500 width=57) (actual time=41.426..42.115 rows=1 loops=5000)

Index Cond: (url ~~ k.url)

Rows Removed by Index Recheck: 0

Planning time: 0.138 ms

Execution time: 210642.896 ms

(8 rows)

2015年12月28日(月) 9:36 Tom Lane <tgl@sss.pgh.pa.us>:

Hiroyuki Sato <hiroysato@gmail.com> writes:
> Sorry, I re-created my questions. I was mis-pasted query log on previous
> question.
> (@~ operator is PGroonga extension (http://pgroonga.github.io))
> [ "like" is actually the operator of interest ]

Ah. You might get some good results with trigram indexes (ie,
contrib/pg_trgm and a GIST or GIN index). See

http://www.postgresql.org/docs/9.4/static/pgtrgm.html

regards, tom lane

pgsql-general by date:

From: Tom Lane
Date: 28 December 2015, 03:36:56
Subject: Re: grep -f keyword data query

From: Andreas Kretschmer
Date: 28 December 2015, 14:23:22
Subject: Re: Calling function (table_name, schema_name) within event trigger

Re: grep -f keyword data query - Mailing list pgsql-general

Previous

Next