Thread: Huge table searching optimization
Hi, I have table with just on column named url (it's not real url, just random string for testing purposes), type text. I have lots of entries in it (it's dynamic, i add and remove them on the fly), 100 000 and more. I've created index on this table to optimize "searching". I just want to test if some "url" is in in the table, so i am using this request: select url from test2 where url ~* '^URLVALUE\\s*$'; there's \\s* because of padding. Here is the analyze: postgres=# explain analyze select url from test2 where url ~* '^zyxel\\s*$'; WARNING: nonstandard use of \\ in a string literal LINE 1: ...plain analyze select url from test2 where url ~* '^zyxel\\s... ^ HINT: Use the escape string syntax for backslashes, e.g., E'\\'. QUERY PLAN ------------------------------------------------------------------------------------------------------- Seq Scan on test2 (cost=0.00..1726.00 rows=10 width=9) (actual time=156.489..156.502 rows=1 loops=1) Filter: (url ~* '^zyxel\\s*$'::text) Total runtime: 156.538 ms (3 rows) It takes 156 ms, it's too much for my purposes, so i want to decrease it. So what can I use for optimizing this request? Again, I just want to test if "url" ("zyxel" in this examlpe) is in the table. Some info: version(): PostgreSQL 8.4.2 on i486-slackware-linux-gnu, compiled by GCC gcc (GCC) 4.3.3, 32-bit Ram: 500 MB CPU: 2.6 Ghz (it's kvm virtualized, i don't know exact type, it's one core cpu) Thank you.
On Mon, Apr 05, 2010 at 04:28:35PM +0200, Oliver Kindernay wrote: > Hi, I have table with just on column named url (it's not real url, > just random string for testing purposes), type text. I have lots of > entries in it (it's dynamic, i add and remove them on the fly), 100 > 000 and more. I've created index on this table to optimize > "searching". I just want to test if some "url" is in in the table, so > i am using this request: > > select url from test2 where url ~* '^URLVALUE\\s*$'; > > there's \\s* because of padding. Here is the analyze: > > postgres=# explain analyze select url from test2 where url ~* '^zyxel\\s*$'; > WARNING: nonstandard use of \\ in a string literal > LINE 1: ...plain analyze select url from test2 where url ~* '^zyxel\\s... > ^ > HINT: Use the escape string syntax for backslashes, e.g., E'\\'. > QUERY PLAN > ------------------------------------------------------------------------------------------------------- > Seq Scan on test2 (cost=0.00..1726.00 rows=10 width=9) (actual > time=156.489..156.502 rows=1 loops=1) > Filter: (url ~* '^zyxel\\s*$'::text) > Total runtime: 156.538 ms > (3 rows) > It takes 156 ms, it's too much for my purposes, so i want to decrease > it. So what can I use for optimizing this request? Again, I just want > to test if "url" ("zyxel" in this examlpe) is in the table. add trigger to remove spaces from end of string on insert and update, and then use normal = operator. Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Hi, On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote: > Hi, I have table with just on column named url (it's not real url, > just random string for testing purposes), type text. I have lots of > entries in it (it's dynamic, i add and remove them on the fly), 100 > 000 and more. I've created index on this table to optimize > "searching". I just want to test if some "url" is in in the table, so > i am using this request: > > select url from test2 where url ~* '^URLVALUE\\s*$'; > > there's \\s* because of padding. Here is the analyze: > > postgres=# explain analyze select url from test2 where url ~* > '^zyxel\\s*$'; WARNING: nonstandard use of \\ in a string literal > LINE 1: ...plain analyze select url from test2 where url ~* '^zyxel\\s... > ^ > HINT: Use the escape string syntax for backslashes, e.g., E'\\'. > QUERY PLAN > --------------------------------------------------------------------------- > ---------------------------- Seq Scan on test2 (cost=0.00..1726.00 rows=10 > width=9) (actual > time=156.489..156.502 rows=1 loops=1) > Filter: (url ~* '^zyxel\\s*$'::text) > Total runtime: 156.538 ms > (3 rows) > > It takes 156 ms, it's too much for my purposes, so i want to decrease > it. So what can I use for optimizing this request? Again, I just want > to test if "url" ("zyxel" in this examlpe) is in the table. > Depending on your locale it might be sensible to create a text_pattern_ops index - see the following link: http://www.postgresql.org/docs/current/static/indexes-opclass.html Like suggested by depesz it would be far better to remove the padding and do exact lookups though. Andres
Andres Freund <andres@anarazel.de> writes: > On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote: >> i am using this request: >> select url from test2 where url ~* '^URLVALUE\\s*$'; > Depending on your locale it might be sensible to create a text_pattern_ops > index - see the following link: > http://www.postgresql.org/docs/current/static/indexes-opclass.html text_pattern_ops won't help for a case-insensitive search. The best bet here would be to index on a case-folded, blank-removed version of the url, viz create index ... on (normalize(url)) select ... where normalize(url) = normalize('URLVALUE') where normalize() is a suitably defined function. Or if it's okay to only store the normalized form of the string, you could simplify that a bit. regards, tom lane
Thanks to all, now it is 0.061 ms :) 2010/4/5 Tom Lane <tgl@sss.pgh.pa.us>: > Andres Freund <andres@anarazel.de> writes: >> On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote: >>> i am using this request: >>> select url from test2 where url ~* '^URLVALUE\\s*$'; > >> Depending on your locale it might be sensible to create a text_pattern_ops >> index - see the following link: >> http://www.postgresql.org/docs/current/static/indexes-opclass.html > > text_pattern_ops won't help for a case-insensitive search. The best bet > here would be to index on a case-folded, blank-removed version of the > url, viz > > create index ... on (normalize(url)) > > select ... where normalize(url) = normalize('URLVALUE') > > where normalize() is a suitably defined function. > > Or if it's okay to only store the normalized form of the string, > you could simplify that a bit. > > regards, tom lane >