Re: looking for a faster way to do that - Mailing list pgsql-general
From | hamann.w@t-online.de |
---|---|
Subject | Re: looking for a faster way to do that |
Date | |
Msg-id | 4E7C392F.mailGNP11KAXQ@amadeus3.local Whole thread Raw |
In response to | looking for a faster way to do that (hamann.w@t-online.de) |
Responses |
Re: looking for a faster way to do that
Re: looking for a faster way to do that Re: looking for a faster way to do that |
List | pgsql-general |
Alban Hertroys <haramrae@gmail.com> wrote: >> What is the output of explain? >> >> You say 'the other table', so presumably we're dealing with a foreign key >> here. Is there an index on that column? Albe Laurenz wrote: >> Is the index used for "where code ~ '^ABC3563'"? >> >> If not, then the result is fast only because the table is scanned only once, >> and it's just the factor of 3000 that's killing you. >> >> The second query (where code ~ wantcode) can never use an index because >> the pattern "wantcode" is unknown at query planning time. >> >> Yours, >> Laurenz Albe Here I created a subset (just number and code matching a certain prefix) \d items Table "pg_temp_1.items" Column | Type | Modifiers --------+-----------------------+----------- num | integer | code | character varying(40) | create index itemsc on items (code); select count(*) from items; count ------- 9614 A single anchored query select * from items where code ~ '^ABC'; does indeed use the index to retrieve data. Next I copied a file of wanted codes create temp table n (wantcode text); \copy n from /tmp/rmartin.tmp the file contains plain names, i.e. unanchored matches explain analyze select num, n.wantcode from items, n where items.code ~ n.wantcode; Nested Loop (cost=20.00..216502.14 rows=48070 width=36) (actual time=148.479..336280.488 rows=2871 loops=1) Join Filter: (("outer".code)::text ~ "inner".wantcode) -> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.048..38.666 rows=9614 loops=1) -> Materialize (cost=20.00..30.00 rows=1000 width=32) (actual time=0.001..1.049 rows=815 loops=9614) -> Seq Scan on n (cost=0.00..20.00 rows=1000 width=32) (actual time=0.003..1.839 rows=815 loops=1) Total runtime: 336286.692 ms An exact match "where items.code = n.wantcode" on the same data completes in 40 ms BTW: indexing the second table does not affect the query plan or the runtime, it just shows actual row count rather than estimate. This is, of course, bad; an anchored match could be faster and also is more appropriate to the scenario. So I change the contents of the second table update n set wantcode = textcat('^', wantcode); and try again, with similar results Nested Loop (cost=14.15..176478.01 rows=39178 width=36) (actual time=125.114..308831.697 rows=2871 loops=1) Join Filter: (("outer".code)::text ~ "inner".wantcode) -> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.061..2034.572 rows=9614 loops=1) -> Materialize (cost=14.15..22.30 rows=815 width=32) (actual time=0.001..1.095 rows=815 loops=9614) -> Seq Scan on n (cost=0.00..14.15 rows=815 width=32) (actual time=0.114..1.893 rows=815 loops=1) Total runtime: 308837.746 ms I am aware that this is unlikely to work fast (the planner would perhaps need a hint in the query rather than in the data column to choose an anchored match algorithm (in case there is such an algo, of course) So I wonder whether there might be a different approach to this problem rather than pattern matching. I recall I had a similar problem before with a "contacts" column possibly containing one or more email addresses. Here searches would also be number of people times number of requests performance. I finally ended up with a @@ match (contrib/tsquery) and a supporting GIST index, but that only supports exact match, not prefix Regards Wolfgang Hamann
pgsql-general by date: