Home > mailing lists

Re: looking for a faster way to do that - Mailing list pgsql-general

From	Alban Hertroys
Subject	Re: looking for a faster way to do that
Date	September 23, 2011 07:39:41
Msg-id	CAF-3MvNz1CSwnPh14grK80pQV=bNhTFya_zt0BnQ5s0um38zag@mail.gmail.com Whole thread
In response to	Re: looking for a faster way to do that (hamann.w@t-online.de)
Responses	Re: looking for a faster way to do that
List	pgsql-general

Tree view

On 23 September 2011 09:45, <hamann.w@t-online.de> wrote:

Alban Hertroys <haramrae@gmail.com> wrote:

>> What is the output of explain?
>>
>> You say 'the other table', so presumably we're dealing with a foreign key
>> here. Is there an index on that column?

Albe Laurenz wrote:

>> Is the index used for "where code ~ '^ABC3563'"?
>>
>> If not, then the result is fast only because the table is scanned only once,
>> and it's just the factor of 3000 that's killing you.
>>
>> The second query (where code ~ wantcode) can never use an index because
>> the pattern "wantcode" is unknown at query planning time.
>>
>> Yours,
>> Laurenz Albe

Here I created a subset (just number and code matching a certain prefix)

\d items
Table "pg_temp_1.items"
Column | Type | Modifiers
--------+-----------------------+-----------
num | integer |
code | character varying(40) |
create index itemsc on items (code);

select count(*) from items;
count
-------
9614

A single anchored query
select * from items where code ~ '^ABC';
does indeed use the index to retrieve data.

Next I copied a file of wanted codes

create temp table n (wantcode text);
\copy n from /tmp/rmartin.tmp

the file contains plain names, i.e. unanchored matches

explain analyze select num, n.wantcode from items, n where items.code ~ n.wantcode;
Nested Loop (cost=20.00..216502.14 rows=48070 width=36) (actual time=148.479..336280.488 rows=2871 loops=1)
Join Filter: (("outer".code)::text ~ "inner".wantcode)
-> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.048..38.666 rows=9614 loops=1)
-> Materialize (cost=20.00..30.00 rows=1000 width=32) (actual time=0.001..1.049 rows=815 loops=9614)
-> Seq Scan on n (cost=0.00..20.00 rows=1000 width=32) (actual time=0.003..1.839 rows=815 loops=1)
Total runtime: 336286.692 ms

So you're comparing a variable field value to a variable pattern - yeah, that's going to hurt. There's no way you could index exactly that.

Perhaps there's some way you can transform the problem so that you get something indexable?
For example, if your match patterns follow a certain pattern by themselves, you could add a column with the longest match pattern that would match the string. Then you could just do a query for which records have the match pattern (in that new column) that you're looking for and voila!

If something like that is possible strongly depends on what kind of match patterns you're using, of course.

An exact match "where items.code = n.wantcode" on the same data completes in 40 ms

That's an exact string match, of course that will be fast ;)

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

pgsql-general by date:

From: Gregg Jaskiewicz
Date: 23 September 2011, 07:34:20
Subject: get number and names of processes connected to postgresql

From: Alban Hertroys
Date: 23 September 2011, 07:41:09
Subject: Re: Query performs badly with materialize node

Re: looking for a faster way to do that - Mailing list pgsql-general

Previous

Next