Re: looking for a faster way to do that - Mailing list pgsql-general

From hamann.w@t-online.de
Subject Re: looking for a faster way to do that
Date
Msg-id 4E7C9186.mailHCM11PW54@amadeus3.local
Whole thread Raw
In response to looking for a faster way to do that  (hamann.w@t-online.de)
List pgsql-general
>> >
>> > Hi Alban,
>> >
>> > I already did that - the test set is just all records from the real table=
>>  (about a million
>> > entries) that match the common 'ABC' prefix
>>
>> I think you misunderstood what I wrote. Notice the difference between
>> "which strings match the pattern" and "which records have the match
>> pattern (in that new column)" - the first is a regular expression
>> match (unindexable), while the second is a string equality match
>> (indexable).
>>
>> What I'm suggesting is to add a column, which for the string 'ABCDEFG'
>> would contain 'ABC%'.
>> Data would look like:
>>
>> SELECT str, pattern FROM tbl;
>> =A0str     | pattern
>> ---------+---------
>> =A0ABCDEFG | ABC%
>>  ABCDEF  | ABC%
>>  BCDEFGH | BCD%
>>  etc.
>>
>> (can't format this properly in webmail, sorry)
>>
>> When you look for records that match the pattern 'ABC%', you would
>> normally perform a query like:
>>
>> SELECT str FROM tbl WHERE str LIKE 'ABC%';
>>
>> But with this new column, you would query:
>>
>> SELECT str FROM tbl WHERE pattern =3D 'ABC%';
>>
>> As I said, it depends a lot on your pattern needs whether this
>> solution would work at all for you. If you only ever use a few
>> patterns, it will work. If you use many different patterns or don't
>> know before-hand which patterns will be used, it won't work well at
>> all.
>>
>> > The main difference is: the fast query looks like
>> >
>> > explain select items.num, wantcode from items, n where code =3D wantcode;
>> > =A0Merge Join =A0(cost=3D53.56..1104.02 rows=3D39178 width=3D36)
>> > =A0 Merge Cond: (("outer".code)::text =3D "inner".wantcode)
>> > =A0 -> =A0Index Scan using itemsc on items =A0(cost=3D0.00..438.75 rows=
>> =3D9614 width=3D42)
>> > =A0 -> =A0Sort =A0(cost=3D53.56..55.60 rows=3D815 width=3D32)
>> > =A0 =A0 =A0 =A0 Sort Key: n.wantcode
>> > =A0 =A0 =A0 =A0 -> =A0Seq Scan on n =A0(cost=3D0.00..14.15 rows=3D815 wid=
>> th=3D32)
>>
>> Is there an index on wantcode? If you have a million or more records,
>> I would expect an index scan for a measly 815 matches...
>>
>> > and the slow ones looks like that one:
>> >
>> > =A0Nested Loop =A0(cost=3D14.15..176478.01 rows=3D39178 width=3D36)
>> > =A0 Join Filter: (("outer".code)::text ~ "inner".wantcode)
>> >
>> > So the database takes an entirely differnet approach at retrieving the en=
>> tries.
>>
>> Yes, because you're still using ~ there, with a pattern that's unknown
>> at query planning time. That will only be fast under some fairly rare
>> circumstances.

Hi,

the problem is that I read the patterns from a file, as part of the script. They are not
known seperately. So it seems that creating the extra column is just the same effort as
running the original query. The processing is one-time in nature.
The one thing I can do is selecting a range of items on a common prefix, if all the
codes in the second table have some characters in common

>> Is there an index on wantcode? If you have a million or more records,
>> I would expect an index scan for a measly 815 matches..

When I ran a test, there was no real difference with wantcode indexed or not
It was interesting to try another suggestion and noting the difference between comparison
functions, with identical (lack of) use of an index

Regards
Wolfgang Hamann


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: pg_dump compress
Next
From: Adrian Klaver
Date:
Subject: Re: pg_dump compress