Re: looking for a faster way to do that - Mailing list pgsql-general

From Alban Hertroys
Subject Re: looking for a faster way to do that
Date
Msg-id 2AB2CD21-A84B-4E7C-8C93-F2C4749EB4CF@gmail.com
Whole thread Raw
In response to looking for a faster way to do that  (hamann.w@t-online.de)
List pgsql-general
You forgot to include the list ;)

On 26 Sep 2011, at 6:06, hamann.w@t-online.de wrote:

> Alban Hertroys wrote:
>>>
>>> To me it sounds a little bit like you're comparing every item in a =
>>> warehouse to a set of descriptions to see what type of item it is, which =
>>> is something you would be much better off storing as a property of the =
>>> item. If an item is a fruit, store that it's a fruit!
>>> But I'm guessing at what you're trying to accomplish, so here's a few =
>>> other options...
>>>
>>> I guess you could create 2781 expression indexes to do what you want (is =
>>> there a limit that prevents this?). Query planning would probably become =
>>> kind of slow and the indices will take up a considerable fraction of the =
>>> total table storage required - that's a pretty outlandish approach.
>>>
>>> CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string1'));
>>> CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2'));
>>> ...
>>> CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2781'));
>
> Hi,
>
> the strings are not really known before.
> Let me explain the scenario; there is one table about products, and code is the
> manufacturer's (or resellers') product id.
> So, if ABC were a maker of laptops, ABC123 and ABC456 might be two different machines,
> and ABC123G might have G3 mobile installed, or ABC123X might be the same thing
> with extra memory. Obviously these device variants all look the same.
> Now reseller sends us a collection of product images, so there would be ABC123.jpg
> and ABC456.jpg
> The database task at hand is matching products to images (and then inserting the image
> name into a column of the products table).


I guessed right then. The origin of your problem is that you have similar items in your database, but the database
doesn'tknow they are similar. I'd fix that first, it makes the problem a whole lot easier to handle. 

For example, if an image comes in named ABC123G.jpg, you look up the product and manufacturer and update its image.
Thenyou query for products of the same manufacturer that are similar to ABC123G (result: ABC123 and ABC123X) and update
theirimages as well (if appropriate; perhaps they have a recent enough image of their own?). 


As another whacky alternative to your regular expressions; I think it would be possible to abuse the text-search
functionalityin Postgres to match product id's. Those id's are basically a language per manufacturer describing product
details.

If you can split the product id's up into lexemes that describe the base product id and it's options, then you can use
full-textsearch to match up expressions similar to the lexemes derived from the image name. 

For example:
 productid | lexemes
-----------+------------------
 ABC123    | {'ABC' '123'}
 ABC123G   | {'ABC' '123' 'G'}
 ABC123X   | {'ABC' '123' 'X'}
 ABC456    | {'ABC' '456'}

I'm not really sure if that's possible, or how much work it would be per manufacturer - I haven't used FTS much.

I'd first see if I couldn't add that similarity information to the products table, though ;)

Alban Hertroys

--
The scale of a problem often equals the size of an ego.



pgsql-general by date:

Previous
From: Albretch Mueller
Date:
Subject: Re: (another ;-)) PostgreSQL-derived project ...
Next
From: sunpeng
Date:
Subject: hi, is that the standard:ISO/IEC 13249-6:2006 if I investigate the data mining extension for SQL language?