Re: how to optimize my c-extension functions - Mailing list pgsql-general

From Pierre-Frédéric Caillaud
Subject Re: how to optimize my c-extension functions
Date
Msg-id opska7qowgcq72hf@musicbox
Whole thread Raw
In response to how to optimize my c-extension functions  (TJ O'Donnell <tjo@acm.org>)
Responses Re: how to optimize my c-extension functions
Re: how to optimize my c-extension functions
List pgsql-general
    I gather your program uses two steps, let's call them :
    - parse( smiles ) -> data
    - search( data ) -> result

    You can create a functional index on your smiles column, but I don't know
if this will help you ; you can do things like CREATE INDEX ... ON
mytable( lower( myfield )), then SELECT ... FROM mytable WHERE
lower(myfield) = something, BUT in your case I gather your search function
which processes the parsed data does a lot more than just dome character
match, so creating a functional index on parse(smile) would be useless for
selecting on search(parse(smile))...

    So, in any case, if the parsing phase is slow, you can store the
preparsed data in a text or binary field and search it directly, but this
will not be indexed.

    If you can map a subset of your searchable properties to simple
datatypes, you could do a first search for related matches, as you said.

    You say nothing about how your system works internally, whta kind of
representation is it and what kind of searches do you actually do ?


On Sat, 08 Jan 2005 15:50:06 -0800, TJ O'Donnell <tjo@acm.org> wrote:

> I've written a c-language extension to postgresql to implement a
> chemical search of a varchar column (named smiles, typically).
> It might be called as:
>   oe_matches(smiles,'COCC') where 'COCC' is a typical search string.
> This uses 3rd party functions to parse the varchar inputs into c++
> objects.  In order to optimize this search, I'd like to parse the whole
> table containing smiles just once, store it and use the parsed data
> instead of the varchar smiles.
>
> So, I could create another column, say smiles_ob and have the
> oe_matches function take that column.  I'd have to be sure the smiles_ob
> column was related (by a trigger?) to the smiles column.
> But I was thinking I might be able to hide these parsed objects from the
> user by somehow incoporating the parsed objects into a type of index.
> I'd like also to use additional columns (like molecular formula) in
> the match function to "triage" the table to quickly rule out impossible
> matches, before doing a full search/match.
>
> Am I way off the track here?  Is it a bad idea to clutter the index
> with things like this?  Is it possible?   Is there another
> approach that could hide some of these details from the user - meaning
> they would not have to create/update these additional columns?
>
> Thanks,
> TJ O'Donnell
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>       joining column's datatypes do not match
>



pgsql-general by date:

Previous
From: mstory@uchicago.edu
Date:
Subject: PYTHON, ODBC
Next
From: Pierre-Frédéric Caillaud
Date:
Subject: Re: PYTHON, ODBC