Home > mailing lists

Re: how to optimize my c-extension functions - Mailing list pgsql-general

From	Pierre-Frédéric Caillaud
Subject	Re: how to optimize my c-extension functions
Date	January 10, 2005 20:44:32
Msg-id	opskemlbpjcq72hf@musicbox Whole thread
In response to	Re: how to optimize my c-extension functions (TJ O'Donnell <tjo@acm.org>)
List	pgsql-general

Tree view

    That's not what I meant...

    I meant, what does 'c1ccccc1C(=O)N' means ?
    If the search operation is too slow, you can narrow it using standard
postgres tools and then hand it down to your C functions. Let me explain,
I have no clue about this 'c1ccccc1C(=O)N' syntax, but I'll suppose you
will be searching for things like :

    1- molecule has N atoms of (whatever) element
    2- molecule has N single or double or triple covalent bonds
    3- molecule has such and such property

    Then, if you can understand the 'c1ccccc1C(=O)N' string and say that all
molecules that satisfy it will satisfy, for instance condition 2 above,
then you can have some fast searchable attributes in your database that
will mark all molecules satisfying condition 2, and you'll only need to
run the C search function on these to get the real matches.
    The idea is basically to narrow down the search to avoid calling the
expensive operator on all rows.

    If A and B and strings like your 'c1ccccc1C(=O)N', then if all molecules
satsfying B also satisfy A (thus B=>A or "B c A", B is contained in A in
set notation), if you can very quickly (with an index) grab the molecules
that satisfy A, and these are a significantly smaller number than the
whole set, then you'll speed your search a lot.
    If you can find some more A's, so that B c A1, B c A2, B c A3, then B c
(intersection of A1, A2, A3) which maps neatly to the gist index on an
integer array.
    So you could have a set of basic conditions, maybe a hundred or so, which
would be all tested on the search string to see which will apply to the
molecules this search string would find, then you translate this into a
GiST query.

    Are my explications making it clearer or just more obfuscated ?



> The only type of search will be of the type:
>
> Select smiles,id from structure where
> oe_matches(smiles,'c1ccccc1C(=O)N');
>
> or joins with other tables e.g.

pgsql-general by date:

From: Michael Fuhr
Date: 10 January 2005, 20:30:57
Subject: Re: does "select count(*) from mytable" always do a seq

From: "Jim C. Nasby"
Date: 10 January 2005, 21:05:34
Subject: Re: handing created and updated fields

Re: how to optimize my c-extension functions - Mailing list pgsql-general

Previous

Next