On Fri, 19 Nov 2004 14:38:20 +0200, Hannu Krosing wrote:
>> Part of my current code concerns packing DNA characters: As the alphabet
>> of DNA strings is very small (four characters), it seems like a
>> straigt-forward optimization to store each character in two bits.
>
> My advice would be to get it to work first, oprimize later.
Valid point. However, I needed something rather basic to work on, to get
to know C and to get to know PostgreSQL in a user defined type context.
But if packing proves to be a problem when implementing the interesting
stuff, then thanks&yes: Packing should be an afterthought.
>> My first and most immediate goal is to support efficient answering of a
>> question like "which rows contain the sequence TTGACCACTTG in column foo?".
>
> If you store your sequences as strings, you may try to use trigrams (or
> modify them to 4,5,6 or 7-grams ;) to get some feel how that works.
>
> trigram module is in contrib/pg_trgm.
(/me Printing readme.) Thanks.
--
Greetings from Troels Arvin, Copenhagen, Denmark