Re: genomic locus - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: genomic locus
Date
Msg-id 8995e58b-80a9-7f8a-f552-a12d77550a74@sigaev.ru
Whole thread Raw
In response to Re: genomic locus  (Gene Selkov <selkovjr@gmail.com>)
List pgsql-hackers

> I think I can wrangle this type into GiST just by tweaking consistent(), 
> union(), and picksplit(), if I manage to express my needs in C without breaking 
> too many things. My first attempt segfaulted.
Actually, consistent() can determ actual query data type by strategy number. See 
examples in ltree, intarray


> If all goes to plan, I will end up with an index tree partitioned by contig at 
> the top level and geometrically down from there. That will be as close as I can 
> get to an array of config-specific indices, without having to store data in 
> separate tables.
> 
> What do you think of that?

I have some doubt that you can distinguish root page, but it's possible to 
distinguish leaf pages, intarray and tsearch do that.

Reading your plan, I found an idea for GIN: key for GIN is a pair of (contig, 
one genome position). So, any search for interset operation with be actually a 
range search from (contig, start) to (contig, end)


> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> I have a low-level technical question. Because I can’t anticipate the maximum 
> length of contig names (and do not want to waste space), I have made the new 
> locus type a varlena, like this:
> 
> #include "utils/varlena.h"
> 
> typedef struct LOCUS
> {
>    int32 l_len_; /* varlena header (do not touch directly!) */
>    int32 start;
>    int32 end;
>    char  contig[FLEXIBLE_ARRAY_MEMBER];
> } LOCUS;
> 
> #define LOCUS_SIZE(str) (offsetof(LOCUS, contig) + sizeof(str))
sizeof? or strlen ?

> 
> That flexible array member messes with me every time I need to copy it while 
> deriving a new locus object from an existing one (or from a pair). What I ended 
> up doing is this:
> 
>    LOCUS  *l = PG_GETARG_LOCUS_P(0);
>    LOCUS  *new_locus;
>    char   *contig;
>    int    size;
>    new_locus = (LOCUS *) palloc0(sizeof(*new_locus));
>    contig = pstrdup(l->contig); // need this to determine the length of contig 
l->contig should be null-terminated for pstrdup, but if so, you don't need to 
pstrdup() it - you could use l->contig directly below. BTW, LOCUS_SIZE should 
add 1 byte for '\0' character in this case.


> name at runtime
>    size = LOCUS_SIZE(contig);
>    SET_VARSIZE(new_locus, size);
>    strcpy(new_locus->contig, contig);
> 
> Is there a more direct way to clone a varlena structure (possibly assigning an 
> differently-sized contig to it)? One that is also memory-safe?
Store length of contig in LOCUS struct.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


pgsql-hackers by date:

Previous
From: Aleksandr Parfenov
Date:
Subject: Re: [HACKERS] Flexible configuration for full-text search
Next
From: Benyamin Guedj
Date:
Subject: How to Works with Centos