Home > mailing lists

Re: UNIQUE constraint on character sequences - Mailing list pgsql-general

From	David Johnston
Subject	Re: UNIQUE constraint on character sequences
Date	May 14, 2011 14:11:09
Msg-id	1BCCE44F-9D72-49C6-825C-07799136B56E@yahoo.com Whole thread Raw
In response to	UNIQUE constraint on character sequences (InterRob <rob.marjot@gmail.com>)
List	pgsql-general

Tree view

Could you index the reverse of the string so the unique part appears first?

On May 14, 2011, at 11:20, InterRob <rob.marjot@gmail.com> wrote:

> Dear list,
>
> I would be pleased if you could share some thoughts with me on the following: say I wish to maintain a table with all
distinctcharacter sequences (variable length) showing series with strong similarities. Example: 
> "abbbabacccdef"
> "abbbabaccdcdf"
> "abbbabaccdcgf"
> ...
> "qwtrhdffdd"
> ...
> "qwtrhdffdds"
> ...
> "qwtrhdffddsspp"
> "qwtrhdffddsspf"
> "qwtrhdffddssph"
> "qwtrhdffddsspL"
> etc.
>
> Think of them as ordered values (array-like), as a set having many values in common, in the same "elements" (that is:
positions;in my application, each position has some particular meaning -- the sequence represents a set of particular
settings)
>
> CREATE TABLE textseqs(txtseq TEXT)
>
> What would be an efficient approach in enforcing a UNIQUE constraint?
>
> I was thinking of using hashbuckets in a b-tree:
> CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )
>
> This index would "cache" hashes for each row. Upon inserting of a new row, traversing the index involves the
comparisonof two single integers for each node in the b-tree, until the actual hash value (if it exists) was reached.
Then,only within that bucket (the hashes won't be unique), the more expensive string comparing is required; involving a
sequencialcomparison of (potentially maaaany) characters. Yet, within that bucket, the character series may be expected
toshow stronger differences than a plain sorted list of all values in the table would have, indexed by a b-tree index.
Wouldn'ttraversing such a plain (non-composite, single column, on: "txtseq") b-tree index involve a sequencial
comparisonof (potentially many) characters *at EACH NODE* of the tree ? Or am I mistaken that each node is filled with
actualvalues from the txtseq column? 
>
> Thank you for your input!
>
> Cheers,
> Rob

pgsql-general by date:

From: David W Noon
Date: 14 May 2011, 12:43:09
Subject: Re: Duplicated Fields

From: InterRob
Date: 14 May 2011, 15:25:22
Subject: Re: UNIQUE constraint on KEY VALUE PAIRS

Re: UNIQUE constraint on character sequences - Mailing list pgsql-general

Previous

Next