UNIQUE constraint on character sequences - Mailing list pgsql-general

From InterRob
Subject UNIQUE constraint on character sequences
Date
Msg-id BANLkTikmZJ0-P6f5ax-b_4mzCLmDSS36Pg@mail.gmail.com
Whole thread Raw
Responses Re: UNIQUE constraint on character sequences  (David Johnston <polobo@yahoo.com>)
List pgsql-general
Dear list,

I would be pleased if you could share some thoughts with me on the following: say I wish to maintain a table with all distinct character sequences (variable length) showing series with strong similarities. Example:
"abbbabacccdef"
"abbbabaccdcdf"
"abbbabaccdcgf"
...
"qwtrhdffdd"
...
"qwtrhdffdds"
...
"qwtrhdffddsspp"
"qwtrhdffddsspf"
"qwtrhdffddssph"
"qwtrhdffddsspL"
etc.

Think of them as ordered values (array-like), as a set having many values in common, in the same "elements" (that is: positions; in my application, each position has some particular meaning -- the sequence represents a set of particular settings)

CREATE TABLE textseqs(txtseq TEXT)

What would be an efficient approach in enforcing a UNIQUE constraint?

I was thinking of using hashbuckets in a b-tree:
CREATE UNIQUE INDEX ON textseqs USING BTREE( hashtext(txtseq), txtseq )

This index would "cache" hashes for each row. Upon inserting of a new row, traversing the index involves the comparison of two single integers for each node in the b-tree, until the actual hash value (if it exists) was reached. Then, only within that bucket (the hashes won't be unique), the more expensive string comparing is required; involving a sequencial comparison of (potentially maaaany) characters. Yet, within that bucket, the character series may be expected to show stronger differences than a plain sorted list of all values in the table would have, indexed by a b-tree index. Wouldn't traversing such a plain (non-composite, single column, on: "txtseq") b-tree index involve a sequencial comparison of (potentially many) characters *at EACH NODE* of the tree ? Or am I mistaken that each node is filled with actual values from the txtseq column?

Thank you for your input!

Cheers,
Rob

pgsql-general by date:

Previous
From: Daron Ryan
Date:
Subject: Duplicated Fields
Next
From: David W Noon
Date:
Subject: Re: Duplicated Fields