Thread: Hierarchical numeric data type

Hierarchical numeric data type

From
Derek Poon
Date:
Hi,

I'm looking for a data type to store numerically labelled hierarchical data, such as section.subsection.paragraph
numbers(e.g. '1.5.3', '1.10.2', '6.30'). 

The closest data type that I have found is ltree.  However, the collation order is inappropriate: it would put '1.10.2'
before'1.5.3', since it performs a naïve memcmp() at each level.[1] 

One way to get the desired sort order would be to use the semver extension.  However, that's not really appropriate, as
Idon't want to store version numbers, and my data do not fit server's mandatory X.Y.Z three-level scheme. 

Of course, I could define a hierarchy-of-integers data type and implement my own comparison functions.  I'm reluctant
tocause a proliferation of data types, though, as ltree is semantically the type I want.  I'm just unhappy with its
sortorder. 

Therefore, I would like to suggest that ltree be modified to use a smart comparator that recognizes numbers within
stringsand sorts them in a human-friendly way.  Apple[2] and recent versions of Windows[3] handle filenames this way.
Onesample implementation of such a comparator is natsort.[4] 

The performance impact of the enhanced comparator would probably be negligible, compared to I/O bottlenecks.  A bigger
issuewould be backwards compatibility, especially for ltrees with existing btree indexes. 

Feedback?  Suggestions?

Derek


[1]: http://doxygen.postgresql.org/ltree__op_8c.html#a635600ad7aad78addf3c14a6e2d67fed

[2]:
https://developer.apple.com/LIBRARY/IOS/#documentation/FileManagement/Conceptual/FileSystemProgrammingGUide/FileSystemDetails/FileSystemDetails.html

[3]: http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html

[4]: http://sourcefrog.net/projects/natsort/


Re: Hierarchical numeric data type

From
Sergey Konoplev
Date:
On Tue, Aug 6, 2013 at 2:36 PM, Derek Poon <derekp+pgsql@ece.ubc.ca> wrote:
> The performance impact of the enhanced comparator would probably be negligible, compared to I/O bottlenecks.  A
biggerissue would be backwards compatibility, especially for ltrees with existing btree indexes. 
>
> Feedback?  Suggestions?

Use integer arrays. It works just like you need

select array_to_string(c, '.')
from (values (array[1,10,2]), (array[1,5,3])) as sq(c)
order by c;

 array_to_string
-----------------
 1.5.3
 1.10.2

and it is pretty fast when indexed.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com


Re: Hierarchical numeric data type

From
Chris Travers
Date:
This depends on exactly what you need.  In the end usually you are going to want to convert things into integer arrays.

In LedgerSMB we use integers and foreign keys to handle hierarchies, and then convert them to int arrays via WITH RECURSIVE CTE's and text strings .

This is one of those things where hierarchical data design can depend heavily on the use case.


--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.