Thread: Hierarchical numeric data type
Hi, I'm looking for a data type to store numerically labelled hierarchical data, such as section.subsection.paragraph numbers(e.g. '1.5.3', '1.10.2', '6.30'). The closest data type that I have found is ltree. However, the collation order is inappropriate: it would put '1.10.2' before'1.5.3', since it performs a naïve memcmp() at each level.[1] One way to get the desired sort order would be to use the semver extension. However, that's not really appropriate, as Idon't want to store version numbers, and my data do not fit server's mandatory X.Y.Z three-level scheme. Of course, I could define a hierarchy-of-integers data type and implement my own comparison functions. I'm reluctant tocause a proliferation of data types, though, as ltree is semantically the type I want. I'm just unhappy with its sortorder. Therefore, I would like to suggest that ltree be modified to use a smart comparator that recognizes numbers within stringsand sorts them in a human-friendly way. Apple[2] and recent versions of Windows[3] handle filenames this way. Onesample implementation of such a comparator is natsort.[4] The performance impact of the enhanced comparator would probably be negligible, compared to I/O bottlenecks. A bigger issuewould be backwards compatibility, especially for ltrees with existing btree indexes. Feedback? Suggestions? Derek [1]: http://doxygen.postgresql.org/ltree__op_8c.html#a635600ad7aad78addf3c14a6e2d67fed [2]: https://developer.apple.com/LIBRARY/IOS/#documentation/FileManagement/Conceptual/FileSystemProgrammingGUide/FileSystemDetails/FileSystemDetails.html [3]: http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html [4]: http://sourcefrog.net/projects/natsort/
On Tue, Aug 6, 2013 at 2:36 PM, Derek Poon <derekp+pgsql@ece.ubc.ca> wrote: > The performance impact of the enhanced comparator would probably be negligible, compared to I/O bottlenecks. A biggerissue would be backwards compatibility, especially for ltrees with existing btree indexes. > > Feedback? Suggestions? Use integer arrays. It works just like you need select array_to_string(c, '.') from (values (array[1,10,2]), (array[1,5,3])) as sq(c) order by c; array_to_string ----------------- 1.5.3 1.10.2 and it is pretty fast when indexed. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
This depends on exactly what you need. In the end usually you are going to want to convert things into integer arrays.
In LedgerSMB we use integers and foreign keys to handle hierarchies, and then convert them to int arrays via WITH RECURSIVE CTE's and text strings .
In LedgerSMB we use integers and foreign keys to handle hierarchies, and then convert them to int arrays via WITH RECURSIVE CTE's and text strings .
This is one of those things where hierarchical data design can depend heavily on the use case.
--
Best Wishes,
Chris Travers
Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in.