Table Design for Hierarchical Data - Mailing list pgsql-sql
From | Lee Hachadoorian |
---|---|
Subject | Table Design for Hierarchical Data |
Date | |
Msg-id | n2i5ab13581004061033j4ed25734xd0504cb95a1fc092@mail.gmail.com Whole thread Raw |
Responses |
Re: Table Design for Hierarchical Data
(Michael Glaesemann <grzm@seespotcode.net>)
Re: Table Design for Hierarchical Data (Steve Crawford <scrawford@pinpointresearch.com>) Re: Table Design for Hierarchical Data (silly sad <sad@bankir.ru>) Re: Table Design for Hierarchical Data (silly sad <sad@bankir.ru>) Re: Table Design for Hierarchical Data (Achilleas Mantzios <achill@matrix.gatewaynet.com>) Re: Table Design for Hierarchical Data (Sergey Konoplev <gray.ru@gmail.com>) |
List | pgsql-sql |
Please point me to another listserv or forum if this question is more appropriately addressed elsewhere.<br /><br />I amtrying to come up with a structure to store employment data by NAICS (North American Industrial Classification System).The data uses a hierarchical encoding scheme ranging between 2 and 5 digits. That is, each 2-digit code includesall industries beginning with the same two digits. 61 includes 611 which includes 6111, 6112, 6113, etc. A portionof the hierarchy is shown after the sig.<br /><br />A standard way to store hierarchical data is the adjacency listmodel, where each node's parent appears as an attribute (table column). So 6111 would list 611 as its parent. Since NAICSuses a hierarchical encoding scheme, the node's name is the same as the node's id, and the parent can always be derivedfrom the node's id. Storing the parent id separately would seem to violate a normal form (because of the redundancy).<br/><br />One way to store this data would be to store at the most granular level (5-digit NAICS) and then aggregateup if I wanted employment at the 4-, 3-, or 2-digit level. The problem is that because of nondisclosure rules, thedata is sometimes censored at the more specific level. I might, for example, have data for 6114, but not 61141, 61142,61143. For a different branch of the tree, I might have data at the 5-digit level while for yet another branch I mighthave data only to the 3-digit level (not 4 or 5). I think that means I have to store all data at multiple levels, evenif some of the higher-level data could be reconstructed from other, lower-level data.<br /><br />Specifically I'd liketo know if this should be a single table or should there be a separate table for each level of the hierarchy (four inall)? If one table, should the digits be broken into separate columns? Should parent ids be stored in each node?<br /><br/>More generally, what questions should I be asking to help decide what structure makes the most sense? Are there anywebsites, forums, or books that cover this kind of problem?<br /><br />Regards,<br />--Lee<br /><br />-- <br />Lee Hachadoorian<br/> PhD Student, Geography<br />Program in Earth & Environmental Sciences<br />CUNY Graduate Center<br/><br />A Portion of the NAICS scheme<br /><br />61 Educational Services<br /> 611 Educational Services<br/> 6111 Elementary and Secondary Schools<br /> 61111 Elementary and Secondary Schools<br /> 6112 JuniorColleges<br /> 61121 Junior Colleges<br /> 6113 Colleges, Universities, and Professional Schools<br /> 61131 Colleges, Universities, and Professional Schools<br /> 6114 Business Schools and Computer and Management Training<br/> 61141 Business and Secretarial Schools<br /> 61142 Computer Training<br /> 61143 Professional andManagement Development Training<br /> etc…<br /><br />