Thread: Summing & Grouping in a Hierarchical Structure

Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Hi all,

I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time.  I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.

Here is the original thread (or one of them):
http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com

Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/


Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information.  A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low.  In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.

Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS.  If anyone sees something I should explain better or in more depth, please let me know.

Regards,
Don
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
"Alexander Gataric"
Date:

I would try a recursive query to determine the category structure and aggregate as you go. I had a similar problem with a hierarchical structure for an organization structure. Another thing you might try is to create a separate CTE for each category and then aggregate the individual CTEs.

 

 

From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Don Parris
Sent: Thursday, February 14, 2013 7:58 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] Summing & Grouping in a Hierarchical Structure

 

Hi all,

I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time.  I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.

Here is the original thread (or one of them):
http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com

 

Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/


Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information.  A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low.  In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.

Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS.  If anyone sees something I should explain better or in more depth, please let me know.

 

Regards,

Don

--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate

GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
"Relyea, Mike"
Date:
> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Don Parris
> Sent: Thursday, February 14, 2013 8:58 PM
> To: pgsql-sql@postgresql.org
> Subject: [SQL] Summing & Grouping in a Hierarchical Structure
>
> Hi all,
> I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty
withmy problem and gave up for a time.   
> I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this
part.
>
> Here is the original thread (or one of them):
> http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com
>
>
> Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
> http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/
>
>
> Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher
tiers,so as to show more summarized information.   
> A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more
detailedlevels if something is unusually high or low.   
> In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the
2nd-levelcategories. 
> Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of
workingwith hierarchical structures in a 2-dimensional RDBMS.   
> If anyone sees something I should explain better or in more depth, please let me know.
>
> Regards,
> Don
> --
> D.C. Parris, FMP, Linux+, ESL Certificate
> Minister, Security/FM Coordinator, Free Software Advocate
> http://dcparris.net/
> GPG Key ID: F5E179BE

My two cents would be to actually use a  different tool for the job of presenting this data.  I'd have used a pivot
tablein Microsoft Excel.  Not sure what your environment or requirements are but pivot tables are widely used in
business,easy to share, can be formatted, and give the user the ability to drill down and navigate to the data they
wantto see. 
I'd set up a query to pull the raw data you need with all of the categories and associated data you need.  Then bring
thatdata to Excel to present and summarize it. 

Mike



Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Hi Alexander,

I appreciate you taking time to reply to my post.  I like the idea of the WITH RECURSIVE query, but...  The two examples in the link you offered are not so helpful to me.  For example, the initial WITH query shown uses a single table, and I wander how that might apply in my case, where the relevant information is actually found in two tables, one of them a recursive table.

The second example, which applies the WITH RECURSIVE clause, is even less so.  I wonder if there is a good tutorial somewhere on this that shows some other examples?  That might help me catch on a little better.  I'll search for that today.


On Thu, Feb 14, 2013 at 11:30 PM, Alexander Gataric <gataric@usa.net> wrote:

I would try a recursive query to determine the category structure and aggregate as you go. I had a similar problem with a hierarchical structure for an organization structure. Another thing you might try is to create a separate CTE for each category and then aggregate the individual CTEs.

 

 

From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Don Parris
Sent: Thursday, February 14, 2013 7:58 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] Summing & Grouping in a Hierarchical Structure

 

Hi all,

I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time.  I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.

Here is the original thread (or one of them):
http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com

 

Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/


Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information.  A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low.  In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.

Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS.  If anyone sees something I should explain better or in more depth, please let me know.

 

Regards,

Don

--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate

GPG Key ID: F5E179BE




--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Hi Mike,

No Excel here.  This is a strictly libre systems environment.  I believe LibreOffice Calc has a similar tool though.  And your suggestion might be a great one.


On Mon, Feb 18, 2013 at 10:24 AM, Relyea, Mike <Mike.Relyea@xerox.com> wrote:
> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Don Parris
> Sent: Thursday, February 14, 2013 8:58 PM
> To: pgsql-sql@postgresql.org
> Subject: [SQL] Summing & Grouping in a Hierarchical Structure
>
> Hi all,
> I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time. 
> I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.
>
> Here is the original thread (or one of them):
> http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com
>
>
> Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
> http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/
>
>
> Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information. 
> A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low. 
> In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.
> Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS. 
> If anyone sees something I should explain better or in more depth, please let me know.
>
> Regards,
> Don
> --
> D.C. Parris, FMP, Linux+, ESL Certificate
> Minister, Security/FM Coordinator, Free Software Advocate
> http://dcparris.net/
> GPG Key ID: F5E179BE

My two cents would be to actually use a  different tool for the job of presenting this data.  I'd have used a pivot table in Microsoft Excel.  Not sure what your environment or requirements are but pivot tables are widely used in business, easy to share, can be formatted, and give the user the ability to drill down and navigate to the data they want to see.
I'd set up a query to pull the raw data you need with all of the categories and associated data you need.  Then bring that data to Excel to present and summarize it.

Mike



--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
"Alexander Gataric"
Date:

I would use the recursive CTE to gather the hierarchical portion of the data you need and then join that CTE to another table or CTE with the other data you need. I had a situation like this at my job were organization info was in a hierarchal table and I needed to join it to two other tables. I created a CTE with the combined data from the non-hierarchical tables and left joined it to the recursive CTE.

 

If you’re having trouble with this, I suggest looking into CTEs and the different types of joins.

 

 

 

From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Don Parris
Sent: Thursday, February 21, 2013 4:38 AM
To: pgsql-sql@postgresql.org
Subject: Re: [SQL] Summing & Grouping in a Hierarchical Structure

 

Hi Alexander,

I appreciate you taking time to reply to my post.  I like the idea of the WITH RECURSIVE query, but...  The two examples in the link you offered are not so helpful to me.  For example, the initial WITH query shown uses a single table, and I wander how that might apply in my case, where the relevant information is actually found in two tables, one of them a recursive table.

The second example, which applies the WITH RECURSIVE clause, is even less so.  I wonder if there is a good tutorial somewhere on this that shows some other examples?  That might help me catch on a little better.  I'll search for that today.

 

On Thu, Feb 14, 2013 at 11:30 PM, Alexander Gataric <gataric@usa.net> wrote:

I would try a recursive query to determine the category structure and aggregate as you go. I had a similar problem with a hierarchical structure for an organization structure. Another thing you might try is to create a separate CTE for each category and then aggregate the individual CTEs.

 

 

From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Don Parris
Sent: Thursday, February 14, 2013 7:58 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] Summing & Grouping in a Hierarchical Structure

 

Hi all,

I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time.  I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.

Here is the original thread (or one of them):
http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com

 

Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/


Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information.  A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low.  In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.

Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS.  If anyone sees something I should explain better or in more depth, please let me know.

 

Regards,

Don

--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate

GPG Key ID: F5E179BE




--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate

GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Misa Simic
Date:
Hi,


Have you considered maybe ltree datatype?


I think it solves a lot of problems in topic....

Kind regards,

Misa

On Friday, February 15, 2013, Don Parris wrote:
Hi all,

I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time.  I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.

Here is the original thread (or one of them):
http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com

Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/


Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information.  A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low.  In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.

Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS.  If anyone sees something I should explain better or in more depth, please let me know.

Regards,
Don
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Hi Misa,

I decided to try out ltree, and have made some progress with it.  If I understand correctly how to use it, I simply insert the 'path' column into my table, using ltree as the data type.  That eliminates the need for a category table, if I understand correctly.  I just need to ensure the category path is correct for each line item in the transaction details table.

However, I have difficulty figuring out how to sum the amounts as I would like:
test_ltree=> SELECT path, SUM(trans_amt) FROM testcat WHERE path ~ 'TOP.*{2}' GROUP BY path;
 path | sum
------+-----
(0 rows)


This works fine:
test_ltree=> SELECT path, trans_amt FROM testcat;
                  path                   | trans_amt
-----------------------------------------+-----------
 TOP.Transportation.Auto.Fuel            |     50.00
 TOP.Transportation.Auto.Maintenance     |     30.00
 TOP.Transportation.Auto.Fuel            |     25.00
 TOP.Transportation.Bicycle.Gear         |     40.00
 TOP.Transportation.Bicycle.Gear         |     20.00
 TOP.Transportation.Fares.Bus            |     10.00
 TOP.Transportation.Fares.Train          |      5.00
 TOP.Groceries.Food.Beverages            |     30.00
 TOP.Groceries.Food.Fruit_Veggies        |     40.00
 TOP.Groceries.Food.Meat_Fish            |     80.00
 TOP.Groceries.Food.Grains_Cereals       |     30.00
 TOP.Groceries.Beverages.Alcohol.Beer    |     25.00
 TOP.Groceries.Beverages.Alcohol.Spirits |     10.00
 TOP.Groceries.Beverages.Alcohol.Wine    |     50.00
 TOP.Groceries.Beverages.Juice           |     45.00
 TOP.Groceries.Beverages.Other           |     15.00
(16 rows)


But I can do that in my sleep with any regular query.  This also works great (sum the trans_amt column at level 3 (not counting "TOP"):
test_ltree=> SELECT path, SUM(trans_amt) FROM testcat WHERE path ~ 'TOP.*{3}' GROUP BY path;
                path                 |  sum 
-------------------------------------+-------
 TOP.Groceries.Beverages.Juice       | 45.00
 TOP.Groceries.Beverages.Other       | 15.00
 TOP.Groceries.Food.Beverages        | 30.00
 TOP.Groceries.Food.Fruit_Veggies    | 40.00
 TOP.Groceries.Food.Grains_Cereals   | 30.00
 TOP.Groceries.Food.Meat_Fish        | 80.00
 TOP.Transportation.Auto.Fuel        | 75.00
 TOP.Transportation.Auto.Maintenance | 30.00
 TOP.Transportation.Bicycle.Gear     | 60.00
 TOP.Transportation.Fares.Bus        | 10.00
 TOP.Transportation.Fares.Train      |  5.00
(11 rows)

So if I want to see:
TOP.Groceries        | 240.00
TOP.Transportation | 180.00

How do I get this?  Can you help?

I am running PostgreSQL 9.1 on Kubuntu 12.10, in case that makes any difference.



On Fri, Feb 22, 2013 at 4:00 AM, Misa Simic <misa.simic@gmail.com> wrote:
Hi,


Have you considered maybe ltree datatype?


I think it solves a lot of problems in topic....

Kind regards,

Misa


On Friday, February 15, 2013, Don Parris wrote:
Hi all,

I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time.  I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.

Here is the original thread (or one of them):
http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com

Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/


Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information.  A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low.  In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.

Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS.  If anyone sees something I should explain better or in more depth, please let me know.

Regards,
Don
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE



--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Ok, I managed to accomplish my goal with the ltree:

test_ltree=> SELECT subltree(path,0,2), sum(trans_amt) FROM testcat GROUP BY subltree;
      subltree      |  sum  
--------------------+--------
 TOP.Groceries      | 325.00
 TOP.Transportation | 180.00
(2 rows)


It took a bit of experimenting with the various functions of ltree, but I got what I wanted.

What I did was keep the ltree path column in the same table as the line-item amount, but this requires building the entire path statement in every row.  That's ok for a quick test to figure out how ltree works, but I think I do need to keep the category structure in a separate table:

Category (category_id, path)

Anyway, thanks for suggesting ltree.


On Fri, Feb 22, 2013 at 7:15 PM, Don Parris <parrisdc@gmail.com> wrote:
Hi Misa,

I decided to try out ltree, and have made some progress with it.  If I understand correctly how to use it, I simply insert the 'path' column into my table, using ltree as the data type.  That eliminates the need for a category table, if I understand correctly.  I just need to ensure the category path is correct for each line item in the transaction details table.

However, I have difficulty figuring out how to sum the amounts as I would like:
test_ltree=> SELECT path, SUM(trans_amt) FROM testcat WHERE path ~ 'TOP.*{2}' GROUP BY path;
 path | sum
------+-----
(0 rows)


This works fine:
test_ltree=> SELECT path, trans_amt FROM testcat;
                  path                   | trans_amt
-----------------------------------------+-----------
 TOP.Transportation.Auto.Fuel            |     50.00
 TOP.Transportation.Auto.Maintenance     |     30.00
 TOP.Transportation.Auto.Fuel            |     25.00
 TOP.Transportation.Bicycle.Gear         |     40.00
 TOP.Transportation.Bicycle.Gear         |     20.00
 TOP.Transportation.Fares.Bus            |     10.00
 TOP.Transportation.Fares.Train          |      5.00
 TOP.Groceries.Food.Beverages            |     30.00
 TOP.Groceries.Food.Fruit_Veggies        |     40.00
 TOP.Groceries.Food.Meat_Fish            |     80.00
 TOP.Groceries.Food.Grains_Cereals       |     30.00
 TOP.Groceries.Beverages.Alcohol.Beer    |     25.00
 TOP.Groceries.Beverages.Alcohol.Spirits |     10.00
 TOP.Groceries.Beverages.Alcohol.Wine    |     50.00
 TOP.Groceries.Beverages.Juice           |     45.00
 TOP.Groceries.Beverages.Other           |     15.00
(16 rows)


But I can do that in my sleep with any regular query.  This also works great (sum the trans_amt column at level 3 (not counting "TOP"):
test_ltree=> SELECT path, SUM(trans_amt) FROM testcat WHERE path ~ 'TOP.*{3}' GROUP BY path;
                path                 |  sum 
-------------------------------------+-------
 TOP.Groceries.Beverages.Juice       | 45.00
 TOP.Groceries.Beverages.Other       | 15.00
 TOP.Groceries.Food.Beverages        | 30.00
 TOP.Groceries.Food.Fruit_Veggies    | 40.00
 TOP.Groceries.Food.Grains_Cereals   | 30.00
 TOP.Groceries.Food.Meat_Fish        | 80.00
 TOP.Transportation.Auto.Fuel        | 75.00
 TOP.Transportation.Auto.Maintenance | 30.00
 TOP.Transportation.Bicycle.Gear     | 60.00
 TOP.Transportation.Fares.Bus        | 10.00
 TOP.Transportation.Fares.Train      |  5.00
(11 rows)

So if I want to see:
TOP.Groceries        | 240.00
TOP.Transportation | 180.00

How do I get this?  Can you help?

I am running PostgreSQL 9.1 on Kubuntu 12.10, in case that makes any difference.



On Fri, Feb 22, 2013 at 4:00 AM, Misa Simic <misa.simic@gmail.com> wrote:
Hi,


Have you considered maybe ltree datatype?


I think it solves a lot of problems in topic....

Kind regards,

Misa


On Friday, February 15, 2013, Don Parris wrote:
Hi all,

I posted to this list some time ago about working with a hierarchical category structure.   I had great difficulty with my problem and gave up for a time.  I recently returned to it and resolved a big part of it.  I have one step left to go, but at least I have solved this part.

Here is the original thread (or one of them):
http://www.postgresql.org/message-id/CAJ-7yonw4_qDCp-ZNYwEkR2jdLKeL8nfGc+-TLLSW=Rmo1VkbA@mail.gmail.com

Here is my recent blog post about how I managed to show my expenses summed and grouped by a mid-level category:
http://dcparris.net/2013/02/13/hierarchical-categories-rdbms/


Specifically, I wanted to sum and group expenses according to categories, not just at the bottom tier, but at higher tiers, so as to show more summarized information.  A CEO primarily wants to know the sum total for all the business units, yet have the ability to drill down to more detailed levels if something is unusually high or low.  In my case, I could see the details, but not the summary.  Well now I can summarize by what I refer to as the 2nd-level categories.

Anyway, I hope this helps someone, as I have come to appreciate - and I mean really appreciate - the challenge of working with hierarchical structures in a 2-dimensional RDBMS.  If anyone sees something I should explain better or in more depth, please let me know.

Regards,
Don
--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE



--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE



--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Bryan L Nuse
Date:


This works fine:
test_ltree=> SELECT path, trans_amt FROM testcat;
                  path                   | trans_amt
-----------------------------------------+-----------
 TOP.Transportation.Auto.Fuel            |     50.00
 TOP.Transportation.Auto.Maintenance     |     30.00
 TOP.Transportation.Auto.Fuel            |     25.00
 TOP.Transportation.Bicycle.Gear         |     40.00
 TOP.Transportation.Bicycle.Gear         |     20.00
 TOP.Transportation.Fares.Bus            |     10.00
 TOP.Transportation.Fares.Train          |      5.00
 TOP.Groceries.Food.Beverages            |     30.00
 TOP.Groceries.Food.Fruit_Veggies        |     40.00
 TOP.Groceries.Food.Meat_Fish            |     80.00
 TOP.Groceries.Food.Grains_Cereals       |     30.00
 TOP.Groceries.Beverages.Alcohol.Beer    |     25.00
 TOP.Groceries.Beverages.Alcohol.Spirits |     10.00
 TOP.Groceries.Beverages.Alcohol.Wine    |     50.00
 TOP.Groceries.Beverages.Juice           |     45.00
 TOP.Groceries.Beverages.Other           |     15.00
(16 rows)


So if I want to see:
TOP.Groceries        | 240.00
TOP.Transportation | 180.00



Hello Don,

Perhaps I am missing something about what your constraints are, or what you're trying to achieve, but is there any reason you could not use a series of joined tables indicating parent-child relationships?  The following example follows that in your previous posts.  Note that this approach (as given) will not work if branches stemming from the same node are different lengths.  That is, if you have costs associated with "Transportation.Bicycle.Gear", you could not also have a category "Transportation.Bicycle.Gear.Chain_ring".  (To add the latter category, you'd have to put costs from the former under something like "Transportation.Bicycle.Gear.General" -- or modify the approach.)  However, lengthening the "Alcohol" branches, e.g., by tacking on a level5 table would be easy.  Notice that level3 and level4 are not true look-up tables, since they may contain duplicate cat values.

If I'm off base, by all means specify just how.

Regards,
Bryan

--------------------------------------------------

CREATE TABLE level1 (
  cat   text  PRIMARY KEY
);

CREATE TABLE level2 (
   cat   text   PRIMARY KEY,
   parent   text   REFERENCES level1(cat)
);

CREATE TABLE level3 (
   cat   text,
   parent   text   REFERENCES level2(cat),
   cost   numeric(6,2)
);

CREATE TABLE level4 (
   cat   text,
   parent   text,
   cost   numeric(6,2)
);


INSERT INTO level1
  VALUES ('Transportation'),
         ('Groceries');

INSERT INTO level2
  VALUES ('Auto', 'Transportation'),
         ('Bicycle', 'Transportation'),
         ('Fares', 'Transportation'),
         ('Food', 'Groceries'),
         ('Beverages', 'Groceries');

INSERT INTO level3
  VALUES ('Fuel', 'Auto', 50.00),
         ('Maintenance', 'Auto', 30.00),
         ('Fuel', 'Auto', 25.00), 
         ('Gear', 'Bicycle', 40.00),
         ('Gear', 'Bicycle', 20.00),
         ('Bus', 'Fares', 10.00),
         ('Train', 'Fares', 5.00),
         ('Beverages', 'Food', 30.00),
         ('Fruit_Veg', 'Food', 40.00),
         ('Meat_Fish', 'Food', 80.00),
         ('Grains_Cereals', 'Food', 30.00), 
         ('Alcohol', 'Beverages', NULL),
         ('Juice', 'Beverages', 45.00), 
         ('Other', 'Beverages', 15.00); 

INSERT INTO level4
  VALUES ('Beer', 'Alcohol', 25.00),
         ('Spirits', 'Alcohol', 10.00),
         ('Wine', 'Alcohol', 50.00);


CREATE VIEW all_cats AS (
SELECT a.cat AS level4,
       b.cat AS level3,
       c.cat AS level2,
       d.cat AS level1,
       CASE WHEN a.cost IS NULL THEN 0
            WHEN a.cost IS NOT NULL THEN a.cost
                 END
       + CASE WHEN b.cost IS NULL THEN 0
              WHEN b.cost IS NOT NULL THEN b.cost
         END AS cost
  FROM level4 a
    FULL JOIN
    level3 b
    ON (a.parent = b.cat)
      FULL JOIN
      level2 c
      ON (b.parent = c.cat)
        FULL JOIN
        level1 d
        ON (c.parent = d.cat)
  ORDER BY level1, level2, level3, level4
);



SELECT * FROM all_cats;

     level4  |     level3     |  level2   |     level1     | cost  
    ---------+----------------+-----------+----------------+-------
     Beer    | Alcohol        | Beverages | Groceries      | 25.00
     Spirits | Alcohol        | Beverages | Groceries      | 10.00
     Wine    | Alcohol        | Beverages | Groceries      | 50.00
             | Juice          | Beverages | Groceries      | 45.00
             | Other          | Beverages | Groceries      | 15.00
             | Beverages      | Food      | Groceries      | 30.00
             | Fruit_Veg      | Food      | Groceries      | 40.00
             | Grains_Cereals | Food      | Groceries      | 30.00
             | Meat_Fish      | Food      | Groceries      | 80.00
             | Fuel           | Auto      | Transportation | 50.00
             | Fuel           | Auto      | Transportation | 25.00
             | Maintenance    | Auto      | Transportation | 30.00
             | Gear           | Bicycle   | Transportation | 20.00
             | Gear           | Bicycle   | Transportation | 40.00
             | Bus            | Fares     | Transportation | 10.00
             | Train          | Fares     | Transportation |  5.00
    (16 rows)




SELECT level1,
       count(cost) AS num_branches,
       sum(cost) AS total_cost 
  FROM all_cats
  GROUP BY level1
  ORDER BY level1;

         level1     | num_branches | total_cost 
    ----------------+--------------+------------
     Groceries      |            9 |     325.00
     Transportation |            7 |     180.00
    (2 rows)

Re: Summing & Grouping in a Hierarchical Structure

From
Misa Simic
Date:
Hi Don,

Yes, its better to use it in category table...

Bryan, how many levels there will be - we dont know...

With one table - and ltree is solved all cases...

To add new subcategory user just picks the parent category... So it is easy to add chain ring to gear... As category...

In another transaction table is category_id, amount...


Don already posted query for sum... In these case just category and transaction table should be joined  sum amount, group by functions on lpath....(depending what is the goal...)

Kind Regards,

Misa



On Saturday, February 23, 2013, Bryan L Nuse wrote:


This works fine:
test_ltree=> SELECT path, trans_amt FROM testcat;
                  path                   | trans_amt
-----------------------------------------+-----------
 TOP.Transportation.Auto.Fuel            |     50.00
 TOP.Transportation.Auto.Maintenance     |     30.00
 TOP.Transportation.Auto.Fuel            |     25.00
 TOP.Transportation.Bicycle.Gear         |     40.00
 TOP.Transportation.Bicycle.Gear         |     20.00
 TOP.Transportation.Fares.Bus            |     10.00
 TOP.Transportation.Fares.Train          |      5.00
 TOP.Groceries.Food.Beverages            |     30.00
 TOP.Groceries.Food.Fruit_Veggies        |     40.00
 TOP.Groceries.Food.Meat_Fish            |     80.00
 TOP.Groceries.Food.Grains_Cereals       |     30.00
 TOP.Groceries.Beverages.Alcohol.Beer    |     25.00
 TOP.Groceries.Beverages.Alcohol.Spirits |     10.00
 TOP.Groceries.Beverages.Alcohol.Wine    |     50.00
 TOP.Groceries.Beverages.Juice           |     45.00
 TOP.Groceries.Beverages.Other           |     15.00
(16 rows)


So if I want to see:
TOP.Groceries        | 240.00
TOP.Transportation | 180.00



Hello Don,

Perhaps I am missing something about what your constraints are, or what you're trying to achieve, but is there any reason you could not use a series of joined tables indicating parent-child relationships?  The following example follows that in your previous posts.  Note that this approach (as given) will not work if branches stemming from the same node are different lengths.  That is, if you have costs associated with "Transportation.Bicycle.Gear", you could not also have a category "Transportation.Bicycle.Gear.Chain_ring".  (To add the latter category, you'd have to put costs from the former under something like "Transportation.Bicycle.Gear.General" -- or modify the approach.)  However, lengthening the "Alcohol" branches, e.g., by tacking on a level5 table would be easy.  Notice that level3 and level4 are not true look-up tables, since they may contain duplicate cat values.

If I'm off base, by all means specify just how.

Regards,
Bryan

--------------------------------------------------

CREATE TABLE level1 (
  cat   text  PRIMARY KEY
);

CREATE TABLE level2 (
   cat   text   PRIMARY KEY,
   parent   text   REFERENCES level1(cat)
);

CREATE TABLE level3 (
   cat   text,
   parent   text   REFERENCES level2(cat),
   cost   numeric(6,2)
);

CREATE TABLE level4 (
   cat   text,
   parent   text,
   cost   numeric(6,2)
);


INSERT INTO level1
  VALUES ('Transportation'),
         ('Groceries');

INSERT INTO level2
  VALUES ('Auto', 'Transportation'),
         ('Bicycle', 'Transportation'),
         ('Fares', 'Transportation'),
         ('Food', 'Groceries'),
         ('Beverages', 'Groceries');

INSERT INTO level3
  VALUES ('Fuel', 'Auto', 50.00),
         ('Maintenance', 'Auto', 30.00),
         ('Fuel', 'Auto', 25.00), 
         ('Gear', 'Bicycle', 40.00),
         ('Gear', 'Bicycle', 20.00),
         ('Bus', 'Fares', 10.00),
         ('Train', 'Fares', 5.00),
         ('Beverages', 'Food', 30.00),
         ('Fruit_Veg', 'Food', 40.00),
         ('Meat_Fish', 'Food', 80.00),
         ('Grains_Cereals', 'Food', 30.00), 
         ('Alcohol', 'Beverages', NULL),
         ('Juice', 'Beverages', 45.00), 
         ('Other', 'Beverages', 15.00); 

INSERT INTO level4
  VALUES ('Beer', 'Alcohol', 25.00),
         ('Spirits', 'Alcohol', 10.00),
         ('Wine', 'Alcohol', 50.00);


CREATE VIEW all_cats AS (
SELECT a.cat AS level4,
       b.cat AS level3,
       c.cat AS level2,
       d.cat AS level1,
       CASE WHEN a.cost IS NULL THEN 0
            WHEN a.cost IS NOT NULL THEN a.cost
                 END
       + CASE WHEN b.cost IS NULL THEN 0
              WHEN b.cost IS NOT NULL THEN b.cost
         END AS cost
  FROM level4 a
    FULL JOIN
    level3 b
    ON (a.parent = b.cat)
      FULL JOIN
      level2 c
      ON (b.parent = c.cat)
        FULL JOIN
        level1 d
        ON (c.parent = d.cat)
  ORDER BY level1, level2, level3, level4
);



SELECT * FROM all_cats;

     level4  |     level3     |  level2   |     level1     | cost  
    ---------+----------------+-----------+----------------+-------
     Beer    | Alcohol        | Beverages | Groceries      | 25.00
     Spirits | Alcohol        | Beverages | Groceries      | 10.00
     Wine    | Alcohol        | Beverages | Groceries      | 50.00
             | Juice          | Beverages | Groceries      | 45.00
             | Other          | Beverages | Groceries      | 15.00
             | Beverages      | Food      | Groceries      | 30.00
             | Fruit_Veg      | Food      | Groceries      | 40.00
             | Grains_Cereals | Food      | Groceries      | 30.00
             | Meat_Fish      | Food      | Groceries      | 80.00
             | Fuel           | Auto      | Transportation | 50.00
             | Fuel           | Auto      | Transportation | 25.00
             | Maintenance    | Auto      | Transportation | 30.00
             | Gear           | Bicycle   | Transportation | 20.00
             | Gear           | Bicycle   | Transportation | 40.00
             | Bus            | Fares     | Transportation | 10.00
             | Train          | Fares     | Transportation |  5.00
    (16 rows)




SELECT level1,
       count(cost) AS num_branches,
       sum(cost) AS total_cost 
  FROM all_cats
  GROUP BY level1
  ORDER BY level1;

         level1     | num_branches | total_cost 
    ----------------+--------------+------------
     Groceries      |            9 |     325.00
     Transportation |            7 |     180.00
    (2 rows)

Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Hi Bryan,

My current (maybe about to be deprecated now) category table is a recursive table that I join on the line-item table.  Your suggestion of using a table for each level of category is probably a bit easier to manage in some respects.  However, as you pointed out, it becomes unwieldy when various branches of the category tree have different lengths.  One effectively has to have the same number of levels in all branches.  If only reality were so refined!  :-)  That is why I chose the recursive table structure that I gather is fairly common where the depth of a tree is not necessarily "even" or known.

However, I like the view you created - I have not really used views very much so far, so it is good to see this use case for a view.  That said, now that I have finally gotten the chance to try ltree, I think I like it a lot.  I wonder how portable it is, but it seems to do the trick extremely well.  Mind you, I am not necessarily looking to change DBMSes anytime soon, but it is still good to know how to handle recursive/hierarchical structures... ummm... the old-fashioned way(?).  ;-)

I think using WITH RECURSIVE would probably do the trick for my recursive category table, but - so far - the WITH construct makes my eyes glaze over.  I feel like I will have returned the *one ring* if I get that.


On Fri, Feb 22, 2013 at 11:54 PM, Bryan L Nuse <nuse@uga.edu> wrote:
<SNIP>

Hello Don,

Perhaps I am missing something about what your constraints are, or what you're trying to achieve, but is there any reason you could not use a series of joined tables indicating parent-child relationships?  The following example follows that in your previous posts.  Note that this approach (as given) will not work if branches stemming from the same node are different lengths.  That is, if you have costs associated with "Transportation.Bicycle.Gear", you could not also have a category "Transportation.Bicycle.Gear.Chain_ring".  (To add the latter category, you'd have to put costs from the former under something like "Transportation.Bicycle.Gear.General" -- or modify the approach.)  However, lengthening the "Alcohol" branches, e.g., by tacking on a level5 table would be easy.  Notice that level3 and level4 are not true look-up tables, since they may contain duplicate cat values.

If I'm off base, by all means specify just how.

Regards,
Bryan

--------------------------------------------------

CREATE TABLE level1 (
  cat   text  PRIMARY KEY
);

CREATE TABLE level2 (
   cat   text   PRIMARY KEY,
   parent   text   REFERENCES level1(cat)
);

CREATE TABLE level3 (
   cat   text,
   parent   text   REFERENCES level2(cat),
   cost   numeric(6,2)
);

CREATE TABLE level4 (
   cat   text,
   parent   text,
   cost   numeric(6,2)
);


INSERT INTO level1
  VALUES ('Transportation'),
         ('Groceries');

INSERT INTO level2
  VALUES ('Auto', 'Transportation'),
         ('Bicycle', 'Transportation'),
         ('Fares', 'Transportation'),
         ('Food', 'Groceries'),
         ('Beverages', 'Groceries');

INSERT INTO level3
  VALUES ('Fuel', 'Auto', 50.00),
         ('Maintenance', 'Auto', 30.00),
         ('Fuel', 'Auto', 25.00), 
         ('Gear', 'Bicycle', 40.00),
         ('Gear', 'Bicycle', 20.00),
         ('Bus', 'Fares', 10.00),
         ('Train', 'Fares', 5.00),
         ('Beverages', 'Food', 30.00),
         ('Fruit_Veg', 'Food', 40.00),
         ('Meat_Fish', 'Food', 80.00),
         ('Grains_Cereals', 'Food', 30.00), 
         ('Alcohol', 'Beverages', NULL),
         ('Juice', 'Beverages', 45.00), 
         ('Other', 'Beverages', 15.00); 

INSERT INTO level4
  VALUES ('Beer', 'Alcohol', 25.00),
         ('Spirits', 'Alcohol', 10.00),
         ('Wine', 'Alcohol', 50.00);


CREATE VIEW all_cats AS (
SELECT a.cat AS level4,
       b.cat AS level3,
       c.cat AS level2,
       d.cat AS level1,
       CASE WHEN a.cost IS NULL THEN 0
            WHEN a.cost IS NOT NULL THEN a.cost
                 END
       + CASE WHEN b.cost IS NULL THEN 0
              WHEN b.cost IS NOT NULL THEN b.cost
         END AS cost
  FROM level4 a
    FULL JOIN
    level3 b
    ON (a.parent = b.cat)
      FULL JOIN
      level2 c
      ON (b.parent = c.cat)
        FULL JOIN
        level1 d
        ON (c.parent = d.cat)
  ORDER BY level1, level2, level3, level4
);



SELECT * FROM all_cats;

     level4  |     level3     |  level2   |     level1     | cost  
    ---------+----------------+-----------+----------------+-------
     Beer    | Alcohol        | Beverages | Groceries      | 25.00
     Spirits | Alcohol        | Beverages | Groceries      | 10.00
     Wine    | Alcohol        | Beverages | Groceries      | 50.00
             | Juice          | Beverages | Groceries      | 45.00
             | Other          | Beverages | Groceries      | 15.00
             | Beverages      | Food      | Groceries      | 30.00
             | Fruit_Veg      | Food      | Groceries      | 40.00
             | Grains_Cereals | Food      | Groceries      | 30.00
             | Meat_Fish      | Food      | Groceries      | 80.00
             | Fuel           | Auto      | Transportation | 50.00
             | Fuel           | Auto      | Transportation | 25.00
             | Maintenance    | Auto      | Transportation | 30.00
             | Gear           | Bicycle   | Transportation | 20.00
             | Gear           | Bicycle   | Transportation | 40.00
             | Bus            | Fares     | Transportation | 10.00
             | Train          | Fares     | Transportation |  5.00
    (16 rows)




SELECT level1,
       count(cost) AS num_branches,
       sum(cost) AS total_cost 
  FROM all_cats
  GROUP BY level1
  ORDER BY level1;

         level1     | num_branches | total_cost 
    ----------------+--------------+------------
     Groceries      |            9 |     325.00
     Transportation |            7 |     180.00
    (2 rows)




--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Bryan L Nuse
Date:
>  That said, now that I have finally gotten the chance to try ltree, I think I like it a lot.

Hello Don,

Yes, after looking at ltree --which I had not done before-- I have to agree with Misa that it looks like the right
solutionfor your problem.  That is not to say that "brute force" SQL couldn't provide a workable arrangement; but ltree
looksvery flexible, especially as it allows you to assign cost values to non-terminal nodes.  If it were me, though,
I'dstill make use of VIEWs to report results of the workhorse queries:  staring at a list of items like
"Transportation.Bicycle.Gear.Chain_ring"sounds like headache.  That's a matter of taste, of course. 

Bryan



Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Hi Bryan,


The detailed category listing is dead simple to create: SELECT cat_id, cat_name, sum(amount) FROM category, line-item GROUP BY cat_name, cat_id;
But... as you say, the result is a LOT of numbers over the course of 160-ish categories of all levels.  Which is why it is so important to be able to summarize.  I want to summarize by the highest-levels of the category tree, precisely so as not to get a headache looking at the details. But it is still important to be able to drill down when I want more details.


On Sat, Feb 23, 2013 at 11:15 AM, Bryan L Nuse <nuse@uga.edu> wrote:
>  That said, now that I have finally gotten the chance to try ltree, I think I like it a lot.

Hello Don,

Yes, after looking at ltree --which I had not done before-- I have to agree with Misa that it looks like the right solution for your problem.  That is not to say that "brute force" SQL couldn't provide a workable arrangement; but ltree looks very flexible, especially as it allows you to assign cost values to non-terminal nodes.  If it were me, though, I'd still make use of VIEWs to report results of the workhorse queries:  staring at a list of items like "Transportation.Bicycle.Gear.Chain_ring" sounds like headache.  That's a matter of taste, of course.

Bryan



--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Misa,

Is it possible to use spaces in the ltree path, like so:  TOP.Groceries.Food.Herbs & Spices

Or do the elements of the path have to use underscores and dashes?


On Sat, Feb 23, 2013 at 7:19 AM, Misa Simic <misa.simic@gmail.com> wrote:
Hi Don,

Yes, its better to use it in category table...

Bryan, how many levels there will be - we dont know...

With one table - and ltree is solved all cases...

To add new subcategory user just picks the parent category... So it is easy to add chain ring to gear... As category...

In another transaction table is category_id, amount...


Don already posted query for sum... In these case just category and transaction table should be joined  sum amount, group by functions on lpath....(depending what is the goal...)

Kind Regards,

Misa



On Saturday, February 23, 2013, Bryan L Nuse wrote:


This works fine:
test_ltree=> SELECT path, trans_amt FROM testcat;
                  path                   | trans_amt
-----------------------------------------+-----------
 TOP.Transportation.Auto.Fuel            |     50.00
 TOP.Transportation.Auto.Maintenance     |     30.00
 TOP.Transportation.Auto.Fuel            |     25.00
 TOP.Transportation.Bicycle.Gear         |     40.00
 TOP.Transportation.Bicycle.Gear         |     20.00
 TOP.Transportation.Fares.Bus            |     10.00
 TOP.Transportation.Fares.Train          |      5.00
 TOP.Groceries.Food.Beverages            |     30.00
 TOP.Groceries.Food.Fruit_Veggies        |     40.00
 TOP.Groceries.Food.Meat_Fish            |     80.00
 TOP.Groceries.Food.Grains_Cereals       |     30.00
 TOP.Groceries.Beverages.Alcohol.Beer    |     25.00
 TOP.Groceries.Beverages.Alcohol.Spirits |     10.00
 TOP.Groceries.Beverages.Alcohol.Wine    |     50.00
 TOP.Groceries.Beverages.Juice           |     45.00
 TOP.Groceries.Beverages.Other           |     15.00
(16 rows)


So if I want to see:
TOP.Groceries        | 240.00
TOP.Transportation | 180.00



Hello Don,

Perhaps I am missing something about what your constraints are, or what you're trying to achieve, but is there any reason you could not use a series of joined tables indicating parent-child relationships?  The following example follows that in your previous posts.  Note that this approach (as given) will not work if branches stemming from the same node are different lengths.  That is, if you have costs associated with "Transportation.Bicycle.Gear", you could not also have a category "Transportation.Bicycle.Gear.Chain_ring".  (To add the latter category, you'd have to put costs from the former under something like "Transportation.Bicycle.Gear.General" -- or modify the approach.)  However, lengthening the "Alcohol" branches, e.g., by tacking on a level5 table would be easy.  Notice that level3 and level4 are not true look-up tables, since they may contain duplicate cat values.

If I'm off base, by all means specify just how.

Regards,
Bryan

--------------------------------------------------

CREATE TABLE level1 (
  cat   text  PRIMARY KEY
);

CREATE TABLE level2 (
   cat   text   PRIMARY KEY,
   parent   text   REFERENCES level1(cat)
);

CREATE TABLE level3 (
   cat   text,
   parent   text   REFERENCES level2(cat),
   cost   numeric(6,2)
);

CREATE TABLE level4 (
   cat   text,
   parent   text,
   cost   numeric(6,2)
);


INSERT INTO level1
  VALUES ('Transportation'),
         ('Groceries');

INSERT INTO level2
  VALUES ('Auto', 'Transportation'),
         ('Bicycle', 'Transportation'),
         ('Fares', 'Transportation'),
         ('Food', 'Groceries'),
         ('Beverages', 'Groceries');

INSERT INTO level3
  VALUES ('Fuel', 'Auto', 50.00),
         ('Maintenance', 'Auto', 30.00),
         ('Fuel', 'Auto', 25.00), 
         ('Gear', 'Bicycle', 40.00),
         ('Gear', 'Bicycle', 20.00),
         ('Bus', 'Fares', 10.00),
         ('Train', 'Fares', 5.00),
         ('Beverages', 'Food', 30.00),
         ('Fruit_Veg', 'Food', 40.00),
         ('Meat_Fish', 'Food', 80.00),
         ('Grains_Cereals', 'Food', 30.00), 
         ('Alcohol', 'Beverages', NULL),
         ('Juice', 'Beverages', 45.00), 
         ('Other', 'Beverages', 15.00); 

INSERT INTO level4
  VALUES ('Beer', 'Alcohol', 25.00),
         ('Spirits', 'Alcohol', 10.00),
         ('Wine', 'Alcohol', 50.00);


CREATE VIEW all_cats AS (
SELECT a.cat AS level4,
       b.cat AS level3,
       c.cat AS level2,
       d.cat AS level1,
       CASE WHEN a.cost IS NULL THEN 0
            WHEN a.cost IS NOT NULL THEN a.cost
                 END
       + CASE WHEN b.cost IS NULL THEN 0
              WHEN b.cost IS NOT NULL THEN b.cost
         END AS cost
  FROM level4 a
    FULL JOIN
    level3 b
    ON (a.parent = b.cat)
      FULL JOIN
      level2 c
      ON (b.parent = c.cat)
        FULL JOIN
        level1 d
        ON (c.parent = d.cat)
  ORDER BY level1, level2, level3, level4
);



SELECT * FROM all_cats;

     level4  |     level3     |  level2   |     level1     | cost  
    ---------+----------------+-----------+----------------+-------
     Beer    | Alcohol        | Beverages | Groceries      | 25.00
     Spirits | Alcohol        | Beverages | Groceries      | 10.00
     Wine    | Alcohol        | Beverages | Groceries      | 50.00
             | Juice          | Beverages | Groceries      | 45.00
             | Other          | Beverages | Groceries      | 15.00
             | Beverages      | Food      | Groceries      | 30.00
             | Fruit_Veg      | Food      | Groceries      | 40.00
             | Grains_Cereals | Food      | Groceries      | 30.00
             | Meat_Fish      | Food      | Groceries      | 80.00
             | Fuel           | Auto      | Transportation | 50.00
             | Fuel           | Auto      | Transportation | 25.00
             | Maintenance    | Auto      | Transportation | 30.00
             | Gear           | Bicycle   | Transportation | 20.00
             | Gear           | Bicycle   | Transportation | 40.00
             | Bus            | Fares     | Transportation | 10.00
             | Train          | Fares     | Transportation |  5.00
    (16 rows)




SELECT level1,
       count(cost) AS num_branches,
       sum(cost) AS total_cost 
  FROM all_cats
  GROUP BY level1
  ORDER BY level1;

         level1     | num_branches | total_cost 
    ----------------+--------------+------------
     Groceries      |            9 |     325.00
     Transportation |            7 |     180.00
    (2 rows)




--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Misa Simic
Date:
Hi Don,

To be honest with you - i dont know - but think it is not...

We use it to solve the problem with hierarchy relations - but it is nowhere "visible" to users in the app...

Our internal rule is to use ids in ltree structure to solve many others problems, actually to easy get, actual category info... From any point...

So if needed, it is easy from 1.2.3.4, get: TOP.Groceries.Food.Herbs & Spices if needed... Each of them are actually category names in the table...

Kind regards,

Misa


On Saturday, February 23, 2013, Don Parris wrote:
Misa,

Is it possible to use spaces in the ltree path, like so:  TOP.Groceries.Food.Herbs & Spices

Or do the elements of the path have to use underscores and dashes?


On Sat, Feb 23, 2013 at 7:19 AM, Misa Simic <misa.simic@gmail.com> wrote:
Hi Don,

Yes, its better to use it in category table...

Bryan, how many levels there will be - we dont know...

With one table - and ltree is solved all cases...

To add new subcategory user just picks the parent category... So it is easy to add chain ring to gear... As category...

In another transaction table is category_id, amount...


Don already posted query for sum... In these case just category and transaction table should be joined  sum amount, group by functions on lpath....(depending what is the goal...)

Kind Regards,

Misa



On Saturday, February 23, 2013, Bryan L Nuse wrote:


This works fine:
test_ltree=> SELECT path, trans_amt FROM testcat;
                  path                   | trans_amt
-----------------------------------------+-----------
 TOP.Transportation.Auto.Fuel            |     50.00
 TOP.Transportation.Auto.Maintenance     |     30.00
 TOP.Transportation.Auto.Fuel            |     25.00
 TOP.Transportation.Bicycle.Gear         |     40.00
 TOP.Transportation.Bicycle.Gear         |     20.00
 TOP.Transportation.Fares.Bus            |     10.00
 TOP.Transportation.Fares.Train          |      5.00
 TOP.Groceries.Food.Beverages            |     30.00
 TOP.Groceries.Food.Fruit_Veggies        |     40.00
 TOP.Groceries.Food.Meat_Fish            |     80.00
 TOP.Groceries.Food.Grains_Cereals       |     30.00
 TOP.Groceries.Beverages.Alcohol.Beer    |     25.00
 TOP.Groceries.Beverages.Alcohol.Spirits |     10.00
 TOP.Groceries.Beverages.Alcohol.Wine    |     50.00
 TOP.Groceries.Beverages.Juice           |     45.00
 TOP.Groceries.Beverages.Other           |     15.00
(16 rows)


So if I want to see:
TOP.Groceries        | 240.00
TOP.Transportation | 180.00



Hello Don,

Perhaps I am missing something about what your constraints are, or what you're trying to achieve, but is there any reason you could not use a series of joined tables indicating parent-child relationships?  The following example follows that in your previous posts.  Note that this approach (as given) will not work if branches stemming from the same node are different lengths.  That is, if you have costs associated with "Transportation.Bicycle.Gear", you could not also have a category "Transportation.Bicycle.Gear.Chain_ring".  (To add the latter category, you'd have to put costs from the former under something like "Transportation.Bicycle.Gear.General" -- or modify the approach.)  However, lengthening the "Alcohol" branches, e.g., by tacking on a level5 table would be easy.  Notice that level3 and level4 are not true look-up tables, since they may contain duplicate cat values.

If I'm off base, by all means specify just how.

Regards,
Bryan

--------------------------------------------------

CREATE TABLE level1 (
  cat   text  PRIMARY KEY
);

CREATE TABLE level2 (
   cat   text   PRIMARY KEY,
   parent   text   REFERENCES level1(cat)
);

--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Summing & Grouping in a Hierarchical Structure

From
Ben Morrow
Date:
Quoth parrisdc@gmail.com (Don Parris):
> 
> Is it possible to use spaces in the ltree path, like so:
> TOP.Groceries.Food.Herbs & Spices
> 
> Or do the elements of the path have to use underscores and dashes?

From the docs:

|  A label is a sequence of alphanumeric characters and underscores (for
|  example, in C locale the characters A-Za-z0-9_ are allowed). Labels
|  must be less than 256 bytes long. [...]
| 
| A label path is a sequence of zero or more labels separated by dots,
| for example L1.L2.L3, representing a path from the root of a
| hierarchical tree to a particular node. The length of a label path
| must be less than 65Kb, but keeping it under 2Kb is preferable.

If you need to store non-alphanumeric labels, one answer (as long as
they aren't too long) would be to use URL-encoding, like
   TOP.Groceries.Food.Herbs_20_26_20Spices

Of course, you would need to encode _ as well, and you would need to be
sure the labels weren't going to come out too long. Another alternative
would be to MD5 each label and use (say) the first 10 bytes of that MD5
in hex as the ltree label. (Annoyingly there's only one
non-alphanumeric, so you can't use base64.) If you were going to do that
you would need to consider the possibility of an attacker arranging a
hash collision: I don't know where you're labels come from, so I don't
know if this would be an issue.

Ben




Re: Summing & Grouping in a Hierarchical Structure

From
Don Parris
Date:
Misa & Ben,

Thanks.  I guess I overlooked that in the docs.  I did see the part about the 65KB.  I don't see my labels getting overly long.  I'll be fine with Herbs_Spices.  I make up the labels myself, so it's really no big deal.

Alexander & Bryan - Many thanks for the help!



On Sat, Feb 23, 2013 at 3:13 PM, Ben Morrow <ben@morrow.me.uk> wrote:
Quoth parrisdc@gmail.com (Don Parris):
>
> Is it possible to use spaces in the ltree path, like so:
> TOP.Groceries.Food.Herbs & Spices
>
> Or do the elements of the path have to use underscores and dashes?

From the docs:

|  A label is a sequence of alphanumeric characters and underscores (for
|  example, in C locale the characters A-Za-z0-9_ are allowed). Labels
|  must be less than 256 bytes long. [...]
|
| A label path is a sequence of zero or more labels separated by dots,
| for example L1.L2.L3, representing a path from the root of a
| hierarchical tree to a particular node. The length of a label path
| must be less than 65Kb, but keeping it under 2Kb is preferable.

If you need to store non-alphanumeric labels, one answer (as long as
they aren't too long) would be to use URL-encoding, like

    TOP.Groceries.Food.Herbs_20_26_20Spices

Of course, you would need to encode _ as well, and you would need to be
sure the labels weren't going to come out too long. Another alternative
would be to MD5 each label and use (say) the first 10 bytes of that MD5
in hex as the ltree label. (Annoyingly there's only one
non-alphanumeric, so you can't use base64.) If you were going to do that
you would need to consider the possibility of an attacker arranging a
hash collision: I don't know where you're labels come from, so I don't
know if this would be an issue.

Ben




--
D.C. Parris, FMP, Linux+, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE