Thread: Stuck Up In My Category Tree

Stuck Up In My Category Tree

From
Don Parris
Date:
Hi all,

I created a category table like so:
cat_id(serial)  |  cat_name(varchar)  |  parent_id(int)  |  lineage(varchar)  |  deep(int)
1                         root_cat_a                 Null                   1                          1
2                         sub_cat_1                 1                       1-2                       2
3                         sub_sub_cat_a          2                      1-2-3                     3

I use this to categorize transactions, and use the most appropriate subcategory for any given transation item in a table called trans_details.  I can easily show transaction amounts by sub-category  (SELECT cat_name, sum(amount) FROM category, trans_details WHERE category_cat_id = trans_details.cat_id):

cat_name                             |    amount
Transportation: Auto: Fuel     |      $100
Transportation: Auto: Maint    |      $150
Transportation: Fares: Bus    |      $40

but what I cannot figure out is how to create a summary where I show
cat_name                     |    amount
Transportation: Auto:     |      $250

or, what I *really* want:
cat_name            |    amount
Transportation     |      $290


Can anyone help me work through this?  Frankly, I'm not even sure where to begin to solve the problem.  I have been trying the WITH RECURSIVE feature, but I do not understand how to apply it in my case.  The example query I have brings up an empty result set and I don't have a clue how I could modify it to make it work.  I'm not even sure it's the best method, but it seems like a reasonable approach.

I found the below e-mail from a thread on this kind of topic (apparently in 2007, I only copied the relevant text).  The author created a very similar table to mine, but talks more about how to select the child categories, not the root.  I just don't yet have the experience to understand this well yet.  :-)

[begin copied e-mail text]
If the data is static (or if you can get away with running a cron job
every now and then), you can write a recursive pl/pgslq function to
get level information for each node on the tree and assign a specific
"incremental" node_id for each record. Due to the nature of the
recursive function, a node_id is assigned to the children of a
specific node instead of its siblings. You should end up with data as
illustrated below.

id      info    parent_id level node_id
1       Name1   Null    1       1
2       Name2   1       2       2
3       Name3   2       3       3
4      Name4    3       4       4
5      Name5    4       5       5
6      Name5    1       2       6
7      Name6    6       3       7
8      Name7    1       2       8

Then you can simply retrieve the children of node (N) on level (L)
with a single statement.

SELECT * FROM table WHERE node_id > N AND node_id < (SELECT node_id
FROM table WHERE level = L AND node_id > N ORDER BY node_id LIMIT 1
OFFSET 0);

Refrain from using MIN() as performance suffers unbelievably.
[end copied e-mail text]

Thanks in advance!
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Stuck Up In My Category Tree

From
Chetan Suttraway
Date:


On Sat, Aug 6, 2011 at 12:10 AM, Don Parris <parrisdc@gmail.com> wrote:
Hi all,

I created a category table like so:
cat_id(serial)  |  cat_name(varchar)  |  parent_id(int)  |  lineage(varchar)  |  deep(int)
1                         root_cat_a                 Null                   1                          1
2                         sub_cat_1                 1                       1-2                       2
3                         sub_sub_cat_a          2                      1-2-3                     3

I use this to categorize transactions, and use the most appropriate subcategory for any given transation item in a table called trans_details.  I can easily show transaction amounts by sub-category  (SELECT cat_name, sum(amount) FROM category, trans_details WHERE category_cat_id = trans_details.cat_id):

cat_name                             |    amount
Transportation: Auto: Fuel     |      $100
Transportation: Auto: Maint    |      $150
Transportation: Fares: Bus    |      $40

but what I cannot figure out is how to create a summary where I show
cat_name                     |    amount
Transportation: Auto:     |      $250

or, what I *really* want:
cat_name            |    amount
Transportation     |      $290


Can anyone help me work through this?  Frankly, I'm not even sure where to begin to solve the problem.  I have been trying the WITH RECURSIVE feature, but I do not understand how to apply it in my case.  The example query I have brings up an empty result set and I don't have a clue how I could modify it to make it work.  I'm not even sure it's the best method, but it seems like a reasonable approach.

I found the below e-mail from a thread on this kind of topic (apparently in 2007, I only copied the relevant text).  The author created a very similar table to mine, but talks more about how to select the child categories, not the root.  I just don't yet have the experience to understand this well yet.  :-)

[begin copied e-mail text]
If the data is static (or if you can get away with running a cron job
every now and then), you can write a recursive pl/pgslq function to
get level information for each node on the tree and assign a specific
"incremental" node_id for each record. Due to the nature of the
recursive function, a node_id is assigned to the children of a
specific node instead of its siblings. You should end up with data as
illustrated below.

id      info    parent_id level node_id
1       Name1   Null    1       1
2       Name2   1       2       2
3       Name3   2       3       3
4      Name4    3       4       4
5      Name5    4       5       5
6      Name5    1       2       6
7      Name6    6       3       7
8      Name7    1       2       8

Then you can simply retrieve the children of node (N) on level (L)
with a single statement.

SELECT * FROM table WHERE node_id > N AND node_id < (SELECT node_id
FROM table WHERE level = L AND node_id > N ORDER BY node_id LIMIT 1
OFFSET 0);

Refrain from using MIN() as performance suffers unbelievably.
[end copied e-mail text]

Thanks in advance!
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE


Could you please post a self contained test case here?

Regards,
Chetan

--
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

EnterpriseDB Blog : http://blogs.enterprisedb.com




Re: Stuck Up In My Category Tree

From
Don Parris
Date:
On Wed, Aug 17, 2011 at 05:24, Chetan Suttraway <chetan.suttraway@enterprisedb.com> wrote:

Could you please post a self contained test case here?

Regards,
Chetan


Hi Chetan,

Thanks.  I will post something first chance I get - could be tomorrow morning though.

I've not made much progress to date, so still need the help.  :-(

Regards,
Don
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Stuck Up In My Category Tree

From
Don Parris
Date:
On Wed, Aug 17, 2011 at 16:20, Don Parris <parrisdc@gmail.com> wrote:
On Wed, Aug 17, 2011 at 05:24, Chetan Suttraway <chetan.suttraway@enterprisedb.com> wrote:

Could you please post a self contained test case here?

Regards,
Chetan


Hi Chetan,

Thanks.  I will post something first chance I get - could be tomorrow morning though.

I've not made much progress to date, so still need the help.  :-(


Ok, realized I have a subset of test data with me...  The following creates the 2 tables required for my desired query.

CREATE TABLE category(
category_id serial NOT NULL,
category_name varchar(60),
parent_id int,
lineage varchar(12),
deep int
);

CREATE TABLE transdetails(
transdetails_id serial NOT NULL,
transaction_id int, -- refers to transaction table which is not at all part of the query result I want
category_id int,
transdetails_item varchar(50),
transdetails_amount numeric(7, 2)
);

The data sets below include enough data to give one a good idea of the working data.  Transactions are categorized at the lowest possible level.  Some are categorized at the root level, while others are categorized at levels 2 or 3.

Thus summing transactions by category is actually a very simple task, simply matching: 
SELECT categoy_name, sum(transdetails_amount) 
FROM category, transdetails
WHERE category.category_id = transdetails

But try showing these details to someone who hates numbers.  I want to summarize, so that I only see the top-level category, regardless of how deeply a given transaction is categorized... something along the lines of:
SHOW ME only the top categories and the sums of all their subcategories.


[The subset of TABLE category data]
"category_id","category_name","parent_id","lineage","deep"
"category_id","category_name","parent_id","lineage","deep"
1,"Income",,"1",1
2,"Tax",,"2",1
3,"Groceries",,"3",1
4,"Entertainment",,"4",1
5,"Clothing",,"5",1
6,"Gross Salary",1,"1-6",2
7,"Federal Tax",2,"2-7",2
8,"Social Security",2,"2-8",2
9,"Medicare",2,"2-9",2
10,"State",2,"2-10",2
11,"Sales Tax",2,"2-11",2
12,"General Groceries",3,"3-12",2
13,"Food",3,"3-13",2
14,"Grains & Cereals",13,"3-13-14",3
15,"Fruits & Veggies",13,"3-13-15",3
16,"Fish & Meats",13,"3-13-16",3
17,"Snacks & Beverages",13,"3-13-17",
18,"Dairy",13,"3-13-18",3
19,"Beverages (alcohol)",3,"3-19",2
20,"Books",4,"4-20",2
21,"Music",4,"4-21",2
22,"Dining",4,"4-22",2
23,"Discounts",,"23",1
24,"Credit Card",,"24",1
25,"Insurance",,"25-26",1
26,"Car",25,"25-26",2
27,"Medical",25,"25-27",2

[The subset of TABLE transdetails data]
"transdetails_id","transaction_id","category_id","transdetails_item","transdetails_amount"
1,1,22,"Lunch",-6.92
2,2,22,"Dinner",-45.43
3,3,12,"Groceries",-24.04
4,4,12,"Groceries",-26.02
5,5,5,"Baby Clothes",-14.99
6,5,5,"Baby Clothes",-14.99
7,5,11,"Sales Tax (7.25%)",-2.17
8,6,16,"Chicken Drumsticks",-8.56
9,6,23,"SuperMarket Discount",3.1
10,6,11,"Sales Tax (2.00%)",-0.11
11,7,17,"Vanilla Wafers",-4.19
12,7,72,"Brown Sugar",-1.77
13,7,23,"SuperMarket Discount",0.3
14,7,15,"Pineapple",-1.49
15,7,15,"Yellow Bananas",-2.26
16,7,72,"Whipped Cream",-1.69
17,7,18,"2% Milk",-2.69
18,7,11,"Sales Tax (2.00%)",-0.28
19,8,20,"Grep Pocket Reference",-8.65
20,9,24,"Credit Card Payment",-50.00
21,10,6,"Gross Salary",1200.00
22,10,7,"Federal Income Tax",75.00
23,10,8,"Social Security",15.00
24,10,9,"Medicare",10.00
25,10,10,"State Income Tax",30.00
26,10,27,"Insurance",70.00
[End of DATA]

My actual category table has @ 135 rows and could grow a bit more, but is mostly stable.  I don't really foresee a need to go deeper, but that is a possibility.  I do seem to encounter the need to add a new category every so often.  I can also see where I might need to manage this issue in other areas.

Most of the tutorials I have seen show techniques for creating web crumbs for forums, etc.  Their focus is on showing a root, plus all its branch categories.  Maybe I've overlooked something that explains how to achieve this:
SHOW ME all top_categories AND count(threads_in_all_subcategories), which would be similar.  Thus a person could see how many total threads were in a given forum, with its sub-forums and sub-sub-forums.  And maybe there are threads at each level.

I have a (fairly dated) book on database design that effectively doesn't even address the issue at all, and "Beginning Databases with PostgreSQL" (Matthew/Stones) only barely skims the issue.

Regards,
Don
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Stuck Up In My Category Tree

From
Don Parris
Date:
On Wed, Aug 17, 2011 at 19:31, Don Parris <parrisdc@gmail.com> wrote:
On Wed, Aug 17, 2011 at 16:20, Don Parris <parrisdc@gmail.com> wrote:
On Wed, Aug 17, 2011 at 05:24, Chetan Suttraway <chetan.suttraway@enterprisedb.com> wrote:

Could you please post a self contained test case here?

Regards,
Chetan


Hi Chetan,

Thanks.  I will post something first chance I get - could be tomorrow morning though.

I've not made much progress to date, so still need the help.  :-(
 
I now have a work-around.  I have a view called amntby_cat that contains the sums by category, which I pull into a LibreOffice Calc sheet.  I then use a second sheet with a list of the top-level categories, and a series of formulas to sum all the sub-categories for each top-level category,

I am now curious as to how business accounting databases approach transaction categories?  If I am selling widgets to all 7 continents (as i learned them), I might have sales as follows:

Antartica - expeditioners love our widgets!
Europe
  Germany
    Berlin, Mainz
  England
    London
    Brighton
North America
  USA
    New York
    Atlanta
    LA
  Mexico
    Mexico City
etc... ad nauseum.

But I may only want to see sales by continent.

Or more closely related:
Sales
  Advertising
  Communications
Operations
  parts
  labor
  Equipment

But I May only want to see the figures summed up at the top-level. 

How is this issue approached in other cases?  Have I taken a bad approach to my category structure?  It appears to be a common approach to categories in general, but is there a solution that better fits my case?

Thanks,
Don
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE

Re: Stuck Up In My Category Tree

From
Brent Dombrowski
Date:
On Aug 23, 2011, at 11:17 AM, Don Parris wrote:

> On Wed, Aug 17, 2011 at 19:31, Don Parris <parrisdc@gmail.com> wrote:
>> I now have a work-around.  I have a view called amntby_cat that contains
> the sums by category, which I pull into a LibreOffice Calc sheet.  I then
> use a second sheet with a list of the top-level categories, and a series of
> formulas to sum all the sub-categories for each top-level category,
>
> I am now curious as to how business accounting databases approach
> transaction categories?  If I am selling widgets to all 7 continents (as i
> learned them), I might have sales as follows:

An approach I've seen before is to encode things into a department id. All North American departments start with 1,
etc.The rollups are then done by truncating the id down to the granularity required. At least this was the approach my
previousemployer took. They were more interested in particular customers, so the encoding was based on that. This was
ona relatively small scale basis (~500 employees and few enough customers to count by hand). I'm not sure how well that
wouldscale. It definitely takes some planning and forethought. 

Brent D.