Thread: Stuck Up In My Category Tree
Hi all,
I created a category table like so:
cat_id(serial) | cat_name(varchar) | parent_id(int) | lineage(varchar) | deep(int)
1 root_cat_a Null 1 1
2 sub_cat_1 1 1-2 2
3 sub_sub_cat_a 2 1-2-3 3
I use this to categorize transactions, and use the most appropriate subcategory for any given transation item in a table called trans_details. I can easily show transaction amounts by sub-category (SELECT cat_name, sum(amount) FROM category, trans_details WHERE category_cat_id = trans_details.cat_id):
cat_name | amount
Transportation: Auto: Fuel | $100
Transportation: Auto: Maint | $150
Transportation: Fares: Bus | $40
but what I cannot figure out is how to create a summary where I show
cat_name | amount
Transportation: Auto: | $250
or, what I *really* want:
cat_name | amount
Transportation | $290
Can anyone help me work through this? Frankly, I'm not even sure where to begin to solve the problem. I have been trying the WITH RECURSIVE feature, but I do not understand how to apply it in my case. The example query I have brings up an empty result set and I don't have a clue how I could modify it to make it work. I'm not even sure it's the best method, but it seems like a reasonable approach.
I found the below e-mail from a thread on this kind of topic (apparently in 2007, I only copied the relevant text). The author created a very similar table to mine, but talks more about how to select the child categories, not the root. I just don't yet have the experience to understand this well yet. :-)
[begin copied e-mail text]
If the data is static (or if you can get away with running a cron job
every now and then), you can write a recursive pl/pgslq function to
get level information for each node on the tree and assign a specific
"incremental" node_id for each record. Due to the nature of the
recursive function, a node_id is assigned to the children of a
specific node instead of its siblings. You should end up with data as
illustrated below.
id info parent_id level node_id
1 Name1 Null 1 1
2 Name2 1 2 2
3 Name3 2 3 3
4 Name4 3 4 4
5 Name5 4 5 5
6 Name5 1 2 6
7 Name6 6 3 7
8 Name7 1 2 8
Then you can simply retrieve the children of node (N) on level (L)
with a single statement.
SELECT * FROM table WHERE node_id > N AND node_id < (SELECT node_id
FROM table WHERE level = L AND node_id > N ORDER BY node_id LIMIT 1
OFFSET 0);
Refrain from using MIN() as performance suffers unbelievably.
[end copied e-mail text]
Thanks in advance!
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
I created a category table like so:
cat_id(serial) | cat_name(varchar) | parent_id(int) | lineage(varchar) | deep(int)
1 root_cat_a Null 1 1
2 sub_cat_1 1 1-2 2
3 sub_sub_cat_a 2 1-2-3 3
I use this to categorize transactions, and use the most appropriate subcategory for any given transation item in a table called trans_details. I can easily show transaction amounts by sub-category (SELECT cat_name, sum(amount) FROM category, trans_details WHERE category_cat_id = trans_details.cat_id):
cat_name | amount
Transportation: Auto: Fuel | $100
Transportation: Auto: Maint | $150
Transportation: Fares: Bus | $40
but what I cannot figure out is how to create a summary where I show
cat_name | amount
Transportation: Auto: | $250
or, what I *really* want:
cat_name | amount
Transportation | $290
Can anyone help me work through this? Frankly, I'm not even sure where to begin to solve the problem. I have been trying the WITH RECURSIVE feature, but I do not understand how to apply it in my case. The example query I have brings up an empty result set and I don't have a clue how I could modify it to make it work. I'm not even sure it's the best method, but it seems like a reasonable approach.
I found the below e-mail from a thread on this kind of topic (apparently in 2007, I only copied the relevant text). The author created a very similar table to mine, but talks more about how to select the child categories, not the root. I just don't yet have the experience to understand this well yet. :-)
[begin copied e-mail text]
If the data is static (or if you can get away with running a cron job
every now and then), you can write a recursive pl/pgslq function to
get level information for each node on the tree and assign a specific
"incremental" node_id for each record. Due to the nature of the
recursive function, a node_id is assigned to the children of a
specific node instead of its siblings. You should end up with data as
illustrated below.
id info parent_id level node_id
1 Name1 Null 1 1
2 Name2 1 2 2
3 Name3 2 3 3
4 Name4 3 4 4
5 Name5 4 5 5
6 Name5 1 2 6
7 Name6 6 3 7
8 Name7 1 2 8
Then you can simply retrieve the children of node (N) on level (L)
with a single statement.
SELECT * FROM table WHERE node_id > N AND node_id < (SELECT node_id
FROM table WHERE level = L AND node_id > N ORDER BY node_id LIMIT 1
OFFSET 0);
Refrain from using MIN() as performance suffers unbelievably.
[end copied e-mail text]
Thanks in advance!
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Sat, Aug 6, 2011 at 12:10 AM, Don Parris <parrisdc@gmail.com> wrote:
Hi all,
I created a category table like so:
cat_id(serial) | cat_name(varchar) | parent_id(int) | lineage(varchar) | deep(int)
1 root_cat_a Null 1 1
2 sub_cat_1 1 1-2 2
3 sub_sub_cat_a 2 1-2-3 3
I use this to categorize transactions, and use the most appropriate subcategory for any given transation item in a table called trans_details. I can easily show transaction amounts by sub-category (SELECT cat_name, sum(amount) FROM category, trans_details WHERE category_cat_id = trans_details.cat_id):
cat_name | amount
Transportation: Auto: Fuel | $100
Transportation: Auto: Maint | $150
Transportation: Fares: Bus | $40
but what I cannot figure out is how to create a summary where I show
cat_name | amount
Transportation: Auto: | $250
or, what I *really* want:
cat_name | amount
Transportation | $290
Can anyone help me work through this? Frankly, I'm not even sure where to begin to solve the problem. I have been trying the WITH RECURSIVE feature, but I do not understand how to apply it in my case. The example query I have brings up an empty result set and I don't have a clue how I could modify it to make it work. I'm not even sure it's the best method, but it seems like a reasonable approach.
I found the below e-mail from a thread on this kind of topic (apparently in 2007, I only copied the relevant text). The author created a very similar table to mine, but talks more about how to select the child categories, not the root. I just don't yet have the experience to understand this well yet. :-)
[begin copied e-mail text]
If the data is static (or if you can get away with running a cron job
every now and then), you can write a recursive pl/pgslq function to
get level information for each node on the tree and assign a specific
"incremental" node_id for each record. Due to the nature of the
recursive function, a node_id is assigned to the children of a
specific node instead of its siblings. You should end up with data as
illustrated below.
id info parent_id level node_id
1 Name1 Null 1 1
2 Name2 1 2 2
3 Name3 2 3 3
4 Name4 3 4 4
5 Name5 4 5 5
6 Name5 1 2 6
7 Name6 6 3 7
8 Name7 1 2 8
Then you can simply retrieve the children of node (N) on level (L)
with a single statement.
SELECT * FROM table WHERE node_id > N AND node_id < (SELECT node_id
FROM table WHERE level = L AND node_id > N ORDER BY node_id LIMIT 1
OFFSET 0);
Refrain from using MIN() as performance suffers unbelievably.
[end copied e-mail text]
Thanks in advance!
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software AdvocateGPG Key ID: F5E179BE
Could you please post a self contained test case here?
Regards,
Chetan
--
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
The Enterprise PostgreSQL Company
Website: www.enterprisedb.com
EnterpriseDB Blog : http://blogs.enterprisedb.com
Follow us on Twitter : http://www.twitter.com/enterprisedb
On Wed, Aug 17, 2011 at 05:24, Chetan Suttraway <chetan.suttraway@enterprisedb.com> wrote:
Hi Chetan,Could you please post a self contained test case here?
Regards,
Chetan
Thanks. I will post something first chance I get - could be tomorrow morning though.
I've not made much progress to date, so still need the help. :-(
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
Regards,
Don
-- D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Wed, Aug 17, 2011 at 16:20, Don Parris <parrisdc@gmail.com> wrote:
Hi Chetan,On Wed, Aug 17, 2011 at 05:24, Chetan Suttraway <chetan.suttraway@enterprisedb.com> wrote:Could you please post a self contained test case here?
Regards,
ChetanThanks. I will post something first chance I get - could be tomorrow morning though.I've not made much progress to date, so still need the help. :-(
Ok, realized I have a subset of test data with me... The following creates the 2 tables required for my desired query.
CREATE TABLE category(
category_id serial NOT NULL,
category_name varchar(60),
parent_id int,
lineage varchar(12),
deep int
);
CREATE TABLE transdetails(
transdetails_id serial NOT NULL,
transaction_id int, -- refers to transaction table which is not at all part of the query result I want
category_id int,
transdetails_item varchar(50),
transdetails_amount numeric(7, 2)
);
The data sets below include enough data to give one a good idea of the working data. Transactions are categorized at the lowest possible level. Some are categorized at the root level, while others are categorized at levels 2 or 3.
Thus summing transactions by category is actually a very simple task, simply matching:
SELECT categoy_name, sum(transdetails_amount)
FROM category, transdetails
WHERE category.category_id = transdetails
But try showing these details to someone who hates numbers. I want to summarize, so that I only see the top-level category, regardless of how deeply a given transaction is categorized... something along the lines of:
SHOW ME only the top categories and the sums of all their subcategories.
[The subset of TABLE category data]
"category_id","category_name","parent_id","lineage","deep"
"category_id","category_name","parent_id","lineage","deep"
1,"Income",,"1",1
2,"Tax",,"2",1
3,"Groceries",,"3",1
4,"Entertainment",,"4",1
5,"Clothing",,"5",1
6,"Gross Salary",1,"1-6",2
7,"Federal Tax",2,"2-7",2
8,"Social Security",2,"2-8",2
9,"Medicare",2,"2-9",2
10,"State",2,"2-10",2
11,"Sales Tax",2,"2-11",2
12,"General Groceries",3,"3-12",2
13,"Food",3,"3-13",2
14,"Grains & Cereals",13,"3-13-14",3
15,"Fruits & Veggies",13,"3-13-15",3
16,"Fish & Meats",13,"3-13-16",3
17,"Snacks & Beverages",13,"3-13-17",
18,"Dairy",13,"3-13-18",3
19,"Beverages (alcohol)",3,"3-19",2
20,"Books",4,"4-20",2
21,"Music",4,"4-21",2
22,"Dining",4,"4-22",2
23,"Discounts",,"23",1
24,"Credit Card",,"24",1
25,"Insurance",,"25-26",1
26,"Car",25,"25-26",2
27,"Medical",25,"25-27",2
[The subset of TABLE transdetails data]
"transdetails_id","transaction_id","category_id","transdetails_item","transdetails_amount"
1,1,22,"Lunch",-6.92
2,2,22,"Dinner",-45.43
3,3,12,"Groceries",-24.04
4,4,12,"Groceries",-26.02
5,5,5,"Baby Clothes",-14.99
6,5,5,"Baby Clothes",-14.99
7,5,11,"Sales Tax (7.25%)",-2.17
8,6,16,"Chicken Drumsticks",-8.56
9,6,23,"SuperMarket Discount",3.1
10,6,11,"Sales Tax (2.00%)",-0.11
11,7,17,"Vanilla Wafers",-4.19
12,7,72,"Brown Sugar",-1.77
13,7,23,"SuperMarket Discount",0.3
14,7,15,"Pineapple",-1.49
15,7,15,"Yellow Bananas",-2.26
16,7,72,"Whipped Cream",-1.69
17,7,18,"2% Milk",-2.69
18,7,11,"Sales Tax (2.00%)",-0.28
19,8,20,"Grep Pocket Reference",-8.65
20,9,24,"Credit Card Payment",-50.00
21,10,6,"Gross Salary",1200.00
22,10,7,"Federal Income Tax",75.00
23,10,8,"Social Security",15.00
24,10,9,"Medicare",10.00
25,10,10,"State Income Tax",30.00
26,10,27,"Insurance",70.00
[End of DATA]
My actual category table has @ 135 rows and could grow a bit more, but is mostly stable. I don't really foresee a need to go deeper, but that is a possibility. I do seem to encounter the need to add a new category every so often. I can also see where I might need to manage this issue in other areas.
Most of the tutorials I have seen show techniques for creating web crumbs for forums, etc. Their focus is on showing a root, plus all its branch categories. Maybe I've overlooked something that explains how to achieve this:
SHOW ME all top_categories AND count(threads_in_all_subcategories), which would be similar. Thus a person could see how many total threads were in a given forum, with its sub-forums and sub-sub-forums. And maybe there are threads at each level.
I have a (fairly dated) book on database design that effectively doesn't even address the issue at all, and "Beginning Databases with PostgreSQL" (Matthew/Stones) only barely skims the issue.
Regards,
Don
-- D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Wed, Aug 17, 2011 at 19:31, Don Parris <parrisdc@gmail.com> wrote:
I now have a work-around. I have a view called amntby_cat that contains the sums by category, which I pull into a LibreOffice Calc sheet. I then use a second sheet with a list of the top-level categories, and a series of formulas to sum all the sub-categories for each top-level category,On Wed, Aug 17, 2011 at 16:20, Don Parris <parrisdc@gmail.com> wrote:Hi Chetan,On Wed, Aug 17, 2011 at 05:24, Chetan Suttraway <chetan.suttraway@enterprisedb.com> wrote:Could you please post a self contained test case here?
Regards,
ChetanThanks. I will post something first chance I get - could be tomorrow morning though.I've not made much progress to date, so still need the help. :-(
I am now curious as to how business accounting databases approach transaction categories? If I am selling widgets to all 7 continents (as i learned them), I might have sales as follows:
Antartica - expeditioners love our widgets!
Europe
Germany
Berlin, Mainz
England
London
Brighton
North America
USA
New York
Atlanta
LA
Mexico
Mexico City
etc... ad nauseum.
But I may only want to see sales by continent.
Or more closely related:
Sales
Advertising
Communications
Operations
parts
labor
Equipment
But I May only want to see the figures summed up at the top-level.
How is this issue approached in other cases? Have I taken a bad approach to my category structure? It appears to be a common approach to categories in general, but is there a solution that better fits my case?
Thanks,
Don
--
D.C. Parris, FMP, LEED AP O+M, ESL Certificate
Minister, Security/FM Coordinator, Free Software Advocate
GPG Key ID: F5E179BE
On Aug 23, 2011, at 11:17 AM, Don Parris wrote: > On Wed, Aug 17, 2011 at 19:31, Don Parris <parrisdc@gmail.com> wrote: >> I now have a work-around. I have a view called amntby_cat that contains > the sums by category, which I pull into a LibreOffice Calc sheet. I then > use a second sheet with a list of the top-level categories, and a series of > formulas to sum all the sub-categories for each top-level category, > > I am now curious as to how business accounting databases approach > transaction categories? If I am selling widgets to all 7 continents (as i > learned them), I might have sales as follows: An approach I've seen before is to encode things into a department id. All North American departments start with 1, etc.The rollups are then done by truncating the id down to the granularity required. At least this was the approach my previousemployer took. They were more interested in particular customers, so the encoding was based on that. This was ona relatively small scale basis (~500 employees and few enough customers to count by hand). I'm not sure how well that wouldscale. It definitely takes some planning and forethought. Brent D.