Thread: SQL (Venn diagram type of logic)
Hello everyone. I have a customer purchase table that keeps track of transaction details. I'm looking sum up total spent by each customer and classify each customer based on which products they bought. Data looks like... account_id, date, product_id, sale_price ----------------------------------------- cust1, 03/21/2005, prod1, 50 cust1, 03/22/2005, prod4, 35 cust1, 05/08/2005, prod2, 50 cust2, 04/21/2005, prod16, 20 cust3, 04/16/2005, prod1, 50 etc...... Setup: I'm picturing a Venn Diagram in my head but I need help with the decision logic to classify each customer. Example: Category A: Prod1, Prod2 Category B: Prod3, Prod4 Category C: All products Not in Class A or Class B -A customer who has bought Prod1, Prod2 would be in the A only category. -A customer who has bought Prod1, Prod3 would be in the Class A&B category -A customer who has bought Prod18 would be in the C category -A customer who has bought Prod4, Prod16 would be in the B&C category -A customer who has bought Prod1, Prod4, Prod15 would be in the A&B&C category -etc... Then for each comination of categories (7 in total?), i will need of number of accounts in that category and total spent by those accounts. Any help or direction would be greatly appreciated. Thank you in advance. -Vince __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
You can start by creating 3 views for your 3 categories:
CREATE OR REPLACE VIEW cat_a (
account_id,
sales_cat_a
) AS
SELECT
account_id,
sum(sale_price) as sales_cat_a
FROM
sales -- Assuming sales is the table with the data
WHERE
product_id in ('prod1', 'prod2')
GROUP BY
account_id
Then do the same with a cat_b and cat_c view.
This will give you subtotals for each category.
As for your Venn Diagram, these views may be helpful or may not. To determine what to do, please provide a little more information:
1. What do you want your final result to look like - what columns?
2. Will each row returned represent one account, one of the 7 sections of the diagram or a combination of these two?
3. When you say total amount do you total amount spent in that section of the diagram or the total amount spent by that person?
CREATE OR REPLACE VIEW cat_a (
account_id,
sales_cat_a
) AS
SELECT
account_id,
sum(sale_price) as sales_cat_a
FROM
sales -- Assuming sales is the table with the data
WHERE
product_id in ('prod1', 'prod2')
GROUP BY
account_id
Then do the same with a cat_b and cat_c view.
This will give you subtotals for each category.
As for your Venn Diagram, these views may be helpful or may not. To determine what to do, please provide a little more information:
1. What do you want your final result to look like - what columns?
2. Will each row returned represent one account, one of the 7 sections of the diagram or a combination of these two?
3. When you say total amount do you total amount spent in that section of the diagram or the total amount spent by that person?
On 6/29/06, Vinnie Ma <nyubambam@yahoo.com> wrote:
Hello everyone.
I have a customer purchase table that keeps track of
transaction details. I'm looking sum up total spent
by each customer and classify each customer based on
which products they bought.
Data looks like...
account_id, date, product_id, sale_price
-----------------------------------------
cust1, 03/21/2005, prod1, 50
cust1, 03/22/2005, prod4, 35
cust1, 05/08/2005, prod2, 50
cust2, 04/21/2005, prod16, 20
cust3, 04/16/2005, prod1, 50
etc......
Setup:
I'm picturing a Venn Diagram in my head but I need
help with the decision logic to classify each
customer.
Example:
Category A: Prod1, Prod2
Category B: Prod3, Prod4
Category C: All products Not in Class A or Class B
-A customer who has bought Prod1, Prod2 would be in
the A only category.
-A customer who has bought Prod1, Prod3 would be in
the Class A&B category
-A customer who has bought Prod18 would be in the C
category
-A customer who has bought Prod4, Prod16 would be in
the B&C category
-A customer who has bought Prod1, Prod4, Prod15 would
be in the A&B&C category
-etc...
Then for each comination of categories (7 in total?),
i will need of number of accounts in that category and
total spent by those accounts.
Any help or direction would be greatly appreciated.
Thank you in advance.
-Vince
On 7/11/06, Vinnie Ma <nyubambam@yahoo.com> wrote:
Just doing a reply to all should send it to the mailing list but the from address must match the email address you registered on the list.
What I have done for situations like this is to first create a report schema. Then I have a daily process run that updates reporting tables every night. These reporting tables keep full history and we only update the latest information.
The main reason for doing it this way is to:
Hope this helps!
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
==================================================================
Hello Aaron,
Thank you for the help. My apologies for the direct
email but i couldn't figure out how to reply to the
thread on the postresql site.
Just doing a reply to all should send it to the mailing list but the from address must match the email address you registered on the list.
Once i get the purchases seperated by categories, I
plan to track the month on month status of each
account.
For example, in may 2006, based on her previous
purchaes, customer1 was in category A but in june
2006, she bought something in category B. This would
make her in bucket 2.
bucket 1: A
bucket 2: A&B
bucket 3: A&C
bucket 4: B
bucket 5: B&C
bucket 6: C
bucket 7: A&B&C
(these are the combinations i was thinking about when
i had the venn diagram in my head.)
To follow with the example above:
In May, she would be counted as part of bucket 1 and
her total spending (life to may) will be part of
running total for bucket 1
In June, she would be counted as part of bucket 2 and
her total spending (life to june) will be part of
running total for bucket 2
In the end, we will have count(customers),
sum(all_purchases) for each of the 7 buckets for each
month. From a business perpective, we hope this
exercise will show the general migration patterns of
customers from one bucket to another over time.
I figured out a way to do it but it is no where near
efficient.
i added three boolean fields to the account table to
indicate catA, catB, and catC. i update the table 3
times for each of the categories, turning on the
boolean fields where applicable.
then i select the count, and sum fields from an inner
join on account and purchase tables for each of the 7
buckets. it works, but i would manually do it for
each month.
any thoughts would be most appreciated. thanks and
please have a great day.
What I have done for situations like this is to first create a report schema. Then I have a daily process run that updates reporting tables every night. These reporting tables keep full history and we only update the latest information.
The main reason for doing it this way is to:
- Remove the need to worry about performance of the query to build the report - it is done once a day during low activity on the server
- Segment permissions so people have rights to run reports but not dig through the main database (the tech savy analysts tend to write bad queries and run them on production)
- Keep history without having to run the report on everything (my refreshes update this month and last month - this provides enough overlap that I don't have to worry about changing months, leap year, etc.)
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
==================================================================