From: "David Johnston" <polobo@yahoo.com> To: pgsql-sql@postgresql.org Sent: Thursday, February 27, 2014 4:09:34 PM Subject: Re: [SQL] how to effectively SELECT new "customers"
Jan Ostrochovsky wrote > Hello, I am solving following task and it seems hard to me to find > effective solution. Maybe somebody knows how to help me: We have table > "purchases" and each one record is identified by "customer_id". We want to > know not only how many different customers did at least one purchase per > time period, grouped by time periods (easy task: "COUNT(DISTINCT > customer_id)" with "GROUP BY period"), but also to know how many NEW > customers there were. We define new customer as customer_id, which had > first record in table "purchases" after 12 month of inactivity (no record > in table "purchases" previous 12 months). I have found one solution, but > it is very slow and ugly. I tried several other concepts, but without > success. Any hint could be helpful for me. Thanks in advance! Jano
Without incorporating additional meta-data about the purchases onto the customer table the most basic solution would be:
SELECT DISTINCT customer_id FROM products WHERE date > (now() - '12 months'::interval) EXCEPT SELECT DISTINCT customer_id FROM products WHERE date <= (now() - '12 months'::interval)
---
Another solution: WHERE ... >12 AND NOT EXISTS (SELECT ... WHERE <= 12)
---
Depending on the frequency that you need to run this query it may be worthwhile to create a materialized view that captures the necessary data and then whenever a new sale is generated you simply update that view by changing the attributes of that single customer. At any point you can quickly determine, using the view, which customers were active at the time of last purchases and which ones were dormant or non-existent.