Thread: Design / Implementation problem
This is possibly not a DB only problem, the solution may involve application logic as well. But PG users are the smartest bunch I know. Ass kissing aside, here are the details: *** The Scenario *** We are running a customer loyalty program whereby customers earn points for purchasing products. Each product has a value of points that are earned by purchasing it, and a value of points required to redeem it. In order to prevent customers from stockpiling points, we want to place an expiry date on points so that unused points expire and are lost if they are not redeemed within a certain period of time. This will be calculated on a FIFO basis, I.e., the oldest points will expire first. We will assume the expiry period is 12 months. *** The Problem *** Ascertaining which points to expire is fairly conceptually simple. At any given point in time, the points expired is simply the balance on hand at the start of the period, less redemptions in that period. If the redemptions is less than the balance at open, not all points that were available on that date were used, and the difference is the expiry. This can be done periodically, say, at the start of every month. However there are a few problems with doing it periodically 1. The runs are likely to be too large to be manageable. A DB with tens of thousands of customers and many hundreds of thousands or even millions of sales in the records tables will require several queries and some application calculation to compute. If it takes 2 seconds to compute each balance of a 20,000 strong customer base, that's over 11 hours of heavy lifting in the DB, which will likely result in severely degraded performance during those hours. This problem can only get worse as time goes on, and hardware upgrade requirements just to accommodate a 12 hour window once a month is the sign of an app not designed to scale well. 2. Calculating the balance on the fly would be more effective, as it is unlikley that many customers will check their balance on a regular basis. It is likely that a small fraction of customers will check their balance in a given month, meaning that calculating it on the fly would both spread the load over time as well as reduce the total load, even if on the fly calculation results in significantly higher per-customer calculation time. 3. The app is a web app, and it would be preferable to contain business logic within the database itself or the current app codebase. Spreading application logic into an external mechanism such as cron or an external daemon would be undesirable unless there was no other way. *** A Possible Solution *** Calculating the balance on the fly can be done easily if it is done at the time the customer seeks to redeem a voucher. Expired points are only relevant at these times, and so the expired points would be calculated with an application function that did the following: 1. Get the balance as it was 12 months ago by getting total points earned less redemptions and expiries up to that date. 2. Subtract from it redemptions and expiries since then. The value obtained, if it is positive, is the value of points to expire. 3. Log the expiry entry, and then calculate the balance of points to the current date by subtracting total points redeemed and expired from total points earned. This procedure has a few practical problems, however: 1. Customers, when they view their running total, will not be aware that some of the points included in it will have expired, as the expiry will only happen when the application attempts to log a redemption. 2. A customer may attempt to redeem a product they do not have enoughh points for, and be told at the last minute that they do not have enough points, leading to acrimony. The solution is then to calculate it on only on redemptions, but also whenever the customer attempts to view their balance. This will ensure that expired points will never be shown in the current balance of available points. This, however, means the calculation may be done many times by each customer in a single catalog browsing session. *** The Question *** Is there a way to design the DB schema as well as the query in such a manner that calculating the point balance on the fly is not an unfeasibly heavy duty calculation to be done at every page view? This problem does not appear to be solved comprehensively by anyone. When I log into my credit card company web site to check my points, I get a message "please come back in an hour, and your points will be calculated" if I haven't logged in for over a week. So obviously they calculate the balance and put it in a table that acts as a "cached balance". Emirates has a different solution, they do bi-annual runs, so points expire every March and September for them. Neither of these solutions appeals to me, and there must be A Better Way(tm).
Here it is again with more sensible wrapping: *** The Scenario *** We are running a customer loyalty program whereby customers earn points for purchasing products. Each product has a value of points that are earned by purchasing it, and a value of points required to redeem it. In order to prevent customers from stockpiling points, we want to place an expiry date on points so that unused points expire and are lost if they are not redeemed within a certain period of time. This will be calculated on a FIFO basis, I.e., the oldest points will expire first. We will assume the expiry period is 12 months. *** The Problem *** Ascertaining which points to expire is fairly conceptually simple. At any given point in time, the points expired is simply the balance on hand at the start of the period, less redemptions in that period. If the redemptions is less than the balance at open, not all points that were available on that date were used, and the difference is the expiry. This can be done periodically, say, at the start of every month. However there are a few problems with doing it periodically 1. The runs are likely to be too large to be manageable. A DB with tens of thousands of customers and many hundreds of thousands or even millions of sales in the records tables will require several queries and some application calculation to compute. If it takes 2 seconds to compute each balance of a 20,000 strong customer base, that's over 11 hours of heavy lifting in the DB, which will likely result in severely degraded performance during those hours. This problem can only get worse as time goes on, and hardware upgrade requirements just to accommodate a 12 hour window once a month is the sign of an app not designed to scale well. 2. Calculating the balance on the fly would be more effective, as it is unlikley that many customers will check their balance on a regular basis. It is likely that a small fraction of customers will check their balance in a given month, meaning that calculating it on the fly would both spread the load over time as well as reduce the total load, even if on the fly calculation results in significantly higher per-customer calculation time. 3. The app is a web app, and it would be preferable to contain business logic within the database itself or the current app codebase. Spreading application logic into an external mechanism such as cron or an external daemon would be undesirable unless there was no other way. *** A Possible Solution *** Calculating the balance on the fly can be done easily if it is done at the time the customer seeks to redeem a voucher. Expired points are only relevant at these times, and so the expired points would be calculated with an application function that did the following: 1. Get the balance as it was 12 months ago by getting total points earned less redemptions and expiries up to that date. 2. Subtract from it redemptions and expiries since then. The value obtained, if it is positive, is the value of points to expire. 3. Log the expiry entry, and then calculate the balance of points to the current date by subtracting total points redeemed and expired from total points earned. This procedure has a few practical problems, however: 1. Customers, when they view their running total, will not be aware that some of the points included in it will have expired, as the expiry will only happen when the application attempts to log a redemption. 2. A customer may attempt to redeem a product they do not have enoughh points for, and be told at the last minute that they do not have enough points, leading to acrimony. The solution is then to calculate it on only on redemptions, but also whenever the customer attempts to view their balance. This will ensure that expired points will never be shown in the current balance of available points. This, however, means the calculation may be done many times by each customer in a single catalog browsing session. *** The Question *** Is there a way to design the DB schema as well as the query in such a manner that calculating the point balance on the fly is not an unfeasibly heavy duty calculation to be done at every page view? This problem does not appear to be solved comprehensively by anyone. When I log into my credit card company web site to check my points, I get a message "please come back in an hour, and your points will be calculated" if I haven't logged in for over a week. So obviously they calculate the balance and put it in a table that acts as a "cached balance". Emirates has a different solution, they do bi-annual runs, so points expire every March and September for them. Neither of these solutions appeals, and there must be A Better Way(tm).
Naz Gassiep <naz@mira.net> writes: > that calculating the point > balance on the fly is not an unfeasibly heavy duty calculation to be done at > every page view? One alternative to calculate it everytime is calculating it once a day. If there are calculations for today, then just read the calculated value. Otherwise calculate it. If the user earns more points, make an entry at the raw table (for the expiration process) and increments today points. Do the same for points spent. -- Jorge Godoy <jgodoy@gmail.com>
Naz First, there is nothing you can do about the computational load except make your code as efficient as possible. Get it right and then make it fast. But there is only so much you can do. If a "calculation" requires an integer sum and an integer difference, you inevitably have two integer operations per "calculation". For a given complex calculation, you generally have to resort to a little algebra to put the calculation into a form that requires the least number of atomic operations (think of a mean and variance, for a simple example where a brute force approach will be correct but much more expensive than one based on a little algebraic manipulation. That leaves making your application appear to be more responsive by distributing the computational load differently. This time think of some basic data structures and the work required to obtain sorted output. Considering the standard C++ STL, one could use a vector, with contents in random order, and sort that, or one could use a map and have sorted output at any time without an explicit sort operation before getting it. But think what has happened. Element access, including especially inserts or additions and reading container elements, will be very fast with the vector, faster than with a map. But all the computational load for obtaining sorted output is carried at the moment the request for such output is made. On the other hand, getting sorted output from the map will appear to be blindingly fast relative to getting the same output from the vector. Why? Because getting sorted output requires a certain number of comparisons. This can not be avoided. But with the map, these are done on insert, so your data is actually stored sorted. The price of the sort is paid in both the case of using a vector and in the case of using the map, but the user may not notice that cost in the case of a map where he would in the case of the vector (if the dataset is large). I suspect you are aware of some of this already, but I say it just to make certain of it. I say all the above so that you don't get too distracted by your calculations of the potential costs of supporting your program. You need to have an idea of how expensive it wll be, and how you might distribute the costs, but if you have millions of clients, and each operation requires a hald a dozen basic operations, there is no avoiding that. You reduce as much as possible, but you will always get to a point where it can not be reduced further. At that point, all you can do is throw more iron at the application. Now, an option to consider. Create two tables, one for points accrued and one for points used. In each case, you'll have fields to identify the customer, his transactions, etc., but the important fields will be one for the number of points (acquired or used) on a given day, and the date. It may also be prudent to have a field to represent expiry date since how long points can be stored before they have to be used is a policy decision, one that could change at any time, and ought therefore be a data item stored rather than hardcoded into your code. The points acquired table will need a field to represent the points acquired on that date that remain to be used, so that transactions using points can use the oldest points acquired by the customer before using new ones. You may well want this for auditing purposes in addition to the core functionality you're after. Now add a third which keeps track of the number of points available at a given time. Again, you'll have fields to identify the customer, the number of available points, and maybe the date (it depends on how you want to do it and what else your database needs to support). Now, update the points acquired table and the summary table in each transaction in which the customer can gain points, and update the points used table and summary table in each transaction where the customer can use points. And you can update all three in transactions where a customer can use points for part of the transaction and other resources for the rest. At least until your unit testing is done, I'd suggest an extra table or two that allows you to store information about which points a customer is using for a given transaction, just to make absolutely certain that in fact your customer is using the oldest points that he has, in a given transaction, before the ones he acquired today. Finally, you may want to create archive tables and a scheduled task that moves records dealing with expired points from the table you're always working with to the archive tables (in enough detail that you can reconstruct precisely how and when points were acquired and used by each customer), but you might need extra code to bring them back should a policy make decide that points can have a three year lifespan instead of one. I'd put all the required code to support all of this into a suite of stored procedures. Note, this should allow your customers to get their information almost instantly since the amount of data that would be processed for them would be very small. If they want to know how many points they have, and even how many points they have that will expire in the next day, they're looking at dirt simple queries with at most one substraction. Note, I did not spend the time to refine this to minimize the total computational load or the data storage requirements. That could take days, or even weeks, depending on your attention to detail and how concerned you are about efficiency. I am sure there are some computational efficiencies that can be gained, with additional analysis, perhaps with some tradeoffs regarding the detail of the data stored available for audit purposes, but I'll leave that as an exercise for you. :-) Note, I do not see how, from what you'd written, how your proposed solution would ensure that customers used their oldest points first. Maybe it does, and you didn't describe that aspect well, but that is something you'll have to be careful about if you want to avoid upset customers. Unit testing, followed by integration tests, are your friends! HTH Ted
>-----Original Message----- >From: pgsql-general-owner@postgresql.org >[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Naz Gassiep >Sent: zondag 18 maart 2007 14:45 >To: Naz Gassiep >Cc: pgsql-general@postgresql.org >Subject: Re: [GENERAL] Design / Implementation problem > >Here it is again with more sensible wrapping: > > >*** The Scenario *** > >We are running a customer loyalty program whereby customers >earn points for purchasing products. Each product has a value >of points that are earned by purchasing it, and a value of >points required to redeem it. > >In order to prevent customers from stockpiling points, we want >to place an expiry date on points so that unused points expire >and are lost if they are not redeemed within a certain period >of time. This will be calculated on a FIFO basis, I.e., the >oldest points will expire first. > >We will assume the expiry period is 12 months. > > >*** The Problem *** > >Ascertaining which points to expire is fairly conceptually >simple. At any given point in time, the points expired is >simply the balance on hand at the start of the period, less >redemptions in that period. If the redemptions is less than >the balance at open, not all points that were available on >that date were used, and the difference is the expiry. > >This can be done periodically, say, at the start of every >month. However there are a few problems with doing it periodically > >1. The runs are likely to be too large to be manageable. A DB >with tens of thousands of customers and many hundreds of >thousands or even millions of sales in the records tables will >require several queries and some application calculation to >compute. If it takes 2 seconds to compute each balance of a >20,000 strong customer base, that's over 11 hours of heavy >lifting in the DB, which will likely result in severely >degraded performance during those hours. This problem can only >get worse as time goes on, and hardware upgrade requirements >just to accommodate a 12 hour window once a month is the sign >of an app not designed to scale well. > >2. Calculating the balance on the fly would be more effective, >as it is unlikley that many customers will check their balance >on a regular basis. >It is likely that a small fraction of customers will check >their balance in a given month, meaning that calculating it on >the fly would both spread the load over time as well as reduce >the total load, even if on the fly calculation results in >significantly higher per-customer calculation time. > >3. The app is a web app, and it would be preferable to contain >business logic within the database itself or the current app >codebase. Spreading application logic into an external >mechanism such as cron or an external daemon would be >undesirable unless there was no other way. > > >*** A Possible Solution *** > [snip] > >*** The Question *** > >Is there a way to design the DB schema as well as the query in >such a manner that calculating the point balance on the fly is >not an unfeasibly heavy duty calculation to be done at every page view? *** My Answer *** I could think of a simple solution that might work, at the cost of a little storage space. This gives an advantage in computational overhead. For every time you award points, track two things: * Awarded points... * Points remaining from the awarded ones. Obviously equal to awarded points at insertion time * Date they are awarded (or the expirary date, that doesn't matter). When you are subtracting points just update the the non-expired remaining points, with the oldest first. From the problem I think you can do it on-the-fly without too much overhead. You can plug in your scheme how to account the points: Per-order, add to the order table... Per-period, add a table for the points only... Of course it really depends on how much data you are expecting. Overhead will be 'fixed' for per-period and otherwise scale with orders/customer. [snip] Maybe this helps a bit, - Joris
hmmm. just a general notice: A customer loyalty program, which expires earned points, not to let the customer "win" anything valuable? If I were your client, I wouldn't be happy with this. 2007/3/18, Naz Gassiep <naz@mira.net>: > > We are running a customer loyalty program whereby customers earn points > for purchasing products. Each > product has a value of points that are earned by purchasing it, and a > value of points required to > redeem it. > > In order to prevent customers from stockpiling points, we want to place > an expiry date on points so > that unused points expire and are lost if they are not redeemed within a > certain period of time. This > will be calculated on a FIFO basis, I.e., the oldest points will expire > first. > > We will assume the expiry period is 12 months. > -- Filip Rembiałkowski
"Filip Rembiałkowski" <plk.zuber@gmail.com> writes: > hmmm. just a general notice: > > A customer loyalty program, which expires earned points, not to let > the customer "win" anything valuable? > If I were your client, I wouldn't be happy with this. On the other hand, having the possibility is better than having nothing... This is common to "force" the customer to buy more and more often. You have 12 months to earn 3000 points. If you have 2850 points, then you'll consider buying thing to earn 150 more points to win something... But if you don't have any incentive, then why should you care buying something "now"? This is very common with miles for flights. If you fly often, you get upgrades, discounts, etc. If you don't, then you pay the fare as everybody else. -- Jorge Godoy <jgodoy@gmail.com>