Re: Design / Implementation problem - Mailing list pgsql-general
From | Ted Byers |
---|---|
Subject | Re: Design / Implementation problem |
Date | |
Msg-id | 02c501c76971$9ca46970$6501a8c0@RnDworkstation Whole thread Raw |
In response to | Design / Implementation problem (Naz Gassiep <naz@mira.net>) |
List | pgsql-general |
Naz First, there is nothing you can do about the computational load except make your code as efficient as possible. Get it right and then make it fast. But there is only so much you can do. If a "calculation" requires an integer sum and an integer difference, you inevitably have two integer operations per "calculation". For a given complex calculation, you generally have to resort to a little algebra to put the calculation into a form that requires the least number of atomic operations (think of a mean and variance, for a simple example where a brute force approach will be correct but much more expensive than one based on a little algebraic manipulation. That leaves making your application appear to be more responsive by distributing the computational load differently. This time think of some basic data structures and the work required to obtain sorted output. Considering the standard C++ STL, one could use a vector, with contents in random order, and sort that, or one could use a map and have sorted output at any time without an explicit sort operation before getting it. But think what has happened. Element access, including especially inserts or additions and reading container elements, will be very fast with the vector, faster than with a map. But all the computational load for obtaining sorted output is carried at the moment the request for such output is made. On the other hand, getting sorted output from the map will appear to be blindingly fast relative to getting the same output from the vector. Why? Because getting sorted output requires a certain number of comparisons. This can not be avoided. But with the map, these are done on insert, so your data is actually stored sorted. The price of the sort is paid in both the case of using a vector and in the case of using the map, but the user may not notice that cost in the case of a map where he would in the case of the vector (if the dataset is large). I suspect you are aware of some of this already, but I say it just to make certain of it. I say all the above so that you don't get too distracted by your calculations of the potential costs of supporting your program. You need to have an idea of how expensive it wll be, and how you might distribute the costs, but if you have millions of clients, and each operation requires a hald a dozen basic operations, there is no avoiding that. You reduce as much as possible, but you will always get to a point where it can not be reduced further. At that point, all you can do is throw more iron at the application. Now, an option to consider. Create two tables, one for points accrued and one for points used. In each case, you'll have fields to identify the customer, his transactions, etc., but the important fields will be one for the number of points (acquired or used) on a given day, and the date. It may also be prudent to have a field to represent expiry date since how long points can be stored before they have to be used is a policy decision, one that could change at any time, and ought therefore be a data item stored rather than hardcoded into your code. The points acquired table will need a field to represent the points acquired on that date that remain to be used, so that transactions using points can use the oldest points acquired by the customer before using new ones. You may well want this for auditing purposes in addition to the core functionality you're after. Now add a third which keeps track of the number of points available at a given time. Again, you'll have fields to identify the customer, the number of available points, and maybe the date (it depends on how you want to do it and what else your database needs to support). Now, update the points acquired table and the summary table in each transaction in which the customer can gain points, and update the points used table and summary table in each transaction where the customer can use points. And you can update all three in transactions where a customer can use points for part of the transaction and other resources for the rest. At least until your unit testing is done, I'd suggest an extra table or two that allows you to store information about which points a customer is using for a given transaction, just to make absolutely certain that in fact your customer is using the oldest points that he has, in a given transaction, before the ones he acquired today. Finally, you may want to create archive tables and a scheduled task that moves records dealing with expired points from the table you're always working with to the archive tables (in enough detail that you can reconstruct precisely how and when points were acquired and used by each customer), but you might need extra code to bring them back should a policy make decide that points can have a three year lifespan instead of one. I'd put all the required code to support all of this into a suite of stored procedures. Note, this should allow your customers to get their information almost instantly since the amount of data that would be processed for them would be very small. If they want to know how many points they have, and even how many points they have that will expire in the next day, they're looking at dirt simple queries with at most one substraction. Note, I did not spend the time to refine this to minimize the total computational load or the data storage requirements. That could take days, or even weeks, depending on your attention to detail and how concerned you are about efficiency. I am sure there are some computational efficiencies that can be gained, with additional analysis, perhaps with some tradeoffs regarding the detail of the data stored available for audit purposes, but I'll leave that as an exercise for you. :-) Note, I do not see how, from what you'd written, how your proposed solution would ensure that customers used their oldest points first. Maybe it does, and you didn't describe that aspect well, but that is something you'll have to be careful about if you want to avoid upset customers. Unit testing, followed by integration tests, are your friends! HTH Ted
pgsql-general by date: