Thread: Design / Implementation problem

Design / Implementation problem

From
Naz Gassiep
Date:
This is possibly not a DB only problem, the solution may involve
application logic as well. But PG users
are the smartest bunch I know. Ass kissing aside, here are the details:

*** The Scenario ***

We are running a customer loyalty program whereby customers earn points
for purchasing products. Each
product has a value of points that are earned by purchasing it, and a
value of points required to
redeem it.

In order to prevent customers from stockpiling points, we want to place
an expiry date on points so
that unused points expire and are lost if they are not redeemed within a
certain period of time. This
will be calculated on a FIFO basis, I.e., the oldest points will expire
first.

We will assume the expiry period is 12 months.

*** The Problem ***

Ascertaining which points to expire is fairly conceptually simple. At
any given point in time, the
points expired is simply the balance on hand at the start of the period,
less redemptions in that
period. If the redemptions is less than the balance at open, not all
points that were available on
that date were used, and the difference is the expiry.

This can be done periodically, say, at the start of every month. However
there are a few problems
with doing it periodically

1. The runs are likely to be too large to be manageable. A DB with tens
of thousands of customers
   and many hundreds of thousands or even millions of sales in the
records tables will require several
   queries and some application calculation to compute. If it takes 2
seconds to compute each balance
   of a 20,000 strong customer base, that's over 11 hours of heavy
lifting in the DB, which will
   likely result in severely degraded performance during those hours.
This problem can only get
   worse as time goes on, and hardware upgrade requirements just to
accommodate a 12 hour window
   once a month is the sign of an app not designed to scale well.

2. Calculating the balance on the fly would be more effective, as it is
unlikley that many customers
   will check their balance on a regular basis. It is likely that a
small fraction of customers will
   check their balance in a given month, meaning that calculating it on
the fly would both spread
   the load over time as well as reduce the total load, even if on the
fly calculation results in
   significantly higher per-customer calculation time.

3. The app is a web app, and it would be preferable to contain business
logic within the database
   itself or the current app codebase. Spreading application logic into
an external mechanism such
   as cron or an external daemon would be undesirable unless there was
no other way.

*** A Possible Solution ***

Calculating the balance on the fly can be done easily if it is done at
the time the customer seeks
to redeem a voucher. Expired points are only relevant at these times,
and so the expired points
would be calculated with an application function that did the following:

1. Get the balance as it was 12 months ago by getting total points
earned less redemptions and expiries
   up to that date.
2. Subtract from it redemptions and expiries since then. The value
obtained, if it is positive, is the
   value of points to expire.
3. Log the expiry entry, and then calculate the balance of points to the
current date by subtracting
   total points redeemed and expired from total points earned.

This procedure has a few practical problems, however:
1. Customers, when they view their running total, will not be aware that
some of the points included
   in it will have expired, as the expiry will only happen when the
application attempts to log a
   redemption.
2. A customer may attempt to redeem a product they do not have enoughh
points for, and be told at
   the last minute that they do not have enough points, leading to acrimony.

The solution is then to calculate it on only on redemptions, but also
whenever the customer attempts
to view their balance. This will ensure that expired points will never
be shown in the current balance
of available points. This, however, means the calculation may be done
many times by each customer in
a single catalog browsing session.

*** The Question ***

Is there a way to design the DB schema as well as the query in such a
manner that calculating the point
balance on the fly is not an unfeasibly heavy duty calculation to be
done at every page view?

This problem does not appear to be solved comprehensively by anyone.
When I log into my credit card
company web site to check my points, I get a message "please come back
in an hour, and your points
will be calculated" if I haven't logged in for over a week. So obviously
they calculate the balance
and put it in a table that acts as a "cached balance".

Emirates has a different solution, they do bi-annual runs, so points
expire every March and September
for them.

Neither of these solutions appeals to me, and there must be A Better
Way(tm).

Re: Design / Implementation problem

From
Naz Gassiep
Date:
Here it is again with more sensible wrapping:


*** The Scenario ***

We are running a customer loyalty program whereby customers earn points
for purchasing products. Each product has a value of points that are
earned by purchasing it, and a value of points required to redeem it.

In order to prevent customers from stockpiling points, we want to place
an expiry date on points so that unused points expire and are lost if
they are not redeemed within a certain period of time. This will be
calculated on a FIFO basis, I.e., the oldest points will expire first.

We will assume the expiry period is 12 months.


*** The Problem ***

Ascertaining which points to expire is fairly conceptually simple. At any
given point in time, the points expired is simply the balance on hand at
the start of the period, less redemptions in that period. If the
redemptions is less than the balance at open, not all points that were
available on that date were used, and the difference is the expiry.

This can be done periodically, say, at the start of every month. However
there are a few problems with doing it periodically

1. The runs are likely to be too large to be manageable. A DB with tens of
thousands of customers and many hundreds of thousands or even millions of
sales in the records tables will require several queries and some
application calculation to compute. If it takes 2 seconds to compute
each balance of a 20,000 strong customer base, that's over 11 hours of
heavy lifting in the DB, which will likely result in severely degraded
performance during those hours. This problem can only get worse as time
goes on, and hardware upgrade requirements just to accommodate a 12 hour
window once a month is the sign of an app not designed to scale well.

2. Calculating the balance on the fly would be more effective, as it is
unlikley that many customers will check their balance on a regular basis.
It is likely that a small fraction of customers will check their balance
in a given month, meaning that calculating it on the fly would both spread
the load over time as well as reduce the total load, even if on the fly
calculation results in significantly higher per-customer calculation time.

3. The app is a web app, and it would be preferable to contain business
logic within the database itself or the current app codebase. Spreading
application logic into an external mechanism such as cron or an external
daemon would be undesirable unless there was no other way.


*** A Possible Solution ***

Calculating the balance on the fly can be done easily if it is done at the
time the customer seeks to redeem a voucher. Expired points are only
relevant at these times, and so the expired points would be calculated
with an application function that did the following:

1. Get the balance as it was 12 months ago by getting total points earned
less redemptions and expiries up to that date.
2. Subtract from it redemptions and expiries since then. The value
obtained, if it is positive, is the value of points to expire.
3. Log the expiry entry, and then calculate the balance of points to the
current date by subtracting total points redeemed and expired from total
points earned.

This procedure has a few practical problems, however:
1. Customers, when they view their running total, will not be aware that
some of the points included in it will have expired, as the expiry will
only happen when the application attempts to log a redemption.
2. A customer may attempt to redeem a product they do not have enoughh
points for, and be told at the last minute that they do not have enough
points, leading to acrimony.

The solution is then to calculate it on only on redemptions, but also
whenever the customer attempts to view their balance. This will ensure
that expired points will never be shown in the current balance of
available points. This, however, means the calculation may be done many
times by each customer in a single catalog browsing session.


*** The Question ***

Is there a way to design the DB schema as well as the query in such a
manner that calculating the point balance on the fly is not an unfeasibly
heavy duty calculation to be done at every page view?

This problem does not appear to be solved comprehensively by anyone. When
I log into my credit card company web site to check my points, I get a
message "please come back in an hour, and your points will be calculated"
if I haven't logged in for over a week. So obviously they calculate the
balance and put it in a table that acts as a "cached balance".

Emirates has a different solution, they do bi-annual runs, so points
expire every March and September for them.

Neither of these solutions appeals, and there must be A Better Way(tm).

Re: Design / Implementation problem

From
Jorge Godoy
Date:
Naz Gassiep <naz@mira.net> writes:

> that calculating the point
> balance on the fly is not an unfeasibly heavy duty calculation to be done at
> every page view?

One alternative to calculate it everytime is calculating it once a day.  If
there are calculations for today, then just read the calculated value.
Otherwise calculate it.

If the user earns more points, make an entry at the raw table (for the
expiration process) and increments today points.  Do the same for points
spent.



--
Jorge Godoy      <jgodoy@gmail.com>

Re: Design / Implementation problem

From
"Ted Byers"
Date:
Naz

First, there is nothing you can do about the computational load except make
your code as efficient as possible.  Get it right and then make it fast.
But there is only so much you can do.  If a "calculation" requires an
integer sum and an integer difference, you inevitably have two integer
operations per "calculation".  For a given complex calculation, you
generally have to resort to a little algebra to put the calculation into a
form that requires the least number of atomic operations (think of a mean
and variance, for a simple example where a brute force approach will be
correct but much more expensive than one based on a little algebraic
manipulation.  That leaves making your application appear to be more
responsive by distributing the computational load differently.  This time
think of some basic data structures and the work required to obtain sorted
output.  Considering the standard C++ STL, one could use a  vector, with
contents in random order, and sort that, or one could use a map and have
sorted output at any time without an explicit sort operation before getting
it.  But think what has happened.  Element access, including especially
inserts or additions and reading container elements, will be very fast with
the vector, faster than with a map.  But all the computational load for
obtaining sorted output is carried at the moment the request for such output
is made.  On the other hand, getting sorted output from the map will appear
to be blindingly fast relative to getting the same output from the vector.
Why?  Because getting sorted output requires a certain number of
comparisons.  This can not be avoided.  But with the map, these are done on
insert, so your data is actually stored sorted.  The price of the sort is
paid in both the case of using a vector and in the case of using the map,
but the user may not notice that cost in the case of a map where he would in
the case of the vector (if the dataset is large).  I suspect you are aware
of some of this already, but I say it just to make certain of it.

I say all the above so that you don't get too distracted by your
calculations of the potential costs of supporting your program.  You need to
have an idea of how expensive it wll be, and how you might distribute the
costs, but if you have millions of clients, and each operation requires a
hald a dozen basic operations, there is no avoiding that.  You reduce as
much as possible, but you will always get to a point where it can not be
reduced further.  At that point, all you can do is throw more iron at the
application.

Now, an option to consider.  Create two tables, one for points accrued and
one for points used.  In each case, you'll have fields to identify the
customer, his transactions, etc., but the important fields will be one for
the number of points (acquired or used) on a given day, and the date.  It
may also be prudent to have a field to represent expiry date since how long
points can be stored before they have to be used is a policy decision, one
that could change at any time, and ought therefore be a data item stored
rather than hardcoded into your code.  The points acquired table will need a
field to represent the points acquired on that date that remain to be used,
so that transactions using points can use the oldest points acquired by the
customer before using new ones.  You may well want this for auditing
purposes in addition to the core functionality you're after.  Now add a
third which keeps track of the number of points available at a given time.
Again, you'll have fields to identify the customer, the number of available
points, and maybe the date (it depends on how you want to do it and what
else your database needs to support).  Now, update the points acquired table
and the summary table in each transaction in which the customer can gain
points, and update the points used table and summary table in each
transaction where the customer can use points.  And you can update all three
in transactions where a customer can use points for part of the transaction
and other resources for the rest.  At least until your unit testing is done,
I'd suggest an extra table or two that allows you to store information about
which points a customer is using for a given transaction, just to make
absolutely certain that in fact your customer is using the oldest points
that he has, in a given transaction, before the ones he acquired today.
Finally, you may want to create archive tables and a scheduled task that
moves records dealing with expired points from the table you're always
working with to the archive tables (in enough detail that you can
reconstruct precisely how and when points were acquired and used by each
customer), but you might need extra code to bring them back should a policy
make decide that points can have a three year lifespan instead of one.  I'd
put all the required code to support all of this into a suite of stored
procedures.

Note, this should allow your customers to get their information almost
instantly since the amount of data that would be processed for them would be
very small.  If they want to know how many points they have, and even how
many points they have that will expire in the next day, they're looking at
dirt simple queries with at most one substraction.

Note, I did not spend the time to refine this to minimize the total
computational load or the data storage requirements.  That could take days,
or even weeks, depending on your attention to detail and how concerned you
are about efficiency.  I am sure there are some computational efficiencies
that can be gained, with additional analysis, perhaps with some tradeoffs
regarding the detail of the data stored available for audit purposes, but
I'll leave that as an exercise for you.  :-)

Note, I do not see how, from what you'd written, how your proposed solution
would ensure that customers used their oldest points first.  Maybe it does,
and you didn't describe that aspect well, but that is something you'll have
to be careful about if you want to avoid upset customers.  Unit testing,
followed by integration tests, are your friends!

HTH

Ted



Re: Design / Implementation problem

From
"Joris Dobbelsteen"
Date:
>-----Original Message-----
>From: pgsql-general-owner@postgresql.org
>[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Naz Gassiep
>Sent: zondag 18 maart 2007 14:45
>To: Naz Gassiep
>Cc: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] Design / Implementation problem
>
>Here it is again with more sensible wrapping:
>
>
>*** The Scenario ***
>
>We are running a customer loyalty program whereby customers
>earn points for purchasing products. Each product has a value
>of points that are earned by purchasing it, and a value of
>points required to redeem it.
>
>In order to prevent customers from stockpiling points, we want
>to place an expiry date on points so that unused points expire
>and are lost if they are not redeemed within a certain period
>of time. This will be calculated on a FIFO basis, I.e., the
>oldest points will expire first.
>
>We will assume the expiry period is 12 months.
>
>
>*** The Problem ***
>
>Ascertaining which points to expire is fairly conceptually
>simple. At any given point in time, the points expired is
>simply the balance on hand at the start of the period, less
>redemptions in that period. If the redemptions is less than
>the balance at open, not all points that were available on
>that date were used, and the difference is the expiry.
>
>This can be done periodically, say, at the start of every
>month. However there are a few problems with doing it periodically
>
>1. The runs are likely to be too large to be manageable. A DB
>with tens of thousands of customers and many hundreds of
>thousands or even millions of sales in the records tables will
>require several queries and some application calculation to
>compute. If it takes 2 seconds to compute each balance of a
>20,000 strong customer base, that's over 11 hours of heavy
>lifting in the DB, which will likely result in severely
>degraded performance during those hours. This problem can only
>get worse as time goes on, and hardware upgrade requirements
>just to accommodate a 12 hour window once a month is the sign
>of an app not designed to scale well.
>
>2. Calculating the balance on the fly would be more effective,
>as it is unlikley that many customers will check their balance
>on a regular basis.
>It is likely that a small fraction of customers will check
>their balance in a given month, meaning that calculating it on
>the fly would both spread the load over time as well as reduce
>the total load, even if on the fly calculation results in
>significantly higher per-customer calculation time.
>
>3. The app is a web app, and it would be preferable to contain
>business logic within the database itself or the current app
>codebase. Spreading application logic into an external
>mechanism such as cron or an external daemon would be
>undesirable unless there was no other way.
>
>
>*** A Possible Solution ***
>
[snip]
>
>*** The Question ***
>
>Is there a way to design the DB schema as well as the query in
>such a manner that calculating the point balance on the fly is
>not an unfeasibly heavy duty calculation to be done at every page view?

*** My Answer ***

I could think of a simple solution that might work, at the cost of a
little storage space. This gives an advantage in computational overhead.

For every time you award points, track two things:
* Awarded points...
* Points remaining from the awarded ones.
  Obviously equal to awarded points at insertion time
* Date they are awarded (or the expirary date, that doesn't matter).

When you are subtracting points just update the the non-expired
remaining points, with the oldest first.

From the problem I think you can do it on-the-fly without too much
overhead. You can plug in your scheme how to account the points:
Per-order, add to the order table...
Per-period, add a table for the points only...

Of course it really depends on how much data you are expecting. Overhead
will be 'fixed' for per-period and otherwise scale with orders/customer.

[snip]

Maybe this helps a bit,

- Joris


Re: Design / Implementation problem

From
"Filip Rembiałkowski"
Date:
hmmm. just a general notice:

A customer loyalty program, which expires earned points, not to let
the customer "win" anything valuable?
If I were your client, I wouldn't be happy with this.


2007/3/18, Naz Gassiep <naz@mira.net>:
>
> We are running a customer loyalty program whereby customers earn points
> for purchasing products. Each
> product has a value of points that are earned by purchasing it, and a
> value of points required to
> redeem it.
>
> In order to prevent customers from stockpiling points, we want to place
> an expiry date on points so
> that unused points expire and are lost if they are not redeemed within a
> certain period of time. This
> will be calculated on a FIFO basis, I.e., the oldest points will expire
> first.
>
> We will assume the expiry period is 12 months.
>


--
Filip Rembiałkowski

Re: Design / Implementation problem

From
Jorge Godoy
Date:
"Filip Rembiałkowski" <plk.zuber@gmail.com> writes:

> hmmm. just a general notice:
>
> A customer loyalty program, which expires earned points, not to let
> the customer "win" anything valuable?
> If I were your client, I wouldn't be happy with this.

On the other hand, having the possibility is better than having nothing...
This is common to "force" the customer to buy more and more often.  You have
12 months to earn 3000 points.  If you have 2850 points, then you'll consider
buying thing to earn 150 more points to win something...  But if you don't
have any incentive, then why should you care buying something "now"?

This is very common with miles for flights.  If you fly often, you get
upgrades, discounts, etc.  If you don't, then you pay the fare as everybody
else.

--
Jorge Godoy      <jgodoy@gmail.com>