Re: Design / Implementation problem - Mailing list pgsql-general

From Ted Byers
Subject Re: Design / Implementation problem
Date
Msg-id 02c501c76971$9ca46970$6501a8c0@RnDworkstation
Whole thread Raw
In response to Design / Implementation problem  (Naz Gassiep <naz@mira.net>)
List pgsql-general
Naz

First, there is nothing you can do about the computational load except make
your code as efficient as possible.  Get it right and then make it fast.
But there is only so much you can do.  If a "calculation" requires an
integer sum and an integer difference, you inevitably have two integer
operations per "calculation".  For a given complex calculation, you
generally have to resort to a little algebra to put the calculation into a
form that requires the least number of atomic operations (think of a mean
and variance, for a simple example where a brute force approach will be
correct but much more expensive than one based on a little algebraic
manipulation.  That leaves making your application appear to be more
responsive by distributing the computational load differently.  This time
think of some basic data structures and the work required to obtain sorted
output.  Considering the standard C++ STL, one could use a  vector, with
contents in random order, and sort that, or one could use a map and have
sorted output at any time without an explicit sort operation before getting
it.  But think what has happened.  Element access, including especially
inserts or additions and reading container elements, will be very fast with
the vector, faster than with a map.  But all the computational load for
obtaining sorted output is carried at the moment the request for such output
is made.  On the other hand, getting sorted output from the map will appear
to be blindingly fast relative to getting the same output from the vector.
Why?  Because getting sorted output requires a certain number of
comparisons.  This can not be avoided.  But with the map, these are done on
insert, so your data is actually stored sorted.  The price of the sort is
paid in both the case of using a vector and in the case of using the map,
but the user may not notice that cost in the case of a map where he would in
the case of the vector (if the dataset is large).  I suspect you are aware
of some of this already, but I say it just to make certain of it.

I say all the above so that you don't get too distracted by your
calculations of the potential costs of supporting your program.  You need to
have an idea of how expensive it wll be, and how you might distribute the
costs, but if you have millions of clients, and each operation requires a
hald a dozen basic operations, there is no avoiding that.  You reduce as
much as possible, but you will always get to a point where it can not be
reduced further.  At that point, all you can do is throw more iron at the
application.

Now, an option to consider.  Create two tables, one for points accrued and
one for points used.  In each case, you'll have fields to identify the
customer, his transactions, etc., but the important fields will be one for
the number of points (acquired or used) on a given day, and the date.  It
may also be prudent to have a field to represent expiry date since how long
points can be stored before they have to be used is a policy decision, one
that could change at any time, and ought therefore be a data item stored
rather than hardcoded into your code.  The points acquired table will need a
field to represent the points acquired on that date that remain to be used,
so that transactions using points can use the oldest points acquired by the
customer before using new ones.  You may well want this for auditing
purposes in addition to the core functionality you're after.  Now add a
third which keeps track of the number of points available at a given time.
Again, you'll have fields to identify the customer, the number of available
points, and maybe the date (it depends on how you want to do it and what
else your database needs to support).  Now, update the points acquired table
and the summary table in each transaction in which the customer can gain
points, and update the points used table and summary table in each
transaction where the customer can use points.  And you can update all three
in transactions where a customer can use points for part of the transaction
and other resources for the rest.  At least until your unit testing is done,
I'd suggest an extra table or two that allows you to store information about
which points a customer is using for a given transaction, just to make
absolutely certain that in fact your customer is using the oldest points
that he has, in a given transaction, before the ones he acquired today.
Finally, you may want to create archive tables and a scheduled task that
moves records dealing with expired points from the table you're always
working with to the archive tables (in enough detail that you can
reconstruct precisely how and when points were acquired and used by each
customer), but you might need extra code to bring them back should a policy
make decide that points can have a three year lifespan instead of one.  I'd
put all the required code to support all of this into a suite of stored
procedures.

Note, this should allow your customers to get their information almost
instantly since the amount of data that would be processed for them would be
very small.  If they want to know how many points they have, and even how
many points they have that will expire in the next day, they're looking at
dirt simple queries with at most one substraction.

Note, I did not spend the time to refine this to minimize the total
computational load or the data storage requirements.  That could take days,
or even weeks, depending on your attention to detail and how concerned you
are about efficiency.  I am sure there are some computational efficiencies
that can be gained, with additional analysis, perhaps with some tradeoffs
regarding the detail of the data stored available for audit purposes, but
I'll leave that as an exercise for you.  :-)

Note, I do not see how, from what you'd written, how your proposed solution
would ensure that customers used their oldest points first.  Maybe it does,
and you didn't describe that aspect well, but that is something you'll have
to be careful about if you want to avoid upset customers.  Unit testing,
followed by integration tests, are your friends!

HTH

Ted



pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: cannot get build (initdb) to work after trying for a week
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Is This A Set Based Solution?