Thread: Optimizations

Optimizations

From
Ogden
Date:
We run a student scoring system with PostgreSQL as a backend. After the results for each student are inputted into the
system,we display many reports for them. We haven't had a problem with efficiency or speed, but it has come up that
perhapsstoring the rolled up scores of each student may be better than calculating their score on the fly. I have
alwayscoded the SQL to calculate on the fly and do not see any benefit from calculating on the fly. For a test with
over100 questions and with 950 students having taken it, it calculates all their relevant score information in less
thanhalf a second. Would there be any obvious benefit to caching the results? 

I would greatly appreciate any thoughts on this.

Here is the structure:

A database to store and calculate student results.


Information about the student and which test they took:

\d test_registration;
                      Table "public.test_registration"
       Column        |            Type             |       Modifiers
---------------------+-----------------------------+------------------------
 id                  | uuid                        | not null
 sid                 | character varying(36)       | not null
 created_date        | timestamp without time zone | not null default now()
 modified_date       | timestamp without time zone | not null
 test_administration | uuid                        | not null


The actual results (what the student marked):

\d test_registration_result (linked to test_registration.id above)


        Table "public.test_registration_result"
      Column       |         Type          | Modifiers
-------------------+-----------------------+-----------
 test_registration | uuid                  | not null
 question          | uuid                  | not null
 answer            | character varying(15) |


\d question (information on each question)

                  Table "public.question"
      Column       |          Type          |   Modifiers
-------------------+------------------------+---------------
 id                | uuid                   | not null
 test              | uuid                   | not null
 question          | integer                | not null
 weight            | double precision       |


\d question_answer (the answers for the question)
        Table "public.question_answer"
  Column  |         Type          | Modifiers
----------+-----------------------+-----------
 question | uuid                  | not null
 answer   | character varying(15) | not null



With a SQL query:

SELECT sid, raw_score, weighted_score, number_questions, total_weights,
( weighted_score / total_weights ) * 100.00 as mp_percentage,
total_weights
    FROM
    (
        SELECT
        tr.sid as sid,
        sum (
              (
                SELECT (case when a.answer = r.answer then 1 else 0 end )
              )
             ) as raw_score,
         sum (
               (
                 SELECT (case when a.answer = r.answer THEN q.weight end )
                 )
              ) as weighted_score,
.....


For 953 students on a test with 145 questions, this takes less than half a second to calculate. Is is worth storing the
score?

\d score_set
                           Table "public.score_set"
         Column         |            Type             |       Modifiers
------------------------+-----------------------------+------------------------
 id                     | uuid                        | not null
 sid                    | uuid                        | not null
 test_registration_id   | uuid                        | not null
 test_administration_id | uuid                        | not null
 score                  | double precision            | not null

Will it be much faster? I know more storage will be needed.


Thank you

Ogden



Re: Optimizations

From
Craig Ringer
Date:
Ogden wrote:
> We run a student scoring system with PostgreSQL as a backend. After the results for each student are inputted into
thesystem, we display many reports for them. We haven't had a problem with efficiency or speed, but it has come up that
perhapsstoring the rolled up scores of each student may be better than calculating their score on the fly. I have
alwayscoded the SQL to calculate on the fly and do not see any benefit from calculating on the fly. For a test with
over100 questions and with 950 students having taken it, it calculates all their relevant score information in less
thanhalf a second. Would there be any obvious benefit to caching the results? 

Caching the results would mean storing the same information in two
places (individual scores, and aggregates calculated from them). That's
room for error if they're permitted to get out of sync in any way for
any reason. For that reason, and because it's complexity you don't need,
I'd avoid it unless I had a reason not to.

On the other hand if you expect the number of students you have to
report on to grow vastly then it's worth considering.

If you do go ahead with it, first restructure all queries that use that
information so they go view a view that calculates that data on the fly.

Then look at replacing that view with a table that's automatically
updated by triggers when the data source is updated (say, a student has
a new score recorded).

--
Craig Ringer

Re: Optimizations

From
Ogden
Date:
On Mar 5, 2010, at 2:26 AM, Craig Ringer wrote:

> Ogden wrote:
>> We run a student scoring system with PostgreSQL as a backend. After the results for each student are inputted into
thesystem, we display many reports for them. We haven't had a problem with efficiency or speed, but it has come up that
perhapsstoring the rolled up scores of each student may be better than calculating their score on the fly. I have
alwayscoded the SQL to calculate on the fly and do not see any benefit from calculating on the fly. For a test with
over100 questions and with 950 students having taken it, it calculates all their relevant score information in less
thanhalf a second. Would there be any obvious benefit to caching the results? 
>
> Caching the results would mean storing the same information in two
> places (individual scores, and aggregates calculated from them). That's
> room for error if they're permitted to get out of sync in any way for
> any reason. For that reason, and because it's complexity you don't need,
> I'd avoid it unless I had a reason not to.
>
> On the other hand if you expect the number of students you have to
> report on to grow vastly then it's worth considering.
>
> If you do go ahead with it, first restructure all queries that use that
> information so they go view a view that calculates that data on the fly.
>
> Then look at replacing that view with a table that's automatically
> updated by triggers when the data source is updated (say, a student has
> a new score recorded).

Craig,

Thank you for the response and insight.

While it sounds good in practice, I know storing the results will vastly increase the size (the table holding the
resultsis over 5Gb in one case) and calculating results from it takes not more than a second for a huge data set.  

Would searching a huge table be as fast as calculating or about the same? I'll have to run some tests on my end but I
amvery impressed by the speed of which PostgreSQL executes aggregate functions.  

Do you suggest looking at this option when we see the reporting to slow down? At that point do you suggest we go back
tothe drawing board? 

Thank you

Ogden

Re: Optimizations

From
Craig Ringer
Date:
On 5/03/2010 10:09 PM, Ogden wrote:

> Would searching a huge table be as fast as calculating or about the same? I'll have to run some tests on my end but I
amvery impressed by the speed of which PostgreSQL executes aggregate functions. 

I'm not sure what you're asking.

> Do you suggest looking at this option when we see the reporting to slow down? At that point do you suggest we go back
tothe drawing board? 

If it ain't broke, don't fix it. However, it's a good idea to make it
easy to fix later - for example, wrap your score calculations up into a
view (see CREATE VIEW) so that you can replace it with a materialized
view later if you start seeing performance issues.

--
Craig Ringer