When to store data that could be derived - Mailing list pgsql-general

From Frank
Subject When to store data that could be derived
Date
Msg-id a2b19f30-9e1d-aea5-7b33-427047522461@chagford.com
Whole thread Raw
Responses Re: When to store data that could be derived  (Ron <ronljohnsonjr@gmail.com>)
Re: When to store data that could be derived  (Tony Shelver <tshelver@gmail.com>)
List pgsql-general
Hi all

As I understand it, a  general rule of thumb is that you should never 
create a physical column if the data could be derived from existing 
columns. A possible reason for breaking this rule is for performance 
reasons.

I have a situation where I am considering breaking the rule, but I am 
not experienced enough in SQL to know if my reason is valid. I would 
appreciate it if someone could glance at my 'before' and 'after' 
scenarios and see if, from a 'gut-feel' point of view, I should proceed.

I have a VIEW constructed as follows -

CREATE VIEW view_name AS
[select statement 1]
UNION ALL
[select statement 2]
etc.

This is one of the select statements. I will give the 'after' scenario 
first -

SELECT
     'arec' AS tran_type, a.row_id AS tran_row_id,
     a.tran_number AS tran_number, a.cust_row_id AS cust_row_id,
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.tran_date
         WHEN a.tran_type = 'cb_rec' THEN w.tran_date
     END AS tran_date,
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.text
         WHEN a.tran_type = 'cb_rec' THEN w.text
     END AS text,
     0 - a.arec_cust AS amount_cust,
     0 - a.arec_local AS amount_local
FROM ar_rec_subtran a
LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id
LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id
WHERE
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.posted
         WHEN a.tran_type = 'cb_rec' THEN w.posted
     END = '1'

The two columns a.arec_cust and a.arec_local *can* be derived from other 
columns, and in fact that is how it is working at the moment, so here is 
the 'before' scenario -

SELECT
     'arec' AS tran_type, a.row_id AS tran_row_id,
     a.tran_number AS tran_number, a.cust_row_id AS cust_row_id,
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.tran_date
         WHEN a.tran_type = 'cb_rec' THEN w.tran_date
     END AS tran_date,
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.text
         WHEN a.tran_type = 'cb_rec' THEN w.text
     END AS text,
     ROUND(0 - (ROUND(a.arec_amount / CASE
         WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate
         WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate
     END * a.cust_exch_rate, u.scale)), 2) AS amount_cust,
     ROUND(0 - (ROUND(a.arec_amount / CASE
         WHEN a.tran_type = 'ar_rec' THEN y.tran_exch_rate
         WHEN a.tran_type = 'cb_rec' THEN w.tran_exch_rate
     END, s.scale)), 2) AS amount_local
FROM ar_rec_subtran a
LEFT JOIN ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
LEFT JOIN ar_tran_rec y ON y.row_id = z.tran_row_id
LEFT JOIN cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
LEFT JOIN cb_tran_rec w ON w.row_id = x.tran_row_id
LEFT JOIN ar_customers v ON v.row_id = a.cust_row_id
LEFT JOIN adm_currencies u ON u.row_id = v.currency_id
LEFT JOIN adm_params t ON t.row_id = 1
LEFT JOIN adm_currencies s ON s.row_id = t.local_curr_id
WHERE
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.posted
         WHEN a.tran_type = 'cb_rec' THEN w.posted
     END = '1'

As you can see, complexity has increased and there are four additional 
JOINs.

I am expecting the VIEW to be used extensively for query purposes, and 
my gut-feel says that the second one is likely to lead to performance 
problems in a system with a lot of data and a lot of users.

I am not looking for an answer - I know that I should create dummy data 
and run some timing tests. I was just wondering if someone more 
experienced would wince when they look at the second SELECT, or if they 
would shrug and think that it looks fine.

Any input will be appreciated.

Frank Millman




pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: regr_slope returning NULL
Next
From: Ron
Date:
Subject: Re: When to store data that could be derived