Thread: Shall I apply normalization in the following case?

Shall I apply normalization in the following case?

From

Yan Cheng Cheok

Date:

03 February 2010, 21:21:04

For example, for the following table,


measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1   0.23     mm                    width
2   0.38     mm                    width
2   0.72     mm                    width


If I normalize to the following format, I will encounter several problem compared to table without normalization



measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1   0.23     1                    1
2   0.38     1                    1
2   0.72     1                    1


measurement_unit_id
===================
id | value
----------
1  | mm


measurement_type_id
===================
id | value
----------
1  | width


(1) When rows grow to few millions in table measurement, the join operation on normalization table, is *much* slower
comparedto non-normalization table.  

One of the most frequent used query, To retrieve "value", "measurement_unit" and "measurement_type",  I need to join
measurement+ measurement_unit_id + measurement_type_id. 

For non-normalization table, I need NOT to join.

Right now, I cannot justify why I need to use normalization. I afraid I miss out several important points when I turn
intoun-normalize solution. Please advice if you realize I miss out several important points. 

Thanks
Yan Cheng CHEOK

Re: Shall I apply normalization in the following case?

From

Jorge Godoy

Date:

03 February 2010, 21:57:13

You can always normalize and not use an artificial key.

You'd end up with:

measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------

1 0.23 mm    width
2 0.38 mm    width
2 0.72 mm    width

measurement_unit_id
===================
value
----------
mm

m
cm
in
cm²
m³

measurement_type_id
===================
value
----------
width

area
depth

volume

And so on. You'd benefit from a normalized structure, you'd have constraints checking for valid units and types and you wouldn't need join to get the resulting information.

--
Jorge Godoy <jgodoy@gmail.com>

On Wed, Feb 3, 2010 at 23:20, Yan Cheng Cheok <yccheok@yahoo.com> wrote:

For example, for the following table,

measurement (without normalization)
===========
id | value | measurement_unit | measurement_type
------------------------------------------------
1 0.23 mm width
2 0.38 mm width
2 0.72 mm width

If I normalize to the following format, I will encounter several problem compared to table without normalization

measurement (normalization)
===========
id | value | measurement_unit_id | measurement_type_id
------------------------------------------------------
1 0.23 1 1
2 0.38 1 1
2 0.72 1 1

measurement_unit_id
===================
id | value
----------
1 | mm

measurement_type_id
===================
id | value
----------
1 | width

(1) When rows grow to few millions in table measurement, the join operation on normalization table, is *much* slower compared to non-normalization table.

One of the most frequent used query, To retrieve "value", "measurement_unit" and "measurement_type", I need to join measurement + measurement_unit_id + measurement_type_id.

For non-normalization table, I need NOT to join.

Right now, I cannot justify why I need to use normalization. I afraid I miss out several important points when I turn into un-normalize solution. Please advice if you realize I miss out several important points.

Thanks
Yan Cheng CHEOK

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Shall I apply normalization in the following case?

From

Sim Zacks

Date:

04 February 2010, 03:03:27

> For example, for the following table,
>
>
> measurement (without normalization)
> ===========
> id | value | measurement_unit | measurement_type
> ------------------------------------------------
> 1   0.23     mm                    width
> 2   0.38     mm                    width
> 2   0.72     mm                    width
>
>
> If I normalize to the following format, I will encounter several problem compared to table without normalization
>
>
>
> measurement (normalization)
> ===========
> id | value | measurement_unit_id | measurement_type_id
> ------------------------------------------------------
> 1   0.23     1                    1
> 2   0.38     1                    1
> 2   0.72     1                    1
>
>
> measurement_unit_id
> ===================
> id | value
> ----------
> 1  | mm
>
>
> measurement_type_id
> ===================
> id | value
> ----------
> 1  | width
>
1) foreign key constraints are important, so you don't have things
misspelled or spelled differently and to define the "official" value.
2) querying on an int is quicker then querying on a string, so if you
query on the values without the join you will have better performance.
3) You might want to have more information in the other tables one day,
such as unit conversion information or descriptions, etc..
4) depending on the size of the string, it might take less space for an
int. Though a varchar with mm only takes 3 bytes, width takes 6 bytes,
while a regular int takes 4.
5) As Jorge mentioned you can make the value your pk instead of a serial
int and then you have it normalized and readable.

For the specific design that you are showing, there is no real benefit
to normalization, other then it would make it more scalable.


Sim

Re: Shall I apply normalization in the following case?

From

Lew

Date:

04 February 2010, 11:31:11

Sim Zacks wrote:
> 2) querying on an int is quicker then querying on a string, so if you
> query on the values without the join you will have better performance.

The point of Jorge's approach is that it allows a query without a join.

With the OP's normalization, using integer keys, a join was required,
obviating any advantage to the integer key.

The assertion that a query involving integer keys is always faster than one
with string keys is unsafe at best.  The first rule of performance
optimization is that optimizations don't.

You have to test and measure to find out when and if they do.

Misuse of surrogate keys to obtain optimization is a prime example.  I've seen
situations in the field several times when such abuse forces extra joins in
queries and an increase in the number of indexes to maintain.  Without
measurement under representative workloads, especially concurrent activity,
it's impossible to know whether the cost outweighs the benefit.

--
Lew

Re: Shall I apply normalization in the following case?

From

Jeff Davis

Date:

04 February 2010, 18:37:00

On Wed, 2010-02-03 at 17:20 -0800, Yan Cheng Cheok wrote:
> For example, for the following table,
>
>
> measurement (without normalization)
> ===========
> id | value | measurement_unit | measurement_type
> ------------------------------------------------
> 1   0.23     mm                    width
> 2   0.38     mm                    width
> 2   0.72     mm                    width

That looks normalized to me. Can you describe the redundancy that I
don't see? Do units of "mm" always imply "width" (I would think they
might also be used for "length").

Regards,
    Jeff Davis

Re: Shall I apply normalization in the following case?

From

Jeff Davis

Date:

04 February 2010, 18:39:56

On Wed, 2010-02-03 at 23:56 -0200, Jorge Godoy wrote:
> You can always normalize and not use an artificial key.
>
> You'd end up with:
>
>
> measurement (normalization)
> ===========
> id | value | measurement_unit_id | measurement_type_id
> ------------------------------------------------------
>         1   0.23     mm                    width
>         2   0.38     mm                    width
>         2   0.72     mm                    width

How is this different? It looks like you just added some FKs, which
don't change the normal form.

Regards,
    Jeff Davis