Thread: Normalization and regexp

Normalization and regexp

From

Date:

22 December 2003, 19:17:42

Hi,

Since Canada Post hasn't programmed their automated web tools for calculating transportation costs to work with
anythingoutside of Windows Internet Explorer, I'm obliged to write a web based program optimized for all those *other*
browsers,myself. Part of this task requires that I set up tables in postgresql that match postal codes to
transportationcost.  

Canada Post provides a booklet for calculating transportation cost based on package weight and the first three
charactersof the postal code sequence. For instance, if I want to send a package to an address that includes G8F 1X1 as
thepostal code, I take the first 3 characters G8F and look them up in table 1. 

Table 1

PostalCode    Tarrif number
---------------------------
G4V               14
G8E-G8G           14
G4R-G4S           13

Since G8F falls in the G8E-G8G range, I now know that the tarrif number is 14. Taking the number 14, I go to table 2

Table 2

   For tarrif Code 14
Weight(kg)       Price
----------------------
  1.0            5.37
  1.5            5.61
  2.0            5.82

If the weight of my package is 1kg, the price is 5.37 to send the package to the address bearing G8F as the first 3
charactersof the postal code. 

To render this in the database, I have done the following:

_____________________________
           p_code            |
=============================
  pcode_id   |     tarrif    |
-----------------------------
    G4V      |       14      |
-----------------------------
    G8E      |       14      |
-----------------------------
    G8F      |       14      |
-----------------------------
    G8G      |       14      |
-----------------------------
    G4R      |       13      |
-----------------------------
    G4S      |       13      |
-----------------------------

__________________________________
           price_weight           |
==================================
   tarrif   | weight(kg)|  price  |
----------------------------------
     14     |    1.0    |   5.37  |
----------------------------------
     14     |    1.5    |   5.61  |
----------------------------------
     14     |    2.0    |   5.82  |
----------------------------------
     13     |    1.0    |   5.20  |
----------------------------------
     13     |    1.5    |   5.32  |
----------------------------------
     13     |    2.0    |   5.42  |


Therefore my sql statement would look something like this:

SELECT price FROM price_weight
WHERE p_code.tarrif = price_weight.tarrif
AND pcode = 'G8F'
AND weight = '1.0';

I think this will work, but before I proceed, I'd like to ask 2 questions:

1.
Is it possible to further normalize the data in the p_code and price_weight tables above?

2.
Is it possible to abbreviate the number of records in the p_code table using regular expressions. For instance, to
avoidrepetition, I thought I'd use regular expressions, so that instead of entering the postal code into separate rows
assuch: 

G4V    14
G8E    14
G8F    14
G8G    14

I could do something like this:

(G4V | G8[E-G])    14

Somehow I don't think this is possible, but I'm looking for any way to minimize the number of postal codes that I have
toenter, since there's a lot of them. 

Anyway, I realize these questions may have more to do with database design than postgresql per se. If there's a better
placeto ask them, please point me in the right direction. 

Thanks,

Mark

Re: Normalization and regexp

From

"John Sidney-Woollett"

Date:

23 December 2003, 04:53:34

In order to reduce the number of rows in the p_code table, you could
modify it so that it had the following columns:

create table p_code (
pCode_Start   varchar(3),
pCode_End     varchar(3),
tariff        integer,
primary key(pCode_Start, pCode_End)
);

If you need to insert a single code, use the following:

insert into pCode('G4V','G4V',14);

and a range as follows:

insert into pCode('G8E', 'G8G',14);

Your select statement is now:

SELECT price FROM price_weight pw, p_code pc
WHERE pc.tarrif = pw.tarrif
AND pCode_Start <= 'G8F'
AND pCode_End >= 'G8F'
AND weight = '1.0';

I think that should work OK...

John Sidney-Woollett

MT said:
> Hi,
>
> Since Canada Post hasn't programmed their automated web tools for
> calculating transportation costs to work with anything outside of Windows
> Internet Explorer, I'm obliged to write a web based program optimized for
> all those *other* browsers, myself. Part of this task requires that I set
> up tables in postgresql that match postal codes to transportation cost.
>
> Canada Post provides a booklet for calculating transportation cost based
> on package weight and the first three characters of the postal code
> sequence. For instance, if I want to send a package to an address that
> includes G8F 1X1 as the postal code, I take the first 3 characters G8F and
> look them up in table 1.
>
> Table 1
>
> PostalCode    Tarrif number
> ---------------------------
> G4V               14
> G8E-G8G           14
> G4R-G4S           13
>
> Since G8F falls in the G8E-G8G range, I now know that the tarrif number is
> 14. Taking the number 14, I go to table 2
>
> Table 2
>
>    For tarrif Code 14
> Weight(kg)       Price
> ----------------------
>   1.0            5.37
>   1.5            5.61
>   2.0            5.82
>
> If the weight of my package is 1kg, the price is 5.37 to send the package
> to the address bearing G8F as the first 3 characters of the postal code.
>
> To render this in the database, I have done the following:
>
> _____________________________
>            p_code            |
> =============================
>   pcode_id   |     tarrif    |
> -----------------------------
>     G4V      |       14      |
> -----------------------------
>     G8E      |       14      |
> -----------------------------
>     G8F      |       14      |
> -----------------------------
>     G8G      |       14      |
> -----------------------------
>     G4R      |       13      |
> -----------------------------
>     G4S      |       13      |
> -----------------------------
>
> __________________________________
>            price_weight           |
> ==================================
>    tarrif   | weight(kg)|  price  |
> ----------------------------------
>      14     |    1.0    |   5.37  |
> ----------------------------------
>      14     |    1.5    |   5.61  |
> ----------------------------------
>      14     |    2.0    |   5.82  |
> ----------------------------------
>      13     |    1.0    |   5.20  |
> ----------------------------------
>      13     |    1.5    |   5.32  |
> ----------------------------------
>      13     |    2.0    |   5.42  |
>
>
> Therefore my sql statement would look something like this:
>
> SELECT price FROM price_weight
> WHERE p_code.tarrif = price_weight.tarrif
> AND pcode = 'G8F'
> AND weight = '1.0';
>
> I think this will work, but before I proceed, I'd like to ask 2 questions:
>
> 1.
> Is it possible to further normalize the data in the p_code and
> price_weight tables above?
>
> 2.
> Is it possible to abbreviate the number of records in the p_code table
> using regular expressions. For instance, to avoid repetition, I thought
> I'd use regular expressions, so that instead of entering the postal code
> into separate rows as such:
>
> G4V    14
> G8E    14
> G8F    14
> G8G    14
>
> I could do something like this:
>
> (G4V | G8[E-G])    14
>
> Somehow I don't think this is possible, but I'm looking for any way to
> minimize the number of postal codes that I have to enter, since there's a
> lot of them.
>
> Anyway, I realize these questions may have more to do with database design
> than postgresql per se. If there's a better place to ask them, please
> point me in the right direction.
>
> Thanks,
>
> Mark
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

Re: Normalization and regexp

From

"ezra epstein"

Date:

24 December 2003, 20:53:04

There's no reason you couldn't store a regex string into a column.  The
trick has only to do with proper escaping for the various steps, namely the
loading method (regular SQL has some escaping requirements and so you need
double back-slashes and escaped quotes).  A work-around for this is to use
other input methods like COPY.  Then there's the escaping required for the
regex tool of choice: either client-side or server(db)-side.

If you're doing the regex processing client-side this might be easier.
Depends on usage.

In short, the principal of what you propose seems completely sound.
However, there might be tricks to handling character escaping correctly.

== Ezra Epstein

"MT" <m_tessier@sympatico.ca> wrote in message
news:20031222181239.476cfa47.m_tessier@sympatico.ca...
> Hi,
>
> Since Canada Post hasn't programmed their automated web tools for
calculating transportation costs to work with anything outside of Windows
Internet Explorer, I'm obliged to write a web based program optimized for
all those *other* browsers, myself. Part of this task requires that I set up
tables in postgresql that match postal codes to transportation cost.
>
> Canada Post provides a booklet for calculating transportation cost based
on package weight and the first three characters of the postal code
sequence. For instance, if I want to send a package to an address that
includes G8F 1X1 as the postal code, I take the first 3 characters G8F and
look them up in table 1.
>
> Table 1
>
> PostalCode    Tarrif number
> ---------------------------
> G4V               14
> G8E-G8G           14
> G4R-G4S           13
>
> Since G8F falls in the G8E-G8G range, I now know that the tarrif number is
14. Taking the number 14, I go to table 2
>
> Table 2
>
>    For tarrif Code 14
> Weight(kg)       Price
> ----------------------
>   1.0            5.37
>   1.5            5.61
>   2.0            5.82
>
> If the weight of my package is 1kg, the price is 5.37 to send the package
to the address bearing G8F as the first 3 characters of the postal code.
>
> To render this in the database, I have done the following:
>
> _____________________________
>            p_code            |
> =============================
>   pcode_id   |     tarrif    |
> -----------------------------
>     G4V      |       14      |
> -----------------------------
>     G8E      |       14      |
> -----------------------------
>     G8F      |       14      |
> -----------------------------
>     G8G      |       14      |
> -----------------------------
>     G4R      |       13      |
> -----------------------------
>     G4S      |       13      |
> -----------------------------
>
> __________________________________
>            price_weight           |
> ==================================
>    tarrif   | weight(kg)|  price  |
> ----------------------------------
>      14     |    1.0    |   5.37  |
> ----------------------------------
>      14     |    1.5    |   5.61  |
> ----------------------------------
>      14     |    2.0    |   5.82  |
> ----------------------------------
>      13     |    1.0    |   5.20  |
> ----------------------------------
>      13     |    1.5    |   5.32  |
> ----------------------------------
>      13     |    2.0    |   5.42  |
>
>
> Therefore my sql statement would look something like this:
>
> SELECT price FROM price_weight
> WHERE p_code.tarrif = price_weight.tarrif
> AND pcode = 'G8F'
> AND weight = '1.0';
>
> I think this will work, but before I proceed, I'd like to ask 2 questions:
>
> 1.
> Is it possible to further normalize the data in the p_code and
price_weight tables above?
>
> 2.
> Is it possible to abbreviate the number of records in the p_code table
using regular expressions. For instance, to avoid repetition, I thought I'd
use regular expressions, so that instead of entering the postal code into
separate rows as such:
>
> G4V 14
> G8E 14
> G8F 14
> G8G 14
>
> I could do something like this:
>
> (G4V | G8[E-G]) 14
>
> Somehow I don't think this is possible, but I'm looking for any way to
minimize the number of postal codes that I have to enter, since there's a
lot of them.
>
> Anyway, I realize these questions may have more to do with database design
than postgresql per se. If there's a better place to ask them, please point
me in the right direction.
>
> Thanks,
>
> Mark
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>