Thread: User defined types -- Social Security number...

User defined types -- Social Security number...

From
"Greg Patnude"
Date:
Anyone have a good pre-built user-defined type definition for creating /
maintaining / manipulating a SSN ... where valid chars are in the range
000-00-0000 through 999-99-9999.

I imagine that the storage column is probably varchar(11) -- I am looking
for a type definition that
   1) checks that all numbers are in the range [0..9]   2) inserts the dashes at position   3) accepts either
'000123456'OR '000-12-3456' as the input   4) leading zeroes (as in the above) must be stored -- hence varchar(11)
 
instead of numeric   5) always outputs in thje format '000-12-3456'

Either that or the question is: How can I coerce postgreSQL into using an
input / output "mask"...

-- 
Greg Patnude / The Digital Demention
2916 East Upper Hayden Lake Road
Hayden Lake, ID 83835
(208) 762-0762




Re: User defined types -- Social Security number...

From
Josh Berkus
Date:
Greg,

> Anyone have a good pre-built user-defined type definition for creating /
> maintaining / manipulating a SSN ... where valid chars are in the range
> 000-00-0000 through 999-99-9999.

Actually, the range is more narrowly defined than that.  I'm not sure of the 
exact rules, but you will never see a leading 0 or a -00- in an SSN.

> I imagine that the storage column is probably varchar(11) -- I am looking
> for a type definition that

Use DOMAINs, not a custom type.  It's less work.

> Either that or the question is: How can I coerce postgreSQL into using an
> input / output "mask"...

After you've created your DOMAIN, based on the TEXT type, you can overload the 
input and output functions to format correctly.   Beware, though: input & 
output functions pretty much have to be written in C.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: User defined types -- Social Security number...

From
Christopher Browne
Date:
In the last exciting episode, "Greg Patnude" <gpatnude@hotmail.com> wrote:
> Thanks Josh -- I understand that there are valid and invalid SSN's -- 
> similar rules apply to zip codes and area codes...
>
> I tried this:
>
> SELECT to_char(123456789, '000-00-0000');
> which yields 123-45-6789 -- nicely, I might add...
>
> the trick is getting postgreSQL to do this without having to create an ON
> SELECT and ON UPDATE TRIGGER...
>
> an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick...
>
> SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx" --
>
> I do agree that there are valid ranges -- my main concern is being able to
> store any leading zeros - I just need to make sure that something "looks"
> like a valid SSN in the formattig
> (nnn-nn-nnnn) and that I can store / retrieve it with the approoriate
> format -- what I am really trying to accomplish is an "input mask"...
>
> I hadn't considered using a Domain.... have to look at that....

Strongly recommended; that allows applying the validation in many
places without having to repeat validation "code."

If you will be using really a lot of these values, and indexing on
them, it even may be worth looking at a custom type.

A performance "win" would come in using a compact data type.  For
instance, for 9 digit national ID numbers, you can do a LOT better
than an 11 byte string.  (Aside: Anything bigger than 34 bits would
do, demonstrating that it is a regrettable loss that 36 bit computer
systems went the way of the dodo...)
-- 
output = reverse("gro.gultn" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/internet.html
I found out why  cats drink out of the toilet. My  mother told me it's
because it's cold in there. And I'm like: How did my mother know THAT?
--Wendy Liebman


Re: User defined types -- Social Security number...

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Greg,
> 
> > Anyone have a good pre-built user-defined type definition for creating /
> > maintaining / manipulating a SSN ... where valid chars are in the range
> > 000-00-0000 through 999-99-9999.
> 
> Actually, the range is more narrowly defined than that.  I'm not sure of the 
> exact rules, but you will never see a leading 0 or a -00- in an SSN.

Actually I know someone who's SSN starts as 003.

Agreed on the DOMAIN suggestion.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: User defined types -- Social Security number...

From
"Greg Patnude"
Date:
Thanks Josh -- I understand that there are valid and invalid SSN's -- 
similar rules apply to zip codes and area codes...

I tried this:

SELECT to_char(123456789, '000-00-0000');
which yields 123-45-6789 -- nicely, I might add...

the trick is getting postgreSQL to do this without having to create an ON
SELECT and ON UPDATE TRIGGER...

an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick...

SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx" --

I do agree that there are valid ranges -- my main concern is being able to
store any leading zeros - I just need to make sure that something "looks"
like a valid SSN in the formattig
(nnn-nn-nnnn) and that I can store / retrieve it with the approoriate
format -- what I am really trying to accomplish is an "input mask"...

I hadn't considered using a Domain.... have to look at that....

-- 
Greg Patnude / The Digital Demention
2916 East Upper Hayden Lake Road
Hayden Lake, ID 83835
(208) 762-0762

"Josh Berkus" <josh@agliodbs.com> wrote in message
news:200402221132.50311.josh@agliodbs.com...
> Greg,
>
> > Anyone have a good pre-built user-defined type definition for creating /
> > maintaining / manipulating a SSN ... where valid chars are in the range
> > 000-00-0000 through 999-99-9999.
>
> Actually, the range is more narrowly defined than that.  I'm not sure of
the
> exact rules, but you will never see a leading 0 or a -00- in an SSN.
>
> > I imagine that the storage column is probably varchar(11) -- I am
looking
> > for a type definition that
>
> Use DOMAINs, not a custom type.  It's less work.
>
> > Either that or the question is: How can I coerce postgreSQL into using
an
> > input / output "mask"...
>
> After you've created your DOMAIN, based on the TEXT type, you can overload
the
> input and output functions to format correctly.   Beware, though: input &
> output functions pretty much have to be written in C.
>
> -- 
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>




Re: User defined types -- Social Security number...

From
Michael Chaney
Date:
On Sun, Feb 22, 2004 at 04:45:51PM -0800, Greg Patnude wrote:
> Thanks Josh -- I understand that there are valid and invalid SSN's --
> similar rules apply to zip codes and area codes...
>
> I tried this:
>
> SELECT to_char(123456789, '000-00-0000');
> which yields 123-45-6789 -- nicely, I might add...
>
> the trick is getting postgreSQL to do this without having to create an
> ON
> SELECT and ON UPDATE TRIGGER...
>
> an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick...
>
> SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx"
> --
>
> I do agree that there are valid ranges -- my main concern is being
> able to
> store any leading zeros - I just need to make sure that something
> "looks"
> like a valid SSN in the formattig
> (nnn-nn-nnnn) and that I can store / retrieve it with the approoriate
> format -- what I am really trying to accomplish is an "input mask"...

Look, you're thinking way too hard on this.  An SSN is a 9-digit number,
nothing more.  There are some 9-digit numbers which aren't valid SSN's,
and you might want to get fancy and create a constraint for that.

Regardless, you are making a *major* mistake of confusing data storage
with
rendering.  It is common to *render* an SSN as xxx-xx-xxxx and its
cousin the FETID (Federal Employers Tax ID) as xx-xxxxxxx.  To store the
dashes makes no sense.  They're in the same place each time, it's wasted
data.

Store the SSN as an "integer".  When you begin to think about this
correctly, the "leading zeros" problem disappears since that is also a
*rendering* issue.

When you pull the data out, either fix it up in your programming
language to the format that you wish, or use the to_char function as
shown above in your select statements.

To help you think about this whole issue, consider the timestamp
datatype.  Timestamps are stored as a Julian date internally.  I suspect
that they use a double-floating point as the actual format, but
regardless the point is that it's a number.  Rather than storing

2004-02-29 21:14:27.030434-06

We store:

2453065.88503472

It's easier to use that as a basic format from which we can render it in
any way we wish.

The same applies to your SSN.

Michael

-- 
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/


Re: User defined types -- Social Security number...

From
Christopher Browne
Date:
The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote:
> Look, you're thinking way too hard on this.  An SSN is a 9-digit number,
> nothing more.  There are some 9-digit numbers which aren't valid SSN's,
> and you might want to get fancy and create a constraint for that.
>
> Regardless, you are making a *major* mistake of confusing data
> storage with rendering.  It is common to *render* an SSN as
> xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as
> xx-xxxxxxx.  To store the dashes makes no sense.  They're in the
> same place each time, it's wasted data.
>
> Store the SSN as an "integer".  When you begin to think about this
> correctly, the "leading zeros" problem disappears since that is also a
> *rendering* issue.

Well put.

The one thing that is a bit unfortunate is that 32 bit ints aren't
quite big enough for this.  You need 1 extra digit :-(.

> When you pull the data out, either fix it up in your programming
> language to the format that you wish, or use the to_char function as
> shown above in your select statements.

Using a view to hide the "physical" representation is also an idea.

A full scale type definition could make for an even more efficient
approach that makes the implementation appear invisible.

> To help you think about this whole issue, consider the timestamp
> datatype.  Timestamps are stored as a Julian date internally.  I
> suspect that they use a double-floating point as the actual format,
> but regardless the point is that it's a number.  Rather than storing

Actually, it's an "int64"; a 64 bit integer, on platforms that support
that type.  It's a "double" only on platforms that do not support that
type.

> It's easier to use that as a basic format from which we can render
> it in any way we wish.

Indeed.
-- 
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://cbbrowne.com/info/spiritual.html
"I owe the government $3400 in  taxes.  So I sent them two hammers and
a toilet seat."  -- Michael McShane


Re: User defined types -- Social Security number...

From
Dana Hudes
Date:
I would represent an SSN as numeric(9,0).
an int 32 would work though.
2**31 is > 999999999

On Sun, 29 Feb 2004, Christopher Browne wrote:

> The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote:
> > Look, you're thinking way too hard on this.  An SSN is a 9-digit number,
> > nothing more.  There are some 9-digit numbers which aren't valid SSN's,
> > and you might want to get fancy and create a constraint for that.
> >
> > Regardless, you are making a *major* mistake of confusing data
> > storage with rendering.  It is common to *render* an SSN as
> > xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as
> > xx-xxxxxxx.  To store the dashes makes no sense.  They're in the
> > same place each time, it's wasted data.
> >
> > Store the SSN as an "integer".  When you begin to think about this
> > correctly, the "leading zeros" problem disappears since that is also a
> > *rendering* issue.
> 
> Well put.
> 
> The one thing that is a bit unfortunate is that 32 bit ints aren't
> quite big enough for this.  You need 1 extra digit :-(.
> 
> > When you pull the data out, either fix it up in your programming
> > language to the format that you wish, or use the to_char function as
> > shown above in your select statements.
> 
> Using a view to hide the "physical" representation is also an idea.
> 
> A full scale type definition could make for an even more efficient
> approach that makes the implementation appear invisible.
> 
> > To help you think about this whole issue, consider the timestamp
> > datatype.  Timestamps are stored as a Julian date internally.  I
> > suspect that they use a double-floating point as the actual format,
> > but regardless the point is that it's a number.  Rather than storing
> 
> Actually, it's an "int64"; a 64 bit integer, on platforms that support
> that type.  It's a "double" only on platforms that do not support that
> type.
> 
> > It's easier to use that as a basic format from which we can render
> > it in any way we wish.
> 
> Indeed.
> 


Re: User defined types -- Social Security number...

From
Steve Crawford
Date:
On Monday 01 March 2004 8:54 am, Dana Hudes wrote:
> I would represent an SSN as numeric(9,0).
> an int 32 would work though.
> 2**31 is > 999999999
>
> On Sun, 29 Feb 2004, Christopher Browne wrote:
> > The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) 
wrote:
> > > Look, you're thinking way too hard on this.  An SSN is a
> > > 9-digit number, nothing more.  There are some 9-digit numbers
> > > which aren't valid SSN's, and you might want to get fancy and
> > > create a constraint for that.
> > >
> > > Regardless, you are making a *major* mistake of confusing data
> > > storage with rendering.  It is common to *render* an SSN as
> > > xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID)
> > > as xx-xxxxxxx.  To store the dashes makes no sense.  They're in
> > > the same place each time, it's wasted data.

I missed the start of this thread but will chime in with a comment 
anyway.

My rule is to select an appropriate numeric type of data if you will 
be doing numeric types of things to it, character types if you will 
be doing character manipulations, etc.

I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN), 
SSN+7.86 but there are plenty of good reasons to need the first three 
characters (the "area number"), the middle two characters (the "group  
number", and the last 4 characters (the "serial number", often 
(ab)used as a password for banking and other purposes).

While the dashes certainly don't need to be stored, they are not in 
arbitrary positions as they delimit the parts of the SSN noted above. 

One might even want to store the SSN in three columns, the first 
linked to a table of valid area numbers, the second error checked so 
"00" is not valid and so on or get even more fancy and error check 
against: http://www.ssa.gov/employer/highgroup.txt. It all depends on 
one's specific requirements.

Google and you will find SSN info pages such as: 
http://proagency.tripod.com/usasssearch.html
http://www.ssa.gov/foia/stateweb.html

Cheers,
Steve



Re: User defined types -- Social Security number...

From
Michael Chaney
Date:
On Sun, Feb 29, 2004 at 11:11:31PM -0500, Christopher Browne wrote:
> The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote:
> > Look, you're thinking way too hard on this.  An SSN is a 9-digit number,
> > nothing more.  There are some 9-digit numbers which aren't valid SSN's,
> > and you might want to get fancy and create a constraint for that.
> >
> > Regardless, you are making a *major* mistake of confusing data
> > storage with rendering.  It is common to *render* an SSN as
> > xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as
> > xx-xxxxxxx.  To store the dashes makes no sense.  They're in the
> > same place each time, it's wasted data.
> >
> > Store the SSN as an "integer".  When you begin to think about this
> > correctly, the "leading zeros" problem disappears since that is also a
> > *rendering* issue.
> 
> Well put.
> 
> The one thing that is a bit unfortunate is that 32 bit ints aren't
> quite big enough for this.  You need 1 extra digit :-(.

For what?  The largest SSN is 999,999,999, a signed 32-bit int goes to
just over 2,000,000,000.  Ever hear of a "4GB limit"?

Michael
-- 
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/


Re: User defined types -- Social Security number...

From
Michael Chaney
Date:
On Mon, Mar 01, 2004 at 09:42:48AM -0800, Steve Crawford wrote:
> I missed the start of this thread but will chime in with a comment 
> anyway.
> 
> My rule is to select an appropriate numeric type of data if you will 
> be doing numeric types of things to it, character types if you will 
> be doing character manipulations, etc.
> 
> I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN), 
> SSN+7.86 but there are plenty of good reasons to need the first three 
> characters (the "area number"), the middle two characters (the "group  
> number", and the last 4 characters (the "serial number", often 
> (ab)used as a password for banking and other purposes).

Another excellent point.  I often store zip codes as text for this
reason.

The only other thing that I would mention is that if the SSN field in
the db will be a key of some sort, which is often the case, then it
might be more efficient to store it as an integer.  It might be more
efficient to store it as a character string.  The author should test
in this case to determine the most efficient way.

As for character vs. integer manipulations, in most scripting style
languages, which is pretty much exlusively what I use, there's no
need to think about types, and something like an SSN will silently
change between being character or integer depending on what operations
are being performed on it.

Michael
-- 
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/


Re: User defined types -- Social Security number...

From
Chris Browne
Date:
"Greg Patnude" <gpatnude@hotmail.com> writes:
> It would be really sweet in postgreSQL if we could apply the
> equivalent of a printf(columnname) to the table definition -- MS
> Access has what they call an "input mask" and it comes in really
> handy -- however -- I havent used Access for anthing serious for
> about 4 years...

Hmm.  

% man "create type"

NAME      CREATE TYPE - define a new data type

SYNOPSIS      CREATE TYPE name AS          ( attribute_name data_type [, ... ] )
      CREATE TYPE name (          INPUT = input_function,          OUTPUT = output_function          [ , RECEIVE =
receive_function]          [ , SEND = send_function ]          [ , INTERNALLENGTH = { internallength | VARIABLE } ]
    [ , PASSEDBYVALUE ]          [ , ALIGNMENT = alignment ]          [ , STORAGE = storage ]          [ , DEFAULT =
default]          [ , ELEMENT = element ]          [ , DELIMITER = delimiter ]      )
 

"input_function" and "output_function" sure _sound_ like what you're
looking for.

They would make your imagined issue of "missing leading zeros" go
away.

I think that CREATE TYPE has been around for a goodly few years now;
it is not impossible that this functionality has been around longer
than MS Access (which is saying something!).
-- 
output = reverse("gro.mca" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/x.html
Rules of the Evil Overlord #116.  "If I capture the hero's starship, I
will keep it in  the landing bay with the ramp down,  only a few token
guards on  duty and a ton  of explosives set to  go off as  soon as it
clears the blast-range." <http://www.eviloverlord.com/>