Thread: User defined types -- Social Security number...
Anyone have a good pre-built user-defined type definition for creating / maintaining / manipulating a SSN ... where valid chars are in the range 000-00-0000 through 999-99-9999. I imagine that the storage column is probably varchar(11) -- I am looking for a type definition that 1) checks that all numbers are in the range [0..9] 2) inserts the dashes at position 3) accepts either '000123456'OR '000-12-3456' as the input 4) leading zeroes (as in the above) must be stored -- hence varchar(11) instead of numeric 5) always outputs in thje format '000-12-3456' Either that or the question is: How can I coerce postgreSQL into using an input / output "mask"... -- Greg Patnude / The Digital Demention 2916 East Upper Hayden Lake Road Hayden Lake, ID 83835 (208) 762-0762
Greg, > Anyone have a good pre-built user-defined type definition for creating / > maintaining / manipulating a SSN ... where valid chars are in the range > 000-00-0000 through 999-99-9999. Actually, the range is more narrowly defined than that. I'm not sure of the exact rules, but you will never see a leading 0 or a -00- in an SSN. > I imagine that the storage column is probably varchar(11) -- I am looking > for a type definition that Use DOMAINs, not a custom type. It's less work. > Either that or the question is: How can I coerce postgreSQL into using an > input / output "mask"... After you've created your DOMAIN, based on the TEXT type, you can overload the input and output functions to format correctly. Beware, though: input & output functions pretty much have to be written in C. -- Josh Berkus Aglio Database Solutions San Francisco
In the last exciting episode, "Greg Patnude" <gpatnude@hotmail.com> wrote: > Thanks Josh -- I understand that there are valid and invalid SSN's -- > similar rules apply to zip codes and area codes... > > I tried this: > > SELECT to_char(123456789, '000-00-0000'); > which yields 123-45-6789 -- nicely, I might add... > > the trick is getting postgreSQL to do this without having to create an ON > SELECT and ON UPDATE TRIGGER... > > an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick... > > SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx" -- > > I do agree that there are valid ranges -- my main concern is being able to > store any leading zeros - I just need to make sure that something "looks" > like a valid SSN in the formattig > (nnn-nn-nnnn) and that I can store / retrieve it with the approoriate > format -- what I am really trying to accomplish is an "input mask"... > > I hadn't considered using a Domain.... have to look at that.... Strongly recommended; that allows applying the validation in many places without having to repeat validation "code." If you will be using really a lot of these values, and indexing on them, it even may be worth looking at a custom type. A performance "win" would come in using a compact data type. For instance, for 9 digit national ID numbers, you can do a LOT better than an 11 byte string. (Aside: Anything bigger than 34 bits would do, demonstrating that it is a regrettable loss that 36 bit computer systems went the way of the dodo...) -- output = reverse("gro.gultn" "@" "enworbbc") http://www.ntlug.org/~cbbrowne/internet.html I found out why cats drink out of the toilet. My mother told me it's because it's cold in there. And I'm like: How did my mother know THAT? --Wendy Liebman
Josh Berkus wrote: > Greg, > > > Anyone have a good pre-built user-defined type definition for creating / > > maintaining / manipulating a SSN ... where valid chars are in the range > > 000-00-0000 through 999-99-9999. > > Actually, the range is more narrowly defined than that. I'm not sure of the > exact rules, but you will never see a leading 0 or a -00- in an SSN. Actually I know someone who's SSN starts as 003. Agreed on the DOMAIN suggestion. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Thanks Josh -- I understand that there are valid and invalid SSN's -- similar rules apply to zip codes and area codes... I tried this: SELECT to_char(123456789, '000-00-0000'); which yields 123-45-6789 -- nicely, I might add... the trick is getting postgreSQL to do this without having to create an ON SELECT and ON UPDATE TRIGGER... an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick... SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx" -- I do agree that there are valid ranges -- my main concern is being able to store any leading zeros - I just need to make sure that something "looks" like a valid SSN in the formattig (nnn-nn-nnnn) and that I can store / retrieve it with the approoriate format -- what I am really trying to accomplish is an "input mask"... I hadn't considered using a Domain.... have to look at that.... -- Greg Patnude / The Digital Demention 2916 East Upper Hayden Lake Road Hayden Lake, ID 83835 (208) 762-0762 "Josh Berkus" <josh@agliodbs.com> wrote in message news:200402221132.50311.josh@agliodbs.com... > Greg, > > > Anyone have a good pre-built user-defined type definition for creating / > > maintaining / manipulating a SSN ... where valid chars are in the range > > 000-00-0000 through 999-99-9999. > > Actually, the range is more narrowly defined than that. I'm not sure of the > exact rules, but you will never see a leading 0 or a -00- in an SSN. > > > I imagine that the storage column is probably varchar(11) -- I am looking > > for a type definition that > > Use DOMAINs, not a custom type. It's less work. > > > Either that or the question is: How can I coerce postgreSQL into using an > > input / output "mask"... > > After you've created your DOMAIN, based on the TEXT type, you can overload the > input and output functions to format correctly. Beware, though: input & > output functions pretty much have to be written in C. > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend >
On Sun, Feb 22, 2004 at 04:45:51PM -0800, Greg Patnude wrote: > Thanks Josh -- I understand that there are valid and invalid SSN's -- > similar rules apply to zip codes and area codes... > > I tried this: > > SELECT to_char(123456789, '000-00-0000'); > which yields 123-45-6789 -- nicely, I might add... > > the trick is getting postgreSQL to do this without having to create an > ON > SELECT and ON UPDATE TRIGGER... > > an ON UPDATE, SELECT, and INSERT re-write RULE might do the trick... > > SSN's CAN in fact start with a leading 0 -- mine does -- "041-xx-xxxx" > -- > > I do agree that there are valid ranges -- my main concern is being > able to > store any leading zeros - I just need to make sure that something > "looks" > like a valid SSN in the formattig > (nnn-nn-nnnn) and that I can store / retrieve it with the approoriate > format -- what I am really trying to accomplish is an "input mask"... Look, you're thinking way too hard on this. An SSN is a 9-digit number, nothing more. There are some 9-digit numbers which aren't valid SSN's, and you might want to get fancy and create a constraint for that. Regardless, you are making a *major* mistake of confusing data storage with rendering. It is common to *render* an SSN as xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as xx-xxxxxxx. To store the dashes makes no sense. They're in the same place each time, it's wasted data. Store the SSN as an "integer". When you begin to think about this correctly, the "leading zeros" problem disappears since that is also a *rendering* issue. When you pull the data out, either fix it up in your programming language to the format that you wish, or use the to_char function as shown above in your select statements. To help you think about this whole issue, consider the timestamp datatype. Timestamps are stored as a Julian date internally. I suspect that they use a double-floating point as the actual format, but regardless the point is that it's a number. Rather than storing 2004-02-29 21:14:27.030434-06 We store: 2453065.88503472 It's easier to use that as a basic format from which we can render it in any way we wish. The same applies to your SSN. Michael -- Michael Darrin Chaney mdchaney@michaelchaney.com http://www.michaelchaney.com/
The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote: > Look, you're thinking way too hard on this. An SSN is a 9-digit number, > nothing more. There are some 9-digit numbers which aren't valid SSN's, > and you might want to get fancy and create a constraint for that. > > Regardless, you are making a *major* mistake of confusing data > storage with rendering. It is common to *render* an SSN as > xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as > xx-xxxxxxx. To store the dashes makes no sense. They're in the > same place each time, it's wasted data. > > Store the SSN as an "integer". When you begin to think about this > correctly, the "leading zeros" problem disappears since that is also a > *rendering* issue. Well put. The one thing that is a bit unfortunate is that 32 bit ints aren't quite big enough for this. You need 1 extra digit :-(. > When you pull the data out, either fix it up in your programming > language to the format that you wish, or use the to_char function as > shown above in your select statements. Using a view to hide the "physical" representation is also an idea. A full scale type definition could make for an even more efficient approach that makes the implementation appear invisible. > To help you think about this whole issue, consider the timestamp > datatype. Timestamps are stored as a Julian date internally. I > suspect that they use a double-floating point as the actual format, > but regardless the point is that it's a number. Rather than storing Actually, it's an "int64"; a 64 bit integer, on platforms that support that type. It's a "double" only on platforms that do not support that type. > It's easier to use that as a basic format from which we can render > it in any way we wish. Indeed. -- If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me http://cbbrowne.com/info/spiritual.html "I owe the government $3400 in taxes. So I sent them two hammers and a toilet seat." -- Michael McShane
I would represent an SSN as numeric(9,0). an int 32 would work though. 2**31 is > 999999999 On Sun, 29 Feb 2004, Christopher Browne wrote: > The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote: > > Look, you're thinking way too hard on this. An SSN is a 9-digit number, > > nothing more. There are some 9-digit numbers which aren't valid SSN's, > > and you might want to get fancy and create a constraint for that. > > > > Regardless, you are making a *major* mistake of confusing data > > storage with rendering. It is common to *render* an SSN as > > xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as > > xx-xxxxxxx. To store the dashes makes no sense. They're in the > > same place each time, it's wasted data. > > > > Store the SSN as an "integer". When you begin to think about this > > correctly, the "leading zeros" problem disappears since that is also a > > *rendering* issue. > > Well put. > > The one thing that is a bit unfortunate is that 32 bit ints aren't > quite big enough for this. You need 1 extra digit :-(. > > > When you pull the data out, either fix it up in your programming > > language to the format that you wish, or use the to_char function as > > shown above in your select statements. > > Using a view to hide the "physical" representation is also an idea. > > A full scale type definition could make for an even more efficient > approach that makes the implementation appear invisible. > > > To help you think about this whole issue, consider the timestamp > > datatype. Timestamps are stored as a Julian date internally. I > > suspect that they use a double-floating point as the actual format, > > but regardless the point is that it's a number. Rather than storing > > Actually, it's an "int64"; a 64 bit integer, on platforms that support > that type. It's a "double" only on platforms that do not support that > type. > > > It's easier to use that as a basic format from which we can render > > it in any way we wish. > > Indeed. >
On Monday 01 March 2004 8:54 am, Dana Hudes wrote: > I would represent an SSN as numeric(9,0). > an int 32 would work though. > 2**31 is > 999999999 > > On Sun, 29 Feb 2004, Christopher Browne wrote: > > The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote: > > > Look, you're thinking way too hard on this. An SSN is a > > > 9-digit number, nothing more. There are some 9-digit numbers > > > which aren't valid SSN's, and you might want to get fancy and > > > create a constraint for that. > > > > > > Regardless, you are making a *major* mistake of confusing data > > > storage with rendering. It is common to *render* an SSN as > > > xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) > > > as xx-xxxxxxx. To store the dashes makes no sense. They're in > > > the same place each time, it's wasted data. I missed the start of this thread but will chime in with a comment anyway. My rule is to select an appropriate numeric type of data if you will be doing numeric types of things to it, character types if you will be doing character manipulations, etc. I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN), SSN+7.86 but there are plenty of good reasons to need the first three characters (the "area number"), the middle two characters (the "group number", and the last 4 characters (the "serial number", often (ab)used as a password for banking and other purposes). While the dashes certainly don't need to be stored, they are not in arbitrary positions as they delimit the parts of the SSN noted above. One might even want to store the SSN in three columns, the first linked to a table of valid area numbers, the second error checked so "00" is not valid and so on or get even more fancy and error check against: http://www.ssa.gov/employer/highgroup.txt. It all depends on one's specific requirements. Google and you will find SSN info pages such as: http://proagency.tripod.com/usasssearch.html http://www.ssa.gov/foia/stateweb.html Cheers, Steve
On Sun, Feb 29, 2004 at 11:11:31PM -0500, Christopher Browne wrote: > The world rejoiced as mdchaney@michaelchaney.com (Michael Chaney) wrote: > > Look, you're thinking way too hard on this. An SSN is a 9-digit number, > > nothing more. There are some 9-digit numbers which aren't valid SSN's, > > and you might want to get fancy and create a constraint for that. > > > > Regardless, you are making a *major* mistake of confusing data > > storage with rendering. It is common to *render* an SSN as > > xxx-xx-xxxx and its cousin the FETID (Federal Employers Tax ID) as > > xx-xxxxxxx. To store the dashes makes no sense. They're in the > > same place each time, it's wasted data. > > > > Store the SSN as an "integer". When you begin to think about this > > correctly, the "leading zeros" problem disappears since that is also a > > *rendering* issue. > > Well put. > > The one thing that is a bit unfortunate is that 32 bit ints aren't > quite big enough for this. You need 1 extra digit :-(. For what? The largest SSN is 999,999,999, a signed 32-bit int goes to just over 2,000,000,000. Ever hear of a "4GB limit"? Michael -- Michael Darrin Chaney mdchaney@michaelchaney.com http://www.michaelchaney.com/
On Mon, Mar 01, 2004 at 09:42:48AM -0800, Steve Crawford wrote: > I missed the start of this thread but will chime in with a comment > anyway. > > My rule is to select an appropriate numeric type of data if you will > be doing numeric types of things to it, character types if you will > be doing character manipulations, etc. > > I don't know of any good reasons to need to know SSN/6.9, sqrt(SSN), > SSN+7.86 but there are plenty of good reasons to need the first three > characters (the "area number"), the middle two characters (the "group > number", and the last 4 characters (the "serial number", often > (ab)used as a password for banking and other purposes). Another excellent point. I often store zip codes as text for this reason. The only other thing that I would mention is that if the SSN field in the db will be a key of some sort, which is often the case, then it might be more efficient to store it as an integer. It might be more efficient to store it as a character string. The author should test in this case to determine the most efficient way. As for character vs. integer manipulations, in most scripting style languages, which is pretty much exlusively what I use, there's no need to think about types, and something like an SSN will silently change between being character or integer depending on what operations are being performed on it. Michael -- Michael Darrin Chaney mdchaney@michaelchaney.com http://www.michaelchaney.com/
"Greg Patnude" <gpatnude@hotmail.com> writes: > It would be really sweet in postgreSQL if we could apply the > equivalent of a printf(columnname) to the table definition -- MS > Access has what they call an "input mask" and it comes in really > handy -- however -- I havent used Access for anthing serious for > about 4 years... Hmm. % man "create type" NAME CREATE TYPE - define a new data type SYNOPSIS CREATE TYPE name AS ( attribute_name data_type [, ... ] ) CREATE TYPE name ( INPUT = input_function, OUTPUT = output_function [ , RECEIVE = receive_function] [ , SEND = send_function ] [ , INTERNALLENGTH = { internallength | VARIABLE } ] [ , PASSEDBYVALUE ] [ , ALIGNMENT = alignment ] [ , STORAGE = storage ] [ , DEFAULT = default] [ , ELEMENT = element ] [ , DELIMITER = delimiter ] ) "input_function" and "output_function" sure _sound_ like what you're looking for. They would make your imagined issue of "missing leading zeros" go away. I think that CREATE TYPE has been around for a goodly few years now; it is not impossible that this functionality has been around longer than MS Access (which is saying something!). -- output = reverse("gro.mca" "@" "enworbbc") http://www3.sympatico.ca/cbbrowne/x.html Rules of the Evil Overlord #116. "If I capture the hero's starship, I will keep it in the landing bay with the ramp down, only a few token guards on duty and a ton of explosives set to go off as soon as it clears the blast-range." <http://www.eviloverlord.com/>