Thread: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Craig Ringer
Date:
Hi This must be a fairly common requirement, but either I don't know how to ask Google about it or there's not as much out there as I would've expected. I'm looking for a way to map the output from a monotonically increasing sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a fairly random different value in the availible space with a 1:1 input->output relationship. In other words, for the input "27" the output will always be the same (say 32 bit) number, and no other input will produce that output. Note that I'm *NOT* looking for a PRNG that takes the previous output as its input. That'd force me to use the same techniques as for a gapless sequence in Pg, with all the associated horror with locking and deadlocks, the performance issues, etc. Does anyone here know of a good algorithm to do this that doesn't just iterate `n' times through a PRNG with the same seed, but instead does a true non-colliding space mapping? If I find something good and there aren't any existing Pl/PgSQL implementations I'll post one for others' use, since I'm pretty sure it must come up a lot. You don't want your database to send out "invoice #1" or "customer #1" after all. (I'm also going to be looking for efficient ways to calculate effective check digits for arbitrary numbers within a certain range, too, and will post something for that, but that comes later). -- Craig Ringer
Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Steve Atkins
Date:
On Apr 30, 2009, at 1:54 AM, Craig Ringer wrote: > Hi > > This must be a fairly common requirement, but either I don't know > how to > ask Google about it or there's not as much out there as I would've > expected. > > I'm looking for a way to map the output from a monotonically > increasing > sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a > fairly random different value in the availible space with a 1:1 > input->output relationship. In other words, for the input "27" the > output will always be the same (say 32 bit) number, and no other input > will produce that output. > > Note that I'm *NOT* looking for a PRNG that takes the previous > output as > its input. That'd force me to use the same techniques as for a gapless > sequence in Pg, with all the associated horror with locking and > deadlocks, the performance issues, etc. > > Does anyone here know of a good algorithm to do this that doesn't just > iterate `n' times through a PRNG with the same seed, but instead > does a > true non-colliding space mapping? > > If I find something good and there aren't any existing Pl/PgSQL > implementations I'll post one for others' use, since I'm pretty sure > it > must come up a lot. You don't want your database to send out "invoice > #1" or "customer #1" after all. > > (I'm also going to be looking for efficient ways to calculate > effective > check digits for arbitrary numbers within a certain range, too, and > will > post something for that, but that comes later). XOR it with a constant. Or depending on your needs, just add a constant. Or shuffle bits. Or some combination of those. Cheers, Steve
Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Jasen Betts
Date:
On 2009-04-30, Craig Ringer <craig@postnewspapers.com.au> wrote: > Hi > > This must be a fairly common requirement, but either I don't know how to > ask Google about it or there's not as much out there as I would've expected. > > I'm looking for a way to map the output from a monotonically increasing > sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a > fairly random different value in the availible space with a 1:1 > input->output relationship. In other words, for the input "27" the > output will always be the same (say 32 bit) number, and no other input > will produce that output. so you want DEFAULT magic_func( nextval('foo_id_seq'::regclass) ) where magic_func is the 1:1 mapping and foo_id_seq is the sequence that feeds it. > Note that I'm *NOT* looking for a PRNG that takes the previous output as > its input. That'd force me to use the same techniques as for a gapless > sequence in Pg, with all the associated horror with locking and > deadlocks, the performance issues, etc. any good PRNG will have the 1:1 mapping you want, but fed sequential values they tend to produce predictable output. I suggest for magic_func you use a collection of bit-shifts, adds, and XORs then mask out the bits abouve 31 and use what's left. test and adjust if needed, > If I find something good and there aren't any existing Pl/PgSQL > implementations I'll post one for others' use, since I'm pretty sure it > must come up a lot. You don't want your database to send out "invoice > #1" or "customer #1" after all. to this end was pg_catalog.setvalue( sequence, value,TRUE) invented. > (I'm also going to be looking for efficient ways to calculate effective > check digits for arbitrary numbers within a certain range, too, and will > post something for that, but that comes later).
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Bill Moran
Date:
In response to Jasen Betts <jasen@xnet.co.nz>: > On 2009-04-30, Craig Ringer <craig@postnewspapers.com.au> wrote: > > Hi > > > > This must be a fairly common requirement, but either I don't know how to > > ask Google about it or there's not as much out there as I would've expected. > > > > I'm looking for a way to map the output from a monotonically increasing > > sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a > > fairly random different value in the availible space with a 1:1 > > input->output relationship. In other words, for the input "27" the > > output will always be the same (say 32 bit) number, and no other input > > will produce that output. > > so you want > > DEFAULT magic_func( nextval('foo_id_seq'::regclass) ) > > where magic_func is the 1:1 mapping and foo_id_seq is the sequence > that feeds it. Sounds like you're reinventing message digests ... > > If I find something good and there aren't any existing Pl/PgSQL > > implementations I'll post one for others' use, since I'm pretty sure it > > must come up a lot. You don't want your database to send out "invoice > > #1" or "customer #1" after all. Most of the systems I've seen like this do one of a few things: * Start with an arbitrary # like 1000 * Prepend the date (pretty common for invoice #s) like 20090501001 * Just start with #1 ... I mean, what's the big deal? -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Craig Ringer
Date:
Bill Moran wrote: > Sounds like you're reinventing message digests ... Message digests do not guarantee non-colliding output for a given input. They provide a result UNLIKELY to collide for any reasonable set of inputs. They need to produce quite large output values to do this effectively. Truncating the digest to fit the desired data type doesn't help, if you have to do this. A message digest is intended for use where the inputs are arbitrarily large and must be condensed down to a generally fixed-length small-ish value that should be very unlikely to collide for any two given inputs. What I'm looking for is a function that, given an input within a constrained range (say, a 32 bit integer) produces a different output within the same range. For any given input, the output should be the same each time, and for any given output there should only be one input that results in that output. So far, picking a suitable value to xor the input with seems like it'll be better than nothing, and good enough for the casual examination that's all I'm required to care about. So long as I don't call it "xor encryption" ... sigh. > Most of the systems I've seen like this do one of a few things: > * Start with an arbitrary # like 1000 > * Prepend the date (pretty common for invoice #s) like 20090501001 > * Just start with #1 ... I mean, what's the big deal? I'm not the one who cares. Alas, I've been given requirements to satisfy, and one of the major ones is that customer numbers in particular must be non-sequential (as close to random-looking as possible) and allocated across a large range. -- Craig Ringer
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Bill Moran
Date:
In response to Craig Ringer <craig@postnewspapers.com.au>: > Bill Moran wrote: > > > Sounds like you're reinventing message digests ... [snip your comments about why I was wrong about MDs working] > So long as I don't call it "xor encryption" ... sigh. > > > Most of the systems I've seen like this do one of a few things: > > * Start with an arbitrary # like 1000 > > * Prepend the date (pretty common for invoice #s) like 20090501001 > > * Just start with #1 ... I mean, what's the big deal? > > I'm not the one who cares. Alas, I've been given requirements to > satisfy, and one of the major ones is that customer numbers in > particular must be non-sequential (as close to random-looking as > possible) and allocated across a large range. Why not just grab random values for that one, then? Just generate a random value, check to ensure it doesn't already exist ... rinse and repeat if it's a duplicate. I mean, how often is this system adding new customers? Another trick with the invoice #s is to prepend them with a subset of the client ID. You could also use the same technique ... generate a random value, repeat if it already exists ... -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
"Daniel Verite"
Date:
Craig Ringer wrote: > What I'm looking for is a function that, given an input within a > constrained range (say, a 32 bit integer) produces a different > output within the same range. For any given input, the output > should be the same each time, and for any given output > there should only be one input that results in that output. That's a permutation, as used in symmetric ciphering. A proven way to build one is to use a Feistel network: http://en.wikipedia.org/wiki/Feistel_cipher In principle, the function used to encode the blocks uses a cipher key, but a pseudo-randomizing of the input is good enough when you're not interested in making it crypto-secure. Here is a plpgqsl implementation: CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns bigint AS $$ DECLARE l1 int; l2 int; r1 int; r2 int; i int:=0; BEGIN l1:= (value >> 16) & 65535; r1:= value&65535; WHILE i<3 LOOP l2:=r1; r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int; l1:=l2; r1:=r2; i:=i+1; END LOOP; return ((l1::bigint<<16) + r1); END; $$ LANGUAGE plpgsql strict immutable; Note that it returns a bigint because we don't have unsigned integers in PG. If you're OK with getting negative values, the return type can be changed to int. Otherwise if you need a positive result that fits in 32 bits, it's possible to tweak the code to use 15 bits blocks instead of 16, but then the input will have to be less than 2^30. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Erik Jones
Date:
On May 1, 2009, at 6:06 AM, Craig Ringer wrote: <snip> > What I'm looking for is a function that, given an input within a > constrained range (say, a 32 bit integer) produces a different output > within the same range. For any given input, the output should be the > same each time, and for any given output there should only be one > input > that results in that output. I think you drop the idea of a repeatable mapping you may have some success with the Knuth (aka Fisher-Yates) shuffle algorithm: http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle Why does anything need to be repeatable when you only need to make sure that each number is only generated once? Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Jasen Betts
Date:
On 2009-05-03, Erik Jones <ejones@engineyard.com> wrote: >> What I'm looking for is a function that, given an input within a >> constrained range (say, a 32 bit integer) produces a different output >> within the same range. For any given input, the output should be the >> same each time, and for any given output there should only be one >> input >> that results in that output. > > I think you drop the idea of a repeatable mapping you may have some > success with the Knuth (aka Fisher-Yates) shuffle algorithm: http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle > Why does anything need to be repeatable when you only need to make > sure that each number is only generated once? That means storing a long list of numbers and doing queries similar to the following to get ne next value for the sequence. select id from idtable order by id limit 1 offset random(0, (select count (*) from idtable) a ramdom-looking 1:1 mapping is potentially much more efficient.
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Craig Ringer
Date:
Jasen Betts wrote: > That means storing a long list of numbers and doing queries similar to > the following to get ne next value for the sequence. > > select id from idtable > order by id > limit 1 > offset random(0, (select count (*) from idtable) > > a ramdom-looking 1:1 mapping is potentially much more efficient. You'd probably be better off generating it with something like: CREATE TABLE shuffled AS (n integer, s integer) AS SELECT n, NULL FROM generate_series(0, max_value) AS n; SELECT shuffle(); -- sets `s' for each `n' ... then querying it with: SELECT s FROM shuffled WHERE n = <value-wanted>; ... but you still have to generate, shuffle, and store a huge collection of values. -- Craig Ringer
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Craig Ringer
Date:
Daniel Verite wrote: > That's a permutation, as used in symmetric ciphering. A proven way to > build one is to use a Feistel network: Thanks. That looks to be pretty much what I'm after. > Here is a plpgqsl implementation: Wow. I really appreciate that. I'll have to test it out and chuck it on the wiki (if that's OK with you) for future use, alongside whatever I end up using for my check digit generator and verifier. Come to think of it, check digits are almost unnecessary; after all, in a large numeric space the chances of any misheard, typo'd, etc value happening to be another valid value is pretty minimal. A simple mod-10-of-sum-of-digits would probably be quite sufficient just to help the UI differentiate between "Whoops, that number was incorrectly entered or misheard" and "Customer not found". -- Craig Ringer
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Craig Ringer
Date:
Just to follow up on this with a look at check digit generation and checkin: Luhn's algorithm should do for the check digit, I think. It need not be anything complex given the chances for collision in the sample space. Additionally, it's commonly used, easily implemented and widely understood since it's used in credit card numbers among many other things. For anyone who later needs it, here's a handy verifier for the Luhn's Algorithm check digit, written as plain SQL functions with all integer-based computation (no string decomposition), along with a corresponding check digit generator and associated utility functions: CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$ SELECT -- Add the digits, doubling odd-numbered digits (counting left with -- least significant as zero), and see if the sum is evenly -- divisible by zero. MOD(SUM( -- Extract digit `n' counting left from least significant as zero MOD( ( $1::int8 / (10^n)::int8 ), 10::int8) -- Double odd-numbered digits * (MOD(n,2) + 1) ), 10) = 0 FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last digit of the input is a correct check digit for the rest of the input according to Luhn''s algorithm.' CREATE OR REPLACE FUNCTION luhn_generate_checkdigit(int8) RETURNS int8 AS $$ SELECT -- Add the digits, doubling even-numbered digits (counting left -- with least-significant as zero). Subtract the remainder of -- dividing the sum by 10 from 10, and take the remainder -- of dividing that by 10 in turn. MOD(10 - MOD(SUM( MOD( ($1::int8 / (10^n)::int8), 10::int8 ) * (2 - MOD(n,2)) -- double even digits ),10),10)::int8 FROM generate_series(0, ceil(log($1))::integer - 1) AS n; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_generate_checkdigit(int8) IS 'For the input value, generate a check digit according to Luhn''s algorithm'; CREATE OR REPLACE FUNCTION luhn_generate(int8) RETURNS int8 AS $$ SELECT 10 * $1 + luhn_generate_checkdigit($1); $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_generate(int8) IS 'Append a check digit generated according to Luhn''s algorithm to the input value. The input value must be no greater than (maxbigint/10).'; CREATE OR REPLACE FUNCTION luhn_strip(int8) RETURNS int8 AS $$ SELECT $1 / 10; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_strip(int8) IS 'Strip the least significant digit from the input value. Intended for use when stripping the check digit from a number including a Luhn''s algorithm check digit.';
Re: Re: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?
From
Alban Hertroys
Date:
On May 3, 2009, at 9:00 AM, Craig Ringer wrote: > CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$ > SELECT > -- Add the digits, doubling odd-numbered digits (counting left with > -- least significant as zero), and see if the sum is evenly > -- divisible by zero. I think you mean divisible by 10 here, numbers are generally not divisible by zero ;) Regardless, thanks for posting these functions, I'm sure they'll come in handy some time. > > MOD(SUM( > -- Extract digit `n' counting left from least significant as > zero > MOD( ( $1::int8 / (10^n)::int8 ), 10::int8) > -- Double odd-numbered digits > * (MOD(n,2) + 1) > ), 10) = 0 > FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n; > $$ LANGUAGE 'SQL' > IMMUTABLE > STRICT; > > COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last > digit > of the input is a correct check digit for the rest of the input > according to Luhn''s algorithm.' Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,49fd82b6129742129210600!
The Luhn algorithm implemention I posted earlier (upthread) is internally consistent and will verify checksums it created, but it is actually not a correct implementation of the Luhn algorithm. The earlier code added the doubled digits directly to the checksum, rather than adding each digit of the the doubled digits. Here's a corrected version that passes tests against other implementations in other languages. -- -- Luhn algorithm implementation by Craig Ringer -- in pure SQL (PostgreSQL function dialect, but -- should be easily adapted to other DBMSs). -- Note that this implementation is purely -- arithmetic; it avoids string manipulation entirely. -- -- See: http://en.wikipedia.org/wiki/Luhn_algorithm -- CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$ -- Take the sum of the -- doubled digits and the even-numbered undoubled digits, and see if -- the sum is evenly divisible by zero. SELECT -- Doubled digits might in turn be two digits. In that case, -- we must add each digit individually rather than adding the -- doubled digit value to the sum. Ie if the original digit was -- `6' the doubled result was `12' and we must add `1+2' to the -- sum rather than `12'. MOD(SUM(doubled_digit / INT8 '10' + doubled_digit % INT8 '10'), 10) = 0 FROM -- Double odd-numbered digits (counting left with -- least significant as zero). If the doubled digits end up -- having values -- > 10 (ie they're two digits), add their digits together. (SELECT -- Extract digit `n' counting left from least significant --as zero MOD( ( $1::int8 / (10^n)::int8 ), 10::int8) -- Double odd-numbered digits * (MOD(n,2) + 1) AS doubled_digit FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n ) AS doubled_digits; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last digit of the input is a correct check digit for the rest of the input according to Luhn''s algorithm.'; CREATE OR REPLACE FUNCTION luhn_generate_checkdigit(int8) RETURNS int8 AS $$ SELECT -- Add the digits, doubling even-numbered digits (counting left -- with least-significant as zero). Subtract the remainder of -- dividing the sum by 10 from 10, and take the remainder -- of dividing that by 10 in turn. ((INT8 '10' - SUM(doubled_digit / INT8 '10' + doubled_digit % INT8 '10') % INT8 '10') % INT8 '10')::INT8 FROM (SELECT -- Extract digit `n' counting left from least significant\ -- as zero MOD( ($1::int8 / (10^n)::int8), 10::int8 ) -- double even-numbered digits * (2 - MOD(n,2)) AS doubled_digit FROM generate_series(0, ceil(log($1))::integer - 1) AS n ) AS doubled_digits; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_generate_checkdigit(int8) IS 'For the input value, generate a check digit according to Luhn''s algorithm'; CREATE OR REPLACE FUNCTION luhn_generate(int8) RETURNS int8 AS $$ SELECT 10 * $1 + luhn_generate_checkdigit($1); $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_generate(int8) IS 'Append a check digit generated according to Luhn''s algorithm to the input value. The input value must be no greater than (maxbigint/10).'; CREATE OR REPLACE FUNCTION luhn_strip(int8) RETURNS int8 AS $$ SELECT $1 / 10; $$ LANGUAGE 'SQL' IMMUTABLE STRICT; COMMENT ON FUNCTION luhn_strip(int8) IS 'Strip the least significant digit from the input value. Intended for use when stripping the check digit from a number including a Luhn''s algorithm check digit.';
On Tue, May 12, 2009 at 02:54:29PM +0800, Craig Ringer wrote: > The Luhn algorithm implemention I posted earlier (upthread) is > internally consistent and will verify checksums it created, but it is > actually not a correct implementation of the Luhn algorithm. This looks like a great candidate for inclusion in the Snippets page <http://wiki.postgresql.org/wiki/Snippets> page, or possibly even the docs for SQL functions :) Cheers, David. > > The earlier code added the doubled digits directly to the checksum, > rather than adding each digit of the the doubled digits. > > Here's a corrected version that passes tests against other > implementations in other languages. > > -- > -- Luhn algorithm implementation by Craig Ringer > -- in pure SQL (PostgreSQL function dialect, but > -- should be easily adapted to other DBMSs). > -- Note that this implementation is purely > -- arithmetic; it avoids string manipulation entirely. > -- > -- See: http://en.wikipedia.org/wiki/Luhn_algorithm > -- > > CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$ > -- Take the sum of the > -- doubled digits and the even-numbered undoubled digits, and see if > -- the sum is evenly divisible by zero. > SELECT > -- Doubled digits might in turn be two digits. In that case, > -- we must add each digit individually rather than adding the > -- doubled digit value to the sum. Ie if the original digit was > -- `6' the doubled result was `12' and we must add `1+2' to the > -- sum rather than `12'. > MOD(SUM(doubled_digit / INT8 '10' + doubled_digit % INT8 '10'), > 10) = 0 > FROM > -- Double odd-numbered digits (counting left with > -- least significant as zero). If the doubled digits end up > -- having values > -- > 10 (ie they're two digits), add their digits together. > (SELECT > -- Extract digit `n' counting left from least significant > --as zero > MOD( ( $1::int8 / (10^n)::int8 ), 10::int8) > -- Double odd-numbered digits > * (MOD(n,2) + 1) > AS doubled_digit > FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n > ) AS doubled_digits; > > $$ LANGUAGE 'SQL' > IMMUTABLE > STRICT; > > COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last digit > of the input is a correct check digit for the rest of the input > according to Luhn''s algorithm.'; > > CREATE OR REPLACE FUNCTION luhn_generate_checkdigit(int8) RETURNS int8 AS $$ > SELECT > -- Add the digits, doubling even-numbered digits (counting left > -- with least-significant as zero). Subtract the remainder of > -- dividing the sum by 10 from 10, and take the remainder > -- of dividing that by 10 in turn. > ((INT8 '10' - SUM(doubled_digit / INT8 '10' + doubled_digit % INT8 > '10') % INT8 '10') % INT8 '10')::INT8 > FROM (SELECT > -- Extract digit `n' counting left from least significant\ > -- as zero > MOD( ($1::int8 / (10^n)::int8), 10::int8 ) > -- double even-numbered digits > * (2 - MOD(n,2)) > AS doubled_digit > FROM generate_series(0, ceil(log($1))::integer - 1) AS n > ) AS doubled_digits; > > $$ LANGUAGE 'SQL' > IMMUTABLE > STRICT; > > COMMENT ON FUNCTION luhn_generate_checkdigit(int8) IS 'For the input > value, generate a check digit according to Luhn''s algorithm'; > > CREATE OR REPLACE FUNCTION luhn_generate(int8) RETURNS int8 AS $$ > SELECT 10 * $1 + luhn_generate_checkdigit($1); > $$ LANGUAGE 'SQL' > IMMUTABLE > STRICT; > > COMMENT ON FUNCTION luhn_generate(int8) IS 'Append a check digit > generated according to Luhn''s algorithm to the input value. The > input value must be no greater than (maxbigint/10).'; > > CREATE OR REPLACE FUNCTION luhn_strip(int8) RETURNS int8 AS $$ > SELECT $1 / 10; > $$ LANGUAGE 'SQL' > IMMUTABLE > STRICT; > > COMMENT ON FUNCTION luhn_strip(int8) IS 'Strip the least significant > digit from the input value. Intended for use when stripping the check > digit from a number including a Luhn''s algorithm check digit.'; > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sat, 02 May 2009 11:26:28 +0200 "Daniel Verite" <daniel@manitou-mail.org> wrote: > Note that it returns a bigint because we don't have unsigned > integers in PG. If you're OK with getting negative values, the > return type can be changed to int. > Otherwise if you need a positive result that fits in 32 bits, it's > possible to tweak the code to use 15 bits blocks instead of 16, > but then the input will have to be less than 2^30. I need shorter values (because they should be easier to type. To be sure to modify the function in a sensible way I really would appreciate some pointer. Still if it return To further shrink the length of the result I was planning to to_hex (actually it would be nice to have a fast to_35base [0-9a-z])... but I wasn't able to find a way to convert back an hex string to an int. x'fff' seems to work just for literals. CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns bigint AS $$ DECLARE l1 int; l2 int; r1 int; r2 int; i int:=0; BEGIN l1:= (value >> 16) & 65535; -- modifying here seems trivial r1:= value&65535; -- l1:= (value >> 15) & B'111111111111111'::int; -- r1:= value & B'111111111111111'::int; WHILE i<3 LOOP l2:=r1; r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int; -- but what about this? where does it come from? /* r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*B'11111111111111'::int)::int; */ -- ?? l1:=l2; r1:=r2; i:=i+1; END LOOP; return ((l1::bigint<<16) + r1); -- modifying here seems trivial END; $$ LANGUAGE plpgsql strict immutable; Anything else to suggest or copy from? -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > I need shorter values (because they should be easier to type. > To be sure to modify the function in a sensible way I really would > appreciate some pointer. > Still if it return What exactly is your desired range of output values? Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
Ivan Sergio Borgonovo wrote: > r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int; > -- but what about this? where does it come from? This function: (1366.0*r1+150889)%714025 implements a known method to get random numbers. I think it comes from "Numerical recipes" by William Press. Note that the algorithm is not tied to that function, it could be replaced by something else (especially one that involves a private key), but it has to be carefully chosen or the end result won't look so random. Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org
On Tue, 07 Jul 2009 12:07:48 +0200 "Daniel Verite" <daniel@manitou-mail.org> wrote: > Ivan Sergio Borgonovo wrote: > > > r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int; > > -- but what about this? where does it come from? > > This function: > (1366.0*r1+150889)%714025 > implements a known method to get random numbers. I think it comes > from "Numerical recipes" by William Press. > Note that the algorithm is not tied to that function, it could be > replaced by something else (especially one that involves a private > key), but it has to be carefully chosen or the end result won't > look so random. I don't get the 1366.0 and the 714025.0. Writing 1366.0 isn't going to use float arithmetic? Is it there just to avoid an overflow? I'm going to see if using bigint is going to make any difference in speed. Finally... if I were (and I'm not) interested in using 30 bit, should I turn that *32767 into a *16383? For shift and bit mask it looks more obvious. Do you remember the name of this particular F? Since I don't see anything other than to_hex that could "shorten" an int to a string easily and quickly... it seems that returning a signed integer is OK. Everything else seems to need more processing at no real added value. Turning the int into base 32 [0-9A-N] with plpgsql looks expensive just to shorten the string to 4 char. Thanks. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > I don't get the 1366.0 and the 714025.0. > Writing 1366.0 isn't going to use float arithmetic? Yes, that's on purpose. Now that you mention it, I think that 1366.0 could be an integer instead, but the division by 714025 and multiplication by 32767 have to be floating point operations. > I'm going to see if using bigint is going to make any difference in > speed. If you're after more speed, using the C language may be the solution. > Finally... if I were (and I'm not) interested in using 30 bit, > should I turn that *32767 into a *16383? > For shift and bit mask it looks more obvious. To generate a 31 bits (positive) result from a 30 bits input, I would modify the initialisation of the 16 bits blocks so that each of them has the most significant bit set to 0, but without loosing any of the 30 bits. The MSB bits have to be kept at 0 throughout the algorithm. So I'd to that: l1:= ((value >> 16) & 16383) | (value&32768); r1:= value&32767; and indeed reduce the output range of the function: r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*16383)::int; the rest being identical excepts that it could now return int (and it would be unsigned) instead of bigint. I haven't tested that variant, though. > Do you remember the name of this particular F? Not sure it has a specific name, but you'll find related stuff by searching for "linear congruential generator". > Everything else seems to need more processing at no real added value. > Turning the int into base 32 [0-9A-N] with plpgsql looks expensive > just to shorten the string to 4 char. Note that 4 chars would cover a range of 32^4, which is only about one million different values. I think you'd need 7 chars to express up to 2^31 in base 32, because 32^7 < 2^31 < 32^6 Best regards, -- Daniel PostgreSQL-powered mail user agent and storage: http://www.manitou-mail.org