Thread: Mapping output from a SEQUENCE into something non-repeating/colliding but random-looking?

Hi

This must be a fairly common requirement, but either I don't know how to
ask Google about it or there's not as much out there as I would've expected.

I'm looking for a way to map the output from a monotonically increasing
sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a
fairly random different value in the availible space with a 1:1
input->output relationship. In other words, for the input "27" the
output will always be the same (say 32 bit) number, and no other input
will produce that output.

Note that I'm *NOT* looking for a PRNG that takes the previous output as
its input. That'd force me to use the same techniques as for a gapless
sequence in Pg, with all the associated horror with locking and
deadlocks, the performance issues, etc.

Does anyone here know of a good algorithm to do this that doesn't just
iterate `n' times through a PRNG with the same seed, but instead does a
true non-colliding space mapping?

If I find something good and there aren't any existing Pl/PgSQL
implementations I'll post one for others' use, since I'm pretty sure it
must come up a lot. You don't want your database to send out "invoice
#1" or "customer #1" after all.

(I'm also going to be looking for efficient ways to calculate effective
check digits for arbitrary numbers within a certain range, too, and will
post something for that, but that comes later).

--
Craig Ringer

On Apr 30, 2009, at 1:54 AM, Craig Ringer wrote:

> Hi
>
> This must be a fairly common requirement, but either I don't know
> how to
> ask Google about it or there's not as much out there as I would've
> expected.
>
> I'm looking for a way to map the output from a monotonically
> increasing
> sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a
> fairly random different value in the availible space with a 1:1
> input->output relationship. In other words, for the input "27" the
> output will always be the same (say 32 bit) number, and no other input
> will produce that output.
>
> Note that I'm *NOT* looking for a PRNG that takes the previous
> output as
> its input. That'd force me to use the same techniques as for a gapless
> sequence in Pg, with all the associated horror with locking and
> deadlocks, the performance issues, etc.
>
> Does anyone here know of a good algorithm to do this that doesn't just
> iterate `n' times through a PRNG with the same seed, but instead
> does a
> true non-colliding space mapping?
>
> If I find something good and there aren't any existing Pl/PgSQL
> implementations I'll post one for others' use, since I'm pretty sure
> it
> must come up a lot. You don't want your database to send out "invoice
> #1" or "customer #1" after all.
>
> (I'm also going to be looking for efficient ways to calculate
> effective
> check digits for arbitrary numbers within a certain range, too, and
> will
> post something for that, but that comes later).

XOR it with a constant. Or depending on your needs, just add a constant.
Or shuffle bits. Or some combination of those.

Cheers,
   Steve


On 2009-04-30, Craig Ringer <craig@postnewspapers.com.au> wrote:
> Hi
>
> This must be a fairly common requirement, but either I don't know how to
> ask Google about it or there's not as much out there as I would've expected.
>
> I'm looking for a way to map the output from a monotonically increasing
> sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a
> fairly random different value in the availible space with a 1:1
> input->output relationship. In other words, for the input "27" the
> output will always be the same (say 32 bit) number, and no other input
> will produce that output.

so you want

  DEFAULT magic_func( nextval('foo_id_seq'::regclass) )

where magic_func is the 1:1 mapping and foo_id_seq is the sequence
that feeds it.

> Note that I'm *NOT* looking for a PRNG that takes the previous output as
> its input. That'd force me to use the same techniques as for a gapless
> sequence in Pg, with all the associated horror with locking and
> deadlocks, the performance issues, etc.

any good PRNG will have the 1:1 mapping you want, but fed sequential
values they tend to produce predictable output.

I suggest for magic_func you use a collection of bit-shifts, adds, and
XORs then mask out the bits abouve 31  and use what's left.

test and adjust if needed,

> If I find something good and there aren't any existing Pl/PgSQL
> implementations I'll post one for others' use, since I'm pretty sure it
> must come up a lot. You don't want your database to send out "invoice
> #1" or "customer #1" after all.

to this end was pg_catalog.setvalue( sequence, value,TRUE)
invented.

> (I'm also going to be looking for efficient ways to calculate effective
> check digits for arbitrary numbers within a certain range, too, and will
> post something for that, but that comes later).


In response to Jasen Betts <jasen@xnet.co.nz>:

> On 2009-04-30, Craig Ringer <craig@postnewspapers.com.au> wrote:
> > Hi
> >
> > This must be a fairly common requirement, but either I don't know how to
> > ask Google about it or there's not as much out there as I would've expected.
> >
> > I'm looking for a way to map the output from a monotonically increasing
> > sequence (not necessarily gapless - ie a normal Pg SEQUENCE) into a
> > fairly random different value in the availible space with a 1:1
> > input->output relationship. In other words, for the input "27" the
> > output will always be the same (say 32 bit) number, and no other input
> > will produce that output.
>
> so you want
>
>   DEFAULT magic_func( nextval('foo_id_seq'::regclass) )
>
> where magic_func is the 1:1 mapping and foo_id_seq is the sequence
> that feeds it.

Sounds like you're reinventing message digests ...

> > If I find something good and there aren't any existing Pl/PgSQL
> > implementations I'll post one for others' use, since I'm pretty sure it
> > must come up a lot. You don't want your database to send out "invoice
> > #1" or "customer #1" after all.

Most of the systems I've seen like this do one of a few things:
* Start with an arbitrary # like 1000
* Prepend the date (pretty common for invoice #s) like 20090501001
* Just start with #1 ... I mean, what's the big deal?

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Bill Moran wrote:

> Sounds like you're reinventing message digests ...

Message digests do not guarantee non-colliding output for a given input.

They provide a result UNLIKELY to collide for any reasonable set of
inputs. They need to produce quite large output values to do this
effectively. Truncating the digest to fit the desired data type doesn't
help, if you have to do this.

A message digest is intended for use where the inputs are arbitrarily
large and must be condensed down to a generally fixed-length small-ish
value that should be very unlikely to collide for any two given inputs.

What I'm looking for is a function that, given an input within a
constrained range (say, a 32 bit integer) produces a different output
within the same range. For any given input, the output should be the
same each time, and for any given output there should only be one input
that results in that output.

So far, picking a suitable value to xor the input with seems like it'll
be better than nothing, and good enough for the casual examination
that's all I'm required to care about.

So long as I don't call it "xor encryption" ... sigh.

> Most of the systems I've seen like this do one of a few things:
> * Start with an arbitrary # like 1000
> * Prepend the date (pretty common for invoice #s) like 20090501001
> * Just start with #1 ... I mean, what's the big deal?

I'm not the one who cares. Alas, I've been given requirements to
satisfy, and one of the major ones is that customer numbers in
particular must be non-sequential (as close to random-looking as
possible) and allocated across a large range.

--
Craig Ringer

In response to Craig Ringer <craig@postnewspapers.com.au>:

> Bill Moran wrote:
>
> > Sounds like you're reinventing message digests ...

[snip your comments about why I was wrong about MDs working]

> So long as I don't call it "xor encryption" ... sigh.
>
> > Most of the systems I've seen like this do one of a few things:
> > * Start with an arbitrary # like 1000
> > * Prepend the date (pretty common for invoice #s) like 20090501001
> > * Just start with #1 ... I mean, what's the big deal?
>
> I'm not the one who cares. Alas, I've been given requirements to
> satisfy, and one of the major ones is that customer numbers in
> particular must be non-sequential (as close to random-looking as
> possible) and allocated across a large range.

Why not just grab random values for that one, then?  Just generate
a random value, check to ensure it doesn't already exist ... rinse
and repeat if it's a duplicate.  I mean, how often is this system
adding new customers?

Another trick with the invoice #s is to prepend them with a subset
of the client ID.  You could also use the same technique ... generate
a random value, repeat if it already exists ...

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

    Craig Ringer wrote:

> What I'm looking for is a function that, given an input within a
> constrained range (say, a 32 bit integer) produces a different
> output within the same range. For any given input, the output
> should be the same each time, and for any given output
> there should only be one input that results in that output.

That's a permutation, as used in symmetric ciphering. A proven way to
build one is to use a Feistel network:
http://en.wikipedia.org/wiki/Feistel_cipher
In principle, the function used to encode the blocks uses a cipher key,
but a pseudo-randomizing of the input is good enough when you're not
interested in making it crypto-secure.
Here is a plpgqsl implementation:

CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns bigint AS
$$
DECLARE
 l1 int;
 l2 int;
 r1 int;
 r2 int;
 i int:=0;
BEGIN
  l1:= (value >> 16) & 65535;
  r1:= value&65535;
  WHILE i<3 LOOP
    l2:=r1;
    r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int;
    l1:=l2;
    r1:=r2;
    i:=i+1;
  END LOOP;
  return ((l1::bigint<<16) + r1);
END;
$$ LANGUAGE plpgsql strict immutable;

Note that it returns a bigint because we don't have unsigned integers
in PG. If you're OK with getting negative values, the return type can
be changed to int.
Otherwise if you need a positive result that fits in 32 bits, it's
possible to tweak the code to use 15 bits blocks instead of 16, but
then the input will have to be less than 2^30.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

On May 1, 2009, at 6:06 AM, Craig Ringer wrote:

<snip>

> What I'm looking for is a function that, given an input within a
> constrained range (say, a 32 bit integer) produces a different output
> within the same range. For any given input, the output should be the
> same each time, and for any given output there should only be one
> input
> that results in that output.

I think you drop the idea of a repeatable mapping you may have some
success with the Knuth (aka Fisher-Yates) shuffle algorithm:  http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

   Why does anything need to be repeatable when you only need to make
sure that each number is only generated once?

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






On 2009-05-03, Erik Jones <ejones@engineyard.com> wrote:

>> What I'm looking for is a function that, given an input within a
>> constrained range (say, a 32 bit integer) produces a different output
>> within the same range. For any given input, the output should be the
>> same each time, and for any given output there should only be one
>> input
>> that results in that output.
>
> I think you drop the idea of a repeatable mapping you may have some
> success with the Knuth (aka Fisher-Yates) shuffle algorithm:
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle 
>    Why does anything need to be repeatable when you only need to make
> sure that each number is only generated once?

That means storing a long list of numbers and doing queries similar to
the following to get ne next value for the sequence.

select id from idtable
  order by id
  limit 1
  offset random(0, (select count (*) from idtable)

a ramdom-looking  1:1 mapping is potentially much more efficient.


Jasen Betts wrote:

> That means storing a long list of numbers and doing queries similar to
> the following to get ne next value for the sequence.
>
> select id from idtable
>   order by id
>   limit 1
>   offset random(0, (select count (*) from idtable)
>
> a ramdom-looking  1:1 mapping is potentially much more efficient.

You'd probably be better off generating it with something like:

CREATE TABLE shuffled AS (n integer, s integer)
AS SELECT n, NULL FROM generate_series(0, max_value) AS n;

SELECT shuffle(); -- sets `s' for each `n'

... then querying it with:

SELECT s FROM shuffled WHERE n = <value-wanted>;

... but you still have to generate, shuffle, and store a huge collection
of values.

--
Craig Ringer

Daniel Verite wrote:

> That's a permutation, as used in symmetric ciphering. A proven way to
> build one is to use a Feistel network:

Thanks. That looks to be pretty much what I'm after.

> Here is a plpgqsl implementation:

Wow. I really appreciate that. I'll have to test it out and chuck it on
the wiki (if that's OK with you) for future use, alongside whatever I
end up using for my check digit generator and verifier.

Come to think of it, check digits are almost unnecessary; after all, in
a large numeric space the chances of any misheard, typo'd, etc value
happening to be another valid value is pretty minimal. A simple
mod-10-of-sum-of-digits would probably be quite sufficient just to help
the UI differentiate between "Whoops, that number was incorrectly
entered or misheard" and "Customer not found".

--
Craig Ringer

Just to follow up on this with a look at check digit generation and checkin:

Luhn's algorithm should do for the check digit, I think. It need not be
anything complex given the chances for collision in the sample space.
Additionally, it's commonly used, easily implemented and widely
understood since it's used in credit card numbers among many other things.

For anyone who later needs it, here's a handy verifier for the Luhn's
Algorithm check digit, written as plain SQL functions with all
integer-based computation (no string decomposition), along with a
corresponding check digit generator and associated utility functions:


CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$
SELECT
    -- Add the digits, doubling odd-numbered digits (counting left with
    -- least significant as zero), and see if the sum is evenly
    -- divisible by zero.
    MOD(SUM(
       -- Extract digit `n' counting left from least significant as zero
       MOD( ( $1::int8 / (10^n)::int8 ), 10::int8)
       -- Double odd-numbered digits
       * (MOD(n,2) + 1)
    ), 10) = 0
FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n;
$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last digit
of the input is a correct check digit for the rest of the input
according to Luhn''s algorithm.'

CREATE OR REPLACE FUNCTION luhn_generate_checkdigit(int8) RETURNS int8 AS $$
SELECT
    -- Add the digits, doubling even-numbered digits (counting left
    -- with least-significant as zero). Subtract the remainder of
    -- dividing the sum by 10 from 10, and take the remainder
    -- of dividing that by 10 in turn.
    MOD(10 - MOD(SUM(
      MOD( ($1::int8 / (10^n)::int8), 10::int8 )
      * (2 - MOD(n,2)) -- double even digits
    ),10),10)::int8
FROM generate_series(0, ceil(log($1))::integer - 1) AS n;
$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_generate_checkdigit(int8) IS 'For the input
value, generate a check digit according to Luhn''s algorithm';

CREATE OR REPLACE FUNCTION luhn_generate(int8) RETURNS int8 AS $$
SELECT 10 * $1 + luhn_generate_checkdigit($1);
$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_generate(int8) IS 'Append a check digit
generated according to Luhn''s algorithm to the input value. The
input value must be no greater than (maxbigint/10).';

CREATE OR REPLACE FUNCTION luhn_strip(int8) RETURNS int8 AS $$
SELECT $1 / 10;
$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_strip(int8) IS 'Strip the least significant
digit from the input value. Intended for use when stripping the check
digit from a number including a Luhn''s algorithm check digit.';

On May 3, 2009, at 9:00 AM, Craig Ringer wrote:

> CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$
> SELECT
>    -- Add the digits, doubling odd-numbered digits (counting left with
>    -- least significant as zero), and see if the sum is evenly
>    -- divisible by zero.

I think you mean divisible by 10 here, numbers are generally not
divisible by zero ;)

Regardless, thanks for posting these functions, I'm sure they'll come
in handy some time.

>
>    MOD(SUM(
>       -- Extract digit `n' counting left from least significant as
> zero
>       MOD( ( $1::int8 / (10^n)::int8 ), 10::int8)
>       -- Double odd-numbered digits
>       * (MOD(n,2) + 1)
>    ), 10) = 0
> FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n;
> $$ LANGUAGE 'SQL'
> IMMUTABLE
> STRICT;
>
> COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last
> digit
> of the input is a correct check digit for the rest of the input
> according to Luhn''s algorithm.'

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,49fd82b6129742129210600!



Luhn algorithm (credit card verify / check) implementation - FIX

From
Craig Ringer
Date:
The Luhn algorithm implemention I posted earlier (upthread) is
internally consistent and will verify checksums it created, but it is
actually not a correct implementation of the Luhn algorithm.

The earlier code added the doubled digits directly to the checksum,
rather than adding each digit of the the doubled digits.

Here's a corrected version that passes tests against other
implementations in other languages.

--
-- Luhn algorithm implementation by Craig Ringer
-- in pure SQL (PostgreSQL function dialect, but
-- should be easily adapted to other DBMSs).
-- Note that this implementation is purely
-- arithmetic; it avoids string manipulation entirely.
--
-- See: http://en.wikipedia.org/wiki/Luhn_algorithm
--

CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$
-- Take the sum of the
-- doubled digits and the even-numbered undoubled digits, and see if
-- the sum is evenly divisible by zero.
SELECT
         -- Doubled digits might in turn be two digits. In that case,
         -- we must add each digit individually rather than adding the
         -- doubled digit value to the sum. Ie if the original digit was
         -- `6' the doubled result was `12' and we must add `1+2' to the
         -- sum rather than `12'.
         MOD(SUM(doubled_digit / INT8 '10' + doubled_digit % INT8 '10'),
10) = 0
FROM
-- Double odd-numbered digits (counting left with
-- least significant as zero). If the doubled digits end up
-- having values
-- > 10 (ie they're two digits), add their digits together.
(SELECT
         -- Extract digit `n' counting left from least significant
         --as zero
         MOD( ( $1::int8 / (10^n)::int8 ), 10::int8)
         -- Double odd-numbered digits
         * (MOD(n,2) + 1)
         AS doubled_digit
         FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n
) AS doubled_digits;

$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last digit
of the input is a correct check digit for the rest of the input
according to Luhn''s algorithm.';

CREATE OR REPLACE FUNCTION luhn_generate_checkdigit(int8) RETURNS int8 AS $$
SELECT
     -- Add the digits, doubling even-numbered digits (counting left
     -- with least-significant as zero). Subtract the remainder of
     -- dividing the sum by 10 from 10, and take the remainder
     -- of dividing that by 10 in turn.
     ((INT8 '10' - SUM(doubled_digit / INT8 '10' + doubled_digit % INT8
'10') % INT8 '10') % INT8 '10')::INT8
FROM (SELECT
         -- Extract digit `n' counting left from least significant\
         -- as zero
         MOD( ($1::int8 / (10^n)::int8), 10::int8 )
         -- double even-numbered digits
         * (2 - MOD(n,2))
         AS doubled_digit
         FROM generate_series(0, ceil(log($1))::integer - 1) AS n
) AS doubled_digits;

$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_generate_checkdigit(int8) IS 'For the input
value, generate a check digit according to Luhn''s algorithm';

CREATE OR REPLACE FUNCTION luhn_generate(int8) RETURNS int8 AS $$
SELECT 10 * $1 + luhn_generate_checkdigit($1);
$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_generate(int8) IS 'Append a check digit
generated according to Luhn''s algorithm to the input value. The
input value must be no greater than (maxbigint/10).';

CREATE OR REPLACE FUNCTION luhn_strip(int8) RETURNS int8 AS $$
SELECT $1 / 10;
$$ LANGUAGE 'SQL'
IMMUTABLE
STRICT;

COMMENT ON FUNCTION luhn_strip(int8) IS 'Strip the least significant
digit from the input value. Intended for use when stripping the check
digit from a number including a Luhn''s algorithm check digit.';



Re: Luhn algorithm (credit card verify / check) implementation - FIX

From
David Fetter
Date:
On Tue, May 12, 2009 at 02:54:29PM +0800, Craig Ringer wrote:
> The Luhn algorithm implemention I posted earlier (upthread) is
> internally consistent and will verify checksums it created, but it is
> actually not a correct implementation of the Luhn algorithm.

This looks like a great candidate for inclusion in the Snippets page
<http://wiki.postgresql.org/wiki/Snippets> page, or possibly even the
docs for SQL functions :)

Cheers,
David.
>
> The earlier code added the doubled digits directly to the checksum,
> rather than adding each digit of the the doubled digits.
>
> Here's a corrected version that passes tests against other
> implementations in other languages.
>
> --
> -- Luhn algorithm implementation by Craig Ringer
> -- in pure SQL (PostgreSQL function dialect, but
> -- should be easily adapted to other DBMSs).
> -- Note that this implementation is purely
> -- arithmetic; it avoids string manipulation entirely.
> --
> -- See: http://en.wikipedia.org/wiki/Luhn_algorithm
> --
>
> CREATE OR REPLACE FUNCTION luhn_verify(int8) RETURNS boolean AS $$
> -- Take the sum of the
> -- doubled digits and the even-numbered undoubled digits, and see if
> -- the sum is evenly divisible by zero.
> SELECT
>         -- Doubled digits might in turn be two digits. In that case,
>         -- we must add each digit individually rather than adding the
>         -- doubled digit value to the sum. Ie if the original digit was
>         -- `6' the doubled result was `12' and we must add `1+2' to the
>         -- sum rather than `12'.
>         MOD(SUM(doubled_digit / INT8 '10' + doubled_digit % INT8 '10'),
> 10) = 0
> FROM
> -- Double odd-numbered digits (counting left with
> -- least significant as zero). If the doubled digits end up
> -- having values
> -- > 10 (ie they're two digits), add their digits together.
> (SELECT
>         -- Extract digit `n' counting left from least significant
>         --as zero
>         MOD( ( $1::int8 / (10^n)::int8 ), 10::int8)
>         -- Double odd-numbered digits
>         * (MOD(n,2) + 1)
>         AS doubled_digit
>         FROM generate_series(0, ceil(log( $1 ))::integer - 1) AS n
> ) AS doubled_digits;
>
> $$ LANGUAGE 'SQL'
> IMMUTABLE
> STRICT;
>
> COMMENT ON FUNCTION luhn_verify(int8) IS 'Return true iff the last digit
> of the input is a correct check digit for the rest of the input
> according to Luhn''s algorithm.';
>
> CREATE OR REPLACE FUNCTION luhn_generate_checkdigit(int8) RETURNS int8 AS $$
> SELECT
>     -- Add the digits, doubling even-numbered digits (counting left
>     -- with least-significant as zero). Subtract the remainder of
>     -- dividing the sum by 10 from 10, and take the remainder
>     -- of dividing that by 10 in turn.
>     ((INT8 '10' - SUM(doubled_digit / INT8 '10' + doubled_digit % INT8
> '10') % INT8 '10') % INT8 '10')::INT8
> FROM (SELECT
>         -- Extract digit `n' counting left from least significant\
>         -- as zero
>         MOD( ($1::int8 / (10^n)::int8), 10::int8 )
>         -- double even-numbered digits
>         * (2 - MOD(n,2))
>         AS doubled_digit
>         FROM generate_series(0, ceil(log($1))::integer - 1) AS n
> ) AS doubled_digits;
>
> $$ LANGUAGE 'SQL'
> IMMUTABLE
> STRICT;
>
> COMMENT ON FUNCTION luhn_generate_checkdigit(int8) IS 'For the input
> value, generate a check digit according to Luhn''s algorithm';
>
> CREATE OR REPLACE FUNCTION luhn_generate(int8) RETURNS int8 AS $$
> SELECT 10 * $1 + luhn_generate_checkdigit($1);
> $$ LANGUAGE 'SQL'
> IMMUTABLE
> STRICT;
>
> COMMENT ON FUNCTION luhn_generate(int8) IS 'Append a check digit
> generated according to Luhn''s algorithm to the input value. The
> input value must be no greater than (maxbigint/10).';
>
> CREATE OR REPLACE FUNCTION luhn_strip(int8) RETURNS int8 AS $$
> SELECT $1 / 10;
> $$ LANGUAGE 'SQL'
> IMMUTABLE
> STRICT;
>
> COMMENT ON FUNCTION luhn_strip(int8) IS 'Strip the least significant
> digit from the input value. Intended for use when stripping the check
> digit from a number including a Luhn''s algorithm check digit.';
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Feistel cipher, shorter string and hex to int

From
Ivan Sergio Borgonovo
Date:
On Sat, 02 May 2009 11:26:28 +0200
"Daniel Verite" <daniel@manitou-mail.org> wrote:

> Note that it returns a bigint because we don't have unsigned
> integers in PG. If you're OK with getting negative values, the
> return type can be changed to int.
> Otherwise if you need a positive result that fits in 32 bits, it's
> possible to tweak the code to use 15 bits blocks instead of 16,
> but then the input will have to be less than 2^30.

I need shorter values (because they should be easier to type.
To be sure to modify the function in a sensible way I really would
appreciate some pointer.
Still if it return

To further shrink the length of the result I was planning to to_hex
(actually it would be nice to have a fast to_35base [0-9a-z])... but
I wasn't able to find a way to convert back an hex string to an int.
x'fff' seems to work just for literals.


CREATE OR REPLACE FUNCTION pseudo_encrypt(value int) returns
bigint AS $$
DECLARE
 l1 int;
 l2 int;
 r1 int;
 r2 int;
 i int:=0;
BEGIN
  l1:= (value >> 16) & 65535;
-- modifying here seems trivial
  r1:= value&65535;
--  l1:= (value >> 15) & B'111111111111111'::int;
--  r1:= value & B'111111111111111'::int;
  WHILE i<3 LOOP
    l2:=r1;
    r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int;
-- but what about this? where does it come from?
/*
   r2:=l1 #
   ((((1366.0*r1+150889)%714025)/714025.0)*B'11111111111111'::int)::int;
*/ -- ??
   l1:=l2; r1:=r2; i:=i+1;
  END LOOP;
  return ((l1::bigint<<16) + r1);
-- modifying here seems trivial
END;
$$ LANGUAGE plpgsql strict immutable;


Anything else to suggest or copy from?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Feistel cipher, shorter string and hex to int

From
"Daniel Verite"
Date:
    Ivan Sergio Borgonovo wrote:

> I need shorter values (because they should be easier to type.
> To be sure to modify the function in a sensible way I really would
> appreciate some pointer.
> Still if it return

What exactly is your desired range of output values?

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

Re: Feistel cipher, shorter string and hex to int

From
"Daniel Verite"
Date:
    Ivan Sergio Borgonovo wrote:

>     r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int;
> -- but what about this? where does it come from?

This function:
(1366.0*r1+150889)%714025
implements a known method to get random numbers. I think it comes from
"Numerical recipes" by William Press.
Note that the algorithm is not tied to that function, it could be
replaced by something else (especially one that involves a private
key), but it has to be carefully chosen or the end result won't look so
random.

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org

Re: Feistel cipher, shorter string and hex to int

From
Ivan Sergio Borgonovo
Date:
On Tue, 07 Jul 2009 12:07:48 +0200
"Daniel Verite" <daniel@manitou-mail.org> wrote:

>     Ivan Sergio Borgonovo wrote:
>
> >     r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*32767)::int;
> > -- but what about this? where does it come from?
>
> This function:
> (1366.0*r1+150889)%714025
> implements a known method to get random numbers. I think it comes
> from "Numerical recipes" by William Press.
> Note that the algorithm is not tied to that function, it could be
> replaced by something else (especially one that involves a private
> key), but it has to be carefully chosen or the end result won't
> look so random.

I don't get the 1366.0 and the 714025.0.
Writing 1366.0 isn't going to use float arithmetic?
Is it there just to avoid an overflow?
I'm going to see if using bigint is going to make any difference in
speed.

Finally... if I were (and I'm not) interested in using 30 bit,
should I turn that *32767 into a *16383?
For shift and bit mask it looks more obvious.
Do you remember the name of this particular F?

Since I don't see anything other than to_hex that could "shorten" an
int to a string easily and quickly... it seems that returning a
signed integer is OK.

Everything else seems to need more processing at no real added value.
Turning the int into base 32 [0-9A-N] with plpgsql looks expensive
just to shorten the string to 4 char.

Thanks.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Feistel cipher, shorter string and hex to int

From
"Daniel Verite"
Date:
    Ivan Sergio Borgonovo wrote:

> I don't get the 1366.0 and the 714025.0.
> Writing 1366.0 isn't going to use float arithmetic?

Yes, that's on purpose. Now that you mention it, I think that 1366.0
could be an integer instead, but the division by 714025 and
multiplication by 32767 have to be floating point operations.

> I'm going to see if using bigint is going to make any difference in
> speed.

If you're after more speed, using  the C language may be the solution.

> Finally... if I were (and I'm not) interested in using 30 bit,
> should I turn that *32767 into a *16383?
> For shift and bit mask it looks more obvious.

To generate a 31 bits (positive) result  from a 30 bits input, I would
modify the initialisation of the 16 bits blocks so that each of them
has the most significant bit set to 0, but without loosing any of the
30 bits. The MSB bits have to be kept at 0 throughout the algorithm.

So I'd to that:

 l1:= ((value >> 16) & 16383) | (value&32768);
 r1:= value&32767;

and indeed reduce the output range of the function:

r2:=l1 # ((((1366.0*r1+150889)%714025)/714025.0)*16383)::int;

the rest being identical excepts that it could now return int (and it
would be unsigned) instead of bigint.
I haven't tested that variant, though.

> Do you remember the name of this particular F?

Not sure it has a specific name, but you'll find related stuff by
searching for "linear congruential generator".

> Everything else seems to need more processing at no real added value.
> Turning the int into base 32 [0-9A-N] with plpgsql looks expensive
> just to shorten the string to 4 char.

Note that 4 chars would cover a range of 32^4, which is only about one
million different values.
I think you'd need 7 chars to express up to 2^31 in base 32, because
32^7  <  2^31  <  32^6

Best regards,
--
Daniel
PostgreSQL-powered mail user agent and storage:
http://www.manitou-mail.org