Thread: replace oidrand() with random_sample()

replace oidrand() with random_sample()

From
Neil Conway
Date:
The "random" regression test uses a function called oidrand(), which
takes two parameters, an OID x and an integer y, and returns "true" with
probability 1/y (the OID argument is ignored). This can be useful -- for
example, it can be used to select a random sampling of the rows in a
table (which is what the "random" regression test uses it for).

This patch removes that function, because it was old and messy. The old
function had the following problems:

- it was undocumented

- it was poorly named

- it was designed to workaround an optimizer bug that no longer exists
(the OID argument is to ensure that the optimizer won't optimize away
calls to the function; AFAIK marking the function as 'volatile' suffices
nowadays)

- it used a different random-number generation technique than the other
PSRNG-related functions in the backend do (it called random() like they
do, but it had its own logic for setting a set and deciding when to
reseed the RNG).

This patch implements a new version of the function, called
"random_sample" (if you think that's not a good name, please suggest an
improvement). It corrects the problems mentioned above.

I also documented the setseed() function: there were some SGML docs for
it, but they were commented out (Peter E. removed them revision 1.41 of
func.sgml, without explaining why).

Finally, I removed the oidsrand() and userfntest() functions: the former
was only for use with oidrand(), and the later has no value AFAICS (I
checked with Bruce, and he agreed with me).

Initdb required.

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC



Attachment

Re: replace oidrand() with random_sample()

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> The "random" regression test uses a function called oidrand(), which
> takes two parameters, an OID x and an integer y, and returns "true" with
> probability 1/y (the OID argument is ignored). This can be useful -- for
> example, it can be used to select a random sampling of the rows in a
> table (which is what the "random" regression test uses it for).

Do we actually need a separate function at all?  Seems like
    random() < 1.0/y
would accomplish the same result.

I agree that oidrand() is crufty and no longer useful, but I had
hesitated to rip it out, for fear that somebody somewhere might still
be using it.  It's not like it's costing us any maintenance effort
to leave it there.

            regards, tom lane

Re: replace oidrand() with random_sample()

From
Neil Conway
Date:
On Wed, 2003-01-15 at 20:21, Tom Lane wrote:
> Do we actually need a separate function at all?  Seems like
>     random() < 1.0/y
> would accomplish the same result.

It does, although it's arguably less obvious. But I wouldn't mind if we
just got rid of it outright. What does everyone else think -- is it
worth keeping?

> I agree that oidrand() is crufty and no longer useful, but I had
> hesitated to rip it out, for fear that somebody somewhere might still
> be using it.  It's not like it's costing us any maintenance effort
> to leave it there.

Keep in mind that it was undocumented and strangely named -- while there
might be some people using it, I'd wager not many.

Whether we leave it in or not, I'd vote for eventually getting rid of
the current version. So if we keep it, we could add random_sample() and
keep oidrand() as a wrapper over it (that emits an elog(WARNING) to let
people know it's been deprecated). And if we want to get rid of it, we
could probably just add the elog() for 7.4 and then remove it outright
in 7.5.

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC




Re: replace oidrand() with random_sample()

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
>> I agree that oidrand() is crufty and no longer useful, but I had
>> hesitated to rip it out, for fear that somebody somewhere might still
>> be using it.  It's not like it's costing us any maintenance effort
>> to leave it there.

> Keep in mind that it was undocumented and strangely named -- while there
> might be some people using it, I'd wager not many.

True enough.

> Whether we leave it in or not, I'd vote for eventually getting rid of
> the current version. So if we keep it, we could add random_sample() and
> keep oidrand() as a wrapper over it (that emits an elog(WARNING) to let
> people know it's been deprecated). And if we want to get rid of it, we
> could probably just add the elog() for 7.4 and then remove it outright
> in 7.5.

Well, now you're adding a whole lot of maintenance effort to support
some quite-hypothetical users of a function we all agree is junk ;-)

I think we should either leave it be, or rip it out and be done with it.
I don't actually much care which (does anyone out there have an
opinion?).  But a phased-obsolescence plan is far more work than it
deserves.

            regards, tom lane

Re: replace oidrand() with random_sample()

From
Bruce Momjian
Date:
Tom Lane wrote:
> > Whether we leave it in or not, I'd vote for eventually getting rid of
> > the current version. So if we keep it, we could add random_sample() and
> > keep oidrand() as a wrapper over it (that emits an elog(WARNING) to let
> > people know it's been deprecated). And if we want to get rid of it, we
> > could probably just add the elog() for 7.4 and then remove it outright
> > in 7.5.
>
> Well, now you're adding a whole lot of maintenance effort to support
> some quite-hypothetical users of a function we all agree is junk ;-)
>
> I think we should either leave it be, or rip it out and be done with it.
> I don't actually much care which (does anyone out there have an
> opinion?).  But a phased-obsolescence plan is far more work than it
> deserves.

Yea, just remove it and we can re-add it later if people ask for it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: replace oidrand() with random_sample()

From
Neil Conway
Date:
On Wed, 2003-01-15 at 23:26, Tom Lane wrote:
> I think we should either leave it be, or rip it out and be done with it.
> I don't actually much care which (does anyone out there have an
> opinion?).  But a phased-obsolescence plan is far more work than it
> deserves.

Heh, fair enough. I'll send in a revised patch that removes it tomorrow.

(BTW, this *particular* case may be a bad situation to worry about
phased-obsolescence, but in general I think it's worth paying more
attention to reducing backward-compatibility headaches for users...)

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC




Re: replace oidrand() with random_sample()

From
"Christopher Kings-Lynne"
Date:
> > Well, now you're adding a whole lot of maintenance effort to support
> > some quite-hypothetical users of a function we all agree is junk ;-)
> >
> > I think we should either leave it be, or rip it out and be done with it.
> > I don't actually much care which (does anyone out there have an
> > opinion?).  But a phased-obsolescence plan is far more work than it
> > deserves.
>
> Yea, just remove it and we can re-add it later if people ask for it.

C'mon guys!  We really have to leave it!  It's of no consequence to us to
leave it there, and backwards compatibility is really, really important!

(Says guy who just spend 3 hours trying to upgrade development server to
7.3!)

Chris


Re: replace oidrand() with random_sample()

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> (BTW, this *particular* case may be a bad situation to worry about
> phased-obsolescence, but in general I think it's worth paying more
> attention to reducing backward-compatibility headaches for users...)

Indeed, we've been beat about the head and shoulders on that point
often enough.  But you can easily bring a development effort to a
standstill by spending all your time on such matters rather than
productive new work.  I think the secret is to understand where it's
worth spending effort on backwards-compatibility issues, and where not.
I can't claim to have a real good track record on making the right
choices...

            regards, tom lane

Re: replace oidrand() with random_sample()

From
Neil Conway
Date:
On Wed, 2003-01-15 at 23:45, Christopher Kings-Lynne wrote:
> C'mon guys!  We really have to leave it!  It's of no consequence to us to
> leave it there

It is, though. Continuing to maintain old, crufty code that is no longer
used *does* have a finite cost. Consider the security holes during
7.2.x: the majority of those were found in code that was written years
ago, and only occaisonally looked at since (e.g. the geometry types).

> backwards compatibility is really, really important!

Agreed, but given that this feature

    (a) was completely undocumented

    (b) strangely named

You would have to try *pretty hard* just to know that it existed. And
even if you decided to use it in an application, it won't take much more
than grepping for 'oidrand' and replacing it with 'random() < 1.0/y' in
order to upgrade to 7.4.

(That's in contrast to things like tightening the rules for varchar
input, which aren't trivial to workaround -- in *that* case, I can see
the need for special arrangements for backward-compatibility...)

Cheers,

Neil

P.S. Is there a reason for keeping nullvalue() and nonnullvalue() in
misc.c?

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC




Re: replace oidrand() with random_sample()

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> You would have to try *pretty hard* just to know that it existed. And
> even if you decided to use it in an application, it won't take much more
> than grepping for 'oidrand' and replacing it with 'random() < 1.0/y' in
> order to upgrade to 7.4.

Or you could do

CREATE FUNCTION oidrand(oid, int) RETURNS bool AS
'SELECT random() < 1.0/$2' LANGUAGE sql;

            regards, tom lane

Re: replace oidrand() with random_sample()

From
Neil Conway
Date:
On Wed, 2003-01-15 at 23:27, Bruce Momjian wrote:
> Yea, just remove it and we can re-add it later if people ask for it.

Ok, this patch removes oidrand(), oidsrand(), and userfntest(), and
improves the SGML docs a little bit (un-commenting the setseed()
documentation).

Cheers,

Neil
--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC



Attachment

Re: replace oidrand() with random_sample()

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Neil Conway wrote:
> On Wed, 2003-01-15 at 23:27, Bruce Momjian wrote:
> > Yea, just remove it and we can re-add it later if people ask for it.
>
> Ok, this patch removes oidrand(), oidsrand(), and userfntest(), and
> improves the SGML docs a little bit (un-commenting the setseed()
> documentation).
>
> Cheers,
>
> Neil
> --
> Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
>
>

[ Attachment, skipping... ]

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: replace oidrand() with random_sample()

From
Bruce Momjian
Date:
Patch applied.  Thanks.

---------------------------------------------------------------------------


Neil Conway wrote:
> On Wed, 2003-01-15 at 23:27, Bruce Momjian wrote:
> > Yea, just remove it and we can re-add it later if people ask for it.
>
> Ok, this patch removes oidrand(), oidsrand(), and userfntest(), and
> improves the SGML docs a little bit (un-commenting the setseed()
> documentation).
>
> Cheers,
>
> Neil
> --
> Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
>
>

[ Attachment, skipping... ]

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073