RE: Statistical Analysis - Mailing list pgsql-general

From Nathan Barnett
Subject RE: Statistical Analysis
Date
Msg-id 71975481CD04D4118E57004033A2596E0DF949@ip205.82.136.216.in-addr.arpa
Whole thread Raw
In response to Statistical Analysis  ("Nathan Barnett" <nbarnett@cellularphones.com>)
List pgsql-general
Tim,
    Hmm... this might just work because I could actually perform myrandfunc() <
.2 and then do a LIMIT on it for 10% or what not.  That would almost
gurantee the exact amount of rows.

-----------------
Nathan Barnett


-----Original Message-----
From: keitt@ulysses.nceas.ucsb.edu
[mailto:keitt@ulysses.nceas.ucsb.edu]On Behalf Of Timothy H. Keitt
Sent: Monday, July 24, 2000 3:41 PM
To: Nathan Barnett
Subject: Re: [GENERAL] Statistical Analysis


You would need to add a pseudorandom number function to postgresql.  If
your function returns numbers on [0, 1), then you could do:

    select * from mytable where myrandfunc() < 0.1;

and get back (asymtotically) 10% of the rows.  If you want exactly n
randomly chosen rows, its a bit more expensive computationally.

Another more involved approach would be to implement random cursors.
This would be great for bootstrapping analysis.

Tim

Nathan Barnett wrote:
>
> I am having to perform a large data analysis query fairly frequently and
the
> execution time is not exceptable, so I was looking at doing a statictical
> sample of the data to get fairly accurate results.  Is there a way to
> perform a query on a set number of random rows instead of the whole
dataset?
> I have looked through the documentation for a function that would do this,
> but I have not seen any.  If this is a RTFM type question, then feel free
to
> tell me so and point me in the right direction because I just haven't been
> able to find any info on it.
>
> Thanks ahead of time.
>
> ---------------
> Nathan Barnett

--
Timothy H. Keitt
National Center for Ecological Analysis and Synthesis
735 State Street, Suite 300, Santa Barbara, CA 93101
Phone: 805-892-2519, FAX: 805-892-2510
http://www.nceas.ucsb.edu/~keitt/


pgsql-general by date:

Previous
From: "Jeffrey A. Rhines"
Date:
Subject: Re: PostgreSQL, ODBC, Access
Next
From: Andrew McMillan
Date:
Subject: Re: Statistical Analysis