Re: [HACKERS] Much Ado About COUNT(*) - Mailing list pgsql-general

From Csaba Nagy
Subject Re: [HACKERS] Much Ado About COUNT(*)
Date
Msg-id 1105632432.14493.31.camel@localhost.localdomain
Whole thread Raw
In response to Re: [HACKERS] Much Ado About COUNT(*)  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
[snip]
> See:
>
>   http://www.jlcomp.demon.co.uk/faq/random.html
>
> I think the Oracle syntax looks like
>
>   SELECT * FROM foo SAMPLE (0.1)
>
> I don't think I would have picked this syntax but it seems like a better idea
> to copy the existing practice rather than invent a new one.
>
> There are some details, like what to do when there's a WHERE clause or joins.
> Oracle disallows joins entirely and I'm unclear what the best thing to do
> about where clauses would be.

The where clauses could be applied exactly as for a normal select, with
the sampling being just a pre-filtering condition for what rows to
consider.

If there would be a way to specify the table on which to apply the
sampling, then the whole construct could be replaced automatically by
the inline view the oracle link recommends.
I doubt there would be any benefit in sampling more than one table in a
query, it should just work to sample the biggest table, and join the
result with the others. Sampling is only useful for really big tables
anyway.

So the syntax above could be extended to:

SELECT * FROM foo SAMPLE (foo, 0.1)

and:

SELECT foo.*, bar.*
  FROM foo, bar
  WHERE foo.key = bar.key
  SAMPLE (foo, 0.1)

which means sample foo and join the result with bar.

All this makes sense from a user point of view, I wonder how big a PITA
is to implement it...

Cheers,
Csaba.




pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Trace triggers
Next
From: Alvaro Herrera
Date:
Subject: Re: Réf. : Re: Réf