Thread: random dataset generator for SKYLINE operator

random dataset generator for SKYLINE operator

From
Hannes Eder
Date:
We wrote a little contrib module, which we'd like to share. It can be
used to generate random datasets as they have been used in
[Borzsonyi2001] and related work. The code is based directly on the
code of the authors, thanks to Donald Kossmann for sharing the
code. Donald Kossmann agrees on sharing our implementation with the
community.

We use it primary for profiling and regression testing of our
implementation of the SKYLINE operator in PostgreSQL, which is work in
progress, details will be announced soon.

We use it in two ways:

(1) create table a2d1000 as (select * from pg_rand_dataset(‘anti’, 2,
1000, 0) ds(id int, d1 float, d2 float)). And then work with this table,
e.g. an additional index and query the data or

(2) directly for profiling and regression testing as: select * from
pg_rand_dataset(‘indep’, 3, 1000, 0) ds(id int, d1 float, d2 float, d3
float) skyline by d1 min, d2 min, d3 max;

It might be useful for other purposes.

We ask you for comments and suggestions.

What's the best way to share this module?


Declaration
-----------

CREATE FUNCTION pg_catalog.pg_rand_dataset(disttype text, dim int,
rows int, seed int)
RETURNS setof record
AS '$libdir/randdataset', 'pg_rand_dataset'
LANGUAGE C IMMUTABLE STRICT;


Usage example
-------------

Hannes=# select * from pg_rand_dataset('indep',1,5,0) ds(id int, d1 float);
id | d1
----+--------------------
1 | 0.170828036112165
2 | 0.749901980510867
3 | 0.0963716553972902
4 | 0.870465227342427
5 | 0.577303506702792
(5 rows)

Hannes=# select * from pg_rand_dataset('corr',2,3,0) ds(id int, d1
float, d2 float);
id | d1 | d2
----+-------------------+-------------------
1 | 0.489722997173791 | 0.431007019449241
2 | 0.385624871971487 | 0.645283734523713
3 | 0.754378154805565 | 0.82440492311745
(3 rows)

Hannes=# select * from pg_rand_dataset('anti',3,2,0) ds(id int, d1
float, d2 float, d3 float);
id | d1 | d2 | d3
----+-------------------+--------------------+-------------------
1 | 0.848072555389202 | 0.0868831583602555 | 0.656344708211334
2 | 0.447278565604571 | 0.641176215525044 | 0.523196923510816
(2 rows)


References
----------

[Borzsonyi2001] Börzsönyi, S.; Kossmann, D. & Stocker, K.:
The Skyline Operator, ICDE 2001, 421--432


Best,
-Hannes

Attachment

Re: random dataset generator for SKYLINE operator

From
Hannes Eder
Date:
Hannes Eder worte:
> We wrote a little contrib module, which we'd like to share. It can be
> used to generate random datasets as they have been used in
> [Borzsonyi2001] and related work.
> [snip]
The module was moved to:

http://randdataset.projects.postgresql.org/

resp.

http://randdataset.projects.postgres.org/

Have fun using it,
-Hannes



Re: random dataset generator for SKYLINE operator

From
Hannes Eder
Date:
Hannes Eder worte:
> We wrote a little contrib module, which we'd like to share. It can be
> used to generate random datasets as they have been used in
> [Borzsonyi2001] and related work.
> [snip]
We release a command line version of this module. See:

http://randdataset.projects.postgresql.org/

the source is available as tarball at:

http://pgfoundry.org/frs/?group_id=1000305

or in the SCM.

-Hannes