Thread: How to split a table?

How to split a table?

From
"Felix Zhang"
Date:
Hi,
 
I want to split a table to 2 small tables. The 1st one contains 60% records which are randomly selected from the source table.
How to do it?
 
Regards,
Felix

Re: [GENERAL] How to split a table?

From
Andreas Kretschmer
Date:
Felix Zhang <felix.zhang.2005@gmail.com> schrieb:

> Hi,
>
> I want to split a table to 2 small tables. The 1st one contains 60% records
> which are randomly selected from the source table.
> How to do it?

Why do you want to do this?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: [GENERAL] How to split a table?

From
"Gregory S. Williamson"
Date:
Perhaps something like:

CREATE TABLE foo2 AS SELECT * FROM foo WHERE (rand() <= 0.60);

?

HTH,

Greg Williamson
DBA
GlobeXplorer LLC


-----Original Message-----
From:    pgsql-sql-owner@postgresql.org on behalf of Andreas Kretschmer
Sent:    Tue 10/17/2006 1:34 AM
To:    pgsql-general@postgresql.org; pgsql-sql@postgresql.org; pgsql-novice@postgresql.org
Cc:
Subject:    Re: [SQL] [GENERAL] How to split a table?

Felix Zhang <felix.zhang.2005@gmail.com> schrieb:

> Hi,
>
> I want to split a table to 2 small tables. The 1st one contains 60% records
> which are randomly selected from the source table.
> How to do it?

Why do you want to do this?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4534936b271274356172766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:4534936b271274356172766!
-------------------------------------------------------






Re: [NOVICE] [GENERAL] How to split a table?

From
"Felix Zhang"
Date:
to do some statistics analysis.

2006/10/17, Andreas Kretschmer <akretschmer@spamfence.net>:
Felix Zhang <felix.zhang.2005@gmail.com> schrieb:

> Hi,
>
> I want to split a table to 2 small tables. The 1st one contains 60% records
> which are randomly selected from the source table.
> How to do it?

Why do you want to do this?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [GENERAL] How to split a table?

From
Achilleas Mantzios
Date:
Στις Τρίτη 17 Οκτώβριος 2006 11:34, ο/η Andreas Kretschmer έγραψε:
> Felix Zhang <felix.zhang.2005@gmail.com> schrieb:
> > Hi,
> >
> > I want to split a table to 2 small tables. The 1st one contains 60%
> > records which are randomly selected from the source table.
> > How to do it?
>
> Why do you want to do this?

In any case, you could write a program in perl and insert into the second
table rows from the first table, using a techique of generarting random
double precision numbers in a predetermined range (lets say 0 to 2^31-1), and
then converting this number to your "range" like this:
let your number be r.
let your table's cardinality be N.
then let R=(N*r/(2^31))+1

select a source row from your table like:

select * from srctable order by pk offset <R>,
and then insert this row to your desttable.

>
>
> Andreas

--
Achilleas Mantzios


Re: [GENERAL] How to split a table?

From
"A. Kretschmer"
Date:
am  Tue, dem 17.10.2006, um  1:53:35 -0700 mailte Gregory S. Williamson folgendes:
> Perhaps something like:
>
> CREATE TABLE foo2 AS SELECT * FROM foo WHERE (rand() <= 0.60);

Then we have 2 tables: one with 100% data and one with around 60% ;-)
If the table contains a primary key you can delete simple the copied
records from the origin table.

(delete from origin where pk in (select pk from copy);


>
> -----Original Message-----

Please, no top-posting with fullquote below.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: [GENERAL] How to split a table?

From
"Gregory S. Williamson"
Date:
A crude approach would be to add a column to the original table; then update that based on the rand() call:

update foo set i_am_a_60 = 1 where (rand() <= 0.60);
create table foo_60 as select * from foo where i_am_a_60 = 1;
create table foo_40 as select * from foo where i_am_a_60 <> 1;

The CASE condition might be usable as well but I haven't puzzled it out ...

G


-----Original Message-----
From:    pgsql-sql-owner@postgresql.org on behalf of A. Kretschmer
Sent:    Tue 10/17/2006 2:12 AM
To:    pgsql-sql@postgresql.org; pgsql-novice@postgresql.org
Cc:
Subject:    Re: [SQL] [GENERAL] How to split a table?

am  Tue, dem 17.10.2006, um  1:53:35 -0700 mailte Gregory S. Williamson folgendes:
> Perhaps something like:
>
> CREATE TABLE foo2 AS SELECT * FROM foo WHERE (rand() <= 0.60);

Then we have 2 tables: one with 100% data and one with around 60% ;-)
If the table contains a primary key you can delete simple the copied
records from the origin table.

(delete from origin where pk in (select pk from copy);


>
> -----Original Message-----

Please, no top-posting with fullquote below.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349cb0275391789821027&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:45349cb0275391789821027!
-------------------------------------------------------






Re: [GENERAL] How to split a table?

From
Jorge Godoy
Date:
"Gregory S. Williamson" <gsw@globexplorer.com> writes:

> A crude approach would be to add a column to the original table; then update
> that based on the rand() call:
>
> update foo set i_am_a_60 = 1 where (rand() <= 0.60);
> create table foo_60 as select * from foo where i_am_a_60 = 1;
> create table foo_40 as select * from foo where i_am_a_60 <> 1;
>
> The CASE condition might be usable as well but I haven't puzzled it out ...

If he's asking that 60% of the contents of the original table be randomly
selected this won't work.  He'll have to count how many rows the original
table has, then loop randomly selecting rows until he has reached 60% of that
total.  Otherwise he might end up with something completely different from
what he wants.

On the other hand if he wants rows whose randomness factor at the time they
were looked at was bigger than 0.6 then he can use that rand() trick.

--
Jorge Godoy      <jgodoy@gmail.com>

Re: [NOVICE] How to split a table?

From
Christoph Frick
Date:
On Tue, Oct 17, 2006 at 03:39:21PM +0800, Felix Zhang wrote:

> I want to split a table to 2 small tables. The 1st one contains 60%
> records which are randomly selected from the source table.  How to do
> it?

i do my A/B-Group splitting usually by &1 the serial of the table.
assuming, that there are no irregularities in the process of creating
your entries, this should give a 50/50 split, which is reuseable for
future things and there never is a intersection of the two groups.

--
cu

Attachment