Thread: How to insert rows distributed evenly between referenced rows?

How to insert rows distributed evenly between referenced rows?

From
Herouth Maoz
Date:
Basically, I want to do something like this.

I have N rows in table rawData.
I have to create batches from these N rows using tables batches (which has a serial id column and some additional data
columns)and batchContents (which references id in batches), where there will be M rows in batchContent for each row in
tablebatches. 

Example (N=12, M=5, meaning take 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows
perbatch) 

rawData
r01
r02
r03
r04
r05
r06
r07
r08
r09
r10
r11
r12

Expected result:

batches:

id         post
5001   5
5002   5
5003   5

batchContents:

batch_id  datum
5001        r01
5001        r02
5001        r03
5001        r04
5001        r05
5002        r06
5002        r07
5002        r08
5002        r09
5002        r10
5003        r11
5003        r12

The order in which the data are distributed between the batches is not important, but I need to have M data in each
batchexcept the last. 

My starting point was a statement for insertion into batches. If I know what N and M are, I know how many batches I'll
need(B=ceil(N/M)), so I thought of writing 

INSERT INTO batches(post)
SELECT 5 -- All the extra data, like the "post" field, is inserted as literals here
FROM generate_series(1,B)
RETURNING id

This will give me the proper rows in batches, but I don't know how to incorporate this into an insert/select from
rawDatainto batchContent, assuming that I don't want to keep the data programatically and do repeated SELECTs with
OFFSETand LIMIT for each id returned. Maybe there is some elegant solution with window functions? Can there be any sort
ofjoin over a window or something like that? 


TIA,
Herouth




Re: How to insert rows distributed evenly between referenced rows?

From
Erik Darling
Date:
<p dir="ltr">I think NTILE()  will be your friend here. <p dir="ltr"><a
href="http://www.postgresql.org/docs/9.3/static/functions-window.html">http://www.postgresql.org/docs/9.3/static/functions-window.html</a><br
/><divclass="gmail_quote">On Jan 27, 2014 10:11 AM, "Herouth Maoz" <<a
href="mailto:herouth@unicell.co.il">herouth@unicell.co.il</a>>wrote:<br type="attribution" /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Basically, I want to do
somethinglike this.<br /><br /> I have N rows in table rawData.<br /> I have to create batches from these N rows using
tablesbatches (which has a serial id column and some additional data columns) and batchContents (which references id in
batches),where there will be M rows in batchContent for each row in table batches.<br /><br /> Example (N=12, M=5,
meaningtake 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows per batch)<br /><br />
rawData<br/> r01<br /> r02<br /> r03<br /> r04<br /> r05<br /> r06<br /> r07<br /> r08<br /> r09<br /> r10<br /> r11<br
/>r12<br /><br /> Expected result:<br /><br /> batches:<br /><br /> id         post<br /> 5001   5<br /> 5002   5<br />
5003  5<br /><br /> batchContents:<br /><br /> batch_id  datum<br /> 5001        r01<br /> 5001        r02<br /> 5001  
    r03<br /> 5001        r04<br /> 5001        r05<br /> 5002        r06<br /> 5002        r07<br /> 5002      
 r08<br/> 5002        r09<br /> 5002        r10<br /> 5003        r11<br /> 5003        r12<br /><br /> The order in
whichthe data are distributed between the batches is not important, but I need to have M data in each batch except the
last.<br/><br /> My starting point was a statement for insertion into batches. If I know what N and M are, I know how
manybatches I'll need (B=ceil(N/M)), so I thought of writing<br /><br /> INSERT INTO batches(post)<br /> SELECT 5 --
Allthe extra data, like the "post" field, is inserted as literals here<br /> FROM generate_series(1,B)<br /> RETURNING
id<br/><br /> This will give me the proper rows in batches, but I don't know how to incorporate this into an
insert/selectfrom rawData into batchContent, assuming that I don't want to keep the data programatically and do
repeatedSELECTs with OFFSET and LIMIT for each id returned. Maybe there is some elegant solution with window functions?
Canthere be any sort of join over a window or something like that?<br /><br /><br /> TIA,<br /> Herouth<br /><br /><br
/><br/> --<br /> Sent via pgsql-sql mailing list (<a
href="mailto:pgsql-sql@postgresql.org">pgsql-sql@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a
href="http://www.postgresql.org/mailpref/pgsql-sql"target="_blank">http://www.postgresql.org/mailpref/pgsql-sql</a><br
/></blockquote></div>

Re: How to insert rows distributed evenly between referenced rows?

From
Herouth Maoz
Date:
But is there a way to join the result of the INSERT...RETURNING that I mentioned with a SELECT from rawData?

On 27/01/2014, at 17:13, Erik Darling wrote:

I think NTILE()  will be your friend here.

http://www.postgresql.org/docs/9.3/static/functions-window.html

On Jan 27, 2014 10:11 AM, "Herouth Maoz" <herouth@unicell.co.il> wrote:
Basically, I want to do something like this.

I have N rows in table rawData.
I have to create batches from these N rows using tables batches (which has a serial id column and some additional data columns) and batchContents (which references id in batches), where there will be M rows in batchContent for each row in table batches.

Example (N=12, M=5, meaning take 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows per batch)

rawData
r01
r02
r03
r04
r05
r06
r07
r08
r09
r10
r11
r12

Expected result:

batches:

id         post
5001   5
5002   5
5003   5

batchContents:

batch_id  datum
5001        r01
5001        r02
5001        r03
5001        r04
5001        r05
5002        r06
5002        r07
5002        r08
5002        r09
5002        r10
5003        r11
5003        r12

The order in which the data are distributed between the batches is not important, but I need to have M data in each batch except the last.

My starting point was a statement for insertion into batches. If I know what N and M are, I know how many batches I'll need (B=ceil(N/M)), so I thought of writing

INSERT INTO batches(post)
SELECT 5 -- All the extra data, like the "post" field, is inserted as literals here
FROM generate_series(1,B)
RETURNING id

This will give me the proper rows in batches, but I don't know how to incorporate this into an insert/select from rawData into batchContent, assuming that I don't want to keep the data programatically and do repeated SELECTs with OFFSET and LIMIT for each id returned. Maybe there is some elegant solution with window functions? Can there be any sort of join over a window or something like that?


TIA,
Herouth



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742

Re: How to insert rows distributed evenly between referenced rows?

From
Herouth Maoz
Date:
Hi all,

I thought I was making some headway in that, when I decided to insert into batches first, without RETURNING, and then use the data from rawData and batches with a WITH statement for inserting into batchContents.

So the idea was to use (in a stored procedure where post and num_batches are integer variables):

WITH b_id_query AS (
            SELECT id, row_number() OVER () AS batch_num
            FROM batches
            WHERE post_id = post
            AND   status = 1
         ), 
         raw_query AS (
            SELECT datum, row_number() OVER () % num_batches + 1 AS batch_num
            FROM rawData
         )
INSERT INTO batchContent( batch_id, datum)
SELECT b_id_query.id, raw_query.datum
    FROM b_id_query, raw_query
    WHERE b_id_query.batch_num = raw_query.batch_num;

This basically assigns a row number to each batch id in the b_id_query, and a number from 1 to num_batches to each datum in raw_query, and then I can join by that into batchContent;

But... the batchContent table has a table that inherits from it, and there are rules on INSERT and UPDATE for it. I get the following error message:

ERROR:  WITH cannot be used in a query that is rewritten by rules into multiple queries

Any way around that? My rules will direct these records to the top level batchContent table, never to the child tables, because the field that's important for the rules is null, so it's really frustrating that PostgreSQL won't allow me to insert it normally.

TIA,
Herouth

On 27/01/2014, at 17:42, Herouth Maoz wrote:

But is there a way to join the result of the INSERT...RETURNING that I mentioned with a SELECT from rawData?

On 27/01/2014, at 17:13, Erik Darling wrote:

I think NTILE()  will be your friend here.

http://www.postgresql.org/docs/9.3/static/functions-window.html

On Jan 27, 2014 10:11 AM, "Herouth Maoz" <herouth@unicell.co.il> wrote:
Basically, I want to do something like this.

I have N rows in table rawData.
I have to create batches from these N rows using tables batches (which has a serial id column and some additional data columns) and batchContents (which references id in batches), where there will be M rows in batchContent for each row in table batches.

Example (N=12, M=5, meaning take 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows per batch)

rawData
r01
r02
r03
r04
r05
r06
r07
r08
r09
r10
r11
r12

Expected result:

batches:

id         post
5001   5
5002   5
5003   5

batchContents:

batch_id  datum
5001        r01
5001        r02
5001        r03
5001        r04
5001        r05
5002        r06
5002        r07
5002        r08
5002        r09
5002        r10
5003        r11
5003        r12

The order in which the data are distributed between the batches is not important, but I need to have M data in each batch except the last.

My starting point was a statement for insertion into batches. If I know what N and M are, I know how many batches I'll need (B=ceil(N/M)), so I thought of writing

INSERT INTO batches(post)
SELECT 5 -- All the extra data, like the "post" field, is inserted as literals here
FROM generate_series(1,B)
RETURNING id

This will give me the proper rows in batches, but I don't know how to incorporate this into an insert/select from rawData into batchContent, assuming that I don't want to keep the data programatically and do repeated SELECTs with OFFSET and LIMIT for each id returned. Maybe there is some elegant solution with window functions? Can there be any sort of join over a window or something like that?


TIA,
Herouth



--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742



--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742