Thread: How to insert rows distributed evenly between referenced rows?
Basically, I want to do something like this. I have N rows in table rawData. I have to create batches from these N rows using tables batches (which has a serial id column and some additional data columns)and batchContents (which references id in batches), where there will be M rows in batchContent for each row in tablebatches. Example (N=12, M=5, meaning take 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows perbatch) rawData r01 r02 r03 r04 r05 r06 r07 r08 r09 r10 r11 r12 Expected result: batches: id post 5001 5 5002 5 5003 5 batchContents: batch_id datum 5001 r01 5001 r02 5001 r03 5001 r04 5001 r05 5002 r06 5002 r07 5002 r08 5002 r09 5002 r10 5003 r11 5003 r12 The order in which the data are distributed between the batches is not important, but I need to have M data in each batchexcept the last. My starting point was a statement for insertion into batches. If I know what N and M are, I know how many batches I'll need(B=ceil(N/M)), so I thought of writing INSERT INTO batches(post) SELECT 5 -- All the extra data, like the "post" field, is inserted as literals here FROM generate_series(1,B) RETURNING id This will give me the proper rows in batches, but I don't know how to incorporate this into an insert/select from rawDatainto batchContent, assuming that I don't want to keep the data programatically and do repeated SELECTs with OFFSETand LIMIT for each id returned. Maybe there is some elegant solution with window functions? Can there be any sort ofjoin over a window or something like that? TIA, Herouth
<p dir="ltr">I think NTILE() will be your friend here. <p dir="ltr"><a href="http://www.postgresql.org/docs/9.3/static/functions-window.html">http://www.postgresql.org/docs/9.3/static/functions-window.html</a><br /><divclass="gmail_quote">On Jan 27, 2014 10:11 AM, "Herouth Maoz" <<a href="mailto:herouth@unicell.co.il">herouth@unicell.co.il</a>>wrote:<br type="attribution" /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Basically, I want to do somethinglike this.<br /><br /> I have N rows in table rawData.<br /> I have to create batches from these N rows using tablesbatches (which has a serial id column and some additional data columns) and batchContents (which references id in batches),where there will be M rows in batchContent for each row in table batches.<br /><br /> Example (N=12, M=5, meaningtake 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows per batch)<br /><br /> rawData<br/> r01<br /> r02<br /> r03<br /> r04<br /> r05<br /> r06<br /> r07<br /> r08<br /> r09<br /> r10<br /> r11<br />r12<br /><br /> Expected result:<br /><br /> batches:<br /><br /> id post<br /> 5001 5<br /> 5002 5<br /> 5003 5<br /><br /> batchContents:<br /><br /> batch_id datum<br /> 5001 r01<br /> 5001 r02<br /> 5001 r03<br /> 5001 r04<br /> 5001 r05<br /> 5002 r06<br /> 5002 r07<br /> 5002 r08<br/> 5002 r09<br /> 5002 r10<br /> 5003 r11<br /> 5003 r12<br /><br /> The order in whichthe data are distributed between the batches is not important, but I need to have M data in each batch except the last.<br/><br /> My starting point was a statement for insertion into batches. If I know what N and M are, I know how manybatches I'll need (B=ceil(N/M)), so I thought of writing<br /><br /> INSERT INTO batches(post)<br /> SELECT 5 -- Allthe extra data, like the "post" field, is inserted as literals here<br /> FROM generate_series(1,B)<br /> RETURNING id<br/><br /> This will give me the proper rows in batches, but I don't know how to incorporate this into an insert/selectfrom rawData into batchContent, assuming that I don't want to keep the data programatically and do repeatedSELECTs with OFFSET and LIMIT for each id returned. Maybe there is some elegant solution with window functions? Canthere be any sort of join over a window or something like that?<br /><br /><br /> TIA,<br /> Herouth<br /><br /><br /><br/> --<br /> Sent via pgsql-sql mailing list (<a href="mailto:pgsql-sql@postgresql.org">pgsql-sql@postgresql.org</a>)<br/> To make changes to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-sql"target="_blank">http://www.postgresql.org/mailpref/pgsql-sql</a><br /></blockquote></div>
But is there a way to join the result of the INSERT...RETURNING that I mentioned with a SELECT from rawData?
--
On 27/01/2014, at 17:13, Erik Darling wrote:
I think NTILE() will be your friend here.
http://www.postgresql.org/docs/9.3/static/functions-window.html
On Jan 27, 2014 10:11 AM, "Herouth Maoz" <herouth@unicell.co.il> wrote:Basically, I want to do something like this.
I have N rows in table rawData.
I have to create batches from these N rows using tables batches (which has a serial id column and some additional data columns) and batchContents (which references id in batches), where there will be M rows in batchContent for each row in table batches.
Example (N=12, M=5, meaning take 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows per batch)
rawData
r01
r02
r03
r04
r05
r06
r07
r08
r09
r10
r11
r12
Expected result:
batches:
id post
5001 5
5002 5
5003 5
batchContents:
batch_id datum
5001 r01
5001 r02
5001 r03
5001 r04
5001 r05
5002 r06
5002 r07
5002 r08
5002 r09
5002 r10
5003 r11
5003 r12
The order in which the data are distributed between the batches is not important, but I need to have M data in each batch except the last.
My starting point was a statement for insertion into batches. If I know what N and M are, I know how many batches I'll need (B=ceil(N/M)), so I thought of writing
INSERT INTO batches(post)
SELECT 5 -- All the extra data, like the "post" field, is inserted as literals here
FROM generate_series(1,B)
RETURNING id
This will give me the proper rows in batches, but I don't know how to incorporate this into an insert/select from rawData into batchContent, assuming that I don't want to keep the data programatically and do repeated SELECTs with OFFSET and LIMIT for each id returned. Maybe there is some elegant solution with window functions? Can there be any sort of join over a window or something like that?
TIA,
Herouth
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742
Hi all,
--
I thought I was making some headway in that, when I decided to insert into batches first, without RETURNING, and then use the data from rawData and batches with a WITH statement for inserting into batchContents.
So the idea was to use (in a stored procedure where post and num_batches are integer variables):
WITH b_id_query AS (
SELECT id, row_number() OVER () AS batch_num
FROM batches
WHERE post_id = post
AND status = 1
),
raw_query AS (
SELECT datum, row_number() OVER () % num_batches + 1 AS batch_num
FROM rawData
)
INSERT INTO batchContent( batch_id, datum)
SELECT b_id_query.id, raw_query.datum
FROM b_id_query, raw_query
WHERE b_id_query.batch_num = raw_query.batch_num;
This basically assigns a row number to each batch id in the b_id_query, and a number from 1 to num_batches to each datum in raw_query, and then I can join by that into batchContent;
But... the batchContent table has a table that inherits from it, and there are rules on INSERT and UPDATE for it. I get the following error message:
ERROR: WITH cannot be used in a query that is rewritten by rules into multiple queries
Any way around that? My rules will direct these records to the top level batchContent table, never to the child tables, because the field that's important for the rules is null, so it's really frustrating that PostgreSQL won't allow me to insert it normally.
TIA,
Herouth
On 27/01/2014, at 17:42, Herouth Maoz wrote:
But is there a way to join the result of the INSERT...RETURNING that I mentioned with a SELECT from rawData?On 27/01/2014, at 17:13, Erik Darling wrote:I think NTILE() will be your friend here.
http://www.postgresql.org/docs/9.3/static/functions-window.html
On Jan 27, 2014 10:11 AM, "Herouth Maoz" <herouth@unicell.co.il> wrote:Basically, I want to do something like this.
I have N rows in table rawData.
I have to create batches from these N rows using tables batches (which has a serial id column and some additional data columns) and batchContents (which references id in batches), where there will be M rows in batchContent for each row in table batches.
Example (N=12, M=5, meaning take 12 rows, and create as many batches as needed with a maximum of 5 batchContent rows per batch)
rawData
r01
r02
r03
r04
r05
r06
r07
r08
r09
r10
r11
r12
Expected result:
batches:
id post
5001 5
5002 5
5003 5
batchContents:
batch_id datum
5001 r01
5001 r02
5001 r03
5001 r04
5001 r05
5002 r06
5002 r07
5002 r08
5002 r09
5002 r10
5003 r11
5003 r12
The order in which the data are distributed between the batches is not important, but I need to have M data in each batch except the last.
My starting point was a statement for insertion into batches. If I know what N and M are, I know how many batches I'll need (B=ceil(N/M)), so I thought of writing
INSERT INTO batches(post)
SELECT 5 -- All the extra data, like the "post" field, is inserted as literals here
FROM generate_series(1,B)
RETURNING id
This will give me the proper rows in batches, but I don't know how to incorporate this into an insert/select from rawData into batchContent, assuming that I don't want to keep the data programatically and do repeated SELECTs with OFFSET and LIMIT for each id returned. Maybe there is some elegant solution with window functions? Can there be any sort of join over a window or something like that?
TIA,
Herouth
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
--חרות מעוזיוניסל פתרונות סלולריים מתקדמים☎ 03-5181717 שלוחה 742
--
חרות מעוז
יוניסל פתרונות סלולריים מתקדמים
☎ 03-5181717 שלוחה 742