Thread: Multiple inserts with two levels of foreign keys

Multiple inserts with two levels of foreign keys

From
Dow Drake
Date:
Hi,

I'm trying to write a postgresql script to replicate a hierarchical structure in a live database into my development database, where I can debug and test more easily.  I can extract the data from the live database that needs to be inserted, but I'm having trouble writing the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that farm, then insert two deliveries for each of the the two crops so that in the end, my tables look like this:
farms
id     name
1      'Happy Valley Farm'

crops
id     farm_id    name
1        1         'corn'
2        1         'wheat'

delvieries
id       crop_id    ticket
1         1          '3124'
2         2          '3127'
3         1          '3133'
4         2          '3140'

It's important that the deliveries get assigned to the right crops.  I think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to adapt it to multiple records.

Thanks for any help on this!


Re: Multiple inserts with two levels of foreign keys

From
Ron
Date:
Frame challenge: why can't you just "\copy to" the dev database tables in the correct order, to satisfy foreign key requirements?

On 10/4/23 18:59, Dow Drake wrote:
Hi,

I'm trying to write a postgresql script to replicate a hierarchical structure in a live database into my development database, where I can debug and test more easily.  I can extract the data from the live database that needs to be inserted, but I'm having trouble writing the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that farm, then insert two deliveries for each of the the two crops so that in the end, my tables look like this:
farms
id     name
1      'Happy Valley Farm'

crops
id     farm_id    name
1        1         'corn'
2        1         'wheat'

delvieries
id       crop_id    ticket
1         1          '3124'
2         2          '3127'
3         1          '3133'
4         2          '3140'

It's important that the deliveries get assigned to the right crops.  I think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to adapt it to multiple records.

Thanks for any help on this!



--
Born in Arizona, moved to Babylonia.

Re: Multiple inserts with two levels of foreign keys

From
Dow Drake
Date:
Thanks for the reply, Ron!
I'm not sure I see how to make your suggestion work, though.  Suppose I dump the three tables to CSV as you suggest (and write a script to extract the relevant records from those CSV dumps in the correct order).  It might be that in the dev database, the next generated key values are 199 for farm's id, 2145 for crop's id and 10242 for deliveries' id.  The databases are independent.

Just inserting the records in the same order doesn't take care of setting the foreign key values correctly -- does it?  I think I'm really looking for a solution more along the lines of the link in my original post.

Best,
Dow

On Wed, Oct 4, 2023 at 6:26 PM Ron <ronljohnsonjr@gmail.com> wrote:
Frame challenge: why can't you just "\copy to" the dev database tables in the correct order, to satisfy foreign key requirements?

On 10/4/23 18:59, Dow Drake wrote:
Hi,

I'm trying to write a postgresql script to replicate a hierarchical structure in a live database into my development database, where I can debug and test more easily.  I can extract the data from the live database that needs to be inserted, but I'm having trouble writing the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that farm, then insert two deliveries for each of the the two crops so that in the end, my tables look like this:
farms
id     name
1      'Happy Valley Farm'

crops
id     farm_id    name
1        1         'corn'
2        1         'wheat'

delvieries
id       crop_id    ticket
1         1          '3124'
2         2          '3127'
3         1          '3133'
4         2          '3140'

It's important that the deliveries get assigned to the right crops.  I think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to adapt it to multiple records.

Thanks for any help on this!



--
Born in Arizona, moved to Babylonia.

Re: Multiple inserts with two levels of foreign keys

From
Ron
Date:
Ah.  We'd truncate all of the dev tables, then load a "slice" (for example, accounts 10000 to 19999, and all associated records from downstream tables; lots and lots of views!!) from the prod database.

On 10/4/23 20:50, Dow Drake wrote:
Thanks for the reply, Ron!
I'm not sure I see how to make your suggestion work, though.  Suppose I dump the three tables to CSV as you suggest (and write a script to extract the relevant records from those CSV dumps in the correct order).  It might be that in the dev database, the next generated key values are 199 for farm's id, 2145 for crop's id and 10242 for deliveries' id.  The databases are independent.

Just inserting the records in the same order doesn't take care of setting the foreign key values correctly -- does it?  I think I'm really looking for a solution more along the lines of the link in my original post.

Best,
Dow

On Wed, Oct 4, 2023 at 6:26 PM Ron <ronljohnsonjr@gmail.com> wrote:
Frame challenge: why can't you just "\copy to" the dev database tables in the correct order, to satisfy foreign key requirements?

On 10/4/23 18:59, Dow Drake wrote:
Hi,

I'm trying to write a postgresql script to replicate a hierarchical structure in a live database into my development database, where I can debug and test more easily.  I can extract the data from the live database that needs to be inserted, but I'm having trouble writing the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that farm, then insert two deliveries for each of the the two crops so that in the end, my tables look like this:
farms
id     name
1      'Happy Valley Farm'

crops
id     farm_id    name
1        1         'corn'
2        1         'wheat'

delvieries
id       crop_id    ticket
1         1          '3124'
2         2          '3127'
3         1          '3133'
4         2          '3140'

It's important that the deliveries get assigned to the right crops.  I think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to adapt it to multiple records.

Thanks for any help on this!



--
Born in Arizona, moved to Babylonia.

--
Born in Arizona, moved to Babylonia.

Re: Multiple inserts with two levels of foreign keys

From
Dow Drake
Date:
I see.  That would definitely work, but part of this for me is to get a better understanding of PostgreSQL's capabilities.  I'm going to keep working on a minimal solution that deletes no records from the dev database, and only inserts the required records.

On Wed, Oct 4, 2023 at 6:58 PM Ron <ronljohnsonjr@gmail.com> wrote:
Ah.  We'd truncate all of the dev tables, then load a "slice" (for example, accounts 10000 to 19999, and all associated records from downstream tables; lots and lots of views!!) from the prod database.

On 10/4/23 20:50, Dow Drake wrote:
Thanks for the reply, Ron!
I'm not sure I see how to make your suggestion work, though.  Suppose I dump the three tables to CSV as you suggest (and write a script to extract the relevant records from those CSV dumps in the correct order).  It might be that in the dev database, the next generated key values are 199 for farm's id, 2145 for crop's id and 10242 for deliveries' id.  The databases are independent.

Just inserting the records in the same order doesn't take care of setting the foreign key values correctly -- does it?  I think I'm really looking for a solution more along the lines of the link in my original post.

Best,
Dow

On Wed, Oct 4, 2023 at 6:26 PM Ron <ronljohnsonjr@gmail.com> wrote:
Frame challenge: why can't you just "\copy to" the dev database tables in the correct order, to satisfy foreign key requirements?

On 10/4/23 18:59, Dow Drake wrote:
Hi,

I'm trying to write a postgresql script to replicate a hierarchical structure in a live database into my development database, where I can debug and test more easily.  I can extract the data from the live database that needs to be inserted, but I'm having trouble writing the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that farm, then insert two deliveries for each of the the two crops so that in the end, my tables look like this:
farms
id     name
1      'Happy Valley Farm'

crops
id     farm_id    name
1        1         'corn'
2        1         'wheat'

delvieries
id       crop_id    ticket
1         1          '3124'
2         2          '3127'
3         1          '3133'
4         2          '3140'

It's important that the deliveries get assigned to the right crops.  I think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to adapt it to multiple records.

Thanks for any help on this!



--
Born in Arizona, moved to Babylonia.

--
Born in Arizona, moved to Babylonia.

Re: Multiple inserts with two levels of foreign keys

From
Ron
Date:
Have you considered writing a stored procedure to process records that have been written to temporary tables?

0. Create temporary tables tmp_farms, tmp_crops and tmp_deliveries, which don't have id columns.
1. Truncate the three temporary tables
2. Insert into the temp tables a "set" of prod data.
3. Call a stored procedure in the dev database that does INSERT INTO ..., using RETURNING to get the relevant id values for the subsequent tables.
4. goto 1.


On 10/4/23 21:15, Dow Drake wrote:
I see.  That would definitely work, but part of this for me is to get a better understanding of PostgreSQL's capabilities.  I'm going to keep working on a minimal solution that deletes no records from the dev database, and only inserts the required records.

On Wed, Oct 4, 2023 at 6:58 PM Ron <ronljohnsonjr@gmail.com> wrote:
Ah.  We'd truncate all of the dev tables, then load a "slice" (for example, accounts 10000 to 19999, and all associated records from downstream tables; lots and lots of views!!) from the prod database.

On 10/4/23 20:50, Dow Drake wrote:
Thanks for the reply, Ron!
I'm not sure I see how to make your suggestion work, though.  Suppose I dump the three tables to CSV as you suggest (and write a script to extract the relevant records from those CSV dumps in the correct order).  It might be that in the dev database, the next generated key values are 199 for farm's id, 2145 for crop's id and 10242 for deliveries' id.  The databases are independent.

Just inserting the records in the same order doesn't take care of setting the foreign key values correctly -- does it?  I think I'm really looking for a solution more along the lines of the link in my original post.

Best,
Dow

On Wed, Oct 4, 2023 at 6:26 PM Ron <ronljohnsonjr@gmail.com> wrote:
Frame challenge: why can't you just "\copy to" the dev database tables in the correct order, to satisfy foreign key requirements?

On 10/4/23 18:59, Dow Drake wrote:
Hi,

I'm trying to write a postgresql script to replicate a hierarchical structure in a live database into my development database, where I can debug and test more easily.  I can extract the data from the live database that needs to be inserted, but I'm having trouble writing the insertion script

Here's a simplified version of the problem I'm trying to solve:
There are three tables: farms, crops and deliveries where a farm has many crops and a crop has many deliveries.

create table farms (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   name character varying(30)
);
create table crops (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   farm_id bigint not null
   name character varying(30)
);
create table deliveries (
   id bigint NOT NULL GENERATED BY DEFAULT AS IDENTITY,
   crop_id bigint not null
   ticket character varying(30)
);
I want to insert a farm record, then insert two crops associated with that farm, then insert two deliveries for each of the the two crops so that in the end, my tables look like this:
farms
id     name
1      'Happy Valley Farm'

crops
id     farm_id    name
1        1         'corn'
2        1         'wheat'

delvieries
id       crop_id    ticket
1         1          '3124'
2         2          '3127'
3         1          '3133'
4         2          '3140'

It's important that the deliveries get assigned to the right crops.  I think this post: https://dba.stackexchange.com/questions/199916
gets close to what I need, but I haven't been able to figure out how to adapt it to multiple records.

Thanks for any help on this!



--
Born in Arizona, moved to Babylonia.

--
Born in Arizona, moved to Babylonia.

--
Born in Arizona, moved to Babylonia.

Re: Multiple inserts with two levels of foreign keys

From
Alvaro Herrera
Date:
On 2023-Oct-04, Dow Drake wrote:

> I want to insert a farm record, then insert two crops associated with that
> farm, then insert two deliveries for each of the the two crops so that in
> the end, my tables look like this:

If I understand you correctly, for each table you want one CTE with the
data you want to insert, and another CTE with the data actually
inserted, that can be matched later.  Something like this should work:

with newfarms (name) as (values ('Happy Valley Farm')),
     insertedfarms (id, name) as (insert into farms (name)
                                         select newfarms.name
                                           from newfarms
                                      returning id, name),
     newcrops (farm, name) as (values ('Happy Valley Farm', 'corn'),
                                      ('Happy Valley Farm', 'wheat')),
     insertedcrops as (insert into crops (farm_id, name)
                              select (select insertedfarms.id
                                        from insertedfarms
                                       where insertedfarms.name = newcrops.farm),
                                     newcrops.name
                                from newcrops
                           returning id, farm_id, name),
     newdeliveries (farm, name, ticket) as (values ('Happy Valley Farm', 'corn', '3124'),
                                                   ('Happy Valley Farm', 'wheat', '3127'),
                                                   ('Happy Valley Farm', 'corn', '3133'),
                                                   ('Happy Valley Farm', 'wheat', '3140')),
     inserteddeliveries as (insert into deliveries (crop_id, ticket)
                                   select (select ics.id
                                             from insertedfarms ifs join insertedcrops ics on (ifs.id = ics.farm_id)
                                            where ifs.name = newdeliveries.farm and
                                                  ics.name = newdeliveries.name),
                                          ticket
                                     from newdeliveries
                                returning *)
select * from inserteddeliveries;


-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
 Are you not unsure you want to delete Firefox?
       [Not unsure]     [Not not unsure]    [Cancel]
                   http://smylers.hates-software.com/2008/01/03/566e45b2.html



Re: Multiple inserts with two levels of foreign keys

From
Dow Drake
Date:
Yes!  Thanks, Alvaro!  This is exactly the pattern I was trying to work out!  This community is awesome!

> On Oct 5, 2023, at 2:39 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2023-Oct-04, Dow Drake wrote:
>
>> I want to insert a farm record, then insert two crops associated with that
>> farm, then insert two deliveries for each of the the two crops so that in
>> the end, my tables look like this:
>
> If I understand you correctly, for each table you want one CTE with the
> data you want to insert, and another CTE with the data actually
> inserted, that can be matched later.  Something like this should work:
>
> with newfarms (name) as (values ('Happy Valley Farm')),
>     insertedfarms (id, name) as (insert into farms (name)
>                                         select newfarms.name
>                                           from newfarms
>                                      returning id, name),
>     newcrops (farm, name) as (values ('Happy Valley Farm', 'corn'),
>                                      ('Happy Valley Farm', 'wheat')),
>     insertedcrops as (insert into crops (farm_id, name)
>                              select (select insertedfarms.id
>                                        from insertedfarms
>                                       where insertedfarms.name = newcrops.farm),
>                                     newcrops.name
>                                from newcrops
>                           returning id, farm_id, name),
>     newdeliveries (farm, name, ticket) as (values ('Happy Valley Farm', 'corn', '3124'),
>                                                   ('Happy Valley Farm', 'wheat', '3127'),
>                                                   ('Happy Valley Farm', 'corn', '3133'),
>                                                   ('Happy Valley Farm', 'wheat', '3140')),
>     inserteddeliveries as (insert into deliveries (crop_id, ticket)
>                                   select (select ics.id
>                                             from insertedfarms ifs join insertedcrops ics on (ifs.id = ics.farm_id)
>                                            where ifs.name = newdeliveries.farm and
>                                                  ics.name = newdeliveries.name),
>                                          ticket
>                                     from newdeliveries
>                                returning *)
> select * from inserteddeliveries;
>
>
> --
> Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
> Are you not unsure you want to delete Firefox?
>       [Not unsure]     [Not not unsure]    [Cancel]
>                   http://smylers.hates-software.com/2008/01/03/566e45b2.html



Re: Multiple inserts with two levels of foreign keys

From
Ron
Date:
But honestly, the amount of text duplication hurts my "inner programmer".  
And it would have to be generated dynamically, since you don't know how many 
crops were delivered.  #shudder

On 10/5/23 09:33, Dow Drake wrote:
> Yes!  Thanks, Alvaro!  This is exactly the pattern I was trying to work out!  This community is awesome!
>
>> On Oct 5, 2023, at 2:39 AM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>>
>> On 2023-Oct-04, Dow Drake wrote:
>>
>>> I want to insert a farm record, then insert two crops associated with that
>>> farm, then insert two deliveries for each of the the two crops so that in
>>> the end, my tables look like this:
>> If I understand you correctly, for each table you want one CTE with the
>> data you want to insert, and another CTE with the data actually
>> inserted, that can be matched later.  Something like this should work:
>>
>> with newfarms (name) as (values ('Happy Valley Farm')),
>>      insertedfarms (id, name) as (insert into farms (name)
>>                                          select newfarms.name
>>                                            from newfarms
>>                                       returning id, name),
>>      newcrops (farm, name) as (values ('Happy Valley Farm', 'corn'),
>>                                       ('Happy Valley Farm', 'wheat')),
>>      insertedcrops as (insert into crops (farm_id, name)
>>                               select (select insertedfarms.id
>>                                         from insertedfarms
>>                                        where insertedfarms.name = newcrops.farm),
>>                                      newcrops.name
>>                                 from newcrops
>>                            returning id, farm_id, name),
>>      newdeliveries (farm, name, ticket) as (values ('Happy Valley Farm', 'corn', '3124'),
>>                                                    ('Happy Valley Farm', 'wheat', '3127'),
>>                                                    ('Happy Valley Farm', 'corn', '3133'),
>>                                                    ('Happy Valley Farm', 'wheat', '3140')),
>>      inserteddeliveries as (insert into deliveries (crop_id, ticket)
>>                                    select (select ics.id
>>                                              from insertedfarms ifs join insertedcrops ics on (ifs.id =
ics.farm_id)
>>                                             where ifs.name = newdeliveries.farm and
>>                                                   ics.name = newdeliveries.name),
>>                                           ticket
>>                                      from newdeliveries
>>                                 returning *)
>> select * from inserteddeliveries;
>>
>>
>> -- 
>> Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
>> Are you not unsure you want to delete Firefox?
>>        [Not unsure]     [Not not unsure]    [Cancel]
>>                    http://smylers.hates-software.com/2008/01/03/566e45b2.html
>

-- 
Born in Arizona, moved to Babylonia.



Re: Multiple inserts with two levels of foreign keys

From
"Peter J. Holzer"
Date:
On 2023-10-05 09:59:24 -0500, Ron wrote:
> But honestly, the amount of text duplication hurts my "inner programmer". 
> And it would have to be generated dynamically, since you don't know how many
> crops were delivered.  #shudder

Yes, this seems like the kind of problem that I would definitely solve
in a script running outside of the database. Especially since it has to
talk to two databases. If the number of data records isn't too large
(maybe a few tens of thousands), I'd just write three loops to select
from the prod database and insert into the dev database.

If the number of records is too large for that, I'd create some staging
table with an extra column "new_id" filled from the same sequence as the
original table, like this:

create table new_farms(
    id bigint,
    name character varying(30),
    new_id bigint default nextval('farms_id_seq')
)

Then you can just COPY the data into these tables and it will give a
nice mapping from old to new ids which you can use in subsequent
inserts.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment

Re: Multiple inserts with two levels of foreign keys

From
Dow Drake
Date:
Thanks Peter!

I'll take a close look at your suggestion when I get a chance.  But I've already implemented a Python script that solves my actual problem based on the pattern that Alvaro Herrera suggested for the toy problem I described here.  It's working very well to reproduce the farm with several levels of one-to-many dependencies, and should be easy to maintain.  I really like the power of the with clause.

Best,
Dow

On Sun, Oct 8, 2023 at 2:03 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2023-10-05 09:59:24 -0500, Ron wrote:
> But honestly, the amount of text duplication hurts my "inner programmer". 
> And it would have to be generated dynamically, since you don't know how many
> crops were delivered.  #shudder

Yes, this seems like the kind of problem that I would definitely solve
in a script running outside of the database. Especially since it has to
talk to two databases. If the number of data records isn't too large
(maybe a few tens of thousands), I'd just write three loops to select
from the prod database and insert into the dev database.

If the number of records is too large for that, I'd create some staging
table with an extra column "new_id" filled from the same sequence as the
original table, like this:

create table new_farms(
    id bigint,
    name character varying(30),
    new_id bigint default nextval('farms_id_seq')
)

Then you can just COPY the data into these tables and it will give a
nice mapping from old to new ids which you can use in subsequent
inserts.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"