Thread: Tools for moving normalized data around

Tools for moving normalized data around

From
Peter
Date:
Hi,

  imagine I have a database containing normalized data: a whole bunch
of tables all related via foreign keys (i.e. the thing one should
usually have ;) ).

So there is a dependency graph: all records relate to others in
some tree-like fashion (has-many, belongs-to, etc.)

Now I want to grab some part of the data, on a certain condition
(let's say all records belonging to user 'Bob', if there is a "user"
table somewhere at the tree-bottom), and move it to another database
with the very same layout - which is already populated with data
and runs a different instance of the same application.

Grabbing the intended records is just some dead-simple Selects. But
then inserting them into the other database is not fun, because some
primary keys will likely collide.
And if we start to update the primary keys to new unique values, we
must consequently update ALL the foreign keys throughout the entire
tree of tables.

How to do this with two tables connected via one foreign key, that is
explained a dozen times in Stackoverflow. But what if the tree is 50
tables and 120 foreign key columns?
It can be done. But probably not manually.

So, as this seems a very usual use-case for normalized data, is there
any tooling available? Anywhere? (I searched, I didn't find.)

Also, it seems the provided commands are not very supporting. Try to
do an Insert and *always* create a new PK from the sequence, and
return the old and the new PK for the inserted row. It seems this does
not work without either naming all the other columns explicitely in
the insert (impossible, they are always different) or establishing
some trigger functions...


Different example, same general problem: Imagine I have such a tree
of normalized tables, and I want to grab a part of it and roll only
that part back in time, to some state it was X weeks ago.
(That's possible if we store records of changes in an extra column
within the rows themselves. It might even be possible with some
postgres-internal data, which has a knowledge of history - but there
seems no API access to that.)

But I want to have this working on click-button, independent
of the table layouts, only with a basic graph of the dependency tree
and with an unlimited number of involved tables, i.e.: maintainable.


I have searched for solutions (or building blocks to solutions) for
these and similar problems, and didn't find much. I don't understand
that - SQL is old, normalization is old, (even postgres has already
become quite old) and these are genuine issues.
It can't be I'm the only one thinking about such things.

So maybe I'm looking at the wrong place? Any ideas welcome.

Cheers,
PMc



Re: Tools for moving normalized data around

From
Jeremy Smith
Date:





How to do this with two tables connected via one foreign key, that is
explained a dozen times in Stackoverflow. But what if the tree is 50
tables and 120 foreign key columns?
It can be done. But probably not manually.

So, as this seems a very usual use-case for normalized data, is there
any tooling available? Anywhere? (I searched, I didn't find.)


I haven't used it, but this seems like the problem that Jailer is trying to solve: https://github.com/Wisser/Jailer
 

Re: Tools for moving normalized data around

From
Gavan Schneider
Date:
On 19 Jan 2023, at 6:47, Peter wrote:

> Now I want to grab some part of the data, on a certain condition
> (let's say all records belonging to user 'Bob', if there is a "user"
> table somewhere at the tree-bottom), and move it to another database
> with the very same layout - which is already populated with data
> and runs a different instance of the same application.
>
> Grabbing the intended records is just some dead-simple Selects. But
> then inserting them into the other database is not fun, because some
> primary keys will likely collide.
>
A very small sliver in this problem:
The key collision problem could be avoided if the unique and arbitrary keys were UUID
Many of the other keys should be related to their respective “table of truth” so the migration is dependant on these
beingthe same across the locations 

The rest TBA down thread :)

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem —
neat,plausible, and wrong. 
— H. L. Mencken, 1920



Re: Tools for moving normalized data around

From
Dominique Devienne
Date:
On Wed, Jan 18, 2023 at 9:03 PM Jeremy Smith <jeremy@musicsmith.net> wrote:
>> How to do this with two tables connected via one foreign key, that is
>> explained a dozen times in Stackoverflow. But what if the tree is 50
>> tables and 120 foreign key columns?
>> It can be done. But probably not manually.
>>
>> So, as this seems a very usual use-case for normalized data, is there
>> any tooling available? Anywhere? (I searched, I didn't find.)
>
> I haven't used it, but this seems like the problem that Jailer is trying to solve: https://github.com/Wisser/Jailer

Seems to dothe subsetting alright, but it doesn't mention the
conflict-resolution for inserting into an already populated target
schema.

We've done something similar, merging many same-schema DBs into a
single local SQLite DB (with conflicts, thus no constraints),
iteratively resolving the conflicts (SK/PK and NK) per-table in
topological order, propagating PK changes to FKs in child tables
later.
Then load the result into a full-constrained PostgreSQL DB. Clever'er
minds can probably do it all in SQL, but we did it with a combination
of imperative code and SQL. Not exactly your use case Peter, but close
enough I think. I don't think a tool could have done what we did,
it's too ad-hoc and specific to our use case. Took a while, and
required lots of testing (unit tests, and functional QA tests). FWIW.
--DD



Re: Tools for moving normalized data around

From
Rob Sargent
Date:
On 1/18/23 13:15, Gavan Schneider wrote:
On 19 Jan 2023, at 6:47, Peter wrote:

Now I want to grab some part of the data, on a certain condition
(let's say all records belonging to user 'Bob', if there is a "user"
table somewhere at the tree-bottom), and move it to another database
with the very same layout - which is already populated with data
and runs a different instance of the same application.

Grabbing the intended records is just some dead-simple Selects. But
then inserting them into the other database is not fun, because some
primary keys will likely collide.

A very small sliver in this problem:
The key collision problem could be avoided if the unique and arbitrary keys were UUID
Many of the other keys should be related to their respective “table of truth” so the migration is dependant on these being the same across the locations

The rest TBA down thread :)

Regards

Gavan Schneider

You'll have to "know" the tree and order inserts accordingly.  Starting with the independent tables, then their dependants, then their dependants etc.
And if the ids are currently integers, any chance you can negate the in transit?  You'll have to disable any sequence that's in play.