Thread: Tools for moving normalized data around
Hi, imagine I have a database containing normalized data: a whole bunch of tables all related via foreign keys (i.e. the thing one should usually have ;) ). So there is a dependency graph: all records relate to others in some tree-like fashion (has-many, belongs-to, etc.) Now I want to grab some part of the data, on a certain condition (let's say all records belonging to user 'Bob', if there is a "user" table somewhere at the tree-bottom), and move it to another database with the very same layout - which is already populated with data and runs a different instance of the same application. Grabbing the intended records is just some dead-simple Selects. But then inserting them into the other database is not fun, because some primary keys will likely collide. And if we start to update the primary keys to new unique values, we must consequently update ALL the foreign keys throughout the entire tree of tables. How to do this with two tables connected via one foreign key, that is explained a dozen times in Stackoverflow. But what if the tree is 50 tables and 120 foreign key columns? It can be done. But probably not manually. So, as this seems a very usual use-case for normalized data, is there any tooling available? Anywhere? (I searched, I didn't find.) Also, it seems the provided commands are not very supporting. Try to do an Insert and *always* create a new PK from the sequence, and return the old and the new PK for the inserted row. It seems this does not work without either naming all the other columns explicitely in the insert (impossible, they are always different) or establishing some trigger functions... Different example, same general problem: Imagine I have such a tree of normalized tables, and I want to grab a part of it and roll only that part back in time, to some state it was X weeks ago. (That's possible if we store records of changes in an extra column within the rows themselves. It might even be possible with some postgres-internal data, which has a knowledge of history - but there seems no API access to that.) But I want to have this working on click-button, independent of the table layouts, only with a basic graph of the dependency tree and with an unlimited number of involved tables, i.e.: maintainable. I have searched for solutions (or building blocks to solutions) for these and similar problems, and didn't find much. I don't understand that - SQL is old, normalization is old, (even postgres has already become quite old) and these are genuine issues. It can't be I'm the only one thinking about such things. So maybe I'm looking at the wrong place? Any ideas welcome. Cheers, PMc
How to do this with two tables connected via one foreign key, that is
explained a dozen times in Stackoverflow. But what if the tree is 50
tables and 120 foreign key columns?
It can be done. But probably not manually.
So, as this seems a very usual use-case for normalized data, is there
any tooling available? Anywhere? (I searched, I didn't find.)
I haven't used it, but this seems like the problem that Jailer is trying to solve: https://github.com/Wisser/Jailer
On 19 Jan 2023, at 6:47, Peter wrote: > Now I want to grab some part of the data, on a certain condition > (let's say all records belonging to user 'Bob', if there is a "user" > table somewhere at the tree-bottom), and move it to another database > with the very same layout - which is already populated with data > and runs a different instance of the same application. > > Grabbing the intended records is just some dead-simple Selects. But > then inserting them into the other database is not fun, because some > primary keys will likely collide. > A very small sliver in this problem: The key collision problem could be avoided if the unique and arbitrary keys were UUID Many of the other keys should be related to their respective “table of truth” so the migration is dependant on these beingthe same across the locations The rest TBA down thread :) Regards Gavan Schneider —— Gavan Schneider, Sodwalls, NSW, Australia Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat,plausible, and wrong. — H. L. Mencken, 1920
On Wed, Jan 18, 2023 at 9:03 PM Jeremy Smith <jeremy@musicsmith.net> wrote: >> How to do this with two tables connected via one foreign key, that is >> explained a dozen times in Stackoverflow. But what if the tree is 50 >> tables and 120 foreign key columns? >> It can be done. But probably not manually. >> >> So, as this seems a very usual use-case for normalized data, is there >> any tooling available? Anywhere? (I searched, I didn't find.) > > I haven't used it, but this seems like the problem that Jailer is trying to solve: https://github.com/Wisser/Jailer Seems to dothe subsetting alright, but it doesn't mention the conflict-resolution for inserting into an already populated target schema. We've done something similar, merging many same-schema DBs into a single local SQLite DB (with conflicts, thus no constraints), iteratively resolving the conflicts (SK/PK and NK) per-table in topological order, propagating PK changes to FKs in child tables later. Then load the result into a full-constrained PostgreSQL DB. Clever'er minds can probably do it all in SQL, but we did it with a combination of imperative code and SQL. Not exactly your use case Peter, but close enough I think. I don't think a tool could have done what we did, it's too ad-hoc and specific to our use case. Took a while, and required lots of testing (unit tests, and functional QA tests). FWIW. --DD
On 1/18/23 13:15, Gavan Schneider wrote:
On 19 Jan 2023, at 6:47, Peter wrote:Now I want to grab some part of the data, on a certain condition (let's say all records belonging to user 'Bob', if there is a "user" table somewhere at the tree-bottom), and move it to another database with the very same layout - which is already populated with data and runs a different instance of the same application. Grabbing the intended records is just some dead-simple Selects. But then inserting them into the other database is not fun, because some primary keys will likely collide.A very small sliver in this problem: The key collision problem could be avoided if the unique and arbitrary keys were UUID Many of the other keys should be related to their respective “table of truth” so the migration is dependant on these being the same across the locations The rest TBA down thread :) Regards Gavan Schneider
You'll have to "know" the tree and order inserts accordingly. Starting with the independent tables, then their dependants, then their dependants etc.
And if the ids are currently integers, any chance you can negate the in transit? You'll have to disable any sequence that's in play.