> Also, when I speak of "unique identifier" I'm not speaking of the one if > your FINAL tables, I assume you would have at least the *_id field as > PKEY, so nothing else needed, but the one in your SOURCE data set (it can > be anything, like the row number in the original excel).
Francisco/Jeremy,
I'm grateful for you patient help. The 'unique identifier' in the source file has been provided (just now) using nl <https://ss64.com/bash/nl.html>. The syntax I used is:
nl -b a -n ln -s , -v 339 source.txt > out.txt
because the organizations table has 338 as the maximum org_id number.
I believe this fulfills the need for a known unique ID in the source file, and when I parse each row using gawk to create the two files for table input I can use it in both the organizations table (as the PK) and the people table (as the FK referring to the organizations table). I can let postgres assign the unique ID for the new rows in the people table.
Am I still missing something critical?
Are you saying your database already has an organizations table, and this data file is appending to it with all-brand-new organizations and people?
Say you have 800 people. Are you honestly saying that there are also 800 organizations? If so, I guess your solution would work. However, I thought that it was the case that multiple people belong to the same organization. Hence, the entire problem of mapping we have been discussing.
The data model doesn't make much sense if that's true.
Also, you said previously that some fields from your data file would be populated in *both tables*. That is a fundamental violation of DB design. Why are you duplicating that data in two places?