On Wed, 22 May 2019, Adrian Klaver wrote:
> A sample of the data you are cleaning up.
Adrian, et al.:
I have it working properly now. Both org_id and person_id numbers are
prepended to each row in the appropriate table and they are unique because
each series begins one greater than the max(*_id) in each table.
> I think what people are trying to wrap there head around is how 800 lines
> in the file is being split into two subsets: the organization data and the
> people data. In particular how that is being done to preserve the
> relationship between organizations and people? This is before it ever gets
> to the database.
After cleaning there are 655 lines rather than 800, but the short answer is
that spliting the data preserves the relationship between organization and
its people:
#!/usr/bin/gawk
# Read input file, write fields to both organizations and people
# input files.
BEGIN { FS=OFS="," }
# for organizations table input:
{ print $1, $2, $4, "," $8, $9, $10, "'US'," $11, ",,," "'Opportunity','');" > "z-orgs.sql" }
# for people table input:
{ print $6, $5, "," $1, $3, $4, $5, $7, $8, $9, $10, "'US'," $11, "," $12, "," $13, "'true','');" > "z-people.sql" }
You can see that the org_id field ($1) is used in both files, but in
different columns in the two tables.
Regards,
Rich