Re: Creating composite keys from csv - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Creating composite keys from csv
Date
Msg-id 54FDA1EB.5040200@aklaver.com
Whole thread Raw
In response to Creating composite keys from csv  (Eli Murray <ejmurra2@illinimedia.com>)
Responses Re: Creating composite keys from csv  (Eli Murray <ejmurra2@illinimedia.com>)
List pgsql-general
On 03/08/2015 08:49 PM, Eli Murray wrote:
> Hi all,
>
> I'm a student journalist working on a project for our student paper
> which lists salaries and positions for every staff member at the
> university. We received the data from an FOI request but the university
> is refusing to give us primary keys for the data.
>
> The issue we've run into is that if there are two staff members with the
> same name (and there are) our current web app adds their salaries
> together and considers them one person. Now, luckily, we can create a
> composite key if we combine their name column with their salary column.
> Unfortunately, the format of the data we have makes it more difficult
> than that (of course!) because some employees can hold multiple paying
> positions.
>
> Here's some example data:
>
> Name, Position, Salary,Total Salary, ...
> Jane Doe, Dean, 100.000, 148.000, ...
> John Locke, Custodian, 30.000, 30.000, ...
> Jane Doe, Academic Adviser, 48.000, 148.000, ...
> Jane Doe, Trainer, 46.000, 46.000, ...
>
> Basically, what we'd like to do is create a serial primary key but
> instead of having it increment every row, it needs to check the name and
> total salary columns and only increment if that person doesn't already
> exist. If they do exist, it should just assign the previously created
> number to the column.

Well the above is not going to work, because the id would not be unique
across rows and therefore could not be a primary key. If I am following
what you want is a staff id that identifies a particular staff member
across rows and is derived from the (Name, Total Salary) combination, is
that correct? If so you could use a serial column to generate a
surrogate primary key for each row without worrying about the names and
total salary. Then it becomes an issue of generating the staff id for
unique staff members. For that I would see John McKowns answer.

  However, our team is small and between us we have
> very little experience working with databases and we haven't found a way
> to accomplish this goal yet. In fact, we may be trying to solve this in
> the wrong way entirely.
>
> So, to put it succinctly, how would you approach this problem? What are
> our options? Do we need to write a script to clean the data into
> separate csv tables before we import it to postgres, or is this
> something we can do in postgres? We'd really appreciate any help you all
> may be able to offer.
>
> Best!
> Eli Murray
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: pg_conndefaults Returning empty string
Next
From: pinker
Date:
Subject: Re: VACUUM FULL doesn't reduce table size