Thread: Role for CSV import
I have a database that will be populated solely by CSV import. There are several CSV file formats, all denormalized. I have created interim tables which match the CSV file formats. An insert trigger distributes the data to their appropriate destination tables. The destination tables themselves have insert/update triggers for automated data clean-up. Any unresolvable inconsistencies are reported in a log table. I don't want the triggers to fire for every insert/update. There might be situations where I have to perform some data clean-up manually. So, my idea is to create a role for import, query current_user in the trigger, perform the trigger actions for importuser and just return the row unadulterated for adminuser. I would give privileges to the importuser for the tables being explicitly and implicitly populated. Is that the best way to organize this? ===== setup: PostgreSQL 8.4 dbname = schema name = admin name -- Best Regards, Tarlika Elisabeth Schmitz
Tarlika Elisabeth Schmitz wrote: > I have a database that will be populated solely by CSV import. > There are several CSV file formats, all denormalized. > > I have created interim tables which match the CSV file formats. An > insert trigger distributes the data to their appropriate destination > tables. The destination tables themselves have insert/update triggers > for automated data clean-up. Any unresolvable inconsistencies are > reported in a log table. > > I don't want the triggers to fire for every insert/update. There might > be situations where I have to perform some data clean-up manually. > > So, my idea is to create a role for import, query current_user in the > trigger, perform the trigger actions for importuser and just return the > row unadulterated for adminuser. > > I would give privileges to the importuser for the tables being > explicitly and implicitly populated. > > Is that the best way to organize this? > > > ===== > setup: PostgreSQL 8.4 > dbname = schema name = admin name > > > You seem to be writing denormalized import records for the sole purpose of writing other normalized records. Have you you looked into writing a programme in a relatively high-level jdbc-friendly language which reads the csv file, normalizes the data (the code already in your triggers) and flushes on every say 1000 independent records? The "clean-up" and logging might also be done by the import app (all depending on what's being cleaned up and logged :) )
2011/5/15 Rob Sargent <robjsargent@gmail.com>: > > > Tarlika Elisabeth Schmitz wrote: >> >> I have a database that will be populated solely by CSV import. >> There are several CSV file formats, all denormalized. >> >> I have created interim tables which match the CSV file formats. An >> insert trigger distributes the data to their appropriate destination >> tables. The destination tables themselves have insert/update triggers >> for automated data clean-up. Any unresolvable inconsistencies are >> reported in a log table. >> >> I don't want the triggers to fire for every insert/update. There might >> be situations where I have to perform some data clean-up manually. >> >> So, my idea is to create a role for import, query current_user in the >> trigger, perform the trigger actions for importuser and just return the >> row unadulterated for adminuser. >> >> I would give privileges to the importuser for the tables being >> explicitly and implicitly populated. >> >> Is that the best way to organize this? >> >> >> ===== >> setup: PostgreSQL 8.4 >> dbname = schema name = admin name >> >> >> > > You seem to be writing denormalized import records for the sole purpose of > writing other normalized records. Have you you looked into writing a > programme in a relatively high-level jdbc-friendly language which reads the > csv file, normalizes the data (the code already in your triggers) and > flushes on every say 1000 independent records? The "clean-up" and logging > might also be done by the import app (all depending on what's being cleaned > up and logged :) ) pgloader may be useful: http://pgloader.projects.postgresql.org > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
On Sun, 15 May 2011 11:56:38 -0600 Rob Sargent <robjsargent@gmail.com> wrote: >Tarlika Elisabeth Schmitz wrote: >> I have a database that will be populated solely by CSV import. >> There are several CSV file formats, all denormalized. >> >> I have created interim tables which match the CSV file formats. An >> insert trigger distributes the data to their appropriate destination >> tables. The destination tables themselves have insert/update triggers >> for automated data clean-up. Any unresolvable inconsistencies are >> reported in a log table. >> >You seem to be writing denormalized import records for the sole >purpose of writing other normalized records. The data come in denormalized form. I have no influence on the incoming format. The DB will be populated first with historical data, then by a constant data stream. >Have you you looked into >writing a programme in a relatively high-level jdbc-friendly language >which reads the csv file, normalizes the data (the code already in >your triggers) and flushes on every say 1000 independent records? The The historical data import is a one-off and import performance practically irrelevant. Even the performance of the daily import is irrelevant because the total volume will only be 1000 records when no-one is accessing the DB anyway. I know it's not just a performance question but the ER model is pretty simple with only a dozen active tables. >"clean-up" and logging might also be done by the import app (all >depending on what's being cleaned up and logged The same entities keep on cropping up and they might get modified over time: 1234 J Smith might become 1234 John Smith; or a FK reference might change because two entities were consolidated. Those amendments will have to be applied. The only hairy bit is one piece of data that comes in textual form, from which crucial information has to be extracted, converted to integer and boolean columns, hopefully picking up misspellings etc. -- Best Regards, Tarlika Elisabeth Schmitz
On Sun, 15 May 2011 20:05:04 +0200 Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: >2011/5/15 Rob Sargent <robjsargent@gmail.com>: >> >> >> Tarlika Elisabeth Schmitz wrote: >>> >>> I have a database that will be populated solely by CSV import. >>> There are several CSV file formats, all denormalized.[...] > >pgloader may be useful: >http://pgloader.projects.postgresql.org This looks very promising!