Thread: Role for CSV import

Role for CSV import

From

Tarlika Elisabeth Schmitz

Date:

15 May 2011, 13:08:25

I have a database that will be populated solely by CSV import.
There are several CSV file formats, all denormalized.

I have created interim tables which match the CSV file formats. An
insert trigger distributes the data to their appropriate destination
tables. The destination tables themselves have insert/update triggers
for automated data clean-up. Any unresolvable inconsistencies are
reported in a log table.

I don't want the triggers to fire for every insert/update. There might
be situations where I have to perform some data clean-up manually.

So, my idea is to create a role for import, query current_user in the
trigger, perform the trigger actions for importuser and just return the
row unadulterated for adminuser.

I would give privileges to the importuser for the tables being
explicitly and implicitly populated.

Is that the best way to organize this?


=====
setup: PostgreSQL 8.4
dbname = schema name = admin name


--

Best Regards,
Tarlika Elisabeth Schmitz

Re: Role for CSV import

From

Rob Sargent

Date:

15 May 2011, 14:56:58

Tarlika Elisabeth Schmitz wrote:
> I have a database that will be populated solely by CSV import.
> There are several CSV file formats, all denormalized.
>
> I have created interim tables which match the CSV file formats. An
> insert trigger distributes the data to their appropriate destination
> tables. The destination tables themselves have insert/update triggers
> for automated data clean-up. Any unresolvable inconsistencies are
> reported in a log table.
>
> I don't want the triggers to fire for every insert/update. There might
> be situations where I have to perform some data clean-up manually.
>
> So, my idea is to create a role for import, query current_user in the
> trigger, perform the trigger actions for importuser and just return the
> row unadulterated for adminuser.
>
> I would give privileges to the importuser for the tables being
> explicitly and implicitly populated.
>
> Is that the best way to organize this?
>
>
> =====
> setup: PostgreSQL 8.4
> dbname = schema name = admin name
>
>
>
You seem to be writing denormalized import records for the sole purpose
of writing other normalized records. Have you you looked into writing a
programme in a relatively high-level jdbc-friendly language which reads
the csv file, normalizes the data (the code already in your triggers)
and flushes on every say 1000 independent records? The "clean-up" and
logging might also be done by the import app (all depending on what's
being cleaned up and logged :) )

Re: Role for CSV import

From

Cédric Villemain

Date:

15 May 2011, 15:05:33

2011/5/15 Rob Sargent <robjsargent@gmail.com>:
>
>
> Tarlika Elisabeth Schmitz wrote:
>>
>> I have a database that will be populated solely by CSV import.
>> There are several CSV file formats, all denormalized.
>>
>> I have created interim tables which match the CSV file formats. An
>> insert trigger distributes the data to their appropriate destination
>> tables. The destination tables themselves have insert/update triggers
>> for automated data clean-up. Any unresolvable inconsistencies are
>> reported in a log table.
>>
>> I don't want the triggers to fire for every insert/update. There might
>> be situations where I have to perform some data clean-up manually.
>>
>> So, my idea is to create a role for import, query current_user in the
>> trigger, perform the trigger actions for importuser and just return the
>> row unadulterated for adminuser.
>>
>> I would give privileges to the importuser for the tables being
>> explicitly and implicitly populated.
>>
>> Is that the best way to organize this?
>>
>>
>> =====
>> setup: PostgreSQL 8.4
>> dbname = schema name = admin name
>>
>>
>>
>
> You seem to be writing denormalized import records for the sole purpose of
> writing other normalized records. Have you you looked into writing a
> programme in a relatively high-level jdbc-friendly language which reads the
> csv file, normalizes the data (the code already in your triggers) and
> flushes on every say 1000 independent records? The "clean-up" and logging
> might also be done by the import app (all depending on what's being cleaned
> up and logged :) )

pgloader may be useful:
http://pgloader.projects.postgresql.org

>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Role for CSV import

From

Tarlika Elisabeth Schmitz

Date:

15 May 2011, 16:08:17

On Sun, 15 May 2011 11:56:38 -0600
Rob Sargent <robjsargent@gmail.com> wrote:

>Tarlika Elisabeth Schmitz wrote:
>> I have a database that will be populated solely by CSV import.
>> There are several CSV file formats, all denormalized.
>>
>> I have created interim tables which match the CSV file formats. An
>> insert trigger distributes the data to their appropriate destination
>> tables. The destination tables themselves have insert/update triggers
>> for automated data clean-up. Any unresolvable inconsistencies are
>> reported in a log table.
>>
>You seem to be writing denormalized import records for the sole
>purpose of writing other normalized records.

The data come in denormalized form. I have no influence on the incoming
format. The DB will be populated first with historical data, then by a
constant data stream.

>Have you you looked into
>writing a programme in a relatively high-level jdbc-friendly language
>which reads the csv file, normalizes the data (the code already in
>your triggers) and flushes on every say 1000 independent records? The

The historical data import is a one-off and import performance
practically irrelevant. Even the performance of the daily import is
irrelevant because the total volume will only be 1000 records when
no-one is accessing the DB anyway.

I know it's not just a performance question but the ER model is pretty
simple with only a dozen active tables.

>"clean-up" and logging might also be done by the import app (all
>depending on what's being cleaned up and logged

The same entities keep on cropping up and they
might get modified over time: 1234 J Smith might become 1234 John
Smith; or a FK reference might change because two entities were
consolidated. Those amendments will have to be applied.

The only hairy bit is one piece of data that comes in textual form,
from which crucial information has to be extracted, converted to
integer and boolean columns, hopefully picking up misspellings etc.

--

Best Regards,
Tarlika Elisabeth Schmitz

Re: Role for CSV import

From

Tarlika Elisabeth Schmitz

Date:

16 May 2011, 05:38:21

On Sun, 15 May 2011 20:05:04 +0200
Cédric Villemain <cedric.villemain.debian@gmail.com> wrote:

>2011/5/15 Rob Sargent <robjsargent@gmail.com>:
>>
>>
>> Tarlika Elisabeth Schmitz wrote:
>>>
>>> I have a database that will be populated solely by CSV import.
>>> There are several CSV file formats, all denormalized.[...]
>
>pgloader may be useful:
>http://pgloader.projects.postgresql.org

This looks very promising!