On Sun, 15 May 2011 11:56:38 -0600
Rob Sargent <robjsargent@gmail.com> wrote:
>Tarlika Elisabeth Schmitz wrote:
>> I have a database that will be populated solely by CSV import.
>> There are several CSV file formats, all denormalized.
>>
>> I have created interim tables which match the CSV file formats. An
>> insert trigger distributes the data to their appropriate destination
>> tables. The destination tables themselves have insert/update triggers
>> for automated data clean-up. Any unresolvable inconsistencies are
>> reported in a log table.
>>
>You seem to be writing denormalized import records for the sole
>purpose of writing other normalized records.
The data come in denormalized form. I have no influence on the incoming
format. The DB will be populated first with historical data, then by a
constant data stream.
>Have you you looked into
>writing a programme in a relatively high-level jdbc-friendly language
>which reads the csv file, normalizes the data (the code already in
>your triggers) and flushes on every say 1000 independent records? The
The historical data import is a one-off and import performance
practically irrelevant. Even the performance of the daily import is
irrelevant because the total volume will only be 1000 records when
no-one is accessing the DB anyway.
I know it's not just a performance question but the ER model is pretty
simple with only a dozen active tables.
>"clean-up" and logging might also be done by the import app (all
>depending on what's being cleaned up and logged
The same entities keep on cropping up and they
might get modified over time: 1234 J Smith might become 1234 John
Smith; or a FK reference might change because two entities were
consolidated. Those amendments will have to be applied.
The only hairy bit is one piece of data that comes in textual form,
from which crucial information has to be extracted, converted to
integer and boolean columns, hopefully picking up misspellings etc.
--
Best Regards,
Tarlika Elisabeth Schmitz