Thread: ...

...

From
Martin Neumann
Date:
Every day I get a plain-vanilla ASCII-file containg space-separated
values. I parse the file with a script and make each row in the file an
INSERT-statement. Sometimes one of these many statements contains
rubbish becaused the line in the file I was processing contained
rubbish.

If I did a separate transaction for every INSERT-statement this doesn't
hurt because only one statement (the broken one) doesn't get processed.

But I have to do about 100.000 INSERTs everyday and this is _slow_ if I
let PostgreSQL process each statement individually.

So my idea was to use a chained transaction. This works perfectly as
long as all INSERT-statement are okay. But if one is broken, PostgreSQL
doesn't process the other statements, too.

My goal is to use chained transaction and to don't have to bother
about broken INSERT-statements. Is that possible or to do I have to
check the statements manually (by the script) before executing?

-- 
`Unser Kopf ist rund, damit das Denken die Richtung wechseln kann.' ~ Francis Picabia


Re: your mail

From
Zalman Stern
Date:
I would either validate the data in the script, or batch the records into
bunches of 100 or 1000 or whatever inserts. If one of the batches fails,
then reprocess only that batch using individual inserts.

I believe using transactions of moderate size will be faster than using
really really huge ones. There are of course also transactional semantics
issues if you have concurrent access going on.

-Z-


Re: none

From
tolik@aaanet.ru (Anatoly K. Lasareff)
Date:
>>>>> "MN" == Martin Neumann <lists@mne.de> writes:
MN> Every day I get a plain-vanilla ASCII-file containg space-separatedMN> values. I parse the file with a script and
makeeach row in the file anMN> INSERT-statement. Sometimes one of these many statements containsMN> rubbish becaused
theline in the file I was processing containedMN> rubbish.
 
MN> If I did a separate transaction for every INSERT-statement this doesn'tMN> hurt because only one statement (the
brokenone) doesn't get processed.
 
MN> But I have to do about 100.000 INSERTs everyday and this is _slow_ if IMN> let PostgreSQL process each statement
individually.
MN> So my idea was to use a chained transaction. This works perfectly asMN> long as all INSERT-statement are okay. But
ifone is broken, PostgreSQLMN> doesn't process the other statements, too.
 
MN> My goal is to use chained transaction and to don't have to botherMN> about broken INSERT-statements. Is that
possibleor to do I have toMN> check the statements manually (by the script) before executing?
 

I think your second way is better - check statement in script
before executing.

-- 
Anatoly K. Lasareff              Email:       tolik@icomm.ru 


Re:

From
Clayton Cottingham
Date:
martin:
couple o things:

use something like sed to pre process the file taking out the erronous
data, if you know what the bad data is

one thing i use sed for is html escape single quotes and the like,

it should work well for you

the regex for substituting like  that is:
s/old/new/g

the sed command operates regexes with an -e switch :
sed -e s/error/correction/g errorfile>correctedfile
takes errorfile in and changes any occurence of 'error' to 'correction'
and
prints all data to correctedfile


also if its a flat file you could just use the copy command in psql
instead of inserts:

\copy tablein from filename