Thread: ...
Every day I get a plain-vanilla ASCII-file containg space-separated values. I parse the file with a script and make each row in the file an INSERT-statement. Sometimes one of these many statements contains rubbish becaused the line in the file I was processing contained rubbish. If I did a separate transaction for every INSERT-statement this doesn't hurt because only one statement (the broken one) doesn't get processed. But I have to do about 100.000 INSERTs everyday and this is _slow_ if I let PostgreSQL process each statement individually. So my idea was to use a chained transaction. This works perfectly as long as all INSERT-statement are okay. But if one is broken, PostgreSQL doesn't process the other statements, too. My goal is to use chained transaction and to don't have to bother about broken INSERT-statements. Is that possible or to do I have to check the statements manually (by the script) before executing? -- `Unser Kopf ist rund, damit das Denken die Richtung wechseln kann.' ~ Francis Picabia
I would either validate the data in the script, or batch the records into bunches of 100 or 1000 or whatever inserts. If one of the batches fails, then reprocess only that batch using individual inserts. I believe using transactions of moderate size will be faster than using really really huge ones. There are of course also transactional semantics issues if you have concurrent access going on. -Z-
>>>>> "MN" == Martin Neumann <lists@mne.de> writes: MN> Every day I get a plain-vanilla ASCII-file containg space-separatedMN> values. I parse the file with a script and makeeach row in the file anMN> INSERT-statement. Sometimes one of these many statements containsMN> rubbish becaused theline in the file I was processing containedMN> rubbish. MN> If I did a separate transaction for every INSERT-statement this doesn'tMN> hurt because only one statement (the brokenone) doesn't get processed. MN> But I have to do about 100.000 INSERTs everyday and this is _slow_ if IMN> let PostgreSQL process each statement individually. MN> So my idea was to use a chained transaction. This works perfectly asMN> long as all INSERT-statement are okay. But ifone is broken, PostgreSQLMN> doesn't process the other statements, too. MN> My goal is to use chained transaction and to don't have to botherMN> about broken INSERT-statements. Is that possibleor to do I have toMN> check the statements manually (by the script) before executing? I think your second way is better - check statement in script before executing. -- Anatoly K. Lasareff Email: tolik@icomm.ru
martin: couple o things: use something like sed to pre process the file taking out the erronous data, if you know what the bad data is one thing i use sed for is html escape single quotes and the like, it should work well for you the regex for substituting like that is: s/old/new/g the sed command operates regexes with an -e switch : sed -e s/error/correction/g errorfile>correctedfile takes errorfile in and changes any occurence of 'error' to 'correction' and prints all data to correctedfile also if its a flat file you could just use the copy command in psql instead of inserts: \copy tablein from filename