pqxx accumulating a transaction - Mailing list pgsql-general
From | Jeff Abrahamson |
---|---|
Subject | pqxx accumulating a transaction |
Date | |
Msg-id | 104f5ce7-823d-69dd-df32-fbbc1aace857@p27.eu Whole thread Raw |
List | pgsql-general |
I am accumulating a transaction using libpqxx and suspect I'm doing something overly tortuous to get where I'm going.
I don't mean to make this question longer than it needs to be, but I think context will help. So I'll start by explaining briefly how I got to where I am. And then I'll show where I am, which I'm pretty sure is abusive and not how libpqxx means for me to do things.
Intro: the naive approach to doing thousands of inserts
I have some code that wants to write a bunch of stuff to the database, typically INSERT ... ON CONFLICT UPDATE
. Roughly, this code is looping over some container and generating the needed SQL to insert each container object that needs inserting.
The naive way to do this (skipping error/exception handling for the moment) is thus:
pqxx::work txn(conn);
ostringstream sql_stmt;
Container rows;
for (const auto& row : rows) {
if (row.IsDirty()) {
RowToSQL(sql_stmt, row, txn);
txn.exec(sql_stmt.str());
// Clear sql_stmt here.
}
}
txn.commit();
The function RowToSQL() takes a transaction object so that it can quote strings appropriately using txn.quote().
This is inefficient, however: calling exec() over and over turns out to be quite slow.
So instead I build up a bunch of statements in the ostringstream, thus:
pqxx::work txn(conn);
ostringstream sql_stmt;
Container rows;
for (const auto& row : rows) {
if (row.IsDirty()) {
RowToSQL(sql_stmt, row, txn);
if (++counter > kExecuteThreshold || sql_stmt.IsGettingRatherBig()) {
txn.exec(sql_stmt.str());
// Clear sql_stmt, reset counter here.
}
if (count > kCommitThreshold) {
txn.commit();
// Reset statement counter here.
}
}
}
// Final commit here.
I chose the two thresholds based on performance testing in our environment. Think 100 and 10,000 for order of magnitude.
This worked until it didn't, because re-using the transaction this way leads to conflicts.
Attempt to activate transaction<READ COMMITTED> which is already closed.
I mostly understand all this.
This leads me to write something that has weird code smell, and I so suspect I've misunderstood how postgresql / libpqxx intend to be used.
Adding back error/exception handling, I've now written this based on lots of doc reading:
Less naive but weird code smell
// Has weird code smell.
pqxx::nontransaction non_txn;
ostringstream sql_stmt;
Container rows;
Vector<string> transactions;
for (const auto& row : rows) {
if (row.IsDirty()) {
RowToSQL(sql_stmt, row, non_txn);
if (++counter > kExecuteThreshold || sql_stmt.IsGettingRatherBig()) {
transactions.push_back(sql_stmt.str());
// Clear sql_stmt, reset counter here.
}
if (count > kCommitThreshold) {
try {
pqxx::work txn(conn);
for (const string& trans : transactions) {
txn.exec(trans);
}
txn.commit();
transactions.clear();
// Reset statement counter here.
} catch (const exception& e) {
txn.abort(); // Pedantic, happens on destruction.
YellowAlert(); // Something appropriate.
}
}
}
}
// Final commit here.
It seems quite wrong to me that I should build up this vector of things to execute and maintain these custom-tuned execution/transaction thresholds rather than using some facility of libpqxx. That is, this pattern seems common enough to me that the complexity I'm starting to see strikes me as my own misunderstanding.
Any pointers much appreciated.
-- Jeff Abrahamson +33 6 24 40 01 57 +44 7920 594 255 http://p27.eu/jeff/ http://mobilitains.fr/
pgsql-general by date: