Thread: OIDs, CTIDs, updateable cursors and friends
Hi all, I'm in the process of writing an OLE DB provider for postgres. I am, right now, at the point where updating an entry becomes an issue. Ideally, I would open an updateable cursor for a table/view, and use that. Unfortunetly, Postgres doesn't seem to support those at all. As an alternative, I was thinking of using some unique field to identify, with certanity, the line in question. This will allow me to use an "update" command to modify said line, in case there is a need with the OLE DB client. My quetion is this - what can I use? I have tried to find some docs regarding OIDs and CTIDs, but the docs seem scarce. If I understand this correctly - OID is meant to identify a row in a table, though it is not guarenteed to be unique across tables, or even inside a given table. A CTID is meant to identify the physical location at which a row is stored. Are these correct? Would adding "OID" to the rows returned by each "Select" call, and then doing "update blah where oid=xxx" when I'm requested to update the row sound like a reasonable stategy, in lieu of updateable cursors? Can anyone suggest a better way? Shachar -- Shachar Shemesh Lingnu Open Systems Consulting http://www.lingnu.com/
> -----Original Message----- > From: Shachar Shemesh [mailto:psql@shemesh.biz] > Sent: 18 February 2004 13:18 > To: Hackers; PostgreSQL OLE DB development > Subject: [HACKERS] OIDs, CTIDs, updateable cursors and friends > > Would adding "OID" to the rows returned by each "Select" > call, and then doing "update blah where oid=xxx" when I'm > requested to update the row sound like a reasonable stategy, > in lieu of updateable cursors? Can anyone suggest a better way? > Ignoring potential OID wraparound problems (which we do in pgAdmin) this should work, assuming there is an OID column. I would suggest trying the following methods in sequence: 1) Use the tables primary key. 2) Use the OID (and check that only one record will be affected). 3) Build a where clause based on all known original values (and check that only one record will be affected). 4) Fail with an appropriate error. 2 & 3 can potentially affect more than one record, but even Microsoft code runs into that problem from time to time and fails with an appropriate error message. In pgAdmin II we used to ask the user if they wanted to update all matching rows, but of course that is not appropriate in a driver. Regards, Dave.
Dave Page wrote: > > > > >>-----Original Message----- >>From: Shachar Shemesh [mailto:psql@shemesh.biz] >>Sent: 18 February 2004 13:18 >>To: Hackers; PostgreSQL OLE DB development >>Subject: [HACKERS] OIDs, CTIDs, updateable cursors and friends >> >>Would adding "OID" to the rows returned by each "Select" >>call, and then doing "update blah where oid=xxx" when I'm >>requested to update the row sound like a reasonable stategy, >>in lieu of updateable cursors? Can anyone suggest a better way? >> >> >> > >Ignoring potential OID wraparound problems (which we do in pgAdmin) this >should work, assuming there is an OID column. I would suggest trying the >following methods in sequence: > >1) Use the tables primary key. > > I would, except I'm not sure how many queries I would need in order to find what the primary key is. Also, what happens if the primary key is not a part of the fields returned by the query? >2) Use the OID (and check that only one record will be affected). > > That may work. Do a query for "how many would be affected". Then again, I'm currently not inside a transaction. The plan was not to be inside a transaction unless I needed to. I'm not sure how safe this is to perform many queries. >3) Build a where clause based on all known original values (and check >that only one record will be affected). > > Again - what happens when I'm not inside a transaction? >4) Fail with an appropriate error. > >2 & 3 can potentially affect more than one record, but even Microsoft >code runs into that problem from time to time and fails with an >appropriate error message. In pgAdmin II we used to ask the user if they >wanted to update all matching rows, but of course that is not >appropriate in a driver. > >Regards, Dave. > > The doc mentions something about making the OID column unique. Would that not cause other problems? What happens if I define the OID field as unique, and I get a wraparound and an attempt to put a new field in with existing value? Would the OID skip to the next unique per table, or would the insert fail? Shachar -- Shachar Shemesh Lingnu Open Systems Consulting http://www.lingnu.com/
> -----Original Message----- > From: Shachar Shemesh [mailto:psql@shemesh.biz] > Sent: 18 February 2004 14:10 > To: Dave Page > Cc: Hackers; PostgreSQL OLE DB development > Subject: Re: [HACKERS] OIDs, CTIDs, updateable cursors and friends > > > I would, except I'm not sure how many queries I would need in > order to find what the primary key is. Well this is the only safe way to update a specific record. To find the pkey, look for an index on the table in pg_index with indisprimary = true. The indkey column holds an array of pg_attribute.attnum's that are in the index iirc. > Also, what happens if > the primary key is not a part of the fields returned by the query? Add them as you proprosed to do with the OID, or fall back to the next method. ADO etc. normally fail to update rows if the programmer hasn't included a suitable key in the recordset. > >2) Use the OID (and check that only one record will be affected). > > > > > That may work. Do a query for "how many would be affected". > Then again, I'm currently not inside a transaction. The plan > was not to be inside a transaction unless I needed to. I'm > not sure how safe this is to perform many queries. Should be perfectly safe. > >3) Build a where clause based on all known original values > (and check > >that only one record will be affected). > > > > > Again - what happens when I'm not inside a transaction? You might find a new row that wasn;t there before but is now, or vice versa. > The doc mentions something about making the OID column > unique. Would that not cause other problems? What happens if > I define the OID field as unique, and I get a wraparound and > an attempt to put a new field in with existing value? Would > the OID skip to the next unique per table, or would the insert fail? It is not the drivers place to mess with peoples schemas, but yes, it could cause an insert to fail following wraparound. Regards, Dave.
Dave Page wrote: > > > > >>-----Original Message----- >>From: Shachar Shemesh [mailto:psql@shemesh.biz] >>Sent: 18 February 2004 14:10 >>To: Dave Page >>Cc: Hackers; PostgreSQL OLE DB development >>Subject: Re: [HACKERS] OIDs, CTIDs, updateable cursors and friends >> >> >>I would, except I'm not sure how many queries I would need in >>order to find what the primary key is. >> >> > >Well this is the only safe way to update a specific record. To find the >pkey, look for an index on the table in pg_index with indisprimary = >true. The indkey column holds an array of pg_attribute.attnum's that are >in the index iirc. > > > I'll have a look at that. How would updateable cursors do it? By locking the row? >>Also, what happens if >>the primary key is not a part of the fields returned by the query? >> >> > >Add them as you proprosed to do with the OID, or fall back to the next >method. ADO etc. normally fail to update rows if the programmer hasn't >included a suitable key in the recordset. > > So, basically, I would not be able to update a table that has no primary key? > > >>>2) Use the OID (and check that only one record will be affected). >>> >>> >>> >>> >>That may work. Do a query for "how many would be affected". >>Then again, I'm currently not inside a transaction. The plan >>was not to be inside a transaction unless I needed to. I'm >>not sure how safe this is to perform many queries. >> >> > >Should be perfectly safe. > > > What happens if I check how many would be updated, and get "1" as a result. I then actually do it, but between asking and performing, someone added a second row that matches the criteria? >>>3) Build a where clause based on all known original values >>> >>> >>(and check >> >> >>>that only one record will be affected). >>> >>> >>> >>> >>Again - what happens when I'm not inside a transaction? >> >> > >You might find a new row that wasn;t there before but is now, or vice >versa. > > > But what if someone else changes some of the known values of my row? >>The doc mentions something about making the OID column >>unique. Would that not cause other problems? What happens if >>I define the OID field as unique, and I get a wraparound and >>an attempt to put a new field in with existing value? Would >>the OID skip to the next unique per table, or would the insert fail? >> >> > >It is not the drivers place to mess with peoples schemas, but yes, it >could cause an insert to fail following wraparound. > > > Then it's not a good enough solution, even if the driver did have the prorogative to change the table. >Regards, Dave. > > Ok, it seems to me there are several options here. 1. Find out which is the primary key for the table. What happens if the primary key is a multi-row thing? What happens if there is no primary key? 2. If I'm in a transaction, use OID for the insert after checking with a select that I'm only affecting one row. If I'm not in a transaction - perform the update in a generated transaction, and roll it back if there is more than one row affected. I like 1 better, frankly. Dillemas dillemas dillemas. -- Shachar Shemesh Lingnu Open Systems Consulting http://www.lingnu.com/
> -----Original Message----- > From: Shachar Shemesh [mailto:psql@shemesh.biz] > Sent: 18 February 2004 14:56 > To: Dave Page > Cc: Hackers; PostgreSQL OLE DB development > Subject: Re: [HACKERS] OIDs, CTIDs, updateable cursors and friends > > I'll have a look at that. How would updateable cursors do it? > By locking the row? Dunno, we don't have them! > So, basically, I would not be able to update a table that has > no primary key? Yes, unless you feel back to the value matching type update. Realistically though, how can anyone expect to edit data successfully unless they have defined a key to identify rows with? Whilst it's nice to get it to work 100% of the time no matter how brain dead the schema it's not that practical. > What happens if I check how many would be updated, and get > "1" as a result. I then actually do it, but between asking > and performing, someone added a second row that matches the criteria? It'll update both rows unless you do it in one transaction. > But what if someone else changes some of the known values of my row? The update will fail to find any rows. This is almost certainly what happens when MS Access starts pinting #DELETED# in rows of a linked table. > Ok, it seems to me there are several options here. > 1. Find out which is the primary key for the table. What > happens if the primary key is a multi-row thing? What happens > if there is no primary key? I guess you mean multicolumn? No different, you just need all columns in your WHERE clause. If there is no pkey (and I would be inclined to say if there is none in the user's query and not try to add it yourself) then you fail with an error. > 2. If I'm in a transaction, use OID for the insert after > checking with a select that I'm only affecting one row. If > I'm not in a transaction - perform the update in a generated > transaction, and roll it back if there is more than one row affected. > > I like 1 better, frankly. Dillemas dillemas dillemas. 1 is definitely better and is the only way that is guaranteed to be safe. Thinking about it more, that is almost certainly the position a driver should take. In pgAdmin we can afford a little artistic licence (no pun intended) because no one will be using pgAdmin as a driver to connect another program to a database, plus we can ask the user what action to take if we don't know if the result will be exactly what was intended. You do not have that luxury in a driver of course. Regards, Dave
> > Ok, it seems to me there are several options here. > > 1. Find out which is the primary key for the > table. What > > happens if the primary key is a multi-row thing? > What happens > > if there is no primary key? > > I guess you mean multicolumn? No different, you just > need all columns in > your WHERE clause. If there is no pkey (and I would > be inclined to say > if there is none in the user's query and not try to > add it yourself) > then you fail with an error. > > > 2. If I'm in a transaction, use OID for the insert > after > > checking with a select that I'm only affecting one > row. If > > I'm not in a transaction - perform the update in a > generated > > transaction, and roll it back if there is more > than one row affected. > > > > I like 1 better, frankly. Dillemas dillemas > dillemas. > > 1 is definitely better and is the only way that is > guaranteed to be > safe. Thinking about it more, that is almost > certainly the position a > driver should take. In pgAdmin we can afford a > little artistic licence > (no pun intended) because no one will be using > pgAdmin as a driver to > connect another program to a database, plus we can > ask the user what > action to take if we don't know if the result will > be exactly what was > intended. You do not have that luxury in a driver of > course. Just as another datapoint, pgaccess does the same thing (finding the pkey(s), and using those to uniquely identify the row). It is kind of a PITA, but as Dave says, it's the best way to do this. HTH, --brett __________________________________ Do you Yahoo!? Yahoo! Mail SpamGuard - Read only the mail you want. http://antispam.yahoo.com/tools
"Dave Page" <dpage@vale-housing.co.uk> writes: > Ignoring potential OID wraparound problems (which we do in pgAdmin) this > should work, assuming there is an OID column. I would suggest trying the > following methods in sequence: > 1) Use the tables primary key. > 2) Use the OID (and check that only one record will be affected). > 3) Build a where clause based on all known original values (and check > that only one record will be affected). > 4) Fail with an appropriate error. I don't think it's a good idea for driver-level code to depend on OIDs for this; to do that you need the knowledge and cooperation of the database designer. The OID column may not exist at all (CREATE TABLE ... WITHOUT OIDS). If it does exist, it's not guaranteed to be unique unless someone put a unique index on it (and I agree with Dave that a driver has no business installing such an index). Furthermore, if there's not an index on OID then an update specifying "WHERE oid = nnn" is going to be very slow because it will have to seqscan the whole table. I believe the ODBC driver uses CTID for this sort of problem. CTID is guaranteed to exist and to be fast to access (since it's a physical locator). Against this you have the problem that concurrent updates of the record will move it, leaving your CTID invalid. However, that could be a good thing, as it's debatable that you want to blindly apply your update in such a case anyway. If you are willing to hold open a transaction while the user edits the record, you can lock the record with SELECT FOR UPDATE, and then your CTID is guaranteed good for the duration of the transaction. If you don't want to do that, I'd suggest reading both CTID and XMIN when you initially read the tuple. When you are ready to commit changes, do this: BEGIN; SELECT xmin FROM table WHERE ctid = whatever FOR UPDATE; -- check that you get a record and its xmin matches -- what you had; if so, you can go ahead and do UPDATE table SET ... WHERE ctid = whatever; COMMIT; If some other client changed the record behind your back, you'll know it because the xmin changes, and you can then do whatever seems best to cope. regards, tom lane
> I believe the ODBC driver uses CTID for this sort of problem. CTID is > guaranteed to exist and to be fast to access (since it's a physical > locator). Against this you have the problem that concurrent updates > of the record will move it, leaving your CTID invalid. However, that IIRC the ctid access follows the chain up to the currently valid tuple ? I thought the only enemy of ctid access was "vacuum" ? Andreas
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: > IIRC the ctid access follows the chain up to the currently valid > tuple ? No. I think Hiroshi or someone put in a function you can use to follow the chain, but a simple "WHERE ctid = whatever" won't do it. In any case, if you're not holding an open transaction then you have to be prepared to have the dead tuple vacuumed out from under you, in which case you'd not be able to follow the chain anyway. regards, tom lane
Shachar Shemesh <psql@shemesh.biz> writes: > Would adding "OID" to the rows returned by each "Select" call, and then doing > "update blah where oid=xxx" when I'm requested to update the row sound like a > reasonable stategy, in lieu of updateable cursors? Can anyone suggest a better > way? If you're in control of the database schema and can ensure that all tables will have OIDs enabled and you can add a unique index on OID on all these tables then yes. But it's not ideal. If OID wraps around you'll get errors from unique key violations. A better strategy is to pull the primary key columns from information_schema and use those columns. This would be more work but would work on any table with a primary key. This won't work for tables without primary keys, but in that case, arguably, updating records doesn't really make sense anyways. Something like this, though I'm not really very familiar with the information_schema. db=> SELECT ordinal_position,column_name FROM information_schema.table_constraints AS a JOIN information_schema.key_column_usageAS b USING (constraint_schema,constraint_name) WHERE a.constraint_type = 'PRIMARYKEY' AND a.table_schema = 'public' AND a.table_name = 'country' ORDER BY ordinal_position;ordinal_position| column_name ------------------+-------------- 1 | country_code (1 row) -- greg