Thread: Writable foreign tables: how to identify rows
One of the core problems for a writable-foreign-tables feature is how to identify a previously-fetched row for UPDATE or DELETE actions. In an ordinary Postgres table, we use the ctid system column for that, but a remote table doesn't necessarily have such a thing. (Oracle has a "rowid" that acts a lot like our ctids, but I don't believe the concept is common in other RDBMSes.) Without any magic row identifier such as these, I suppose an FDW would need to insist on knowing the primary key for the remote table, so that it could update based on the values of the pkey column(s). The current writable-foreign-tables patch goes to great lengths to try to cater for magic row identifiers of unspecified data types; which is something I encouraged in the beginning, but now that I see the results I think the messiness-to-usefulness quotient is awfully low. Basically what it's doing is hacking the TupleDesc associated with a foreign table so that the descriptor (sometimes) includes extra columns. I don't think that's workable at all --- there are too many places that assume that relation TupleDescs match up with what's in the catalogs. I think if we're going to support magic row identifiers, they need to be actual system columns, complete with negative attnums and entries in pg_attribute. This would require FDWs to commit to the data type of a magic row identifier at foreign-table creation time, but that doesn't seem like much of a restriction: probably any one FDW would have only one possible way to handle a magic identifier. So I'm envisioning adding an FDW callback function that gets called at table creation and returns an indication of which system columns the foreign table should have, and then we actually make pg_attribute entries for those columns. For postgres_fdw, that would really be enough, since it could just cause a "ctid" column to be created with the usual definition. Then it could put the remote ctid into the usual t_self field in returned tuples. Supporting magic identifiers that aren't of the TID data type is considerably harder, mainly because it's not clear how heap_getsysattr() could know how to fetch the column value. I have some rough ideas about that, but I suggest that we might want to punt on that extension for the time being. Thoughts? regards, tom lane
Tom, > One of the core problems for a writable-foreign-tables feature is how > to identify a previously-fetched row for UPDATE or DELETE actions. > In an ordinary Postgres table, we use the ctid system column for that, > but a remote table doesn't necessarily have such a thing. (Oracle has > a "rowid" that acts a lot like our ctids, but I don't believe the > concept is common in other RDBMSes.) Without any magic row identifier > such as these, I suppose an FDW would need to insist on knowing the > primary key for the remote table, so that it could update based on the > values of the pkey column(s). Well, a good test case for this could be the various key-value stores, where the foreign row identifier (FRI) is an immutable string key of arbitrary size. If we can support that, there's a lot of datastores we can support, and even a rule of "your target must have a single-column key" would be tolerable for non-postgres FDWs for quite a while. > I think if we're going to support magic row identifiers, they need to > be actual system columns, complete with negative attnums and entries > in pg_attribute. This would require FDWs to commit to the data type > of a magic row identifier at foreign-table creation time, but that > doesn't seem like much of a restriction: probably any one FDW would > have only one possible way to handle a magic identifier. So I'm > envisioning adding an FDW callback function that gets called at table > creation and returns an indication of which system columns the foreign > table should have, and then we actually make pg_attribute entries for > those columns. Per the above, it would be good to allow the "system column" to also be a column which is exposed to the user. > Supporting magic identifiers that aren't of the TID data type is > considerably harder, mainly because it's not clear how heap_getsysattr() > could know how to fetch the column value. I have some rough ideas > about that, but I suggest that we might want to punt on that extension > for the time being. Well, if it gets pgsql_fdw in, I'm for it. I'd always assumed that writeable non-postgres targets were going to be hard. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Mar 6, 2013 at 6:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One of the core problems for a writable-foreign-tables feature is how > to identify a previously-fetched row for UPDATE or DELETE actions. > In an ordinary Postgres table, we use the ctid system column for that, > but a remote table doesn't necessarily have such a thing. (Oracle has > a "rowid" that acts a lot like our ctids, but I don't believe the > concept is common in other RDBMSes.) Without any magic row identifier > such as these, I suppose an FDW would need to insist on knowing the > primary key for the remote table, so that it could update based on the > values of the pkey column(s). > > The current writable-foreign-tables patch goes to great lengths to try > to cater for magic row identifiers of unspecified data types; which is > something I encouraged in the beginning, but now that I see the results > I think the messiness-to-usefulness quotient is awfully low. Basically > what it's doing is hacking the TupleDesc associated with a foreign table > so that the descriptor (sometimes) includes extra columns. I don't > think that's workable at all --- there are too many places that assume > that relation TupleDescs match up with what's in the catalogs. > > I think if we're going to support magic row identifiers, they need to > be actual system columns, complete with negative attnums and entries > in pg_attribute. This would require FDWs to commit to the data type > of a magic row identifier at foreign-table creation time, but that > doesn't seem like much of a restriction: probably any one FDW would > have only one possible way to handle a magic identifier. So I'm > envisioning adding an FDW callback function that gets called at table > creation and returns an indication of which system columns the foreign > table should have, and then we actually make pg_attribute entries for > those columns. > > For postgres_fdw, that would really be enough, since it could just > cause a "ctid" column to be created with the usual definition. Then > it could put the remote ctid into the usual t_self field in returned > tuples. > +1 for adding a new system attribute. We did something similar in Postgres-XC, though problem there was much simpler because we always knew that the remote FDW is a Postgres instance running the same version. So we added a new system column "node_id" which does not get any disk storage, but gets set during execution time depending on which node the tuple belongs to. The ctid system column of course is set to the remote ctid. In the context of postgres_fdw, I am not sure if we need an additional system column like a node_id. Would there be a possibility where tuples to-be-modified are coming from different foreign tables and at runtime we need to decide where to execute the UPDATE/DELETE operation ? If we start supporting inheritance involving foreign tables as child tables, this will become a reality. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:
-- Postgres-XC, though problem there was much simpler because we always+1 for adding a new system attribute. We did something similar in
knew that the remote FDW is a Postgres instance running the same
version. So we added a new system column "node_id" which does not get
any disk storage, but gets set during execution time depending on
which node the tuple belongs to. The ctid system column of course is
set to the remote ctid.
Michael
2013/3/6 Tom Lane <tgl@sss.pgh.pa.us>: > One of the core problems for a writable-foreign-tables feature is how > to identify a previously-fetched row for UPDATE or DELETE actions. > In an ordinary Postgres table, we use the ctid system column for that, > but a remote table doesn't necessarily have such a thing. (Oracle has > a "rowid" that acts a lot like our ctids, but I don't believe the > concept is common in other RDBMSes.) Without any magic row identifier > such as these, I suppose an FDW would need to insist on knowing the > primary key for the remote table, so that it could update based on the > values of the pkey column(s). > > The current writable-foreign-tables patch goes to great lengths to try > to cater for magic row identifiers of unspecified data types; which is > something I encouraged in the beginning, but now that I see the results > I think the messiness-to-usefulness quotient is awfully low. Basically > what it's doing is hacking the TupleDesc associated with a foreign table > so that the descriptor (sometimes) includes extra columns. I don't > think that's workable at all --- there are too many places that assume > that relation TupleDescs match up with what's in the catalogs. > > I think if we're going to support magic row identifiers, they need to > be actual system columns, complete with negative attnums and entries > in pg_attribute. This would require FDWs to commit to the data type > of a magic row identifier at foreign-table creation time, but that > doesn't seem like much of a restriction: probably any one FDW would > have only one possible way to handle a magic identifier. So I'm > envisioning adding an FDW callback function that gets called at table > creation and returns an indication of which system columns the foreign > table should have, and then we actually make pg_attribute entries for > those columns. > > For postgres_fdw, that would really be enough, since it could just > cause a "ctid" column to be created with the usual definition. Then > it could put the remote ctid into the usual t_self field in returned > tuples. > > Supporting magic identifiers that aren't of the TID data type is > considerably harder, mainly because it's not clear how heap_getsysattr() > could know how to fetch the column value. I have some rough ideas > about that, but I suggest that we might want to punt on that extension > for the time being. > > Thoughts? > Sorry, -1 for me. The proposed design tried to kill two birds with one stone. The reason why the new GetForeignRelWidth() can reserve multiple slot for pseudo-columns is, that intends to push-down complex calculation in target-list to the remote computing resource. For example, please assume the third target-entry below is very complex calculation, thus, it consumes much CPU cycles. SELECT x, y, (x-y)^2 from remote_table; FDW driver will able to replace it with just a reference to a pseudo-column that shall hold the calculation result of (x-y)^2 in remote side. It does not take a calculation in local side, it can assist CPU off-load. If we have a particular system attribute to carry remote "rowid" on foreign- table declaration time, it will loose opportunity of such kind of optimization. When we handle a query including sub-queries, it generates its TupleDesc in run-time without problems. I don't think we have no special reason that we cannot admit foreign-tables to adopt an alternative TupleDesc being constructed in run-time. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On 2013-03-05 19:30:53 -0500, Tom Lane wrote: > One of the core problems for a writable-foreign-tables feature is how > to identify a previously-fetched row for UPDATE or DELETE actions. > In an ordinary Postgres table, we use the ctid system column for that, > but a remote table doesn't necessarily have such a thing. > ... > For postgres_fdw, that would really be enough, since it could just > cause a "ctid" column to be created with the usual definition. Then > it could put the remote ctid into the usual t_self field in returned > tuples. > > Supporting magic identifiers that aren't of the TID data type is > considerably harder, mainly because it's not clear how heap_getsysattr() > could know how to fetch the column value. I have some rough ideas > about that, but I suggest that we might want to punt on that extension > for the time being. What about just making it a bytea in fdw's? Now its not nice to waste space for a (probably 1byte) bytea header, but its not too bad either. The fdw author then needs to ensure only the correct data ends up in that system column. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > In the context of postgres_fdw, I am not sure if we need an additional > system column like a node_id. Would there be a possibility where > tuples to-be-modified are coming from different foreign tables and at > runtime we need to decide where to execute the UPDATE/DELETE operation > ? If we start supporting inheritance involving foreign tables as child > tables, this will become a reality. Foreign tables have tableoid system column which holds pg_class.oid of the foreign table. It seems sufficient to determine source server. -- Shigeru HANADA
On 06-Mar-2013, at 4:12 PM, Shigeru Hanada <shigeru.hanada@gmail.com> wrote: > On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee > <pavan.deolasee@gmail.com> wrote: >> In the context of postgres_fdw, I am not sure if we need an additional >> system column like a node_id. Would there be a possibility where >> tuples to-be-modified are coming from different foreign tables and at >> runtime we need to decide where to execute the UPDATE/DELETE operation >> ? If we start supporting inheritance involving foreign tables as child >> tables, this will become a reality. > > Foreign tables have tableoid system column which holds pg_class.oid of > the foreign table. It seems sufficient to determine source server. > I think you are right. In the context of foreign tables, tableoid might be enough to get the source. Unfortunately, Postgres-XChad not yet leveraged foreign tables fully and hence special handling was required. Thanks, Pavan > -- > Shigeru HANADA
On Wed, Mar 6, 2013 at 9:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > For postgres_fdw, that would really be enough, since it could just > cause a "ctid" column to be created with the usual definition. Then > it could put the remote ctid into the usual t_self field in returned > tuples. I'm not sure whether postgres_fdw should support, but updatable views have no system column including ctid. So, we need magic identifier, perhaps it would be set of primary key value(s), to support updating remote updatable views via foreign tables. Just a heads up. -- Shigeru HANADA
Shigeru Hanada <shigeru.hanada@gmail.com> writes: > I'm not sure whether postgres_fdw should support, but updatable views > have no system column including ctid. So, we need magic identifier, > perhaps it would be set of primary key value(s), to support updating > remote updatable views via foreign tables. Yeah, I considered that. I thought seriously about proposing that we forget magic row identifiers altogether, and instead make postgres_fdw require a remote primary key for a foreign table to be updatable. That would solve the immediate problem since we'd no longer need any system columns at all. On balance though I think it's better for postgres_fdw to use ctid for this purpose, at least for now: ctid is better-performing and more reliable than a PK (since someone might mis-describe the PK, or change a row's PK value underneath us). Perhaps in a future release we could add a table option to use PK row identification, but I don't want to try to implement both methods in postgres_fdw right now. On the other hand, I don't have a problem with decreeing that non-Postgres FDWs need to use PK row identification in the first release; which would be the consequence if we don't do anything about allowing new system columns in 9.3. We will certainly need that style of row identification to be written and tested anyway. It won't stop us from extending things later. regards, tom lane
Kohei KaiGai <kaigai@kaigai.gr.jp> writes: > 2013/3/6 Tom Lane <tgl@sss.pgh.pa.us>: >> I think if we're going to support magic row identifiers, they need to >> be actual system columns, complete with negative attnums and entries >> in pg_attribute. > Sorry, -1 for me. > The proposed design tried to kill two birds with one stone. > The reason why the new GetForeignRelWidth() can reserve multiple slot > for pseudo-columns is, that intends to push-down complex calculation in > target-list to the remote computing resource. [ shrug... ] That's just pie in the sky: there is no such capability in the submitted patch, nor is it close to being implementable. When weighing that against the very real probability that this patch won't get into 9.3 at all, I have no problem tossing that overboard to try to get to something committable. The larger issue here is that the patch is confusing the set of possibly-computed columns that could be returned by a scan node with the catalog-specified set of columns for a table. I don't think that changing the (representation of the) latter on the fly within the planner is a tenable approach. You've had to put in an unreasonable number of crude hacks already to make that sort-of work, but I have exactly zero confidence that you've hacked everyplace that would have to change. I also have no confidence that there aren't unfixable problems where the same TupleDesc would need to be in both states to satisfy the expectations of different bits of code. (An example of the risks here is that, IIRC, parts of the planner use the relation's TupleDesc as a guide to what "relname.*" means. So I'm pretty suspicious that the existing patch breaks behavior for whole-row Vars in some cases.) Really the idea is a bit broken in this form anyway, because if we did have a plan that involved calculating "(x-y)^2" at the scan level, we'd want that expression to be represented using Vars for x and y, not some made-up Var whose meaning is not apparent from the system catalogs. Also, the right way to deal with this desire is to teach the planner in general, not just FDWs, about pushing expensive calculations down the plan tree --- basically, resurrecting Joe Hellerstein's thesis work, which we ripped out more than ten years ago. I don't think there's that much that would need to change about the planner's data structures, but deciding where is cheapest to evaluate an expression is a pretty hard problem. Trying to handle that locally within FDWs is doomed to failure IMO. So my intention is to get rid of GetForeignRelWidth() and make use of the existing ctid system column for returning remote TIDs in postgres_fdw. (On review I notice that we're creating ctid columns for foreign tables already, so we don't even need the proposed hook to let FDWs control that; though we will certainly want one in future if we are to support non-TID magic row identifiers.) If you find that unacceptable, I'm quite willing to mark this patch Returned With Feedback and get on with dealing with some of the other forty-odd patches in the CF queue. regards, tom lane
2013/3/6 Tom Lane <tgl@sss.pgh.pa.us>: > Also, the right way to deal with this desire is to teach the planner in > general, not just FDWs, about pushing expensive calculations down the > plan tree --- basically, resurrecting Joe Hellerstein's thesis work, > which we ripped out more than ten years ago. I don't think there's that > much that would need to change about the planner's data structures, but > deciding where is cheapest to evaluate an expression is a pretty hard > problem. Trying to handle that locally within FDWs is doomed to failure > IMO. > It also seems to me more attractive direction to support a mechanism to off-load calculation of expensive expression in general, not only foreign-tables. Probably, we have to include the cost to compute expression node to choose a proper plan node. For example, if an extension injected a pseudo scan node to materialize underlying regular table and calculate target entry using SIMD operations, its cost to compute target-entry shall be 25% towards regular ones. So, planner shall choose this plan-node in case of advantage. > So my intention is to get rid of GetForeignRelWidth() and make use of > the existing ctid system column for returning remote TIDs in > postgres_fdw. (On review I notice that we're creating ctid columns > for foreign tables already, so we don't even need the proposed hook > to let FDWs control that; though we will certainly want one in future > if we are to support non-TID magic row identifiers.) > OK. It gets back to the initial implementation that uses TID system column. I'll try to adjust my patches by the next week. Probably, we should allow FDW drivers to specify data type of TID system column in future development, rather than new system columns, because the non-TID magic row-identifier is used exclusively with TID. > If you find that unacceptable, I'm quite willing to mark this patch > Returned With Feedback and get on with dealing with some of the other > forty-odd patches in the CF queue. > A bird is a reasonable outcome with a stone, even though it is not two birds. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Wed, Mar 6, 2013 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > On the other hand, I don't have a problem with decreeing that > non-Postgres FDWs need to use PK row identification in the first > release; which would be the consequence if we don't do anything about > allowing new system columns in 9.3. We will certainly need that style > of row identification to be written and tested anyway. It won't stop > us from extending things later. Oh, I didn't realize that was how it was going to work out. That seems very reasonable to me. There is a performance problem with forcing DELETE FROM ft WHERE nonkey = 5 to be pushed to the remote side as SELECT pk FROM ft WHERE nonkey = 5 followed by DELETE FROM ft WHERE pk = $1 for each pk value returned by the SELECT, which sounds like it's what will happen under this system. But I don't have any problem leaving that as future work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Shigeru Hanada <shigeru.hanada@gmail.com> writes: >> I'm not sure whether postgres_fdw should support, but updatable views >> have no system column including ctid. So, we need magic identifier, >> perhaps it would be set of primary key value(s), to support updating >> remote updatable views via foreign tables. > > Yeah, I considered that. I thought seriously about proposing that we > forget magic row identifiers altogether, and instead make postgres_fdw > require a remote primary key for a foreign table to be updatable. IMO, Utilizing anything but this for remote record identification is an implementation specific optimization. Aren't the semantics different though? If you go: update foo set id = 1 where id = 1; the primary key would not change, but the ctid would. or is that already a handled? merlin
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Mar 6, 2013 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> On the other hand, I don't have a problem with decreeing that >> non-Postgres FDWs need to use PK row identification in the first >> release; which would be the consequence if we don't do anything about >> allowing new system columns in 9.3. We will certainly need that style >> of row identification to be written and tested anyway. It won't stop >> us from extending things later. > Oh, I didn't realize that was how it was going to work out. That > seems very reasonable to me. There is a performance problem with > forcing DELETE FROM ft WHERE nonkey = 5 to be pushed to the remote > side as SELECT pk FROM ft WHERE nonkey = 5 followed by DELETE FROM ft > WHERE pk = $1 for each pk value returned by the SELECT, which sounds > like it's what will happen under this system. But I don't have any > problem leaving that as future work. That performance issue exists for *all* foreign updates/deletes at the moment, with or without a ctid-ish column. As you say, fixing it is material for 9.4 or later. (It's possible that an FDW could dodge this by itself without any additional help from the core code, but I'm not sure. Anyway postgres_fdw hasn't tried yet, and I think there are a number of things that are higher priority to worry about than bulk update performance.) regards, tom lane
Merlin Moncure <mmoncure@gmail.com> writes: > On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I considered that. I thought seriously about proposing that we >> forget magic row identifiers altogether, and instead make postgres_fdw >> require a remote primary key for a foreign table to be updatable. > IMO, Utilizing anything but this for remote record identification is > an implementation specific optimization. Aren't the semantics > different though? If you go: > update foo set id = 1 where id = 1; > the primary key would not change, but the ctid would. or is that > already a handled? In postgres_fdw as it currently stands, the remote ctid would change. I'm not sure we should posit that as a universal property of FDWs though. It's not even a meaningful question for FDWs with no underlying rowid concept. regards, tom lane
On Wed, Mar 13, 2013 at 9:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, I considered that. I thought seriously about proposing that we >>> forget magic row identifiers altogether, and instead make postgres_fdw >>> require a remote primary key for a foreign table to be updatable. > >> IMO, Utilizing anything but this for remote record identification is >> an implementation specific optimization. Aren't the semantics >> different though? If you go: > >> update foo set id = 1 where id = 1; > >> the primary key would not change, but the ctid would. or is that >> already a handled? > > In postgres_fdw as it currently stands, the remote ctid would change. > I'm not sure we should posit that as a universal property of FDWs > though. It's not even a meaningful question for FDWs with no underlying > rowid concept. I just find it odd that rowid concept is used at all without strong guarantee that the record you are referencing is the one you are supposed to be referencing. Basically I'm saying PKEY semantics are the correct ones and that ctid is ok to use iff they match the pkey ones. I don't think this is possible unless you maintain a remote lock on the ctid between when you fetch it and do some other operation. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > I just find it odd that rowid concept is used at all without strong > guarantee that the record you are referencing is the one you are > supposed to be referencing. Basically I'm saying PKEY semantics are > the correct ones and that ctid is ok to use iff they match the pkey > ones. I don't think this is possible unless you maintain a remote > lock on the ctid between when you fetch it and do some other > operation. postgres_fdw does maintain such a lock (it does SELECT FOR UPDATE when scanning an update/delete target table). However, I'm not exactly sure why you think a pkey-based design would be free of such hazards. pkeys aren't immutable either ... regards, tom lane
On 2013-03-13 09:51:59 -0500, Merlin Moncure wrote: > On Wed, Mar 13, 2013 at 9:30 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Merlin Moncure <mmoncure@gmail.com> writes: > >> On Wed, Mar 6, 2013 at 11:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Yeah, I considered that. I thought seriously about proposing that we > >>> forget magic row identifiers altogether, and instead make postgres_fdw > >>> require a remote primary key for a foreign table to be updatable. > > > >> IMO, Utilizing anything but this for remote record identification is > >> an implementation specific optimization. Aren't the semantics > >> different though? If you go: > > > >> update foo set id = 1 where id = 1; > > > >> the primary key would not change, but the ctid would. or is that > >> already a handled? > > > > In postgres_fdw as it currently stands, the remote ctid would change. > > I'm not sure we should posit that as a universal property of FDWs > > though. It's not even a meaningful question for FDWs with no underlying > > rowid concept. > > I just find it odd that rowid concept is used at all without strong > guarantee that the record you are referencing is the one you are > supposed to be referencing. Basically I'm saying PKEY semantics are > the correct ones and that ctid is ok to use iff they match the pkey > ones. I don't think this is possible unless you maintain a remote > lock on the ctid between when you fetch it and do some other > operation. ISTM thats just mostly the same semantics as you have locally. If you locally do an UPDATE it will use the ctid for location of the updated row. If it got updated since (which you can detect by checking whether xmax has committed) the behaviour depends on the transaction isolation level. On repeatable_read upwards it errors out, otherwise it will do the EvalPlanQual magic. Which is to make sure the WHERE condition still fits the row. Thats obviously where the cases start to differ. But thats also the case if you use the primary key since you obviously can have more quals than just that on the origin side. Perhaps pgsql-fdw should make sure the update was performed *without* following the ctid chain to a new valid tuple? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > Perhaps pgsql-fdw should make sure the update was performed *without* > following the ctid chain to a new valid tuple? I did think about these issues before committing the patch ;-) The basic theory in PG's existing design is to postpone locking rows as long as possible; which means that when we do finally lock a target row, we have to check if it's changed since we scanned it, and that leads into the whole EvalPlanQual mess. I considered trying to make FDWs duplicate that behavior, but gave up on it. In the first place, it's hard to see how you even define "did the row change" unless you have something exactly like ctids (including forward update chains). And in the second place, this would mandate yet another round trip to the remote server for each row to be updated. In the patch as committed, the expectation (which is satisfied by postgres_fdw) is that FDWs should lock rows that are candidates for update/delete during the initial scan. This avoids an extra round trip and justifies leaving EvalPlanQual out of the picture altogether. The cost is that we may lock rows that we end up not updating, because they fail locally-checked restriction or join conditions. I think on balance that's a good trade-off. regards, tom lane
On Wed, Mar 13, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> Perhaps pgsql-fdw should make sure the update was performed *without* >> following the ctid chain to a new valid tuple? > > I did think about these issues before committing the patch ;-) > > The basic theory in PG's existing design is to postpone locking rows as > long as possible; which means that when we do finally lock a target row, > we have to check if it's changed since we scanned it, and that leads > into the whole EvalPlanQual mess. I considered trying to make FDWs > duplicate that behavior, but gave up on it. In the first place, it's > hard to see how you even define "did the row change" unless you have > something exactly like ctids (including forward update chains). And > in the second place, this would mandate yet another round trip to the > remote server for each row to be updated. > > In the patch as committed, the expectation (which is satisfied by > postgres_fdw) is that FDWs should lock rows that are candidates for > update/delete during the initial scan. This avoids an extra round trip > and justifies leaving EvalPlanQual out of the picture altogether. > The cost is that we may lock rows that we end up not updating, because > they fail locally-checked restriction or join conditions. I think on > balance that's a good trade-off. agreed. As long as lock as held between ctid examination and row modification you are ok. didn't read the patch, just pointing that out (history is full of client side drivers that did not do this properly). I also might have missed some of the finer contextual points of the discussion here: I was thinking that you are identifying rows on the client over fetch transaction A to write back in transaction B. If that is the case, ctid based identification to me is full of issues and probably best used to handle awkward cases like there is no identified primary key. But sure, if you are holding locks you can base it on physical position. with pkey row identification, yes, rows can change from under you, but now you can employ application tricks to guard for that just as you would with any optimistic locking strategy in SQL. This is and expected and well understood problem (use transactions etc). merlin
Merlin Moncure <mmoncure@gmail.com> writes: > As long as lock as held between ctid examination and row modification > you are ok. didn't read the patch, just pointing that out (history is > full of client side drivers that did not do this properly). > I also might have missed some of the finer contextual points of the > discussion here: I was thinking that you are identifying rows on the > client over fetch transaction A to write back in transaction B. If > that is the case, ctid based identification to me is full of issues Absolutely --- you can't rely on ctid across transactions. postgres_fdw isn't doing that though, just using it to update or delete rows that it locked earlier in the same remote transaction. regards, tom lane