Thread: Clarification of FDW API Documentation

Clarification of FDW API Documentation

From
Jason Petersen
Date:
I've been deep in the FDW APIs lately and have come up with a couple of
questions about the [user-facing documentation][1].

# Requirement that DELETE use junk columns

The bit I'm confused by is the parenthetical in this bit at the end of the
section on `AddForeignUpdateTargets`:

> If the AddForeignUpdateTargets pointer is set to NULL, no extra target
> expressions are added. (This will make it impossible to implement DELETE
> operations, though UPDATE may still be feasible if the FDW relies on an
> unchanging primary key to identify rows.)

Later on, the section on `ExecForeignDelete` says (emphasis mine):

> The junk column(s) **must** be used to identify the tuple to be deleted.

Why is this a requirement? At the moment, `postgres_fdw` asks remote machines
for the `ctid` of tuples during a scan so it can use that `ctid` to create a
targeted `DELETE` during `ExecForeignDelete`, but I think an alternative could
avoid the use of the `ctid` junk column altogether...

Imagine if `BeginForeignScan` set up a remote cursor and `IterateForeignScan`
just fetched _one tuple at a time_ (unlike the current behavior where they are
fetched in batches). The tuple would be passed to `ExecForeignDelete` (as is
required), but the remote cursor would remain pointing at that tuple. Couldn't
`ExecForeignDelete` just call `DELETE FROM table WHERE CURRENT OF cursor` to
then delete that tuple?

Even if there is no guarantee that `IterateForeignScan` is called exactly once
before each `ExecForeignDelete` call (which would remove the ability to have
them cooperate using this single cursor), one could easily devise other storage
backends that don't need "junk" columns to perform `DELETE` operations.

So why the strong language around this functionality?

# Examples of `NULL` return after modification

Each of the `ExecForeign`- functions needs to return a tuple representing the
row inserted, deleted, or modified. But each function's documentation contains
an aside similar to this:

> The return value is either a slot containing the data that was actually
> inserted (this might differ from the data supplied, for example as a result
> of trigger actions), or NULL if no row was actually inserted (again,
> typically as a result of triggers).

Is this even accurate in PostgreSQL 9.3? Can triggers fire against foreign
tables? If so, can someone provide an example where the foreign scan has found
a tuple, passed it to `ExecForeignDelete`, and then no delete takes place (i.e.
`ExecForeignDelete` returns `NULL`)? As far as I can reason, if the foreign
scan has found a tuple, the update and delete actions need to do _something_
with it. Maybe I'm missing something.

--Jason

[1]: http://www.postgresql.org/docs/9.3/static/fdw-callbacks.html




Re: Clarification of FDW API Documentation

From
Tom Lane
Date:
Jason Petersen <jason@citusdata.com> writes:
> Imagine if `BeginForeignScan` set up a remote cursor and `IterateForeignScan`
> just fetched _one tuple at a time_ (unlike the current behavior where they are
> fetched in batches). The tuple would be passed to `ExecForeignDelete` (as is
> required), but the remote cursor would remain pointing at that tuple. Couldn't
> `ExecForeignDelete` just call `DELETE FROM table WHERE CURRENT OF cursor` to
> then delete that tuple?

No.  This is not guaranteed (or even likely) to work in join cases: the
tuple to be updated/deleted might no longer be the current one of the scan.
You *must* arrange for the scan to return enough information to uniquely
identify the tuple later, and that generally means adding some resjunk
columns.

> Even if there is no guarantee that `IterateForeignScan` is called exactly once
> before each `ExecForeignDelete` call (which would remove the ability to have
> them cooperate using this single cursor), one could easily devise other storage
> backends that don't need "junk" columns to perform `DELETE` operations.

Such as?  I could imagine having an optimization that works like you
suggest for simple scan cases, but it's not there now, and it could not
be the only mode.

> Each of the `ExecForeign`- functions needs to return a tuple representing the
> row inserted, deleted, or modified. But each function's documentation contains
> an aside similar to this:

>> The return value is either a slot containing the data that was actually
>> inserted (this might differ from the data supplied, for example as a result
>> of trigger actions), or NULL if no row was actually inserted (again,
>> typically as a result of triggers).

> Is this even accurate in PostgreSQL 9.3? Can triggers fire against foreign
> tables?

Any local trigger execution would be handled by the core executor.
What this is on about is that the remote database might have modified or
suppressed the operation as a result of triggers on the remote table;
and we'd like the FDW to return data that reflects what actually got
inserted/updated/deleted remotely.  (I guess a particular FDW might have a
policy of not reporting such things accurately, but the point of the text
is that if you want to tell the truth you can do so.)

Perhaps it would help if these paragraphs said "remote trigger" not
just "trigger".
        regards, tom lane



Re: Clarification of FDW API Documentation

From
Etsuro Fujita
Date:
(2014/06/14 2:46), Tom Lane wrote:
> Jason Petersen <jason@citusdata.com> writes:

>> Even if there is no guarantee that `IterateForeignScan` is called exactly once
>> before each `ExecForeignDelete` call (which would remove the ability to have
>> them cooperate using this single cursor), one could easily devise other storage
>> backends that don't need "junk" columns to perform `DELETE` operations.
> 
> Such as?  I could imagine having an optimization that works like you
> suggest for simple scan cases, but it's not there now, and it could not
> be the only mode.

The optimization in the following comment for postgresPlanForeignModify?

/** postgresPlanForeignModify*      Plan an insert/update/delete operation on a foreign table** Note: currently, the
plantree generated for UPDATE/DELETE will always* include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)*
andthen the ModifyTable node will have to execute individual remote* UPDATE/DELETE commands.  If there are no local
conditionsor joins* needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING* and then do nothing at
ModifyTable. Room for future optimization ...*/
 

I think this would be very useful.  So, I plan to add a patch for it to
2014-08.

Thanks,

Best regards,
Etsuro Fujita



Re: Clarification of FDW API Documentation

From
Bernd Helmle
Date:

--On 13. Juni 2014 13:46:38 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> Imagine if `BeginForeignScan` set up a remote cursor and
>> `IterateForeignScan` just fetched _one tuple at a time_ (unlike the
>> current behavior where they are fetched in batches). The tuple would be
>> passed to `ExecForeignDelete` (as is required), but the remote cursor
>> would remain pointing at that tuple. Couldn't `ExecForeignDelete` just
>> call `DELETE FROM table WHERE CURRENT OF cursor` to then delete that
>> tuple?
>
> No.  This is not guaranteed (or even likely) to work in join cases: the
> tuple to be updated/deleted might no longer be the current one of the
> scan. You *must* arrange for the scan to return enough information to
> uniquely identify the tuple later, and that generally means adding some
> resjunk columns.

Yeah, this is exactly the trap i ran into while implementing the 
informix_fdw driver. It used an updatable cursor to implement the modify 
actions as you proposed first. Consider a query like

UPDATE remote SET f1 = t.id FROM local t WHERE t.id = f1

The planner might choose a hash join where the hash table is built by 
forwarding the cursor via the foreign scan. You'll end up with the cursor 
positioned at the end and you have no way to get it back "in sync" when the 
modify action is actually called.

-- 
Thanks
Bernd