Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server
Date
Msg-id 5B9BB133.1060107@lab.ntt.co.jp
Whole thread Raw
In response to Re: Problem while updating a foreign table pointing to apartitioned table on foreign server  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Problem while updating a foreign table pointing to apartitioned table on foreign server
List pgsql-hackers
(2018/08/24 16:58), Kyotaro HORIGUCHI wrote:
> At Tue, 21 Aug 2018 11:01:32 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI<horiguchi.kyotaro@lab.ntt.co.jp>  wrote
in<20180821.110132.261184472.horiguchi.kyotaro@lab.ntt.co.jp>
>>> You wrote:
>>>>     Several places seems to be assuming that fdw_scan_tlist may be
>>>>     used foreign scan on simple relation but I didn't find that
>>>>     actually happens.
>>>
>>> Yeah, currently, postgres_fdw and file_fdw don't use that list for
>>> simple foreign table scans, but it could be used to improve the
>>> efficiency for those scans, as explained in fdwhandler.sgml:
> ...
>> I'll put more consideration on using fdw_scan_tlist in the
>> documented way.
>
> Done. postgres_fdw now generates full fdw_scan_tlist (as
> documented) for foreign relations with junk columns having a
> small change in core side. However it is far less invasive than
> the previous version and I believe that it dones't harm
> maybe-existing use of fdw_scan_tlist on non-join rels (that is,
> in the case of a subset of relation columns).

Yeah, changes to the core by the new version is really small, which is 
great, but I'm not sure it's a good idea to modify the catalog info on 
the target table on the fly:

@@ -126,8 +173,18 @@ get_relation_info(PlannerInfo *root, Oid 
relationObjectId,\
  bool inhparent,
                 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                  errmsg("cannot access temporary or unlogged relations 
during r\
ecovery")));

+   max_attrnum = RelationGetNumberOfAttributes(relation);
+
+   /* Foreign table may have exanded this relation with junk columns */
+   if (root->simple_rte_array[varno]->relkind == RELKIND_FOREIGN_TABLE)
+   {
+       AttrNumber maxattno = max_varattno(root->parse->targetList, varno);
+       if (max_attrnum < maxattno)
+           max_attrnum = maxattno;
+   }
+
     rel->min_attr = FirstLowInvalidHeapAttributeNumber + 1;
-   rel->max_attr = RelationGetNumberOfAttributes(relation);
+   rel->max_attr = max_attrnum;
     rel->reltablespace = RelationGetForm(relation)->reltablespace;

This breaks the fundamental assumption that rel->max_attr is equal to 
RelationGetNumberOfAttributes of that table.  My concern is: this change 
would probably be a wart, so it would be bug-prone in future versions.

Another thing on the new version:

@@ -1575,6 +1632,19 @@ build_physical_tlist(PlannerInfo *root, 
RelOptInfo *rel)
             relation = heap_open(rte->relid, NoLock);

             numattrs = RelationGetNumberOfAttributes(relation);
+
+           /*
+            * Foreign tables may have expanded with some junk columns. Punt
+            * in the case.
+            */
+           if (numattrs < rel->max_attr)
+           {
+               Assert(root->simple_rte_array[rel->relid]->relkind ==
+                      RELKIND_FOREIGN_TABLE);
+               heap_close(relation, NoLock);
+               break;
+           }

I think this would disable the optimization on projection in foreign 
scans, causing performance regression.

> One arguable behavior change is about wholrow vars. Currently it
> refferes local tuple with all columns but it is explicitly
> fetched as ROW() after this patch applied. This could be fixed
> but not just now.
>
> Part of 0004-:
> -  Output: f1, ''::text, ctid, rem1.*
> -  Remote SQL: SELECT f1, f2, ctid FROM public.loc1 FOR UPDATE
> +  Output: f1, ''::text, tableoid, ctid, rem1.*
> +  Remote SQL: SELECT f1, tableoid, ctid, ROW(f1, f2) FROM public.loc1 FOR UPDATE

That would be also performance regression.  If we go in this direction, 
that should be fixed.

> Since this uses fdw_scan_tlist so it is theoretically
> back-patchable back to 9.6.

IIRC, the fdw_scan_tlist stuff was introduced in PG9.5 as part of join 
pushdown infrastructure, so I think your patch can be back-patched to 
PG9.5, but I don't think that's enough; IIRC, this issue was introduced 
in PG9.3, so a solution for this should be back-patch-able to PG9.3, I 
think.

> Please find the attached three files.

Thanks for the patches!

> 0001-Add-test-for-postgres_fdw-foreign-parition-update.patch
>
>   This should fail for unpatched postgres_fdw. (Just for demonstration)

+CREATE TABLE p1 (a int, b int);
+CREATE TABLE c1 (LIKE p1) INHERITS (p1);
+CREATE TABLE c2 (LIKE p1) INHERITS (p1);
+CREATE FOREIGN TABLE fp1 (a int, b int)
+ SERVER loopback OPTIONS (table_name 'p1');
+INSERT INTO c1 VALUES (0, 1);
+INSERT INTO c2 VALUES (1, 1);
+SELECT tableoid::int - (SELECT min(tableoid) FROM fp1)::int AS 
toiddiff, ctid, * FROM fp1;

Does it make sense to evaluate toiddiff?  I think that should always be 0.

> 0003-Fix-of-foreign-update-bug-of-PgFDW.patch
>
>   Fix of postgres_fdw for this problem.

Sorry, I have not looked at it closely yet, but before that I'd like to 
discuss the direction we go in.  I'm not convinced that your approach is 
the right direction, so as promised, I wrote a patch using the 
Param-based approach, and compared the two approaches.  Attached is a 
WIP patch for that, which includes the 0003 patch.  I don't think there 
would be any warts as discussed above in the Param-based approach for 
now.  (That approach modifies the planner so that the targetrel's tlist 
would contain Params as well as Vars/PHVs, so actually, it breaks the 
planner assumption that a rel's tlist would only include Vars/PHVs, but 
I don't find any issues on that at least for now.  Will look into that 
in more detail.)  And I don't think there would be any concern about 
performance regression, either.  Maybe I'm missing something, though.

What do you think about that?

Note about the attached: I tried to invent a utility for 
generate_new_param like SS_make_initplan_output_param as mentioned in 
[1], but since the FDW API doesn't pass PlannerInfo to the FDW, I think 
the FDW can't call the utility the same way.  Instead, I modified the 
planner so that 1) the FDW adds Params without setting PARAM_EXEC Param 
IDs using a new function, and then 2) the core fixes the IDs.

Sorry for the delay.

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/3919.1527775582%40sss.pgh.pa.us

Attachment

pgsql-hackers by date:

Previous
From: Surafel Temesgen
Date:
Subject: Re: hostorder and failover_timeout for libpq
Next
From: Arthur Zakirov
Date:
Subject: Re: [PATCH] Tab completion for ALTER DATABASE … SET TABLESPACE