RE: POC: postgres_fdw insert batching - Mailing list pgsql-hackers

From tsunakawa.takay@fujitsu.com
Subject RE: POC: postgres_fdw insert batching
Date
Msg-id TYAPR01MB2990A726F963C10DF3B98790FEE00@TYAPR01MB2990.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: POC: postgres_fdw insert batching  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: POC: postgres_fdw insert batching  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
From: Tomas Vondra <tomas.vondra@enterprisedb.com>
> Unfortunately, this does not compile for me, because nodeModifyTable calls
> ExecGetTouchedPartitions, which is not defined anywhere. Not sure what's
> that about, so I simply commented-out this. That probably fails the partitioned
> cases, but it allowed me to do some review and testing.

Ouch, sorry.  I'm ashamed to have forgotten including execPartition.c.


> The are a couple other smaller changes. E.g. it undoes changes to
> finish_foreign_modify, and instead calls separate functions to prepare the bulk
> statement. It also adds list_make5/list_make6 macros, so as to not have to do
> strange stuff with the parameter lists.

Thanks, I'll take them thankfully!  I wonder why I didn't think of separating deallocate_query() from
finish_foreign_modify()... perhaps my brain was dying.  As for list_make5/6(), I saw your first patch avoid adding
them,so I thought you found them ugly (and I felt so, too.)  But thinking now, there's no reason to hesitate it.
 


> A finally, this should probably add a bunch of regression tests.

Sure.


> 1) As I mentioned before, I really don't think we should be doing deparsing in
> execute_foreign_modify - that's something that should happen earlier, and
> should be in a deparse.c function.
...
> The attached patch tries to address both of these points.
> 
> Firstly, it adds a new deparseBulkInsertSql function, that builds a query for the
> "full" batch, and then uses those two queries - when we get a full batch we use
> the bulk query, otherwise we use the single-row query in a loop. IMO this is
> cleaner than deparsing queries ad hoc in the execute_foreign_modify.
...
> Of course, this might be worse when we don't have a full batch, e.g. for a query
> that insert only 50 rows with batch_size=100. If this case is common, one
> option would be lowering the batch_size accordingly. If we really want to
> improve this case too, I suggest we pass more info than just a position of the
> VALUES clause - that seems a bit too hackish.
...
> Secondly, it adds the batch_size option to server/foreign table, and uses that.
> This is not complete, though. postgresPlanForeignModify currently passes a
> hard-coded value at the moment, it needs to lookup the correct value for the
> server/table from RelOptInfo or something. And I suppose ModifyTable
> inftractructure will need to determine the value in order to pass the correct
> number of slots to the FDW API.

I can sort of understand your feeling, but I'd like to reconstruct the query and prepare it in execute_foreign_modify()
because:

* Some of our customers use bulk insert in ECPG (INSERT ... VALUES(record1, (record2), ...) to insert variable number
ofrecords per query.  (Oracle's Pro*C has such a feature.)  So, I want to be prepared to enable such a thing with FDW.
 

* The number of records to insert is not known during planning (in general), so it feels natural to get prepared during
executionphase, or not unnatural at least.
 

* I wanted to avoid the overhead of building the full query string for 100-record insert statement during query
planning,because it may be a bit costly for usual 1-record inserts.  (The overhead may be hidden behind the high
communicationcost of postgres_fdw, though.)
 

So, in terms of code cleanness, how about moving my code for rebuilding query string from execute_foreign_modify() to
somenew function in deparse.c?
 


> 2) I think the GUC should be replaced with an server/table option, similar to
> fetch_size.

Hmm, batch_size differs from fetch_size.  fetch_size is a postgres_fdw-specific feature with no relevant FDW routine,
whilebatch_size is a configuration parameter for all FDWs that implement ExecForeignBulkInsert().  The ideas I can
thinkof are:
 

1. Follow JDBC/ODBC and add standard FDW properties.  For example, the JDBC standard defines standard connection pool
propertiessuch as maxPoolSize and minPoolSize.  JDBC drivers have to provide them with those defined names.  Likewise,
theFDW interface requires FDW implementors to handle the foreign server option name "max_bulk_insert_tuples" if he/she
wantsto provide bulk insert feature and implement ExecForeignBulkInsert().  The core executor gets that setting from
theFDW by calling a new FDW routine like GetMaxBulkInsertTuples().  Sigh...
 

2. Add a new max_bulk_insert_tuples reloption to CREATE/ALTER FOREIGN TABLE.  executor gets the value from Relation and
usesit.  (But is this a table-specific configuration?  I don't think so, sigh...)
 

3. Adopt the current USERSET GUC max_bulk_insert_tuples.  I think this is enough because the user can change the
settingper session, application, and database.
 



Regards
Takayuki Tsunakawa


pgsql-hackers by date:

Previous
From: Hubert Zhang
Date:
Subject: Recv-Q buffer is filled up due to bgwriter continue sending statistics to un-launched stat collector
Next
From: Michael Paquier
Date:
Subject: Re: "as quickly as possible" (was: remove spurious CREATE INDEX CONCURRENTLY wait)