Re: Async execution of postgres_fdw. - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Async execution of postgres_fdw.
Date
Msg-id CAFjFpRdgY0gF72k7vPm0s1LU7J8V1W9T8X-aFpEY3WT81dSmQg@mail.gmail.com
Whole thread Raw
In response to Async execution of postgres_fdw.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: Async execution of postgres_fdw.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Re: Async execution of postgres_fdw.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List pgsql-hackers
Hi Horiguchi-san,
Here are my comments for the patches together

Sanity
1. The patch applies cleanly but has trailing white spaces.
[ashutosh@ubuntu coderoot]git apply /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch
/mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:41: trailing whitespace.
        entry->conn =
/mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:44: trailing whitespace.
       
/mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:611: trailing whitespace.
           
warning: 3 lines add whitespace errors.

2. The patches compile cleanly.
3. The regression is clean, even in contrib/postgres_fdw and contrib/file_fdw

Tests
-------
We need tests to make sure that the logic remains intact even after further changes in this area. Couple of tests which require multiple foreign scans within the same query fetching rows more than fetch size (100) would be required. Also, some DMLs, which involve multiple foreign scans would test the sanity when UPDATE/DELETE interleave such scans. By multiple foreign scans I mean both multiple scans on a single foreign server and multiple scans spread across multiple foreign servers.

Code
-------
Because previous "conn" is now replaced by "conn->pgconn", the double indirection makes the code a bit ugly and prone to segfaults (conn being NULL or invalid pointer). Can we minimize such code or wrap it under a macro?

We need some comments about the structure definition of PgFdwConn and its members explaining the purpose of this structure and its members.

Same is the case with enum PgFdwConnState. In fact, the state diagram of a connection has become more complicated with the async connections, so it might be better to explain that state diagram at one place in the code (through comments). The definition of the enum might be a good place to do that. Otherwise, the logic of connection maintenance is spread at multiple places and is difficult to understand by looking at the code.

In function GetConnection(), at line
        elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\"",
-            entry->conn, server->servername);
+            entry->conn->pgconn, server->servername);

entry->conn->pgconn may not necessarily be a new connection and may be NULL (as the next line check it for being NULL). So, I think this line should be moved within the following if block after pgconn has been initialised with the new connection.
+   if (entry->conn->pgconn == NULL)
+   {
+       entry->conn->pgconn = connect_pg_server(server, user);
+       entry->conn->nscans = 0;
+       entry->conn->async_state = PGFDW_CONN_IDLE;
+       entry->conn->async_scan = NULL;
+   }

The if condition if (entry->conn == NULL) in GetConnection(), used to track whether there is a PGConn active for the given entry, now it tracks whether it has PgFdwConn for the same.

Please see more comments inline.

On Mon, Dec 15, 2014 at 2:39 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:


* Outline of this patch

>From some consideration after the previous discussion and
comments from others, I judged the original (WIP) patch was
overdone as the first step. So I reduced the patch to minimal
function. The new patch does the following,

- Wrapping PGconn by PgFdwConn in order to handle multiple scans
  on one connection.

- The core async logic was added in fetch_more_data().

It might help if you can explain this logic in this mail as well as in code (as per my comment above).
 

- Invoking remote commands asynchronously in ExecInitForeignScan.

- Canceling async invocation if any other foreign scans,
  modifies, deletes use the same connection.

Cancellation is done by immediately fetching the return of
already-invoked acync command.

* Where this patch will be effective.

With upcoming inheritance-partition feature, this patch enables
stating and running foreign scans asynchronously. It will be more
effective for longer TAT or remote startup times, and larger
number of foreign servers. No negative performance effect on
other situations.


AFAIU, this logic sends only the first query in asynchronous manner not all of them. Is that right? If yes, I think it's a sever limitation of the feature. For a query involving multiple foreign scans, only the first one will be done in async fashion and not the rest. Sorry, if my understanding is wrong.
 
I think, we need some data which shows the speed up by this patch. You may construct a case, where a single query involved multiple foreign scans, and we can check what is the speed up obtained against the number of foreign scans.
 


* Concerns about this patch.

- This breaks the assumption that scan starts at ExecForeignScan,
  not ExecInitForeignScan, which might cause some problem.


This should be fine as long as it doesn't have any side effects like sending query during EXPLAIN (which has been taken care of in this patch.) Do you think, we need any special handling for PREPAREd statements?
 
- error reporting code in do_sql_command is quite ugly..



regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Andres Freund
Date:
Subject: Re: pg_regress writes into source tree