Re: Async execution of postgres_fdw. - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Async execution of postgres_fdw.
Date
Msg-id 20141224.163212.21062518.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Async execution of postgres_fdw.  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
Hello, thank you for the comment, Ashutosh.

I'll return after the New Year holidays. Very sorry not
addressing them sooner but then I will have more time on this.

Have a happy holidays.

regards,

=====
> Hi Horiguchi-san,
> Here are my comments for the patches together
> 
> Sanity
> 1. The patch applies cleanly but has trailing white spaces.
> [ashutosh@ubuntu coderoot]git apply
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:41: trailing whitespace.
>         entry->conn =
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:44: trailing whitespace.
> 
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:611: trailing
> whitespace.
> 
> warning: 3 lines add whitespace errors.
> 
> 2. The patches compile cleanly.
> 3. The regression is clean, even in contrib/postgres_fdw and
> contrib/file_fdw
> 
> Tests
> -------
> We need tests to make sure that the logic remains intact even after further
> changes in this area. Couple of tests which require multiple foreign scans
> within the same query fetching rows more than fetch size (100) would be
> required. Also, some DMLs, which involve multiple foreign scans would test
> the sanity when UPDATE/DELETE interleave such scans. By multiple foreign
> scans I mean both multiple scans on a single foreign server and multiple
> scans spread across multiple foreign servers.
> 
> Code
> -------
> Because previous "conn" is now replaced by "conn->pgconn", the double
> indirection makes the code a bit ugly and prone to segfaults (conn being
> NULL or invalid pointer). Can we minimize such code or wrap it under a
> macro?
> 
> We need some comments about the structure definition of PgFdwConn and its
> members explaining the purpose of this structure and its members.
> 
> Same is the case with enum PgFdwConnState. In fact, the state diagram of a
> connection has become more complicated with the async connections, so it
> might be better to explain that state diagram at one place in the code
> (through comments). The definition of the enum might be a good place to do
> that. Otherwise, the logic of connection maintenance is spread at multiple
> places and is difficult to understand by looking at the code.
> 
> In function GetConnection(), at line
>         elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\"",
> -            entry->conn, server->servername);
> +            entry->conn->pgconn, server->servername);
> 
> entry->conn->pgconn may not necessarily be a new connection and may be NULL
> (as the next line check it for being NULL). So, I think this line should be
> moved within the following if block after pgconn has been initialised with
> the new connection.
> +   if (entry->conn->pgconn == NULL)
> +   {
> +       entry->conn->pgconn = connect_pg_server(server, user);
> +       entry->conn->nscans = 0;
> +       entry->conn->async_state = PGFDW_CONN_IDLE;
> +       entry->conn->async_scan = NULL;
> +   }
> 
> The if condition if (entry->conn == NULL) in GetConnection(), used to track
> whether there is a PGConn active for the given entry, now it tracks whether
> it has PgFdwConn for the same.
> 
> Please see more comments inline.
> 
> On Mon, Dec 15, 2014 at 2:39 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> 
> >
> >
> > * Outline of this patch
> >
> > From some consideration after the previous discussion and
> > comments from others, I judged the original (WIP) patch was
> > overdone as the first step. So I reduced the patch to minimal
> > function. The new patch does the following,
> >
> > - Wrapping PGconn by PgFdwConn in order to handle multiple scans
> >   on one connection.
> >
> > - The core async logic was added in fetch_more_data().
> >
> 
> It might help if you can explain this logic in this mail as well as in code
> (as per my comment above).
> 
> 
> >
> > - Invoking remote commands asynchronously in ExecInitForeignScan.
> >
> 
> > - Canceling async invocation if any other foreign scans,
> >   modifies, deletes use the same connection.
> >
> 
> > Cancellation is done by immediately fetching the return of
> > already-invoked acync command.
> >
> 
> > * Where this patch will be effective.
> >
> > With upcoming inheritance-partition feature, this patch enables
> > stating and running foreign scans asynchronously. It will be more
> > effective for longer TAT or remote startup times, and larger
> > number of foreign servers. No negative performance effect on
> > other situations.
> >
> >
> AFAIU, this logic sends only the first query in asynchronous manner not all
> of them. Is that right? If yes, I think it's a sever limitation of the
> feature. For a query involving multiple foreign scans, only the first one
> will be done in async fashion and not the rest. Sorry, if my understanding
> is wrong.
> 
> I think, we need some data which shows the speed up by this patch. You may
> construct a case, where a single query involved multiple foreign scans, and
> we can check what is the speed up obtained against the number of foreign
> scans.
> 
> 
> 
> > * Concerns about this patch.
> >
> > - This breaks the assumption that scan starts at ExecForeignScan,
> >   not ExecInitForeignScan, which might cause some problem.
> >
> >
> This should be fine as long as it doesn't have any side effects like
> sending query during EXPLAIN (which has been taken care of in this patch.)
> Do you think, we need any special handling for PREPAREd statements?
> 
> 
> > - error reporting code in do_sql_command is quite ugly..
> >
> >
> >
> > regards,
> >
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: alter user set local_preload_libraries.
Next
From: Noah Misch
Date:
Subject: Re: bin checks taking too long.