Home > mailing lists

Re: Async execution of postgres_fdw. - Mailing list pgsql-hackers

From	Kyotaro HORIGUCHI
Subject	Re: Async execution of postgres_fdw.
Date	January 19, 2015 06:25:23
Msg-id	20150119.152416.57998318.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw
In response to	Re: Async execution of postgres_fdw. (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
List	pgsql-hackers

Tree view

Hello, that's a silly mistake. fetch_seize = 10000 in the v4
patch. This v5 patch is fixed at the point.

> But the v4 patch mysteriously accelerates this query, 6.5 seconds.
> 
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> >    FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
...
> >  Execution time: 6512.043 ms

fetch_size was 10000 at this run. I got about 13.0 seconds for
fetch_size = 100, which is about 19% faster than the original.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

=======
15 17:18:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp>
> I revised the patch so that async scan will be done more
> aggressively, and took execution time for two very simple cases.
> 
> As the result, simple seq scan gained 5% and hash join of two
> foreign tables gained 150%. (2.4 times faster).
> 
> While measuring the performance, I noticed that each scan in a
> query runs at once rather than alternating with each other in
> many cases such as hash join or sorted joins and so. So I
> modified the patch so that async fetch is done more
> aggressively. The new v4 patch is attached. The following numbers
> are taken based on it.
> 
> ========
> Simple seq scan for the first test.
> 
> > CREATE TABLE lt1 (a int, b timestamp, c text);
> > CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost');
> > CREATE USER MAPPING FOR PUBLIC SERVER sv1;
> > CREATE FOREIGN TABLE ft1 () SERVER sv1 OPTIONS (table_name 'lt1');
> > INSERT INTO lt1 (SELECT a, now(), repeat('x', 128) FROM generate_series(0, 999999) a);
> 
> On this case, I took the the 10 times average of exec time of the
> following query for both master head and patched version.  The
> fetch size is 100.
> 
> > postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
> >                          QUERY PLAN                  
> > ------------------------------------------------------------------
> >  Foreign Scan on ft1  (actual time=0.79 5..4175.706 rows=1000000 loops=1)
> >  Planning time: 0.060 ms
> >  Execution time: 4276.043 ms
> 
>   master head  : avg = 4256.621,  std dev = 17.099
>   patched pgfdw: avg = 4036.463,  std dev =  2.608
> 
> The patched version is faster by about 5%. This should be pure
> result of asynchronous fetching, not including the effect of
> early starting of remote execution in ExecInit.
> 
> Interestingly, as fetch_count gets larger, the gain raises in
> spite of the decrease of the number of query sending.
> 
>   master head  : avg = 2622.759,  std dev = 38.379
>   patched pgfdw: avg = 2277.622,  std dev = 27.269
> 
> About 15% gain. And for 10000,
> 
>   master head  : avg = 2000.980,  std dev =  6.434
>   patched pgfdw: avg = 1616.793,  std dev = 13.192
> 
> 19%.. It is natural that exec time reduces along with increase of
> fetch size, but I haven't found the reason why the patch's gain
> also increases.
> 
> ======================
> 
> The second case is a simple join of two foreign tables sharing
> one connection.
> 
> The master head runs this query in about 16 seconds with almost
> no fluctuation among multiple tries.
> 
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> >    FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> >                                       QUERY PLAN
> > ----------------------------------------------------------------------------
> >  Hash Join (actual time=7541.831..15924.631 rows=1000000 loops=1)
> >    Hash Cond: (x.a = y.a)
> >   ->  Foreign Scan on ft1 x (actual time=1.176..6553.480 rows=1000000 loops=1)
> >   ->  Hash (actual time=7539.761..7539.761 rows=1000000 loops=1)
> >        Buckets: 32768  Batches: 64  Memory Usage: 2829kB
> >    ->  Foreign Scan on ft1 y (actual time=1.067..6529.165 rows=1000000 loops=1)
> >  Planning time: 0.223 ms
> >  Execution time: 15973.916 ms
> 
> But the v4 patch mysteriously accelerates this query, 6.5 seconds.
> 
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> >    FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> >                                    QUERY PLAN
> > ----------------------------------------------------------------------------
> >  Hash Join (actual time=2556.977..5812.937 rows=1000000 loops=1)
> >    Hash Cond: (x.a = y.a)
> >   ->  Foreign Scan on ft1 x (actual time=32.689..1936.565 rows=1000000 loops=1)
> >   ->  Hash (actual time=2523.810..2523.810 rows=1000000 loops=1)
> >        Buckets: 32768  Batches: 64  Memory Usage: 2829kB
> >    ->  Foreign Scan on ft1 y (actual time=50.345..1928.411 rows=1000000 loops=1)
> >  Planning time: 0.220 ms
> >  Execution time: 6512.043 ms
> 
> The result data seems not broken. I don't know the reason yet but
> I'll investigate it.

pgsql-hackers by date:

From: Amit Kapila
Date: 19 January 2015, 06:08:39
Subject: Re: TABLESAMPLE patch

From: Abhijit Menon-Sen
Date: 19 January 2015, 06:54:34
Subject: Re: pgaudit - an auditing extension for PostgreSQL

Re: Async execution of postgres_fdw. - Mailing list pgsql-hackers

Previous

Next