Re: Indices and Foreign Tables - Mailing list pgsql-general

From Kohler Manuel (ID SIS)
Subject Re: Indices and Foreign Tables
Date
Msg-id 2C3C059B-73D0-492A-AAF5-8E4F00052F9E@bsse.ethz.ch
Whole thread Raw
In response to Re: Indices and Foreign Tables  (Albe Laurenz <laurenz.albe@wien.gv.at>)
Responses Re: Indices and Foreign Tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Indices and Foreign Tables  (Torsten Förtsch <torsten.foertsch@gmx.net>)
List pgsql-general
Hi,
here are the Query plans. The first plan is on the source database directly. So no fdw involved:

source_db=# EXPLAIN ANALYZE select ds.code, count(*), sum(dsf.size_in_bytes) as
"raw_size",pg_size_pretty(sum(dsf.size_in_bytes))as "size" from data_set_files dsf, data_sets ds where dsf.parent_id is
nulland dsf.dase_id=ds.id group by ds.code order by raw_size desc; 
                                                                                    QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=341248.80..341281.56 rows=13103 width=34) (actual time=695.519..724.286 rows=13839 loops=1)
   Sort Key: (sum(dsf.size_in_bytes))
   Sort Method: quicksort  Memory: 2283kB
   ->  HashAggregate  (cost=340188.93..340352.71 rows=13103 width=34) (actual time=536.229..615.115 rows=13839 loops=1)
         ->  Nested Loop  (cost=0.56..340057.90 rows=13103 width=34) (actual time=0.104..456.139 rows=13839 loops=1)
               ->  Seq Scan on data_sets ds  (cost=0.00..385.39 rows=13839 width=34) (actual time=0.022..40.113
rows=13839loops=1) 
               ->  Index Scan using data_set_files_dase_id_parent_id_idx on data_set_files dsf  (cost=0.56..24.43
rows=11width=16) (actual time=0.015..0.019 rows=1 loops=13839) 
                     Index Cond: (((dase_id)::bigint = ds.id) AND (parent_id IS NULL))
 Total runtime: 752.695 ms
(9 rows)


Here is the same query with fdw:

db=# EXPLAIN VERBOSE select ds.code, count(*), sum(dsf.size_in_bytes) as
"raw_size",pg_size_pretty(sum(dsf.size_in_bytes))as "size" from data_set_files_fdw dsf, data_sets_fdw ds where
dsf.parent_idis null and dsf.dase_id=ds.id group by ds.code order by raw_size desc; 
                                                  QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Sort  (cost=327.81..327.97 rows=64 width=40)
   Output: ds.code, (count(*)), (sum(dsf.size_in_bytes)), (pg_size_pretty(sum(dsf.size_in_bytes)))
   Sort Key: (sum(dsf.size_in_bytes))
   ->  HashAggregate  (cost=325.09..325.89 rows=64 width=40)
         Output: ds.code, count(*), sum(dsf.size_in_bytes), pg_size_pretty(sum(dsf.size_in_bytes))
         ->  Hash Join  (cost=270.61..324.45 rows=64 width=40)
               Output: dsf.size_in_bytes, ds.code
               Hash Cond: (ds.id = (dsf.dase_id)::bigint)
               ->  Foreign Scan on public.data_sets_fdw ds  (cost=100.00..148.40 rows=1280 width=40)
                     Output: ds.id, ds.code, ds.location
                     Remote SQL: SELECT id, code FROM public.data_sets
               ->  Hash  (cost=170.48..170.48 rows=10 width=16)
                     Output: dsf.size_in_bytes, dsf.dase_id
                     ->  Foreign Scan on public.data_set_files_fdw dsf  (cost=100.00..170.48 rows=10 width=16)
                           Output: dsf.size_in_bytes, dsf.dase_id
                           Filter: (dsf.parent_id IS NULL)
                           Remote SQL: SELECT dase_id, parent_id, size_in_bytes FROM public.data_set_files
(17 rows)

And also with ANALYZE:

db=# EXPLAIN ANALYZE select ds.code, count(*), sum(dsf.size_in_bytes) as
"raw_size",pg_size_pretty(sum(dsf.size_in_bytes))as "size" from data_set_files_fdw dsf, data_sets_fdw ds where
dsf.parent_idis null and dsf.dase_id=ds.id group by ds.code order by raw_size desc; 
                                                                          QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=327.81..327.97 rows=64 width=40) (actual time=219401.864..219414.641 rows=13839 loops=1)
   Sort Key: (sum(dsf.size_in_bytes))
   Sort Method: quicksort  Memory: 2283kB
   ->  HashAggregate  (cost=325.09..325.89 rows=64 width=40) (actual time=219327.664..219363.709 rows=13839 loops=1)
         ->  Hash Join  (cost=270.61..324.45 rows=64 width=40) (actual time=219127.848..219277.308 rows=13839 loops=1)
               Hash Cond: (ds.id = (dsf.dase_id)::bigint)
               ->  Foreign Scan on data_sets_fdw ds  (cost=100.00..148.40 rows=1280 width=40) (actual
time=1.057..77.415rows=13839 loops=1) 
               ->  Hash  (cost=170.48..170.48 rows=10 width=16) (actual time=219126.713..219126.713 rows=13839 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 649kB
                     ->  Foreign Scan on data_set_files_fdw dsf  (cost=100.00..170.48 rows=10 width=16) (actual
time=1082.614..219083.326rows=13839 loops=1) 
                           Filter: (parent_id IS NULL)
                           Rows Removed by Filter: 35726596
 Total runtime: 219438.925 ms
(13 rows)

Regards
Manuel


--
Kohler Manuel (ID SIS)
Research Informatics, Scientific IT Services (ID ETHZ)
Quantitative Genomics Facility (QGF), D-BSSE
ETH Zurich, Mattenstr. 26 (1078 1.02), CH-4058 Basel, +41 61 387 3132

On 04 Apr 2014, at 17:01, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:

> Kohler Manuel wrote:
>> I have a question regarding the use of indices when querying foreign data tables (using postgres_fdw
>> of 9.3 to another postgres DB).
>> Everything works fine beside the fact that the indices which are defined in the foreign DB are not
>> used at all when I do query it through the foreign data wrapper. This leads to an extremely slow query
>> performance.
>>
>> Here is what I did:
>>
>> CREATE EXTENSION postgres_fdw;
>>
>> CREATE SERVER app_db
>>  FOREIGN DATA WRAPPER postgres_fdw
>>  OPTIONS (dbname ‘<DBNAME>', host 'localhost');
>>
>> CREATE USER MAPPING for openbis
>>  SERVER app_db
>>  OPTIONS (user ‘<USRE>', password ‘<PW>');
>>
>> CREATE FOREIGN TABLE data_sets_fdw
>>  (
>>    id bigint,
>>    code code,
>>    location file_path
>>  )
>>  SERVER app_db OPTIONS (table_name 'data_sets’);
>>
>> The data_sets_fdw corresponds to the table in the other DB which is defined as this:
>>
>> db=> \d data_sets
>>                           Table "public.data_sets"
>>  Column  |   Type    |                       Modifiers
>> ----------+-----------+--------------------------------------------------------
>> id       | bigint    | not null default nextval('data_sets_id_seq'::regclass)
>> code     | code      | not null
>> location | file_path | not null
>> Indexes:
>>    "data_sets_pkey" PRIMARY KEY, btree (id)
>>    "data_sets_code_key" UNIQUE CONSTRAINT, btree (code)
>>    "data_sets_code_idx" btree (code)
>>
>> When I  realised that the queries are so slow I ran an EXPLAIN ANALYZE which shows that the indices
>> are not used. I can also post the query plan if it helps.
>>
>> So am I doing something wrong here, or is this not possible that the indices are used by the fdw?
>
> The indices should be used.
>
> Did you ANALYZE the remote table on the remote database?
> What is the remote query (EXPLAIN VERBOSE)?
> What do you get for EXPLAIN ANALYZE of the remote query when executed on the remote database?
>
> Yours,
> Laurenz Albe



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Why is pg_restore trying to create tables in pg_catalog?
Next
From: Tom Lane
Date:
Subject: Re: Indices and Foreign Tables