WIP Patch: Use sortedness of CSV foreign tables for query planning - Mailing list pgsql-hackers

From Etsuro Fujita
Subject WIP Patch: Use sortedness of CSV foreign tables for query planning
Date
Msg-id 002501cd709e$29e7e760$7db7b620$@lab.ntt.co.jp
Whole thread Raw
Responses Re: WIP Patch: Use sortedness of CSV foreign tables for query planning
Re: WIP Patch: Use sortedness of CSV foreign tables for query planning
List pgsql-hackers
The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c:
   /*    * If data file was sorted, and we knew it somehow, we could insert    * appropriate pathkeys into the
ForeignPathnode to tell the planner    * that.    */
 

To do this, I would like to propose new generic options for a file_fdw foreign
table to specify the sortedness of a data file.  While it is best to allow to
specify the sortedness on multiple columns, the current interface for the
generic options dose not seems to be suitable for doing it.  As a compromise, I
would like to propose single-column sortedness options and insert appropriate
pathkeys into the ForeignPath node based on these information:
sorted: Boolean option to specify whether data file is sorted by a column.key: Specifies the name of a column by which
datafile is sorted.  Required
 
when the above option is set to true.direction: Specifies the sort order: asc or desc.  The default is asc.nulls:
Specifiesthat nulls sort before or after non-nulls: first or last.
 
first is the default when direction option is set desc.  When direction option
is not set desc, last is the default.

Attached is a WIP patch implementing this feature.  I would like to demonstrate
the usefulness of the patch.  Experimental results are shown below.  Here, data
in /home/pgsql/relation.csv is sorted by aid in ascending order.

postgres=# CREATE EXTENSION file_fdw;
CREATE EXTENSION
postgres=# CREATE SERVER fs FOREIGN DATA WRAPPER file_fdw;
CREATE SERVER
postgres=# CREATE FOREIGN TABLE ft (aid INTEGER, bid INTEGER) SERVER fs OPTIONS
(filename '/home/pgsql/relation.csv', format 'csv', delimiter ',');
CREATE FOREIGN TABLE
postgres=# ANALYZE ft;
ANALYZE
postgres=# SELECT count(*) FROM ft; count
----------50000000
(1 row)

postgres=# SELECT count(DISTINCT aid) FROM ft; count
---------1000000
(1 row)

postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
       QUERY PLAN
 

--------------------------------------------------------------------------------
----------------------------------------------------Unique  (cost=14187375.19..14437375.19 rows=1092929 width=4)
(actualtime=48952
 
.602..62788.934 rows=1000000 loops=1)  ->  Sort  (cost=14187375.19..14312375.19 rows=50000000 width=4) (actual time=
48952.601..56635.448 rows=50000000 loops=1)        Sort Key: aid        Sort Method: external sort  Disk: 684272kB
 ->  Foreign Scan on ft  (cost=0.00..5059137.00 rows=50000000 width=4) (
 
actual time=0.073..18324.062 rows=50000000 loops=1)              Foreign File: /home/pgsql/relation.csv
ForeignFile Size: 484444500Total runtime: 63019.868 ms
 
(8 rows)

postgres=# ALTER FOREIGN TABLE ft OPTIONS ( ADD  sorted 'true', key 'aid' );
ALTER FOREIGN TABLE
postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid;
    QUERY PLAN
 

--------------------------------------------------------------------------------
----------------------------------------------Unique  (cost=0.00..5184137.00 rows=1092929 width=4) (actual
time=0.074..23124.
195 rows=1000000 loops=1)  ->  Foreign Scan on ft  (cost=0.00..5059137.00 rows=50000000 width=4)
(actualtime=0.070..17633.821rows=50000000 loops=1)        Foreign File: /home/pgsql/relation.csv        Foreign File
Size:484444500Total runtime: 23213.909 ms
 
(5 rows)

Any comments and suggestions are welcomed.

Thanks,

Best regards,
Etsuro Fujita

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [patch] libpq one-row-at-a-time API
Next
From: Amit Kapila
Date:
Subject: Re: Help me develop new commit_delay advice