WIP Patch: Use sortedness of CSV foreign tables for query planning - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | WIP Patch: Use sortedness of CSV foreign tables for query planning |
Date | |
Msg-id | 002501cd709e$29e7e760$7db7b620$@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: WIP Patch: Use sortedness of CSV foreign tables
for query planning
Re: WIP Patch: Use sortedness of CSV foreign tables for query planning |
List | pgsql-hackers |
The following is a comment at fileGetForeignPaths() in contrib/file_fdw.c: /* * If data file was sorted, and we knew it somehow, we could insert * appropriate pathkeys into the ForeignPathnode to tell the planner * that. */ To do this, I would like to propose new generic options for a file_fdw foreign table to specify the sortedness of a data file. While it is best to allow to specify the sortedness on multiple columns, the current interface for the generic options dose not seems to be suitable for doing it. As a compromise, I would like to propose single-column sortedness options and insert appropriate pathkeys into the ForeignPath node based on these information: sorted: Boolean option to specify whether data file is sorted by a column.key: Specifies the name of a column by which datafile is sorted. Required when the above option is set to true.direction: Specifies the sort order: asc or desc. The default is asc.nulls: Specifiesthat nulls sort before or after non-nulls: first or last. first is the default when direction option is set desc. When direction option is not set desc, last is the default. Attached is a WIP patch implementing this feature. I would like to demonstrate the usefulness of the patch. Experimental results are shown below. Here, data in /home/pgsql/relation.csv is sorted by aid in ascending order. postgres=# CREATE EXTENSION file_fdw; CREATE EXTENSION postgres=# CREATE SERVER fs FOREIGN DATA WRAPPER file_fdw; CREATE SERVER postgres=# CREATE FOREIGN TABLE ft (aid INTEGER, bid INTEGER) SERVER fs OPTIONS (filename '/home/pgsql/relation.csv', format 'csv', delimiter ','); CREATE FOREIGN TABLE postgres=# ANALYZE ft; ANALYZE postgres=# SELECT count(*) FROM ft; count ----------50000000 (1 row) postgres=# SELECT count(DISTINCT aid) FROM ft; count ---------1000000 (1 row) postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid; QUERY PLAN -------------------------------------------------------------------------------- ----------------------------------------------------Unique (cost=14187375.19..14437375.19 rows=1092929 width=4) (actualtime=48952 .602..62788.934 rows=1000000 loops=1) -> Sort (cost=14187375.19..14312375.19 rows=50000000 width=4) (actual time= 48952.601..56635.448 rows=50000000 loops=1) Sort Key: aid Sort Method: external sort Disk: 684272kB -> Foreign Scan on ft (cost=0.00..5059137.00 rows=50000000 width=4) ( actual time=0.073..18324.062 rows=50000000 loops=1) Foreign File: /home/pgsql/relation.csv ForeignFile Size: 484444500Total runtime: 63019.868 ms (8 rows) postgres=# ALTER FOREIGN TABLE ft OPTIONS ( ADD sorted 'true', key 'aid' ); ALTER FOREIGN TABLE postgres=# EXPLAIN ANALYZE SELECT DISTINCT aid FROM ft ORDER BY aid; QUERY PLAN -------------------------------------------------------------------------------- ----------------------------------------------Unique (cost=0.00..5184137.00 rows=1092929 width=4) (actual time=0.074..23124. 195 rows=1000000 loops=1) -> Foreign Scan on ft (cost=0.00..5059137.00 rows=50000000 width=4) (actualtime=0.070..17633.821rows=50000000 loops=1) Foreign File: /home/pgsql/relation.csv Foreign File Size:484444500Total runtime: 23213.909 ms (5 rows) Any comments and suggestions are welcomed. Thanks, Best regards, Etsuro Fujita
pgsql-hackers by date: