Proposal: Create index on foreign table - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Proposal: Create index on foreign table
Date
Msg-id 4F62FD69.2060007@lab.ntt.co.jp
Whole thread Raw
Responses Re: Proposal: Create index on foreign table  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Re: Proposal: Create index on foreign table  (Shigeru Hanada <shigeru.hanada@gmail.com>)
Re: Proposal: Create index on foreign table  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I have a plan to support 'Create index on foreign table' for 9.3.  Here
is my plan.

The index creation is supported for a flat file such as CSV and a remote
table on a RDB e.g., Postgres using CREATE INDEX.  (I thought using a
new statement, CREATE FOREIGN INDEX, at first, but I think that CREATE
INDEX would be sufficient to define an index for the foreign table.)
For a flat file, CREATE INDEX constructs an index in the same way as an
index for a regular table.  On the other hand, for a remote table,
CREATE INDEX collects information about the index on the specified
column(s) for the specified table that was created on the remote table.An index created is stored in pg_class and
pg_indexlike an index for a
 
regular table.  It depends on the wrappers implementation whether it
supports the options such as UNIQUE or WHERE predicates, though I think
that CONCURRENTLY is not supported in common for the foreign tables.
For a flat file, I plan that the user can specify all the options
excluding CONCURRENTLY and UNIQUE.  On the other hand, for a remote
table, I think that the user can specify only the names of the foreign
table and its column(s), using which the wrapper collects information
about all the related indexes created on the remote table.  To do so,
I'd like to propose new FDW callback routines:

CreateIndex(): This is called maybe from DefineIndex(), and does the
similar task to index_create().  For a flat file, this function makes
the catalog entries for the index and actually build the index, while
for a remote table, it just stores the catalog entries collected from
the remote end.
DropIndex(): This is called at DROP INDEX, and does the similar task to
index_drop().

I'd like to build the index physical data file for a flat file using the
index access method of regular tables (ie btree, hash, gin, gist, and
spgist) based on the following transformation between the TID and the
file offset to some data in the file:
   block_number  = file_offset_to_some_data / BLCKSZ   offset_number = file_offset_to_some_data % BLCKSZ

I plan to make use of the above index for better query optimization.
For a flat file, I'd like to realize index scans, index-only scans,
bitmap (like) scans and parametrized scans on the file in the same way
as those on a regular table utilizing the currently revised FDW
infrastructure.  For a remote table, I have to admit that I don't have
any clear idea to make use of the index information stored in the system
catalogs for better query optimization, but I believe that it's useful
for the ORDER BY push down and/or nestloop-with-inner-parametrized-scan
join optimization.

Thoughts?

Best regards,
Etsuro Fujita


pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: Command Triggers, v16
Next
From: Andres Freund
Date:
Subject: Re: Command Triggers, v16