Home > mailing lists

Re: Proposal: Create index on foreign table - Mailing list pgsql-hackers

From	Etsuro Fujita
Subject	Re: Proposal: Create index on foreign table
Date	March 21, 2012 05:45:16
Msg-id	4F6995AF.3070709@lab.ntt.co.jp Whole thread Raw
In response to	Re: Proposal: Create index on foreign table (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Proposal: Create index on foreign table
List	pgsql-hackers

Tree view

(2012/03/21 4:39), Robert Haas wrote:
> On Fri, Mar 16, 2012 at 4:44 AM, Etsuro Fujita
> <fujita.etsuro@lab.ntt.co.jp>  wrote:

>> For a flat file, CREATE INDEX constructs an index in the same way as an
>> index for a regular table.
>
> It seems really weird to have the data half inside the database and
> half outside of it like this.  What is the use case for this feature?
> I would have thought that something like file_fdw would be primarily
> targeted toward use cases where you intend to read the data file only
> a few times, or you always intend to sequential scan it.  If you need
> to index it, why not store the data in a regular table?  That's what
> they are for.  What you're proposing sounds to me like a lot of work
> for no real gain.

This feature is planned to used to save time and space for loading file 
data into Postgres.  As you know, loading data is time-consuming.  In 
addition, it requires twice the disk space in the case where source 
files have to be stored against the time of need.  I think this feature 
is especially useful for data warehouse environments.

>> On the other hand, for a remote table,
>> CREATE INDEX collects information about the index on the specified
>> column(s) for the specified table that was created on the remote table.
>
> I can possibly see the point of this.  Having local information about
> which remote indexes are available seems like it could be useful.  As
> Heikki says, you could cache it on a per-session basis, but that might
> not be very efficient.
>
> I also think that it would not be a very good idea to have CREATE
> INDEX on a foreign table sometimes really create an index and other
> times just define the properties of a remote index.  If we're going to
> have both features at all, I think they should use different syntax.
> I suggest that CREATE INDEX or CREATE FOREIGN INDEX is the right way
> to provide the query planner with information about remote-side
> indexes; and that if we even want to have indexes on flat files, the
> interface to those should be exported via functions bundled in the
> file_fdw extension rather than DDL.  There's no reason to suppose that
> the indexes the FDW supports correspond to PostgreSQL's AMs, so tying
> it into that framework doesn't seem wise.

I did an investigation on DB2 a little bit.  DB2 uses the CREATE INDEX 
SPECIFICATION ONLY statement to define the properties of a remote index.
    CREATE INDEX index_name ON foreintable_name    (column_name) SPECIFICATION ONLY

How about introducing this kind of option?; Using the CREATE INDEX 
statement with the SPECIFICATION ONLY option, a user can just define the 
properties of a remote index.  On the other hand, using the statement 
without this option, he or she can specify more options like the USING 
option and really create an index, which requires that the FDW's AMs 
correspond to Postgres's AMs, as pointed out by you.  If the real index 
of an external data is considered as just a complementary data for 
efficient query processing like stats to be collected for the external 
data by the ANALYZE statement, it doen't seem so weird to use the DDL 
for the external data, create the real index for it, and store the index 
data inside Postgres.

Best regards,
Etsuro Fujita

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 21 March 2012, 04:12:00
Subject: Re: Chronic performance issue with Replication Failover and FSM.

From: Kohei KaiGai
Date: 21 March 2012, 07:07:43
Subject: Re: [v9.2] Add GUC sepgsql.client_label

Re: Proposal: Create index on foreign table - Mailing list pgsql-hackers

Previous

Next