On Wed, Mar 21, 2012 at 4:47 AM, Etsuro Fujita
<fujita.etsuro@lab.ntt.co.jp> wrote:
> I did an investigation on DB2 a little bit. DB2 uses the CREATE INDEX
> SPECIFICATION ONLY statement to define the properties of a remote index.
>
> CREATE INDEX index_name ON foreintable_name
> (column_name) SPECIFICATION ONLY
>
> How about introducing this kind of option?; Using the CREATE INDEX statement
> with the SPECIFICATION ONLY option, a user can just define the properties of
> a remote index. On the other hand, using the statement without this option,
> he or she can specify more options like the USING option and really create
> an index, which requires that the FDW's AMs correspond to Postgres's AMs, as
> pointed out by you. If the real index of an external data is considered as
> just a complementary data for efficient query processing like stats to be
> collected for the external data by the ANALYZE statement, it doen't seem so
> weird to use the DDL for the external data, create the real index for it,
> and store the index data inside Postgres.
I still don't think it's a good idea to introduce the concept of a
PostgreSQL index that indexes data not stored in the database. There
is some pretty serious impedance mismatch there. PostgreSQL indexes
are intended to store CTIDs; you might be able to hack things for
file_fdw to make a byte offset look like a CTID, but in general I
don't think you can count on making that work. There's no guarantee
that a particular FDW provides unique identifiers for every data
element that fit in six bytes and allow for fast retrieval. In fact,
beyond flat files, I suspect that's more the exception than the norm.
I agree with you that our bulk loading isn't fast enough (or
space-efficient enough) but I don't think the right solution is to
contort our index code, which is not designed to do this and probably
won't handle it very gracefully.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company