Thread: Function scan push-down using SQL/MED syntax

Function scan push-down using SQL/MED syntax

From

Takahiro Itagaki

Date:

04 March 2010, 01:04:33

I'm working on SQL/MED foreign table for "Function scan push-down" item:
    http://wiki.postgresql.org/wiki/ClusterFeatures#Function_scan_push-down
The current research is described in:
    http://wiki.postgresql.org/wiki/SQL/MED
and the codes are in:
    http://repo.or.cz/w/pgsql-fdw.git

The present codes consist of two parts:
    1. Foreign table syntax, catalog, and connection manager in the core.
    2. contrib/postgresql_fdw extension module.

SQL/MED could take on some part of dblink and PL/Proxy. Also, it requires
the same issues with pgpool where to execute functions, local or remote.


The major part of the proposal is that table functions (SRF) should have
an ability to access ScanState during execution because ScanState has
filtering conditions. Foreign Data Wrappers can re-constract a SQL query
from the conditions and send the query to the external server.

The current design is very similar to "Executor node hook".
ExecXxxForeignScan() almost pass-through the processing to each FDW
routine. SQL-based FDWs can push-down the conditions to their foreign
servers. Also, plain data FDWs (including CSV-FDW) can give over
the filtering to the default executor.


Comments and suggestions are very welcome for the design and implementation.
There are also some known issues in the Open questions section.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Function scan push-down using SQL/MED syntax

From

Josh Berkus

Date:

04 March 2010, 14:09:02

On 3/3/10 8:52 PM, Takahiro Itagaki wrote:
> The current design is very similar to "Executor node hook".
> ExecXxxForeignScan() almost pass-through the processing to each FDW
> routine. SQL-based FDWs can push-down the conditions to their foreign
> servers. Also, plain data FDWs (including CSV-FDW) can give over
> the filtering to the default executor.

It would be really good to have access to the scan criteria as text, a
hash, or other generally manipulable data structures, as well, for
foreign data which is not SQL-based.

Finally, if you are making the scan more visible, a nice side effect of
this would be the ability to log all the scan criteria on a particular
table (whether its an FDW or not); it would help a lot with indexing.

--Josh Berkus

Re: Function scan push-down using SQL/MED syntax

From

Hitoshi Harada

Date:

05 March 2010, 11:51:30

2010/3/4 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
> I'm working on SQL/MED foreign table for "Function scan push-down" item:
>
>
> Comments and suggestions are very welcome for the design and implementation.
> There are also some known issues in the Open questions section.
>

I've not read any part of code yet but have comments:

* I'm not very happy with "Getting tuples from the foreign server"
section. Present tuplestore isn't quite efficient and putting all
tuples into TSS adds devastating overhead. In principle, storing
tuples doesn't match SQL exectuor model. So something like cursor is
needed here.

* In FDW routines section there seems only methods to read out data,
but I'd suggest there should be coverage of all CRUD operation.
Sometimes dropping foreign data is so specific that only FdwRoutines
know the way.


Regards,

--
Hitoshi Harada

Re: Function scan push-down using SQL/MED syntax

From

Takahiro Itagaki

Date:

07 March 2010, 20:32:09

Hitoshi Harada <umi.tanuki@gmail.com> wrote:

> I've not read any part of code yet but have comments:
>
> * I'm not very happy with "Getting tuples from the foreign server"
> section. Present tuplestore isn't quite efficient and putting all
> tuples into TSS adds devastating overhead. In principle, storing
> tuples doesn't match SQL exectuor model. So something like cursor is
> needed here.

Sure, but your optimization requires some extensions in libpq protocol.
We could send HeapTuple in a binary form if the remote and the local
server uses the same format, but the present libpq can return tuples
only as text or libpq-specific binary forms (that is not a HeapTuple).

> * In FDW routines section there seems only methods to read out data,
> but I'd suggest there should be coverage of all CRUD operation.
> Sometimes dropping foreign data is so specific that only FdwRoutines
> know the way.

There is only SELECT in the SQL standard, where CREATE FOREIGN TABLE
means a symbolic link for the external data. But we also could support
CREATE and DROP in the routines, and it means CREATE/DROP FOREIGN TABLE
will actually create or drop external data.

Also, we could add methods for INSERT, UPDATE and DELETE -- but it
requires additional consideration that which key we should use to
determine the modified tuples; PostgreSQL uses CTID as a physical key,
but some of FDW might have completely different keys.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Function scan push-down using SQL/MED syntax

From

Takahiro Itagaki

Date:

07 March 2010, 20:44:31

Josh Berkus <josh@agliodbs.com> wrote:

> It would be really good to have access to the scan criteria as text, a
> hash, or other generally manipulable data structures, as well, for
> foreign data which is not SQL-based.

We can provide such formatter functions as options. In fact, the present
postgresql_fdw uses deparse_context_for_plan() and deparse_expression()
exported from the core. However, I think the conversion should be
performed in each FDW because some of FDW don't necessary need the
criteria. For example, CSV-FDW will just return the all of the contents
as tuples, and the common executor routine will filter and modify them.

> Finally, if you are making the scan more visible, a nice side effect of
> this would be the ability to log all the scan criteria on a particular
> table (whether its an FDW or not); it would help a lot with indexing.

We might need to provide more useful deparse_xxx() functions for general
uses. For example, oracle_fdw also requires deparsing criteria to a SQL.
It would be almost the same as PostgreSQL, but also require to remap
incompatible functions and expressions in their preferred ways.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Function scan push-down using SQL/MED syntax

From

Takahiro Itagaki

Date:

07 March 2010, 21:01:36

Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote:

> > * I'm not very happy with "Getting tuples from the foreign server"
> > section. Present tuplestore isn't quite efficient and putting all
> > tuples into TSS adds devastating overhead. In principle, storing
> > tuples doesn't match SQL exectuor model. So something like cursor is
> > needed here.
>
> Sure, but your optimization requires some extensions in libpq protocol.
> We could send HeapTuple in a binary form if the remote and the local
> server uses the same format, but the present libpq can return tuples
> only as text or libpq-specific binary forms (that is not a HeapTuple).

In addition, I beleive the tuplestore is requried *for performance*
because per-tuple cursor fetching is very slow if we retrieve tuples from
remote servers. We should fetch tuples in some resonable-size of batches.

If we will optimize the part, we could remove PGresult-to-tuplestore
convertson here. But we also need to some codes to avoid memory leak
of PGresult on error because PGresult is allocaed with malloc, not palloc.
(That is the same bug in contrib/dblink fixed recently.)

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Function scan push-down using SQL/MED syntax

From

Hitoshi Harada

Date:

08 March 2010, 06:24:32

2010/3/8 Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp>:
>
> Takahiro Itagaki <itagaki.takahiro@oss.ntt.co.jp> wrote:
>
>> > * I'm not very happy with "Getting tuples from the foreign server"
>> > section. Present tuplestore isn't quite efficient and putting all
>> > tuples into TSS adds devastating overhead. In principle, storing
>> > tuples doesn't match SQL exectuor model. So something like cursor is
>> > needed here.
>>
>> Sure, but your optimization requires some extensions in libpq protocol.
>> We could send HeapTuple in a binary form if the remote and the local
>> server uses the same format, but the present libpq can return tuples
>> only as text or libpq-specific binary forms (that is not a HeapTuple).
>
> In addition, I beleive the tuplestore is requried *for performance*
> because per-tuple cursor fetching is very slow if we retrieve tuples from
> remote servers. We should fetch tuples in some resonable-size of batches.
>
> If we will optimize the part, we could remove PGresult-to-tuplestore
> convertson here. But we also need to some codes to avoid memory leak
> of PGresult on error because PGresult is allocaed with malloc, not palloc.
> (That is the same bug in contrib/dblink fixed recently.)
>

So, as the first step we implement it by tuplestore with the present
libpq, but for further improvement we need to refactor or to extend
our libpq to buffer some sized tuples. Or invent another
more-data-fetching-oriented protocol like existing copy?

Regards,



--
Hitoshi Harada

Re: Function scan push-down using SQL/MED syntax

From

Takahiro Itagaki

Date:

09 March 2010, 04:32:22

Hitoshi Harada <umi.tanuki@gmail.com> wrote:

> So, as the first step we implement it by tuplestore with the present
> libpq, but for further improvement we need to refactor or to extend
> our libpq to buffer some sized tuples. Or invent another
> more-data-fetching-oriented protocol like existing copy?

Before starting such optimization, we need some research for which
part is the performance bottleneck when sending large amount of data.

IMHO, since network is typically slower than in-memory data copy,
data compression and burst-transport would be more effective
even if they consume local CPU resources to convert the data format.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Re: Function scan push-down using SQL/MED syntax

From

Tatsuo Ishii

Date:

09 March 2010, 09:54:10

> IMHO, since network is typically slower than in-memory data copy,
> data compression and burst-transport would be more effective
> even if they consume local CPU resources to convert the data format.

Not sure. Once I tested the performance of rsync using two computers
connected with a GbE. With --compress option, rsync was slower than
without the option. Of course this may vary in different CPU
speed/network speed combo though.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp