Re: [POC] FETCH limited by bytes. - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: [POC] FETCH limited by bytes.
Date
Msg-id CADkLM=fwPC+r8p_LAe12OkgsUbyAQtq77SZ2vEv7LLMQ_mcccg@mail.gmail.com
Whole thread Raw
In response to Re: [POC] FETCH limited by bytes.  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [POC] FETCH limited by bytes.  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On Fri, Sep 4, 2015 at 2:45 AM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
Hello,

> > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > int eflags)
...
> > > +     def = get_option(table->options, "fetch_size");
>
> > I don't think it's a good idea to make such checks at runtime - and
> > either way it's somethign that should be reported back using an
> > ereport(), not an elog.
>
> > Also, it seems somewhat wrong to determine this at execution
> > time. Shouldn't this rather be done when creating the foreign scan node?
> > And be a part of the scan state?
>
> I agree, that was my original plan, but I wasn't familiar enough with the
> FDW architecture to know where the table struct and the scan struct were
> both exposed in the same function.
>
> What I submitted incorporated some of Kyotaro's feedback (and working
> patch) to my original incomplete patch.

Sorry, it certainly shouldn't be a good place to do such thing. I
easily selected the place in order to avoid adding new similar
member in multiple existing structs (PgFdwRelationInfo and
PgFdwScanState).

Having a new member fetch_size is added in PgFdwRelationInfo and
PgFdwScanState, I think postgresGetForeignRelSize is the best
place to do that, from the point that it collects basic
information needed to calculate scan costs.

# Fetch sizes of foreign join would be the future issue..

> typedef struct PgFdwRelationInfo
> {
  ...
+ int    fetch_size;   /* fetch size for this remote table */

====================
> postgreGetForeignRelSize()
> {
  ...
>     fpinfo->table = GetForeignTable(foreigntableid);
>     fpinfo->server = GetForeignServer(fpinfo->table->serverid);
>
+     def = get_option(table->options, "fetch_size");
+     ..
+     fpinfo->fetch_size = strtod(defGetString...

Also it is doable in postgresGetForeignPlan and doing there
removes redundant copy of fetch_size from PgFdwRelation to
PgFdwScanState but theoretical basis would be weak.

regards,

> > On 2015-02-27 13:50:22 -0500, Corey Huinker wrote:
> > > +static DefElem*
> > > +get_option(List *options, char *optname)
> > > +{
> > > +     ListCell *lc;
> > > +
> > > +     foreach(lc, options)
> > > +     {
> > > +             DefElem *def = (DefElem *) lfirst(lc);
> > > +
> > > +             if (strcmp(def->defname, optname) == 0)
> > > +                     return def;
> > > +     }
> > > +     return NULL;
> > > +}
> >
> >
> > >       /*
> > >        * Do nothing in EXPLAIN (no ANALYZE) case.  node->fdw_state stays
> > NULL.
> > > @@ -915,6 +933,23 @@ postgresBeginForeignScan(ForeignScanState *node,
> > int eflags)
> > >       server = GetForeignServer(table->serverid);
> > >       user = GetUserMapping(userid, server->serverid);
> > >
> > > +     /* Reading table options */
> > > +     fsstate->fetch_size = -1;
> > > +
> > > +     def = get_option(table->options, "fetch_size");
> > > +     if (!def)
> > > +             def = get_option(server->options, "fetch_size");
> > > +
> > > +     if (def)
> > > +     {
> > > +             fsstate->fetch_size = strtod(defGetString(def), NULL);
> > > +             if (fsstate->fetch_size < 0)
> > > +                     elog(ERROR, "invalid fetch size for foreign table
> > \"%s\"",
> > > +                              get_rel_name(table->relid));
> > > +     }
> > > +     else
> > > +             fsstate->fetch_size = 100;
> >
> > I don't think it's a good idea to make such checks at runtime - and
> > either way it's somethign that should be reported back using an
> > ereport(), not an elog.
>
> > Also, it seems somewhat wrong to determine this at execution
> > time. Shouldn't this rather be done when creating the foreign scan node?
> > And be a part of the scan state?
>
> I agree, that was my original plan, but I wasn't familiar enough with the
> FDW architecture to know where the table struct and the scan struct were
> both exposed in the same function.
>
> What I submitted incorporated some of Kyotaro's feedback (and working
> patch) to my original incomplete patch.
>
> > Have you thought about how this option should cooperate with join
> > pushdown once implemented?
> >
>
> I hadn't until now, but I think the only sensible thing would be to
> disregard table-specific settings once a second foreign table is detected,
> and instead consider only the server-level setting.
>
> I suppose one could argue that if ALL the tables in the join had the same
> table-level setting, we should go with that, but I think that would be
> complicated, expensive, and generally a good argument for changing the
> server setting instead.

--
Kyotaro Horiguchi
NTT Open Source Software Center


Ok, with some guidance from RhodiumToad (thanks!) I was able to get the proper RelOptInfo->Plan->Scan handoff.

What I don't know how to do is show that the proper fetch sizes are being used on the remote server with the resources available in the regression test. Suggestions welcome.

This patch works for my original added test-cases, and works for me connecting to a redshift cluster that we have, the queries show up in the console like this:
     FETCH 101 FROM c1
     FETCH 30 FROM c1
     FETCH 50 FROM c1

The (redacted) source of that test is as follows:


begin;
create extension if not exists postgres_fdw;

create server redshift foreign data wrapper postgres_fdw
options (host 'REDACTED', port '5439', dbname 'REDACTED', fetch_size '101');

select * from pg_foreign_server;

create user mapping for public server redshift
options ( user 'REDACTED', password 'REDACTED');

select * from pg_user_mappings;

create foreign table test_table ( date date, tval text )
server redshift
options (table_name 'REDACTED');

select count(*) from ( select tval from test_table where date = 'REDACTED' ) x;

alter server redshift options ( set fetch_size '30' );

select count(*) from ( select tval from test_table where date = 'REDACTED' ) x;

alter foreign table test_table options ( fetch_size '50' );

select count(*) from ( select tval from test_table where date = 'REDACTED' ) x;

rollback;


Attached is the patch / diff against current master.
Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: row_security GUC, BYPASSRLS
Next
From: Noah Misch
Date:
Subject: Re: row_security GUC, BYPASSRLS