Thread: Full JSONb column returned over FDW when only single value needed

Full JSONb column returned over FDW when only single value needed

From
Ed Kurowski
Date:
Query:

SELECT user_id, DATA->>'query' AS query FROM aggregates WHERE guid LIKE
'search-%' AND time >= '2016-01-01' AND time < '2017-01-01' AND
organization_id = 23;


Explain (analyze, verbose):

Foreign Scan on public.aggregates  (cost=1184.50..29250.63 rows=23379
width=816) (actual time=17.497..5657.981 rows=18944 loops=1)

   Output: user_id, (data ->> 'query'::text)

   Remote SQL: SELECT user_id, data FROM public.aggregates WHERE ((guid ~~
'search-%'::text)) AND (("time" >= '2016-01-01 00:00:00'::timestamp without
time zone)) AND (("time" < '2017-01-01 00:00:00'::timestamp without time
zone)) AND ((organization_id = 23))


This runs very quickly directly on the remote postgres server (indexes have
been set appropriately on the foreign server), but is slow when running
over the fdw.  I believe I have tracked this down to the fact that it is
returning the entire data column (which is jsonb, and sometimes large)
instead of the single field from the json my query cares about.

Is there anyway to stop postgres from returning the entire jsonb column?

-Ed

Re: Full JSONb column returned over FDW when only single value needed

From
Tom Lane
Date:
Ed Kurowski <ed.kurowski@gmail.com> writes:
> This runs very quickly directly on the remote postgres server (indexes have
> been set appropriately on the foreign server), but is slow when running
> over the fdw.  I believe I have tracked this down to the fact that it is
> returning the entire data column (which is jsonb, and sometimes large)
> instead of the single field from the json my query cares about.

> Is there anyway to stop postgres from returning the entire jsonb column?

This isn't a consideration that postgres_fdw knows anything about at the
moment.  You could force it by defining a view on the remote server that
only exposes data->>'query' rather than the whole data column, and making
the foreign table reference the view not the underlying table.  Of course
that approach won't scale if there are a lot of different jsonb fields
you may want to ask about.

FWIW, I'm a little suspicious of whether it'd really help that much,
though an experiment with the view approach would probably prove or
disprove things quickly.

            regards, tom lane

Re: Full JSONb column returned over FDW when only single value needed

From
Ed Kurowski
Date:
Testing has shown it improves a lot by removing the data column.  Typically
takes 6 seconds, without the data field, it takes about 10ms.  The 'data'
column here is rather arbitrary and sometimes contains a fair amount of
data.  We considered the view option, but it seems like the planner should
be smart enough to know we only care about a single field and just send
that back instead of all the json.

On Fri, Sep 2, 2016 at 3:06 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Ed Kurowski <ed.kurowski@gmail.com> writes:
> > This runs very quickly directly on the remote postgres server (indexes
> have
> > been set appropriately on the foreign server), but is slow when running
> > over the fdw.  I believe I have tracked this down to the fact that it is
> > returning the entire data column (which is jsonb, and sometimes large)
> > instead of the single field from the json my query cares about.
>
> > Is there anyway to stop postgres from returning the entire jsonb column?
>
> This isn't a consideration that postgres_fdw knows anything about at the
> moment.  You could force it by defining a view on the remote server that
> only exposes data->>'query' rather than the whole data column, and making
> the foreign table reference the view not the underlying table.  Of course
> that approach won't scale if there are a lot of different jsonb fields
> you may want to ask about.
>
> FWIW, I'm a little suspicious of whether it'd really help that much,
> though an experiment with the view approach would probably prove or
> disprove things quickly.
>
>                         regards, tom lane
>