Thread: Performance killed with FDW when using CAST.

Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
We setup a FDW server for a large table that has an hstore data type as on eof the coums.

When we run the query directly on the FDW server

select * from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit 10;

The response is 2ms . which is awesome.

When we run it from the remote server where we have the server defined, the query never returns.

Our server definition is as follows.

(host 'fdwserver', dbname 'mydb', port '5432', fetch_size '1000', use_remote_estimate 'true', extensions 'hstore')

We have been playing with this for hours to no avail. 

Any clues ?

Thanks

--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Ian Barwick
Date:
On 4/17/19 9:21 AM, Jorge Torralba wrote:
> We setup a FDW server for a large table that has an hstore data type as on eof the coums.
> 
> When we run the query directly on the FDW server
> 
> select * from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit
10;
> 
> The response is 2ms . which is awesome.
> 
> When we run it from the remote server where we have the server defined, the query never returns.
> 
> Our server definition is as follows.
> 
> (host 'fdwserver', dbname 'mydb', port '5432', fetch_size '1000', use_remote_estimate 'true', extensions 'hstore')
> 
> We have been playing with this for hours to no avail.
> 
> Any clues ?

At a guess the FDW is not able to push down the entire query to the FDW server
and might be doing something like fetching all the remote rows and casting
them locally, or something along those lines.

It would help to provide the following info:

- PostgreSQL versions of both servers
- "EXPLAIN" output of the query on both servers
- "EXPLAIN ANALYZE" output of the query executed directly on the FDW server
- approximate number of rows in the "recent_events" table


Regards

Ian Barwick

-- 
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
Thanks for taking the time to look.

Both servers are on ....

                                                             version
----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit

I have reduced the query to select only two columns for simplicity with the same results.

The table contain about 50 million rows

On the server hosting the table ..... Hiding private data.

alertsdb_recent_events=#  select id, attributes -> 'account_incident_id' from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit 5;
                  id                  | ?column?
--------------------------------------+----------
 ***** | 2617116
 ***** | 2617116
 ***** | 2617116
 ***** | 2617116
 ***** | 2617116
(5 rows)

Time: 82.868 ms


# explain  select id, attributes -> 'account_incident_id' from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit 5;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.77 rows=5 width=48)
   ->  Index Scan using recent_event_account_id_attributes_account_incident_id_idx on recent_events  (cost=0.56..56.31 rows=87 width=48)
         Index Cond: ((account_id = 1) AND (((attributes -> 'account_incident_id'::text))::integer = 2617116))
(3 rows)

Time: 71.907 ms

# explain  analyze select id, attributes -> 'account_incident_id' from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit 5;
                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.56..3.77 rows=5 width=48) (actual time=0.019..0.025 rows=5 loops=1)
   ->  Index Scan using recent_event_account_id_attributes_account_incident_id_idx on recent_events  (cost=0.56..56.31 rows=87 width=48) (actual time=0.018..0.023 rows=5 loops=1)
         Index Cond: ((account_id = 1) AND (((attributes -> 'account_incident_id'::text))::integer = 2617116))
 Planning Time: 0.124 ms
 Execution Time: 0.038 ms
(5 rows)

Time: 80.782 ms


On the server that communicates with the FDW server .....

# explain  select id, attributes -> 'account_incident_id' from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit 5;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=100.00..426.27 rows=5 width=48)
   ->  Foreign Scan on recent_events  (cost=100.00..6663659.61 rows=102117 width=48)
         Filter: (((attributes -> 'account_incident_id'::text))::integer = 2617116)
(3 rows)

Time: 85.276 ms


# explain  analyze select id, attributes -> 'account_incident_id' from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit 5;
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.00..426.27 rows=5 width=48) (actual time=21242.087..26387.257 rows=5 loops=1)
   ->  Foreign Scan on recent_events  (cost=100.00..6663703.90 rows=102117 width=48) (actual time=21242.086..26387.252 rows=5 loops=1)
         Filter: (((attributes -> 'account_incident_id'::text))::integer = 2617116)
         Rows Removed by Filter: 724249
 Planning Time: 1.164 ms
 Execution Time: 26387.851 ms
(6 rows)

Time: 26528.113 ms (00:26.528)

The query killer is the ...

AND (attributes -> 'account_incident_id')::integer = 2617116

Run the query this way ...

select id, attributes -> 'account_incident_id' from recent_events where account_id = 1  limit 5;

and the results is only 10ms slower than on the hosting server directly which is what we are expecting.  It's like the casting of the hstore column is just not playing nice.

Thanks


On Tue, Apr 16, 2019 at 6:56 PM Ian Barwick <ian.barwick@2ndquadrant.com> wrote:
On 4/17/19 9:21 AM, Jorge Torralba wrote:
> We setup a FDW server for a large table that has an hstore data type as on eof the coums.
>
> When we run the query directly on the FDW server
>
> select * from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit 10;
>
> The response is 2ms . which is awesome.
>
> When we run it from the remote server where we have the server defined, the query never returns.
>
> Our server definition is as follows.
>
> (host 'fdwserver', dbname 'mydb', port '5432', fetch_size '1000', use_remote_estimate 'true', extensions 'hstore')
>
> We have been playing with this for hours to no avail.
>
> Any clues ?

At a guess the FDW is not able to push down the entire query to the FDW server
and might be doing something like fetching all the remote rows and casting
them locally, or something along those lines.

It would help to provide the following info:

- PostgreSQL versions of both servers
- "EXPLAIN" output of the query on both servers
- "EXPLAIN ANALYZE" output of the query executed directly on the FDW server
- approximate number of rows in the "recent_events" table


Regards

Ian Barwick

--
  Ian Barwick                   https://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Ron
Date:

On 4/16/19 9:40 PM, Jorge Torralba wrote:
> Thanks for taking the time to look.
>
> Both servers are on ....
>
>    version
>
----------------------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled 
> by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
>
> I have reduced the query to select only two columns for simplicity with 
> the same results.
>
> The table contain about 50 million rows
>
> On the server hosting the table ..... Hiding private data.
>
> alertsdb_recent_events=#  select id, attributes -> 'account_incident_id' 
> from recent_events where account_id = 1 AND (attributes -> 
> 'account_incident_id')::integer = 2617116 limit 5;
>                   id                  | ?column?
> --------------------------------------+----------
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
> (5 rows)
>
> Time: 82.868 ms
>
>
> # explain  select id, attributes -> 'account_incident_id' from 
> recent_events where account_id = 1 AND (attributes -> 
> 'account_incident_id')::integer = 2617116 limit 5;
>      QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.56..3.77 rows=5 width=48)
>    ->  Index Scan using 
> recent_event_account_id_attributes_account_incident_id_idx on 
> recent_events  (cost=0.56..56.31 rows=87 width=48)
>          Index Cond: ((account_id = 1) AND (((attributes -> 
> 'account_incident_id'::text))::integer = 2617116))
> (3 rows)
>
> Time: 71.907 ms
>
> # explain  analyze select id, attributes -> 'account_incident_id' from 
> recent_events where account_id = 1 AND (attributes -> 
> 'account_incident_id')::integer = 2617116 limit 5;
>                           QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.56..3.77 rows=5 width=48) (actual time=0.019..0.025 rows=5 
> loops=1)
>    ->  Index Scan using 
> recent_event_account_id_attributes_account_incident_id_idx on 
> recent_events  (cost=0.56..56.31 rows=87 width=48) (actual 
> time=0.018..0.023 rows=5 loops=1)
>          Index Cond: ((account_id = 1) AND (((attributes -> 
> 'account_incident_id'::text))::integer = 2617116))
>  Planning Time: 0.124 ms
>  Execution Time: 0.038 ms
> (5 rows)
>
> Time: 80.782 ms
>
>
> On the server that communicates with the FDW server .....
>
> # explain  select id, attributes -> 'account_incident_id' from 
> recent_events where account_id = 1 AND (attributes -> 
> 'account_incident_id')::integer = 2617116 limit 5;
>                                      QUERY PLAN
> -------------------------------------------------------------------------------------
>  Limit  (cost=100.00..426.27 rows=5 width=48)
>    ->  Foreign Scan on recent_events (cost=100.00..6663659.61 rows=102117 
> width=48)
>          Filter: (((attributes -> 'account_incident_id'::text))::integer = 
> 2617116)
> (3 rows)
>
> Time: 85.276 ms
>
>
> # explain  analyze select id, attributes -> 'account_incident_id' from 
> recent_events where account_id = 1 AND (attributes -> 
> 'account_incident_id')::integer = 2617116 limit 5;
>     QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=100.00..426.27 rows=5 width=48) (actual 
> time=21242.087..26387.257 rows=5 loops=1)
>    ->  Foreign Scan on recent_events (cost=100.00..6663703.90 rows=102117 
> width=48) (actual time=21242.086..26387.252 rows=5 loops=1)
>          Filter: (((attributes -> 'account_incident_id'::text))::integer = 
> 2617116)
>          Rows Removed by Filter: 724249
>  Planning Time: 1.164 ms
>  Execution Time: 26387.851 ms
> (6 rows)
>
> Time: 26528.113 ms (00:26.528)
>
> The query killer is the ...
>
> AND (attributes -> 'account_incident_id')::integer = 2617116
>
> Run the query this way ...
>
> select id, attributes -> 'account_incident_id' from recent_events where 
> account_id = 1  limit 5;
>
> and the results is only 10ms slower than on the hosting server directly 
> which is what we are expecting.  It's like the casting of the hstore 
> column is just not playing nice.

Casting the left side of a predicate is not recommended.  What if you cast 
2617166 to be the same type as attributes -> 'account_incident_id'?


-- 
Angular momentum makes the world go 'round.



Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
I tried without casting and still the same problem on the server accessing remote data.

On the server hosting the data, even with casting the query executes in 78ms.

fty, I have run analyze on both the servers as well.

Thanks

JT

On Tue, Apr 16, 2019 at 8:00 PM Ron <ronljohnsonjr@gmail.com> wrote:


On 4/16/19 9:40 PM, Jorge Torralba wrote:
> Thanks for taking the time to look.
>
> Both servers are on ....
>
>    version
> ----------------------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled
> by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
>
> I have reduced the query to select only two columns for simplicity with
> the same results.
>
> The table contain about 50 million rows
>
> On the server hosting the table ..... Hiding private data.
>
> alertsdb_recent_events=#  select id, attributes -> 'account_incident_id'
> from recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>                   id                  | ?column?
> --------------------------------------+----------
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
> (5 rows)
>
> Time: 82.868 ms
>
>
> # explain  select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>      QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.56..3.77 rows=5 width=48)
>    ->  Index Scan using
> recent_event_account_id_attributes_account_incident_id_idx on
> recent_events  (cost=0.56..56.31 rows=87 width=48)
>          Index Cond: ((account_id = 1) AND (((attributes ->
> 'account_incident_id'::text))::integer = 2617116))
> (3 rows)
>
> Time: 71.907 ms
>
> # explain  analyze select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>                           QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.56..3.77 rows=5 width=48) (actual time=0.019..0.025 rows=5
> loops=1)
>    ->  Index Scan using
> recent_event_account_id_attributes_account_incident_id_idx on
> recent_events  (cost=0.56..56.31 rows=87 width=48) (actual
> time=0.018..0.023 rows=5 loops=1)
>          Index Cond: ((account_id = 1) AND (((attributes ->
> 'account_incident_id'::text))::integer = 2617116))
>  Planning Time: 0.124 ms
>  Execution Time: 0.038 ms
> (5 rows)
>
> Time: 80.782 ms
>
>
> On the server that communicates with the FDW server .....
>
> # explain  select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>                                      QUERY PLAN
> -------------------------------------------------------------------------------------
>  Limit  (cost=100.00..426.27 rows=5 width=48)
>    ->  Foreign Scan on recent_events (cost=100.00..6663659.61 rows=102117
> width=48)
>          Filter: (((attributes -> 'account_incident_id'::text))::integer =
> 2617116)
> (3 rows)
>
> Time: 85.276 ms
>
>
> # explain  analyze select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>     QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=100.00..426.27 rows=5 width=48) (actual
> time=21242.087..26387.257 rows=5 loops=1)
>    ->  Foreign Scan on recent_events (cost=100.00..6663703.90 rows=102117
> width=48) (actual time=21242.086..26387.252 rows=5 loops=1)
>          Filter: (((attributes -> 'account_incident_id'::text))::integer =
> 2617116)
>          Rows Removed by Filter: 724249
>  Planning Time: 1.164 ms
>  Execution Time: 26387.851 ms
> (6 rows)
>
> Time: 26528.113 ms (00:26.528)
>
> The query killer is the ...
>
> AND (attributes -> 'account_incident_id')::integer = 2617116
>
> Run the query this way ...
>
> select id, attributes -> 'account_incident_id' from recent_events where
> account_id = 1  limit 5;
>
> and the results is only 10ms slower than on the hosting server directly
> which is what we are expecting.  It's like the casting of the hstore
> column is just not playing nice.

Casting the left side of a predicate is not recommended.  What if you cast
2617166 to be the same type as attributes -> 'account_incident_id'?


--
Angular momentum makes the world go 'round.




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
I tried without casting and still the same problem on the server accessing remote data.

On the server hosting the data, even with casting the query executes in 78ms.

fty, I have run analyze on both the servers as well.

Thanks

JT

On Tue, Apr 16, 2019 at 8:00 PM Ron <ronljohnsonjr@gmail.com> wrote:


On 4/16/19 9:40 PM, Jorge Torralba wrote:
> Thanks for taking the time to look.
>
> Both servers are on ....
>
>    version
> ----------------------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 11.2 (Debian 11.2-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled
> by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit
>
> I have reduced the query to select only two columns for simplicity with
> the same results.
>
> The table contain about 50 million rows
>
> On the server hosting the table ..... Hiding private data.
>
> alertsdb_recent_events=#  select id, attributes -> 'account_incident_id'
> from recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>                   id                  | ?column?
> --------------------------------------+----------
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
>  ***** | 2617116
> (5 rows)
>
> Time: 82.868 ms
>
>
> # explain  select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>      QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.56..3.77 rows=5 width=48)
>    ->  Index Scan using
> recent_event_account_id_attributes_account_incident_id_idx on
> recent_events  (cost=0.56..56.31 rows=87 width=48)
>          Index Cond: ((account_id = 1) AND (((attributes ->
> 'account_incident_id'::text))::integer = 2617116))
> (3 rows)
>
> Time: 71.907 ms
>
> # explain  analyze select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>                           QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.56..3.77 rows=5 width=48) (actual time=0.019..0.025 rows=5
> loops=1)
>    ->  Index Scan using
> recent_event_account_id_attributes_account_incident_id_idx on
> recent_events  (cost=0.56..56.31 rows=87 width=48) (actual
> time=0.018..0.023 rows=5 loops=1)
>          Index Cond: ((account_id = 1) AND (((attributes ->
> 'account_incident_id'::text))::integer = 2617116))
>  Planning Time: 0.124 ms
>  Execution Time: 0.038 ms
> (5 rows)
>
> Time: 80.782 ms
>
>
> On the server that communicates with the FDW server .....
>
> # explain  select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>                                      QUERY PLAN
> -------------------------------------------------------------------------------------
>  Limit  (cost=100.00..426.27 rows=5 width=48)
>    ->  Foreign Scan on recent_events (cost=100.00..6663659.61 rows=102117
> width=48)
>          Filter: (((attributes -> 'account_incident_id'::text))::integer =
> 2617116)
> (3 rows)
>
> Time: 85.276 ms
>
>
> # explain  analyze select id, attributes -> 'account_incident_id' from
> recent_events where account_id = 1 AND (attributes ->
> 'account_incident_id')::integer = 2617116 limit 5;
>     QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=100.00..426.27 rows=5 width=48) (actual
> time=21242.087..26387.257 rows=5 loops=1)
>    ->  Foreign Scan on recent_events (cost=100.00..6663703.90 rows=102117
> width=48) (actual time=21242.086..26387.252 rows=5 loops=1)
>          Filter: (((attributes -> 'account_incident_id'::text))::integer =
> 2617116)
>          Rows Removed by Filter: 724249
>  Planning Time: 1.164 ms
>  Execution Time: 26387.851 ms
> (6 rows)
>
> Time: 26528.113 ms (00:26.528)
>
> The query killer is the ...
>
> AND (attributes -> 'account_incident_id')::integer = 2617116
>
> Run the query this way ...
>
> select id, attributes -> 'account_incident_id' from recent_events where
> account_id = 1  limit 5;
>
> and the results is only 10ms slower than on the hosting server directly
> which is what we are expecting.  It's like the casting of the hstore
> column is just not playing nice.

Casting the left side of a predicate is not recommended.  What if you cast
2617166 to be the same type as attributes -> 'account_incident_id'?


--
Angular momentum makes the world go 'round.




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Laurenz Albe
Date:
Jorge Torralba wrote:
> We setup a FDW server for a large table that has an hstore data type as on eof the coums.
> 
> When we run the query directly on the FDW server
> 
> select * from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit
10;
> 
> The response is 2ms . which is awesome.
> 
> When we run it from the remote server where we have the server defined, the query never returns.
> 
> Our server definition is as follows.
> 
> (host 'fdwserver', dbname 'mydb', port '5432', fetch_size '1000', use_remote_estimate 'true', extensions 'hstore')

After debugging into this, it seems that the hstore operator -> cannot
be pushed down because of collation problems.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Performance killed with FDW when using CAST.

From
Thomas Kellerer
Date:
Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> We setup a FDW server for a large table that has an hstore data type as on eof the coums.
>>
>> When we run the query directly on the FDW server
>>
>> select * from recent_events where account_id = 1 AND (attributes -> 'account_incident_id')::integer = 2617116 limit
10;
>>
>> The response is 2ms . which is awesome.
>>
>> When we run it from the remote server where we have the server defined, the query never returns.
>>
>> Our server definition is as follows.
>>
>> (host 'fdwserver', dbname 'mydb', port '5432', fetch_size '1000', use_remote_estimate 'true', extensions 'hstore')
> 
> After debugging into this, it seems that the hstore operator -> cannot
> be pushed down because of collation problems.

Do you happen to know if the JSONB operator -> (or ->>) can be pushed down? 

Although I realize it would be a major change in the database design, maybe that would be a solution for Jorge. 

Thomas






Re: Performance killed with FDW when using CAST.

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down? 

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

            regards, tom lane



Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
Thanks for the info. Is there a way to match the collations and make it work?

On Wed, Apr 17, 2019, 07:02 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down?

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

                        regards, tom lane


Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
I made a copy of the table and altered the column from hstore to jsonb.

Ran the following query with the same performance issues. 

SELECT id, attributes FROM xxx WHERE account_id = 1 AND timestamp >= '2019-01-16 22:34:28.584' AND  CAST(attributes ->> 'account_incident_id' as integer) = 2617116  order by timestamp desc limit 10;

Things to Note.

Remove the CAST on the attributes column and the order by results in quick performance

Add order by performance dies

Add CAST without the order by you can go out for dinner and still be waiting for a result set.





On Wed, Apr 17, 2019 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down?

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

                        regards, tom lane




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
I made a copy of the table and altered the column from hstore to jsonb.

Ran the following query with the same performance issues. 

SELECT id, attributes FROM xxx WHERE account_id = 1 AND timestamp >= '2019-01-16 22:34:28.584' AND  CAST(attributes ->> 'account_incident_id' as integer) = 2617116  order by timestamp desc limit 10;

Things to Note.

Remove the CAST on the attributes column and the order by results in quick performance

Add order by performance dies

Add CAST without the order by you can go out for dinner and still be waiting for a result set.





On Wed, Apr 17, 2019 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down?

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

                        regards, tom lane




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Greg Spiegelberg
Date:
Hi Jorge,

Can you create a view in the source database such as
CREATE VIEW xxx_id_attributes AS
SELECT id, CAST(attributes->>'account_incident_id' AS integer)
  FROM xxx WHERE attributes->'account_incident_id' ~ '^[0-9]+$';

On the remote server, create a foreign table on the new view and perform your test.

Just curious.

-Greg

On Wed, Apr 17, 2019 at 2:59 PM Jorge Torralba <jorge.torralba@gmail.com> wrote:
I made a copy of the table and altered the column from hstore to jsonb.

Ran the following query with the same performance issues. 

SELECT id, attributes FROM xxx WHERE account_id = 1 AND timestamp >= '2019-01-16 22:34:28.584' AND  CAST(attributes ->> 'account_incident_id' as integer) = 2617116  order by timestamp desc limit 10;

Things to Note.

Remove the CAST on the attributes column and the order by results in quick performance

Add order by performance dies

Add CAST without the order by you can go out for dinner and still be waiting for a result set.





On Wed, Apr 17, 2019 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down?

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

                        regards, tom lane




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Greg Spiegelberg
Date:
Hi Jorge,

Can you create a view in the source database such as
CREATE VIEW xxx_id_attributes AS
SELECT id, CAST(attributes->>'account_incident_id' AS integer)
  FROM xxx WHERE attributes->'account_incident_id' ~ '^[0-9]+$';

On the remote server, create a foreign table on the new view and perform your test.

Just curious.

-Greg

On Wed, Apr 17, 2019 at 2:59 PM Jorge Torralba <jorge.torralba@gmail.com> wrote:
I made a copy of the table and altered the column from hstore to jsonb.

Ran the following query with the same performance issues. 

SELECT id, attributes FROM xxx WHERE account_id = 1 AND timestamp >= '2019-01-16 22:34:28.584' AND  CAST(attributes ->> 'account_incident_id' as integer) = 2617116  order by timestamp desc limit 10;

Things to Note.

Remove the CAST on the attributes column and the order by results in quick performance

Add order by performance dies

Add CAST without the order by you can go out for dinner and still be waiting for a result set.





On Wed, Apr 17, 2019 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down?

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

                        regards, tom lane




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
Funny that you mentioned that. I created the view a couple of hours ago and that resolve the problem. The view had the order by on the timestamp as well so it does not have to send the data over for sorting. I wish there was a way for postgres to allow control of what happens on which server when using foreign data wrappers. If there is I would like to know about it

On Wed, Apr 17, 2019, 15:40 Greg Spiegelberg <gspiegelberg@gmail.com> wrote:
Hi Jorge,

Can you create a view in the source database such as
CREATE VIEW xxx_id_attributes AS
SELECT id, CAST(attributes->>'account_incident_id' AS integer)
  FROM xxx WHERE attributes->'account_incident_id' ~ '^[0-9]+$';

On the remote server, create a foreign table on the new view and perform your test.

Just curious.

-Greg

On Wed, Apr 17, 2019 at 2:59 PM Jorge Torralba <jorge.torralba@gmail.com> wrote:
I made a copy of the table and altered the column from hstore to jsonb.

Ran the following query with the same performance issues. 

SELECT id, attributes FROM xxx WHERE account_id = 1 AND timestamp >= '2019-01-16 22:34:28.584' AND  CAST(attributes ->> 'account_incident_id' as integer) = 2617116  order by timestamp desc limit 10;

Things to Note.

Remove the CAST on the attributes column and the order by results in quick performance

Add order by performance dies

Add CAST without the order by you can go out for dinner and still be waiting for a result set.





On Wed, Apr 17, 2019 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down?

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

                        regards, tom lane




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.

Re: Performance killed with FDW when using CAST.

From
Jorge Torralba
Date:
Funny that you mentioned that. I created the view a couple of hours ago and that resolve the problem. The view had the order by on the timestamp as well so it does not have to send the data over for sorting. I wish there was a way for postgres to allow control of what happens on which server when using foreign data wrappers. If there is I would like to know about it

On Wed, Apr 17, 2019, 15:40 Greg Spiegelberg <gspiegelberg@gmail.com> wrote:
Hi Jorge,

Can you create a view in the source database such as
CREATE VIEW xxx_id_attributes AS
SELECT id, CAST(attributes->>'account_incident_id' AS integer)
  FROM xxx WHERE attributes->'account_incident_id' ~ '^[0-9]+$';

On the remote server, create a foreign table on the new view and perform your test.

Just curious.

-Greg

On Wed, Apr 17, 2019 at 2:59 PM Jorge Torralba <jorge.torralba@gmail.com> wrote:
I made a copy of the table and altered the column from hstore to jsonb.

Ran the following query with the same performance issues. 

SELECT id, attributes FROM xxx WHERE account_id = 1 AND timestamp >= '2019-01-16 22:34:28.584' AND  CAST(attributes ->> 'account_incident_id' as integer) = 2617116  order by timestamp desc limit 10;

Things to Note.

Remove the CAST on the attributes column and the order by results in quick performance

Add order by performance dies

Add CAST without the order by you can go out for dinner and still be waiting for a result set.





On Wed, Apr 17, 2019 at 7:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Laurenz Albe schrieb am 17.04.2019 um 07:03:
>> After debugging into this, it seems that the hstore operator -> cannot
>> be pushed down because of collation problems.

> Do you happen to know if the JSONB operator -> (or ->>) can be pushed down?

A bit of experimentation says that jsonb -> integer can be pushed down,
but not any of the variants involving a text fieldname or result.
Presumably this is because of the heuristic that says not to push down
a collation that didn't arise from the remote column.  jsonb -> text
isn't really collation-sensitive, of course, but postgres_fdw has no
good way to know that, since the core code (outside of that operator
itself) doesn't know it either.  The assumption is that any function
with at least one input of a collatable type is collation-sensitive.
Here you're getting a default collation from the text literal, and
postgres_fdw doesn't want to assume that the remote end would choose
the same collation.

                        regards, tom lane




--
Thanks,

Jorge Torralba
----------------------------

Note: This communication may contain privileged or other confidential information. If you are not the intended recipient, please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the sender that you have received this email in error and delete the copy you received. Thank You.