Re: Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Ideas about a better API for postgres_fdw remote estimates |
Date | |
Msg-id | CAExHW5uqKTfjC0etXexFZUBdBwRfJuhsUd=Gm58Hx6OOFEDrVA@mail.gmail.com Whole thread Raw |
In response to | Re: Ideas about a better API for postgres_fdw remote estimates ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>) |
Responses |
Re: Ideas about a better API for postgres_fdw remote estimates
Re: Ideas about a better API for postgres_fdw remote estimates |
List | pgsql-hackers |
On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov <a.lepikhov@postgrespro.ru> wrote: > > Thanks for this helpful feedback. > > I found several threads related to the problem [1-3]. > I agreed that this task needs to implement an API for > serialization/deserialization of statistics: > pg_load_relation_statistics(json_string text); > pg_get_relation_statistics(relname text); > We can use a version number for resolving conflicts with different > statistics implementations. > "Load" function will validate the values[] anyarray while deserializing > the input json string to the datatype of the relation column. > This is a valuable feature. Analysing a foreign table by fetching rows from the foreign server isn't very efficient. In fact the current FDW API for doing that forges that in-efficiency by requiring the FDW to return a sample of rows that will be analysed by the core. That's why I see that your patch introduces a new API to get foreign rel stat. I don't think there's any point in maintaining these two APIs just for ANALYSING table. Instead we should have only one FDW API which will do whatever it wants and return statistics that can be understood by the core and let core install it in the catalogs. I believe that's doable. In case of PostgreSQL it could get the stats available as is from the foreign server, convert it into a form that the core understands and returns. The patch introduces a new function postgres_fdw_stat() which will be available only from version 14 onwards. Can we use row_to_json(), which is available in all the supported versions, instead? In case of some other foreign server, an FDW will be responsible to return statistics in a form that the core will understand. It may fetch rows from the foreign server or be a bit smart and fetch the statistics and convert. This also means that FDWs will have to deal with the statistics format that the core understands and thus will need changes in their code with every version in the worst case. But AFAIR, PostgreSQL supports different forms of statistics so the problem may not remain that severe if FDWs and core agree on some bare minimum format that the core supports for long. I think the patch has some other problems like it works only for regular tables on foreign server but a foreign table can be pointing to any relation like a materialized view, partitioned table or a foreign table on the foreign server all of which have statistics associated with them. I didn't look closely but it does not consider that the foreign table may not have all the columns from the relation on the foreign server or may have different names. But I think those problems are kind of secondary. We have to agree on the design first. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: