Re: Ideas about a better API for postgres_fdw remote estimates - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Ideas about a better API for postgres_fdw remote estimates |
Date | |
Msg-id | 20200904145751.5zca73ihdfgh3eec@development Whole thread Raw |
In response to | Re: Ideas about a better API for postgres_fdw remote estimates ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>) |
Responses |
Re: Ideas about a better API for postgres_fdw remote estimates
|
List | pgsql-hackers |
On Thu, Sep 03, 2020 at 10:14:41AM +0500, Andrey V. Lepikhov wrote: >On 8/31/20 6:19 PM, Ashutosh Bapat wrote: >>On Mon, Aug 31, 2020 at 3:36 PM Andrey V. Lepikhov >><a.lepikhov@postgrespro.ru> wrote: >>> >>>Thanks for this helpful feedback. >>I think the patch has some other problems like it works only for >>regular tables on foreign server but a foreign table can be pointing >>to any relation like a materialized view, partitioned table or a >>foreign table on the foreign server all of which have statistics >>associated with them. I didn't look closely but it does not consider >>that the foreign table may not have all the columns from the relation >>on the foreign server or may have different names. But I think those >>problems are kind of secondary. We have to agree on the design first. >> >In accordance with discussion i made some changes in the patch: >1. The extract statistic routine moved into the core. >2. Serialized stat contains 'version' field to indicate format of >statistic received. >3. ANALYZE and VACUUM ANALYZE uses this approach only in the case of >implicit analysis of the relation. > >I am currently keeping limitation of using the approach for regular >relations only, because i haven't studied the specifics of another >types of relations. >But I don't know any reason to keep this limit in the future. > >The patch in attachment is very raw. I publish for further substantive >discussion. > Thanks for working on this. I briefly looked at the patch today, and I have some comments/feedback: 1) I wonder why deparseGetStatSql looks so different from e.g. deparseAnalyzeSizeSql - no deparseStringLiteral on relname, no cast to pg_catalog.regclass, function name not qualified with pg_catalog, ... 2) I'm a bit annoyed by the amount of code added to analyze.c only to support output/input in JSON format. I'm no expert, but I don't recall explain needing this much new stuff (OTOH it just produces json, it does not need to read it). Maybe we also need to process wider range of data types here. But the code is almost perfectly undocumented :-( 3) Why do we need to change vacuum_rel this way? 4) I wonder if we actually want/need to simply output pg_statistic data verbatim like this. Is postgres_fdw actually going to benefit from it? I kinda doubt that, and my assumption was that we'd return only a small subset of the data, needed by get_remote_estimate. This has a couple of issues. Firstly, it requires the knowledge of what the stakind constants in pg_statistic mean and how to interpret it - but OK, it's true that does not change very often (or at all). Secondly, it entirely ignores extended statistics - OK, we might extract those too, but it's going to be much more complex. And finally it entirely ignores costing on the remote node. Surely we can't just apply local random_page_cost or whatever, because those may be entirely different. And we don't know if the remote is going to use index etc. So is extracting data from pg_statistic the right approach? 5) I doubt it's enough to support relnames - we also need to estimate joins, so this needs to support plain queries I think. At least that's what Tom envisioned in his postgres_fdw_support(query text) proposal. 6) I see you've included a version number in the data - why not to just check regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: