In general implemented functionality looks good. I've checked how it works on the existing tests, almost everywhere required columns were not missing in scanCols (which is probably the most important part). Sometimes exressions were checked multiple times, which could potentially introduce some overhead, but I believe this effect is negligible. Just to mention some counterintuitive examples, for this kind of query
SELECT min(q1) FROM INT8_TBL;
the same column was checked 5 times in my tests, since it's present also in baserestrictinfo, and build_minmax_path does one extra round of planning and invoking make_one_rel.
Thanks so much for the very thorough review, Dmitry.
These extra calls to extract_scan_cols() should be okay in this case. As you mentioned, for min/max queries, planner attempts an optimization with an indexscan and, to do this, it modifies the querytree and then calls query_planner() on it. It tries it with NULLs first and then NULLs last -- each of which invokes query_planner(), so that is two out of three calls. The third is the normal invocation. I'm not sure how you would get five, though.
Another interesting example is Values Scan (e.g. in an insert statements with multiple records), can an abstract table AM user leverage information about columns in it?
One case, where I believe columns were missing, is statements with returning:
INSERT INTO foo (col1) VALUES ('col1'), ('col2'), ('col3') RETURNING *;
Looks like in this situation there is only expression in reltarget is for col1, but returning result contains all columns.
This relates to both of your above points:
For this RETURNING query, it is a ValuesScan, so no columns have to be scanned. We actually do add the reference to col1 to the scanCols bitmap, though. I'm not sure we want to do so, since we don't want to scan col1 in this case. I wonder what cases we would miss if we special cased RTEKind RTE_VALUES when doing extract_scan_cols().