Re: SQL/MED - file_fdw - Mailing list pgsql-hackers

From Itagaki Takahiro
Subject Re: SQL/MED - file_fdw
Date
Msg-id AANLkTinvkGqf9RHjw0AKDZ8_LUe3mKTH_BVkgJJ8QxK3@mail.gmail.com
Whole thread Raw
In response to Re: SQL/MED - file_fdw  (Shigeru HANADA <hanada@metrosystems.co.jp>)
Responses Re: SQL/MED - file_fdw  (Shigeru HANADA <hanada@metrosystems.co.jp>)
List pgsql-hackers
On Mon, Feb 7, 2011 at 16:01, Shigeru HANADA <hanada@metrosystems.co.jp> wrote:
> This patch is based on latest FDW API patches which are posted in
> another thread "SQL/MED FDW API", and copy_export-20110104.patch which
> was posted by Itagaki-san.

I have questions about estimate_costs().

* What value does baserel->tuples have? Foreign tables are never analyzed for now. Is the number correct?

* Your previous measurement showed it has much more startup_cost. When you removed ReScan, it took long time but
plannerdidn't choose materialized plans. It might come from lower startup costs.
 

* Why do you use lstat() in it? Even if the file is a symlink, we will read the linked file in the succeeding copy. So,
Ithink it should be stat() rather than lstat().
 

+estimate_costs(const char *filename, RelOptInfo *baserel,
+              double *startup_cost, double *total_cost)
+{
...
+   /* get size of the file */
+   if (lstat(filename, &stat) == -1)
+   {
+       ereport(ERROR,
+               (errcode_for_file_access(),
+                errmsg("could not stat file \"%s\": %m", filename)));
+   }
+
+   /*
+    * The way to estimate costs is almost same as cost_seqscan(), but there
+    * are some differences:
+    * - DISK costs are estimated from file size.
+    * - CPU costs are 10x of seq scan, for overhead of parsing records.
+    */
+   pages = stat.st_size / BLCKSZ + (stat.st_size % BLCKSZ > 0 ? 1 : 0);
+   run_cost += seq_page_cost * pages;
+
+   *startup_cost += baserel->baserestrictcost.startup;
+   cpu_per_tuple = cpu_tuple_cost + baserel->baserestrictcost.per_tuple;
+   run_cost += cpu_per_tuple * 10 * baserel->tuples;
+   *total_cost = *startup_cost + run_cost;
+
+   return stat.st_size;
+}

-- 
Itagaki Takahiro


pgsql-hackers by date:

Previous
From: Thom Brown
Date:
Subject: Re: [GENERAL] Issues with generate_series using integer boundaries
Next
From: strk
Date:
Subject: DROP SCHEMA xxx CASCADE: ERROR: could not open relation with OID yyy