Thread: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

From
"Gabe F. Rudy"
Date:
<div class="WordSection1"><p class="MsoNormal">Hey all,<p class="MsoNormal"> <p class="MsoNormal">I’m building a FDW
arounda column-store backend (similar to CStore but for genomic data!).<p class="MsoNormal"> <p class="MsoNormal">I
havetables in the billions of rows, and have a common query pattern of asking for the table size (i.e. SELECT COUNT(*)
FROMbig_fdw_table; ).<p class="MsoNormal"> <p class="MsoNormal">This is a read-optimized system in which I know in
constanttime the exact dimensions of the table.<p class="MsoNormal"> <p class="MsoNormal">Is there any way to convince
PostgresFDW to leverage the analyze row counts or even the “double* totalRowCount” returned from the AcquireSampleRows
callbackfrom my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?<p
class="MsoNormal"> <pclass="MsoNormal">My current fallback is to export a specialized function that returns the table
rowcount for a given FDW table, but that then leaks into the user-application driving these queries.<p
class="MsoNormal"> <pclass="MsoNormal">Thanks in advance!<p class="MsoNormal">Gabe<p class="MsoNormal"> <p
class="MsoNormal"style="background:white"><b><span style="font-size:10.0pt;color:#262626">Gabe Rudy</span></b><span
style="font-size:10.0pt;color:#595959">| VP Product & Engineering | Golden Helix, Inc.</span><span
style="color:black"></span><pclass="MsoNormal"> </div> 
"Gabe F. Rudy" <rudy@goldenhelix.com> writes:
> Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount"
returnedfrom the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table
scanfor a COUNT(*) etc?
 

No.  In PG's view, ANALYZE-based row counts are imprecise by definition.
        regards, tom lane



On 25 February 2016 at 09:48, Gabe F. Rudy <rudy@goldenhelix.com> wrote:

Hey all,

 

I’m building a FDW around a column-store backend (similar to CStore but for genomic data!).

 

I have tables in the billions of rows, and have a common query pattern of asking for the table size (i.e. SELECT COUNT(*) FROM big_fdw_table; ).

 

This is a read-optimized system in which I know in constant time the exact dimensions of the table.

 

Is there any way to convince Postgres FDW to leverage the analyze row counts or even the “double* totalRowCount” returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?

 

My current fallback is to export a specialized function that returns the table row count for a given FDW table, but that then leaks into the user-application driving these queries.


Look at TABLESAMPLE, which does mostly what you're asking.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Ok, I get that.

Really what I am *rooting* for is Aggregate (and Sort By) Push-Down to FDW plugins.

I can already internalize conditional filters for most cases, and doing a count on the filtered results would be
considerablyfaster in my FDW back-end before all the records and Datums have to be constructed for postgres to do the
counting.

Similarly, I'm very excited about the potential for FDW to advertise a-priori sort states, so things like external
merge-sortscan pass-through the request for sorted data for fields in which sorting is a no-op in my backend. 

Importantly my IDs are sorted by definition since they are essentially array indexes into the column-store, so joining
onthem with merge-sort should be blazing fast, but currently time is wasted sorting these pre-sorted fields. 

Just my 2c, and I'll be tracking the 9.6 progress that includes some of these proposals.

Gabe

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, February 25, 2016 11:21 PM
To: Gabe F. Rudy <rudy@goldenhelix.com>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

"Gabe F. Rudy" <rudy@goldenhelix.com> writes:
> Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount"
returnedfrom the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table
scanfor a COUNT(*) etc? 

No.  In PG's view, ANALYZE-based row counts are imprecise by definition.
        regards, tom lane



Hi Gabe, 
Did you get Aggregate Pushdown FDW plugin?
Would be really helpful if you can share some insight on your investigation.
Regards
Anant



--
View this message in context:
http://postgresql.nabble.com/FDW-handling-count-through-AnalyzeForeignTable-or-other-constant-time-push-down-tp5889291p5912699.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.