Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans - Mailing list pgsql-general

From Sameer Kumar
Subject Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans
Date
Msg-id CADp-Sm7GSO_=7iOwarX9-ScwA9GOoeiCMZxcztyY7_qqvUphWw@mail.gmail.com
Whole thread Raw
In response to Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans  (Victor Blomqvist <vb@viblo.se>)
List pgsql-general


On Fri, Aug 19, 2016 at 2:25 PM Victor Blomqvist <vb@viblo.se> wrote:
On Fri, Aug 19, 2016 at 1:31 PM, Sameer Kumar <sameer.kumar@ashnik.com> wrote:


On Fri, 19 Aug 2016, 1:07 p.m. Victor Blomqvist, <vb@viblo.se> wrote:
Hi,

Is it possible to break/limit a query so that it returns whatever results found after having checked X amount of rows in a index scan?

For example:
create table a(id int primary key);
insert into a select * from generate_series(1,100000);

select * from a
where id%2 = 0
order by id limit 10

In this case the query will "visit" 20 rows and filter out 10 of them. We can see that in the query plan:
"Rows Removed by Filter: 10"
"Heap Fetches: 20"

Is it somehow possible to limit this query so that it only fetches X amount, in my example if we limited it to 10 Heap Fetches the query could return the first 5 rows?


 

My use case is I have a table with 35 million rows with a geo index, and I want to do a KNN search but also limit the query on some other parameters. In some cases the other parameters restrict the query so much that Heap Fetches becomes several 100k or more, and in those cases I would like to have a limit to my query.

Have you checked the TABLESAMPLE clause in v9.5?



Unless I misunderstand what you mean or how it works I cant really see what it would help.


I stand corrected. TABLESAMPLE will not help you.
 
I want my query to still return the "best" results, and I want it to use the index for that. Just randomly selecting out from the whole table will either have to sample a too small subset of the rows, or be too slow.

So, given my query above, in the normal ("slow" case) I would find the 10 first even rows:
2,4,6,8,10,12,14,16,18,20
If I could restrict the heap fetches to 10 I would find
2,4,6,8,10
However, with tablesample I might end up with for example these rows:
15024,71914,51682,7110,61802,63390,98278,8022,34256,49220


How about using the LIMIT ?
SELECT column_1, column_2, ... FROM my_table WHERE <<expression>>
ORDER BY my_column
LIMIT 10 ;

 
In my use case I want the best rows (according to the order by), so just a random sample is not good enough.

/Victor
 

Thanks!
/Victor
--
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 

Skype: sameer.ashnik | www.ashnik.com

--
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 

Skype: sameer.ashnik | www.ashnik.com

pgsql-general by date:

Previous
From: Thomas Güttler
Date:
Subject: Re: PG vs ElasticSearch for Logs
Next
From: Sameer Kumar
Date:
Subject: Re: PG vs ElasticSearch for Logs