Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans - Mailing list pgsql-general

From Victor Blomqvist
Subject Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans
Date
Msg-id CAL870DVEaNJkFv7bk5EnsZ3vDdP2DQe8iLgLfqJpJi_rk-0yHA@mail.gmail.com
Whole thread Raw
In response to Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: Limit Heap Fetches / Rows Removed by Filter in Index Scans  (Francisco Olarte <folarte@peoplecall.com>)
List pgsql-general

On Sat, Aug 20, 2016 at 1:13 AM, Francisco Olarte <folarte@peoplecall.com> wrote:
Hi Victor:


On Fri, Aug 19, 2016 at 7:02 PM, Victor Blomqvist <vb@viblo.se> wrote:
> What I want to avoid is my query visiting the whole 1m rows to get a result,
> because in my real table that can take 100sec. At the same time I want the
> queries that only need to visit 1k rows finish quickly, and the queries that
> visit 100k rows at least get some result back.

You are going to have problems with that. If you just want to limit it
to max 100k rows, max 10 results my solution works, probably better as
nested selects than CTEs, but someone more knowledgeable in the
optimizer will need to say something ( or tests will be needed ). This
is because "the queries that visit 100k rows at least get some result
back." may be false, you may need to visit the whole 1M to get the
first result if you are unlucky. Just set ap=999 where id=1M and ask
for ap>=999 and you've got that degenerate case, which can only be
saved if you have an index on ap ( even with statistics, you would
need a full table scan to find it ).

If you are positive some results are in the first 100k rows, then my
method works fine, how fast will need to be tested with the real data.
You can even try using *10, *100, *1k of the real limit until you have
enough results if you want to time-limit your queries.


Francisco Olarte.


Thanks! A sub select seems to do it.

I didnt think of it before, guess I got blinded by the CTE since usually
its the other way around and the CTE is the answer to the problem.
But seems like the easy solution with a good old sub select fixes it.
Now I feel a bit embarrassed for such a easy answer :)


Checking these two queries I can see that the first one visits the
max 50 rows its allowed to and returns 5 rows, while the second one
finish off after 13 rows fetched and returns the full 10 rows.

select *
    from (select * from b order by id limit 50) x
    where age_preference%10 < 1
    order by id limit 10

select *
    from (select * from b order by id limit 50) x
    where age_preference%10 < 5
    order by id limit 10

/Victor
 

pgsql-general by date:

Previous
From: Andy Colson
Date:
Subject: Re: PG vs ElasticSearch for Logs
Next
From: Chris Mair
Date:
Subject: Re: PG vs ElasticSearch for Logs