Thread: Large offset optimization and index only scan

Large offset optimization and index only scan

From
Maxim Boguk
Date:
Hi,

Recently, I found very interesting case where using IOS could provide serious benefits, but database not able to use it.

I think my ideas could be beneficial in two ways:
1)as work around of the current limitations
2)as idea to improve planner/executor in the future


Case:
SELECT * FROM very_large_table ORDER BY [some indexed column(s)]  OFFSET [some large value] LIMIT N;

Idea:
Use IOS (index only scan) to skip the first OFFSET values than switch to common index scan to fetch LIMIT N values


Usage sample (from real life).. fully cached case:

Simple index-scan:
(postgres@[local]:5432) [hh_data]=# explain analyze select * from vacancy_invitation order by vacancy_invitation_id limit 10 offset 1000000;
                                                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=6997.98..6998.05 rows=10 width=953) (actual time=883.971..883.978 rows=10 loops=1)
   ->  Index Scan using vacancy_invitation_pkey_dblarge on vacancy_invitation  (cost=0.00..287399.74 rows=41068952 width=953) (actual time=0.011..845.592 rows=1000010 loops=1)
 Total runtime: 884.003 ms

Index only scan... 3x speedup:
(postgres@[local]:5432) [hh_data]=# explain analyze select vacancy_invitation_id from vacancy_invitation order by vacancy_invitation_id limit 10 offset 1000000;
                                                                                    QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5546.48..5546.54 rows=10 width=4) (actual time=283.851..283.853 rows=10 loops=1)
   ->  Index Only Scan using vacancy_invitation_pkey_dblarge on vacancy_invitation  (cost=0.00..227788.13 rows=41068952 width=4) (actual time=0.013..246.477 rows=1000010 loops=1)
         Heap Fetches: 22
 Total runtime: 283.870 ms



Now lets combine fast IOS for skip first 1M records and simple index scan to fetch the next 10 full entries... still 3x speedup over initial version:
(postgres@[local]:5432) [hh_data]=# explain analyze select * from vacancy_invitation where vacancy_invitation_id>(select vacancy_invitation_id from vacancy_invitation order by vacancy_invitation_id limit 1 offset 1000000) order by vacancy_invitation_id limit 10;
                                                                                        QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5546.49..5546.58 rows=10 width=953) (actual time=289.674..289.683 rows=10 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=5546.48..5546.49 rows=1 width=4) (actual time=289.654..289.654 rows=1 loops=1)
           ->  Index Only Scan using vacancy_invitation_pkey_dblarge on vacancy_invitation  (cost=0.00..227788.13 rows=41068952 width=4) (actual time=0.013..252.358 rows=1000001 loops=1)
                 Heap Fetches: 23
   ->  Index Scan using vacancy_invitation_pkey_dblarge on vacancy_invitation  (cost=0.00..123476.85 rows=13689651 width=953) (actual time=289.673..289.681 rows=10 loops=1)
         Index Cond: (vacancy_invitation_id > $0)
 Total runtime: 289.716 ms


As a result there is possible to have a 3x speedup large offset query if corresponding to order by index present.
In not fully cached case difference could be easy 100x (if index stayed in memory and table data not).

May be it could be justified implement this approach (use IOS to skip an offset records) in planner/executor (instead of manual query rewriting).

I hope this post could be useful for someone.

PS: yes I know that the large offset is big evil... but ability to serious speedup in some cases is still useful.

--
Maxim Boguk
Senior Postgresql DBA
http://www.postgresql-consulting.ru/

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

LinkedIn: http://www.linkedin.com/pub/maksym-boguk/80/b99/b1b
Skype: maxim.boguk
Jabber: maxim.boguk@gmail.com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."

Re: Large offset optimization and index only scan

From
Robert Haas
Date:
On Tue, Nov 19, 2013 at 2:03 AM, Maxim Boguk <maxim.boguk@gmail.com> wrote:
> Case:
> SELECT * FROM very_large_table ORDER BY [some indexed column(s)]  OFFSET
> [some large value] LIMIT N;
>
> Idea:
> Use IOS (index only scan) to skip the first OFFSET values than switch to
> common index scan to fetch LIMIT N values

Interesting.  This is really a variant of an optimization that I
believe to have been originally proposed by Heikki.  His idea was to
have an Index-Only Scan node that would only ever look at the index,
and the have a Heap Fetch node which would bubble up the plan tree to
a higher level.  So, for example, if you were doing something like
this:

SELECT * FROM medium_size_table a, huge_table b WHERE a.x = b.x AND
rarely_true(a.y, b.y)

...you could implement this as:

Heap Fetch On b
-> Nested Loop   -> Seq Scan on a   -> Index-Only Scan on b

That way, we'd apply the filter condition at the nested-loop level,
using some hypothetical index on x and y, before even testing the
visibility of the rows from b (and thus incurring random I/O).  The
reason why nobody implemented this is (1) the planner challenges were
formidable and (b) if rarely_true isn't leakproof, the user (or some
other user) might find it unfortunate that it gets passed tuples not
visible to the current scan.  Thus, we stuck with a design where the
index-only scan always performs the heap fetch if that is needed to
determine visibility.

However, in your example, that wouldn't be the right thing anyway,
because you'd need to know whether the tuples were visible before
performing the limit.  So what you'd really want is to have index-only
scan work about the way it does now, except that if it does end up
fetching the heap tuple to check visibility, it somehow returns that.
If not, it returns only the index tuple, and then a Heap Fetch node
higher up the plan tree can fetch those heap tuples not yet fetched
without re-fetching those we've already got.  That seems complicated
to make work even in the executor, though, and the planning problem
makes my head hurt.  If we can sole those problems it might be neat,
though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company