Performance of a nested loop, whose inner loop uses an index scan. - Mailing list pgsql-performance

From negora
Subject Performance of a nested loop, whose inner loop uses an index scan.
Date
Msg-id 5cb92dc3-af70-3cb1-5b2b-87ccc31f62cb@negora.com
Whole thread Raw
Responses Re: Performance of a nested loop, whose inner loop uses an index scan.  (Matheus de Oliveira <matioli.matheus@gmail.com>)
List pgsql-performance
Hello:

I've a question about the performance of a query plan that uses a nested
loop, and whose inner loop uses an index scan. Would you be so kind to
help me, please?

I'm using PostgreSQL 9.5.4 on Ubuntu 14.04 64-bit (kernel 4.8.2). I've 3
tables, which are "answers", "test_completions" and "courses". The first
one contains around ~30 million rows, whereas the others only have a few
thousands each one. The query that I'm performing is very simple,
although retrieves lots of rows:

    ---------------------
    SELECT answers.*
    FROM answers
    JOIN test_completions ON test_completions.test_completion_id =
answers.test_completion_id
    JOIN courses ON courses.course_id = test_completions.course_id
    WHERE courses.group_id = 2;
    ---------------------


This yields the following plan:

    ---------------------
    Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)
        ->  Hash Join  (cost=245.36..539.81 rows=3081 width=8) (actual
time=1.077..6.087 rows=3123 loops=1)
                    Hash Cond: (test_completions.course_id =
courses.course_id)
                    ->  Seq Scan on test_completions  (cost=0.00..214.65
rows=13065 width=16) (actual time=0.005..1.051 rows=13065 loops=1)
                    ->  Hash  (cost=204.11..204.11 rows=3300 width=8)
(actual time=1.063..1.063 rows=3300 loops=1)
                                Buckets: 4096  Batches: 1  Memory Usage:
161kB
                                ->  Bitmap Heap Scan on courses
(cost=45.86..204.11 rows=3300 width=8) (actual time=0.186..0.777
rows=3300 loops=1)
                                            Recheck Cond: (group_id = 2)
                                            Heap Blocks: exact=117
                                            ->  Bitmap Index Scan on
fki_courses_group_id_fkey  (cost=0.00..45.03 rows=3300 width=0) (actual
time=0.172..0.172 rows=3300 loops=1)
                                                        Index Cond:
(group_id = 2)
       ### HERE ###
        ->  Index Scan using fki_answers_test_completion_id_fkey on
answers  (cost=0.56..96.90 rows=2747 width=38) (actual time=0.007..0.558
rows=2852 loops=3123)
       ### HERE ###
                    Index Cond: (test_completion_id =
test_completions.test_completion_id)
    Planning time: 0.523 ms
    Execution time: 2805.530 ms
    ---------------------

My doubt is about the inner loop of the nested loop, the one that I've
delimited with  ### HERE ### . This loop is the part that, obviously,
more time consumes. Because its run 3,123 times and requires lots of
accesses to multiple database pages. But, Is there anything that I can
do to reduce even more the time spent in this part? Apart of:

    * Clustering the "answers" table.
    * Upgrading PostgreSQL to version 9.6, to take advantage of the
index scans in parallel.
    * Upgrading the hardware.

Thank you!



pgsql-performance by date:

Previous
From: Sam Gendler
Date:
Subject: Re: Should I generate strings in Postgres of Python?
Next
From: Matheus de Oliveira
Date:
Subject: Re: Performance of a nested loop, whose inner loop uses an index scan.