Re: [PERFORM] Very poor read performance, query independent - Mailing list pgsql-performance

From Claudio Freire
Subject Re: [PERFORM] Very poor read performance, query independent
Date
Msg-id CAGTBQpY+0eUrRJjpYB+DN5gvTBK+nARyxMkiTdLi1UpP0kX--Q@mail.gmail.com
Whole thread Raw
In response to Re: [PERFORM] Very poor read performance, query independent  (Charles Nadeau <charles.nadeau@gmail.com>)
Responses Re: [PERFORM] Very poor read performance, query independent
List pgsql-performance
On Fri, Jul 14, 2017 at 12:34 PM, Charles Nadeau
<charles.nadeau@gmail.com> wrote:
>                                  Workers Planned: 12
>                                  Workers Launched: 12
>                                  Buffers: shared hit=728798037 read=82974833
>                                  ->  Hash Semi Join
> (cost=30059688.07..47951761.31 rows=220376 width=20) (actual
> time=1268845.181..2007864.725 rows=7057357 loops=13)
>                                        Hash Cond: (flows.dstaddr =
> flows_1.dstaddr)
>                                        Buffers: shared hit=728795193
> read=82974833
>                                        ->  Nested Loop
> (cost=0.03..17891246.86 rows=220376 width=20) (actual time=0.207..723790.283
> rows=37910370 loops=13)
>                                              Buffers: shared hit=590692229
> read=14991777
>                                              ->  Parallel Seq Scan on flows
> (cost=0.00..16018049.14 rows=55094048 width=20) (actual
> time=0.152..566179.117 rows=45371630 loops=13)
>                                                    Buffers: shared
> hit=860990 read=14991777
>                                              ->  Index Only Scan using
> mynetworks_ipaddr_idx on mynetworks  (cost=0.03..0.03 rows=1 width=8)
> (actual time=0.002..0.002 rows=1 loops=589831190)
>                                                    Index Cond: (ipaddr >>=
> (flows.srcaddr)::ip4r)
>                                                    Heap Fetches: 0
>                                                    Buffers: shared
> hit=589831203

12 workers on a parallel sequential scan on a RAID-10 volume of
rotating disks may not be a good idea.

Have you measured average request size and average wait times with iostat?

Run "iostat -x -m -d 60" while running the query and copy a few
relevant lines (or attach the whole thing). I suspect 12 parallel
sequential scans are degrading your array's performance to random I/O
performance, and that explains the 10MB/s very well (a rotating disk
will give you about 3-4MB/s at random I/O, and you've got 2 mirrors on
that array).

You could try setting the max_parallel_workers_per_gather to 2, which
should be the optimum allocation for your I/O layout.

You might also want to test switching to the deadline scheduler. While
the controller may get more aggregate thoughput rearranging your I/O
requests, high I/O latency will severly reduce postgres' ability to
saturate the I/O system itself, and deadlines tends to minimize
latency. I've had good results in the past using deadline, but take
this suggestion with a grain of salt, YMMV.


pgsql-performance by date:

Previous
From: Charles Nadeau
Date:
Subject: Re: [PERFORM] Very poor read performance, query independent
Next
From: Charles Nadeau
Date:
Subject: Re: [PERFORM] Very poor read performance, query independent