Re: Why could different data in a table be processed with differentperformance? - Mailing list pgsql-performance

From Justin Pryzby
Subject Re: Why could different data in a table be processed with differentperformance?
Date
Msg-id 20180921004232.GC2471@telsasoft.com
Whole thread Raw
In response to Why could different data in a table be processed with different performance?  (Vladimir Ryabtsev <greatvovan@gmail.com>)
Responses Re: Why could different data in a table be processed with different performance?
List pgsql-performance
On Thu, Sep 20, 2018 at 05:07:21PM -0700, Vladimir Ryabtsev wrote:
> I am experiencing a strange performance problem when accessing JSONB
> content by primary key.

> I noticed that with some IDs it works pretty fast while with other it is
> 4-5 times slower. It is suitable to note, there are two main 'categories'
> of IDs in this table: first is range 270000000-500000000, and second is
> range 10000000000-100030000000. For the first range it is 'fast' and for
> the second it is 'slow'.

Was the data populated differently, too ?
Has the table been reindexed (or pg_repack'ed) since loading (or vacuumed for
that matter) ?
Were the tests run when the DB was otherwise idle?

You can see the index scan itself takes an additional 11sec, the "heap" portion
takes the remaining, additional 14sec (33s-12s-7s).

So it seems to me like the index itself is slow to scan.  *And*, the heap
referenced by the index is slow to scan, probably due to being referenced by
the index less consecutively.

> "Small' range: disk read rate is around 10-11 MB/s uniformly across the
> test. Output rate was 1300-1700 rows/s. Read ratio is around 13% (why?
> Shouldn't it be ~ 100% after drop_caches?).

I guess you mean buffers cache hit ratio: read/hit, which I think should
actually be read/(hit+read).

It's because a given buffer can be requested multiple times.  For example, if
an index page is read which references multiple items on the same heap page,
each heap access is counted separately.  If the index is freshly built, that'd
happen nearly every item.

Justin

> Aggregate  (cost=8635.91..8635.92 rows=1 width=16) (actual time=6625.993..6625.995 rows=1 loops=1)
>   Buffers: shared hit=26847 read=3914
>   ->  Index Scan using articles_pkey on articles  (cost=0.57..8573.35 rows=5005 width=107) (actual
time=21.649..1128.004rows=5000 loops=1)
 
>         Index Cond: ((article_id >= 438000000) AND (article_id <= 438005000))
>         Buffers: shared hit=4342 read=671

> Aggregate  (cost=5533.02..5533.03 rows=1 width=16) (actual time=33219.100..33219.102 rows=1 loops=1)
>   Buffers: shared hit=6568 read=7104
>   ->  Index Scan using articles_pkey on articles  (cost=0.57..5492.96 rows=3205 width=107) (actual
time=22.167..12082.624rows=2416 loops=1)
 
>         Index Cond: ((article_id >= '100021000000'::bigint) AND (article_id <= '100021010000'::bigint))
>         Buffers: shared hit=50 read=2378


pgsql-performance by date:

Previous
From: Vladimir Ryabtsev
Date:
Subject: Why could different data in a table be processed with different performance?
Next
From: Vladimir Ryabtsev
Date:
Subject: Re: Why could different data in a table be processed with different performance?