Re: Why could different data in a table be processed with differentperformance? - Mailing list pgsql-performance

From Fabio Pardi
Subject Re: Why could different data in a table be processed with differentperformance?
Date
Msg-id e8d93055-9643-0ad2-d8bd-3f4fc344bf51@portavita.eu
Whole thread Raw
In response to Re: Why could different data in a table be processed with different performance?  (Vladimir Ryabtsev <greatvovan@gmail.com>)
Responses Re: Why could different data in a table be processed with different performance?
List pgsql-performance
Hi,

answers (and questions) in line here below

On 22/09/18 11:19, Vladimir Ryabtsev wrote:
> is the length of the text equally distributed over the 2 partitions?
Not 100% equally, but to me it does not seem to be a big deal... Considering the ranges independently:
First range: ~70% < 10 KB, ~25% for 10-20 KB, ~3% for 20-30 KB, everything else is less than 1% (with 10 KB steps).
Second range: ~80% < 10 KB, ~18% for 10-20 KB, ~2% for 20-30 KB, everything else is less than 1% (with 10 KB steps).

agree, should not play a role here

>From what you posted, the first query retrieves 5005 rows, but the second 2416. It might be helpful if we are able to compare 5000 vs 5000
Yes it was just an example, here are the plans for approximately same number of rows:

Aggregate  (cost=9210.12..9210.13 rows=1 width=16) (actual time=4265.478..4265.479 rows=1 loops=1)
  Buffers: shared hit=27027 read=4311
  I/O Timings: read=2738.728
  ->  Index Scan using articles_pkey on articles  (cost=0.57..9143.40 rows=5338 width=107) (actual time=12.254..873.081 rows=5001 loops=1)
        Index Cond: ((article_id >= 438030000) AND (article_id <= 438035000))
        Buffers: shared hit=4282 read=710
        I/O Timings: read=852.547
Planning time: 0.235 ms
Execution time: 4265.554 ms

Aggregate  (cost=11794.59..11794.60 rows=1 width=16) (actual time=62298.559..62298.559 rows=1 loops=1)
  Buffers: shared hit=15071 read=14847
  I/O Timings: read=60703.859
  ->  Index Scan using articles_pkey on articles  (cost=0.57..11709.13 rows=6837 width=107) (actual time=24.686..24582.221 rows=5417 loops=1)
        Index Cond: ((article_id >= '100021040000'::bigint) AND (article_id <= '100021060000'::bigint))
        Buffers: shared hit=195 read=5244
        I/O Timings: read=24507.621
Planning time: 0.494 ms
Execution time: 62298.630 ms

If we subtract I/O from total time, we get 1527 ms vs 1596 ms — very close timings for other than I/O operations (considering slightly higher number of rows in second case). But  I/O time differs dramatically.

> Also is worth noticing that the 'estimated' differs from 'actual' on the second query. I think that happens because data is differently distributed over the ranges. Probably the analyzer does not have enough samples to understand the real distribution.
I think we should not worry about it unless the planner chose poor plan, should we? Statistics affects on picking a proper plan, but not on execution of the plan, doesn't it?


Agree, it was pure speculation


> or to create partial indexes on the 2 ranges.
Sure, will try it with partial indexes. Should I drop existing PK index, or ensuring that planner picks range index is enough?

you cannot drop it since is on a PKEY.

You can create 2 partial indexes and the planner will pick it up for you. (and the planning time will go a bit up)


> - does the raid controller have a cache?
> - how big is the cache? (when you measure disk speed, that will influence the result very much, if  you do not run the test on big-enough data chunk) best if is disabled during your tests 
I am pretty sure there is some, usually it's several tens of megabytes, but I ran disk read tests several times with chunks that could not be fit in the cache and with random offset, so I am pretty sure that something around 500 MB/s is enough reasonably accurate (but it is only for sequential read).


it is not unusual to have 1GB cache or more...  and do not forget to drop the cache between tests + do a sync


> - is the OS caching disk blocks too? maybe you want to drop everything from there too.
How can I find it out? And how to drop it? Or you mean hypervisor OS?
Anyway, don't you think that caching specifics could not really explain these issues?

Sorry I meant the hypervisor OS.

Given that the most of the time is on the I/O then caching is maybe playing a role.

I tried to reproduce your problem but I cannot go even closer to your results. Everything goes smooth with or without shared buffers, or OS cache.

A few questions and considerations came to mind:

- how big is your index?

- how big is the table?

- given the size of shared_buffers, almost 2M blocks should fit, but you say 2 consecutive runs still are hitting the disk. That's strange indeed since you are using way more than 2M blocks.
Did you check that perhaps are there any other processes or cronjobs (on postgres and on the system) that are maybe reading data and flushing out the cache?

You can make use of pg_buffercache in order to see what is actually cached. That might help to have an overview of the content of it.

- As Laurenz suggested (VACUUM FULL), you might want to move data around. You can try also a dump + restore to narrow the problem to data or disk

- You might also want to try to see the disk graph of Windows, while you are running your tests. It can show you if data (and good to know how much) is actually fetching from disk or not.

regards,

fabio pardi

pgsql-performance by date:

Previous
From: Vladimir Ryabtsev
Date:
Subject: Re: Why could different data in a table be processed with different performance?
Next
From: legrand legrand
Date:
Subject: Explain is slow with tables having many columns