Re: [PoC] pgstattuple2: block sampling to reduce physical read - Mailing list pgsql-hackers

From Satoshi Nagayasu
Subject Re: [PoC] pgstattuple2: block sampling to reduce physical read
Date
Msg-id 525779C5.2020608@uptime.jp
Whole thread Raw
In response to Re: [PoC] pgstattuple2: block sampling to reduce physical read  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: [PoC] pgstattuple2: block sampling to reduce physical read  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
List pgsql-hackers
(2013/10/11 7:32), Mark Kirkwood wrote:
> On 11/10/13 11:09, Mark Kirkwood wrote:
>> On 16/09/13 16:20, Satoshi Nagayasu wrote:
>>> (2013/09/15 11:07), Peter Eisentraut wrote:
>>>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
>>>>> I'm looking forward to seeing more feedback on this approach,
>>>>> in terms of design and performance improvement.
>>>>> So, I have submitted this for the next CF.
>>>> Your patch fails to build:
>>>>
>>>> pgstattuple.c: In function ‘pgstat_heap_sample’:
>>>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
>>>> this function)
>>>> pgstattuple.c:737:13: note: each undeclared identifier is reported
>>>> only once for each function it appears in
>>> Thanks for checking. Fixed to eliminate SnapshotNow.
>>>
>> This seems like a cool idea! I took a quick look, and initally
>> replicated the sort of improvement you saw:
>>
>>
>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------
>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>> time=786.368..786.369 rows=1 loops=1)
>> Total runtime: 786.384 ms
>> (2 rows)
>>
>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------
>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>> time=12.004..12.005 rows=1 loops=1)
>> Total runtime: 12.019 ms
>> (2 rows)
>>
>>
>>
>> I wondered what sort of difference eliminating caching would make:
>>
>> $ sudo sysctl -w vm.drop_caches=3
>>
>> Repeating the above queries:
>>
>>
>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------
>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>> time=9503.774..9503.776 rows=1 loops=1)
>> Total runtime: 9504.523 ms
>> (2 rows)
>>
>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------
>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>> time=12330.630..12330.631 rows=1 loops=1)
>> Total runtime: 12331.353 ms
>> (2 rows)
>>
>>
>> So the sampling code seems *slower* when the cache is completely cold -
>> is that expected? (I have not looked at how the code works yet - I'll
>> dive in later if I get a chance)!

Thanks for testing that. It would be very helpful to improve the
performance.

> Quietly replying to myself - looking at the code the sampler does 3000
> random page reads... I guess this is slower than 163935 (number of pages
> in pgbench_accounts) sequential page reads thanks to os readahead on my
> type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
> the sample size) down helps - but obviously that can impact estimation
> accuracy.
> 
> Thinking about this a bit more, I guess the elapsed runtime is not the
> *only* theng to consider - the sampling code will cause way less
> disruption to the os page cache (3000 pages vs possibly lots more than
> 3000 for reading an entire ralation).
> 
> Thoughts?

I think it could be improved by sorting sample block numbers
*before* physical block reads in order to eliminate random access
on the disk.

pseudo code:
--------------------------------------
for (i=0 ; i<SAMPLE_SIZE ; i++)
{   sample_block[i] = random();
}

qsort(sample_block);

for (i=0 ; i<SAMPLE_SIZE ; i++)
{   buf = ReadBuffer(rel, sample_block[i]);
   do_some_stats_stuff(buf);
}
--------------------------------------

I guess it would be helpful for reducing random access thing.

Any comments?
-- 
Satoshi Nagayasu <snaga@uptime.jp>
Uptime Technologies, LLC. http://www.uptime.jp



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Compression of full-page-writes
Next
From: Amit Kapila
Date:
Subject: Re: Patch for reserved connections for replication users