Re: [PoC] pgstattuple2: block sampling to reduce physical read - Mailing list pgsql-hackers

From Mark Kirkwood
Subject Re: [PoC] pgstattuple2: block sampling to reduce physical read
Date
Msg-id 525DC309.4070401@catalyst.net.nz
Whole thread Raw
In response to Re: [PoC] pgstattuple2: block sampling to reduce physical read  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
List pgsql-hackers
On 11/10/13 17:49, Mark Kirkwood wrote:
> On 11/10/13 17:08, Satoshi Nagayasu wrote:
>> (2013/10/11 7:32), Mark Kirkwood wrote:
>>> On 11/10/13 11:09, Mark Kirkwood wrote:
>>>> On 16/09/13 16:20, Satoshi Nagayasu wrote:
>>>>> (2013/09/15 11:07), Peter Eisentraut wrote:
>>>>>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
>>>>>>> I'm looking forward to seeing more feedback on this approach,
>>>>>>> in terms of design and performance improvement.
>>>>>>> So, I have submitted this for the next CF.
>>>>>> Your patch fails to build:
>>>>>>
>>>>>> pgstattuple.c: In function ‘pgstat_heap_sample’:
>>>>>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
>>>>>> this function)
>>>>>> pgstattuple.c:737:13: note: each undeclared identifier is reported
>>>>>> only once for each function it appears in
>>>>> Thanks for checking. Fixed to eliminate SnapshotNow.
>>>>>
>>>> This seems like a cool idea! I took a quick look, and initally
>>>> replicated the sort of improvement you saw:
>>>>
>>>>
>>>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>>>> QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>>>> time=786.368..786.369 rows=1 loops=1)
>>>> Total runtime: 786.384 ms
>>>> (2 rows)
>>>>
>>>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>>>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>>>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>>>> QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>>>> time=12.004..12.005 rows=1 loops=1)
>>>> Total runtime: 12.019 ms
>>>> (2 rows)
>>>>
>>>>
>>>>
>>>> I wondered what sort of difference eliminating caching would make:
>>>>
>>>> $ sudo sysctl -w vm.drop_caches=3
>>>>
>>>> Repeating the above queries:
>>>>
>>>>
>>>> bench=# explain analyze select * from pgstattuple('pgbench_accounts');
>>>> QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
>>>> time=9503.774..9503.776 rows=1 loops=1)
>>>> Total runtime: 9504.523 ms
>>>> (2 rows)
>>>>
>>>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
>>>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
>>>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
>>>> QUERY PLAN
>>>>
>>>> --------------------------------------------------------------------------------
>>>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
>>>> time=12330.630..12330.631 rows=1 loops=1)
>>>> Total runtime: 12331.353 ms
>>>> (2 rows)
>>>>
>>>>
>>>> So the sampling code seems *slower* when the cache is completely cold -
>>>> is that expected? (I have not looked at how the code works yet - I'll
>>>> dive in later if I get a chance)!
>> Thanks for testing that. It would be very helpful to improve the
>> performance.
>>
>>> Quietly replying to myself - looking at the code the sampler does 3000
>>> random page reads... I guess this is slower than 163935 (number of pages
>>> in pgbench_accounts) sequential page reads thanks to os readahead on my
>>> type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
>>> the sample size) down helps - but obviously that can impact estimation
>>> accuracy.
>>>
>>> Thinking about this a bit more, I guess the elapsed runtime is not the
>>> *only* theng to consider - the sampling code will cause way less
>>> disruption to the os page cache (3000 pages vs possibly lots more than
>>> 3000 for reading an entire ralation).
>>>
>>> Thoughts?
>> I think it could be improved by sorting sample block numbers
>> *before* physical block reads in order to eliminate random access
>> on the disk.
>>
>> pseudo code:
>> --------------------------------------
>> for (i=0 ; i<SAMPLE_SIZE ; i++)
>> {
>>     sample_block[i] = random();
>> }
>>
>> qsort(sample_block);
>>
>> for (i=0 ; i<SAMPLE_SIZE ; i++)
>> {
>>     buf = ReadBuffer(rel, sample_block[i]);
>>
>>     do_some_stats_stuff(buf);
>> }
>> --------------------------------------
>>
>> I guess it would be helpful for reducing random access thing.
>>
>> Any comments?
> Ah yes - that's a good idea (rough patch to your patch attached)!
>
> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
> QUERY PLAN
>
> --------------------------------------------------------------------------------
> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
> time=9968.318..9968.319 rows=1 loops=1)
> Total runtime: 9968.443 ms
> (2 rows)
>



Actually - correcting my compare function to sort the blocks in
*increasing* order (doh), gets a better result:

bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
time=7055.840..7055.841 rows=1 loops=1)
Total runtime: 7055.954 ms
(2 rows)




pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Next
From: Mark Kirkwood
Date:
Subject: Re: [PoC] pgstattuple2: block sampling to reduce physical read