Re: [PoC] pgstattuple2: block sampling to reduce physical read - Mailing list pgsql-hackers
From | Mark Kirkwood |
---|---|
Subject | Re: [PoC] pgstattuple2: block sampling to reduce physical read |
Date | |
Msg-id | 525DC309.4070401@catalyst.net.nz Whole thread Raw |
In response to | Re: [PoC] pgstattuple2: block sampling to reduce physical read (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) |
List | pgsql-hackers |
On 11/10/13 17:49, Mark Kirkwood wrote: > On 11/10/13 17:08, Satoshi Nagayasu wrote: >> (2013/10/11 7:32), Mark Kirkwood wrote: >>> On 11/10/13 11:09, Mark Kirkwood wrote: >>>> On 16/09/13 16:20, Satoshi Nagayasu wrote: >>>>> (2013/09/15 11:07), Peter Eisentraut wrote: >>>>>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote: >>>>>>> I'm looking forward to seeing more feedback on this approach, >>>>>>> in terms of design and performance improvement. >>>>>>> So, I have submitted this for the next CF. >>>>>> Your patch fails to build: >>>>>> >>>>>> pgstattuple.c: In function ‘pgstat_heap_sample’: >>>>>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in >>>>>> this function) >>>>>> pgstattuple.c:737:13: note: each undeclared identifier is reported >>>>>> only once for each function it appears in >>>>> Thanks for checking. Fixed to eliminate SnapshotNow. >>>>> >>>> This seems like a cool idea! I took a quick look, and initally >>>> replicated the sort of improvement you saw: >>>> >>>> >>>> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >>>> QUERY PLAN >>>> >>>> -------------------------------------------------------------------------------- >>>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >>>> time=786.368..786.369 rows=1 loops=1) >>>> Total runtime: 786.384 ms >>>> (2 rows) >>>> >>>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >>>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >>>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >>>> QUERY PLAN >>>> >>>> -------------------------------------------------------------------------------- >>>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >>>> time=12.004..12.005 rows=1 loops=1) >>>> Total runtime: 12.019 ms >>>> (2 rows) >>>> >>>> >>>> >>>> I wondered what sort of difference eliminating caching would make: >>>> >>>> $ sudo sysctl -w vm.drop_caches=3 >>>> >>>> Repeating the above queries: >>>> >>>> >>>> bench=# explain analyze select * from pgstattuple('pgbench_accounts'); >>>> QUERY PLAN >>>> >>>> -------------------------------------------------------------------------------- >>>> Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual >>>> time=9503.774..9503.776 rows=1 loops=1) >>>> Total runtime: 9504.523 ms >>>> (2 rows) >>>> >>>> bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); >>>> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, >>>> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 >>>> QUERY PLAN >>>> >>>> -------------------------------------------------------------------------------- >>>> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual >>>> time=12330.630..12330.631 rows=1 loops=1) >>>> Total runtime: 12331.353 ms >>>> (2 rows) >>>> >>>> >>>> So the sampling code seems *slower* when the cache is completely cold - >>>> is that expected? (I have not looked at how the code works yet - I'll >>>> dive in later if I get a chance)! >> Thanks for testing that. It would be very helpful to improve the >> performance. >> >>> Quietly replying to myself - looking at the code the sampler does 3000 >>> random page reads... I guess this is slower than 163935 (number of pages >>> in pgbench_accounts) sequential page reads thanks to os readahead on my >>> type of disk (WD Velociraptor). Tweaking the number of random reads (i.e >>> the sample size) down helps - but obviously that can impact estimation >>> accuracy. >>> >>> Thinking about this a bit more, I guess the elapsed runtime is not the >>> *only* theng to consider - the sampling code will cause way less >>> disruption to the os page cache (3000 pages vs possibly lots more than >>> 3000 for reading an entire ralation). >>> >>> Thoughts? >> I think it could be improved by sorting sample block numbers >> *before* physical block reads in order to eliminate random access >> on the disk. >> >> pseudo code: >> -------------------------------------- >> for (i=0 ; i<SAMPLE_SIZE ; i++) >> { >> sample_block[i] = random(); >> } >> >> qsort(sample_block); >> >> for (i=0 ; i<SAMPLE_SIZE ; i++) >> { >> buf = ReadBuffer(rel, sample_block[i]); >> >> do_some_stats_stuff(buf); >> } >> -------------------------------------- >> >> I guess it would be helpful for reducing random access thing. >> >> Any comments? > Ah yes - that's a good idea (rough patch to your patch attached)! > > bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); > NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, > dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 > QUERY PLAN > > -------------------------------------------------------------------------------- > Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual > time=9968.318..9968.319 rows=1 loops=1) > Total runtime: 9968.443 ms > (2 rows) > Actually - correcting my compare function to sort the blocks in *increasing* order (doh), gets a better result: bench=# explain analyze select * from pgstattuple2('pgbench_accounts'); NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00, dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00 QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual time=7055.840..7055.841 rows=1 loops=1) Total runtime: 7055.954 ms (2 rows)
pgsql-hackers by date: