Re: ANALYZE sampling is too good - Mailing list pgsql-hackers

From Mark Kirkwood
Subject Re: ANALYZE sampling is too good
Date
Msg-id 52A686CD.6050104@catalyst.net.nz
Whole thread Raw
In response to Re: ANALYZE sampling is too good  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: ANALYZE sampling is too good
List pgsql-hackers
On 10/12/13 15:32, Mark Kirkwood wrote:
> On 10/12/13 15:17, Mark Kirkwood wrote:
>> On 10/12/13 15:11, Mark Kirkwood wrote:
>>> On 10/12/13 15:04, Mark Kirkwood wrote:
>>>> On 10/12/13 13:53, Mark Kirkwood wrote:
>>>>> On 10/12/13 13:20, Mark Kirkwood wrote:
>>>>>> On 10/12/13 13:14, Mark Kirkwood wrote:
>>>>>>> On 10/12/13 12:14, Heikki Linnakangas wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> I took a stab at using posix_fadvise() in ANALYZE. It turned 
>>>>>>>> out to be very easy, patch attached. Your mileage may vary, but 
>>>>>>>> I'm seeing a nice gain from this on my laptop. Taking a 30000 
>>>>>>>> page sample of a table with 717717 pages (ie. slightly larger 
>>>>>>>> than RAM), ANALYZE takes about 6 seconds without the patch, and 
>>>>>>>> less than a second with the patch, with 
>>>>>>>> effective_io_concurrency=10. If anyone with a good test data 
>>>>>>>> set loaded would like to test this and post some numbers, that 
>>>>>>>> would be great.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I did a test run:
>>>>>>>
>>>>>>> pgbench scale 2000 (pgbench_accounts approx 25GB).
>>>>>>> postgres 9.4
>>>>>>>
>>>>>>> i7 3.5Ghz Cpu
>>>>>>> 16GB Ram
>>>>>>> 500 GB Velociraptor 10K
>>>>>>>
>>>>>>> (cold os and pg cache both runs)
>>>>>>> Without patch:  ANALYZE pgbench_accounts    90s
>>>>>>> With patch: ANALYZE pgbench_accounts  91s
>>>>>>>
>>>>>>> So I'm essentially seeing no difference :-(
>>>>>>
>>>>>>
>>>>>> Arrg - sorry forgot the important bits:
>>>>>>
>>>>>> Ubuntu 13.10 (kernel 3.11.0-14)
>>>>>> filesystem is ext4
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Doing the same test as above, but on a 80GB Intel 520 (ext4 
>>>>> filesystem mounted with discard):
>>>>>
>>>>> (cold os and pg cache both runs)
>>>>> Without patch:  ANALYZE pgbench_accounts  5s
>>>>> With patch: ANALYZE pgbench_accounts  5s
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> Redoing the filesystem on the 520 as btrfs didn't seem to make any 
>>>> difference either:
>>>>
>>>> (cold os and pg cache both runs)
>>>> Without patch:  ANALYZE pgbench_accounts  6.4s
>>>> With patch: ANALYZE pgbench_accounts  6.4s
>>>>
>>>>
>>>>
>>>
>>> Ah - I have just realized I was not setting effective_io_concurrency 
>>> - so I'll redo the test. - Apologies.
>>>
>>>
>>
>> Redoing the test on the velociraptor gives me exactly the same 
>> numbers as before (effective_io_concurrency = 10 instead of 1).
>>
>>
>
> Good grief - repeating the test gave:
>
> Without patch:  ANALYZE pgbench_accounts 90s
> With patch: ANALYZE pgbench_accounts  42s
>
> pretty consistent *now*. No idea what was going on in the 1st run 
> (maybe I happened to have it running at the same time as a 
> checkpoint)? Anyway will stop now before creating more confusion.
>
>

Just one more...

The Intel 520 with ext4:

Without patch:  ANALYZE pgbench_accounts 5s
With patch: ANALYZE pgbench_accounts  1s

And double checking -
With patch, but effective_io_concurrency = 1: ANALYZE pgbench_accounts  5s

These results look more like Heikki's. Which suggests more benefit on 
SSD than spinning disks. Some more data points (apart from mine) would 
be good to see tho.

Cheers

Mark







pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [patch] Adding EXTRA_REGRESS_OPTS to all pg_regress invocations
Next
From: Amit Kapila
Date:
Subject: Re: [bug fix] pg_ctl always uses the same event source