Re: Trouble with hashagg spill I/O pattern and costing - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Trouble with hashagg spill I/O pattern and costing |
Date | |
Msg-id | 20200525225909.aifxjm53hw4xyzxd@development Whole thread Raw |
In response to | Re: Trouble with hashagg spill I/O pattern and costing (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Trouble with hashagg spill I/O pattern and costing
|
List | pgsql-hackers |
On Mon, May 25, 2020 at 12:49:45PM -0700, Jeff Davis wrote: >On Mon, 2020-05-25 at 14:17 +0200, Tomas Vondra wrote: >> It's still ~2x slower than the sort, so presumably we'll need to >> tweak >> the costing somehow. > >One thing to think about is that the default random_page_cost is only >4X seq_page_cost. We know that's complete fiction, but it's meant to >paper over the OS caching effects. It seems like that shortcut may be >what's hurting us now. > >HashAgg counts 1/2 of the page accesses as random, whereas Sort only >counts 1/4 as random. If the random_page_cost were closer to reality, >HashAgg would already be penalized substantially. It might be >interesting to test with higher values of random_page_cost and see what >the planner does. > Hmmm, good point. I've tried bumping the random_page_cost up (to 40) and the bogus plans disappeared - instead, the parallel plans switch from hashagg to groupagg at 4 workers. There's a lot of parameters affecting this, though (e.g. higher work_mem -> more workers use hashagg). One thing annoys me, though. The costing just ignores costs set on the (temporary) tablespace, which confused me at first because ALTER TABLESPACE temp SET (random_page_cost = 40); had absolutely no effect. It's not the fault of this patch and it's actually understandable (we can have multiple temp tablespaces, and we don't know which one will end up being used). I wonder if it would be a good idea to have "temp" version of those cost variables, applied to all temporary tablespaces ... Currently I have to do the inverse thing - tweak this for all regular tablespaces. >If we want to be a bit more conservative, I'm fine with adding a >general penalty against a HashAgg that we expect to spill (multiply the >disk costs by some factor). We can consider removing the penalty in >v14. > Not sure, I need to look at the costing a bit. I see all page writes are random while all page reads are sequential. Shouldn't this consider the number of tapes and that we pre-allocate blocks? Essentially, We split the whole file into chunks of 128 blocks that can be read sequentially, and switch between blocks are random. Not sure if that's necessary, though. If increasing random_page_cost pushes the plans to groupagg, then maybe that's good enough. Let's push the two fixes that we already have. These extra questions clearly need more investigation and testing, and I'm not even sure it's something we should pursue for v13. >> I do belive this is still due to differences in I/O >> patterns, with parallel hashagg probably being a bit more random (I'm >> deducing that from SSD not being affected by this). > >Do you think the difference in IO patterns is due to a difference in >handling reads vs. writes in the kernel? Or do you think that 128 >blocks is not enough to amortize the cost of a seek for that device? > I don't know. I kinda imagined it was due to the workers interfering with each other, but that should affect the sort the same way, right? I don't have any data to support this, at the moment - I can repeat the iosnoop tests and analyze the data, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: