Re: Parallel Seq Scan vs kernel read ahead - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Seq Scan vs kernel read ahead |
Date | |
Msg-id | CAApHDvrfJfYH51_WY-iQqPw8yGR4fDoTxAQKqn+Sa7NTKEVWtg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan vs kernel read ahead (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Parallel Seq Scan vs kernel read ahead
Re: Parallel Seq Scan vs kernel read ahead |
List | pgsql-hackers |
On Thu, 21 May 2020 at 14:32, Thomas Munro <thomas.munro@gmail.com> wrote: > Thanks. So it seems like Linux, Windows and anything using ZFS are > OK, which probably explains why we hadn't heard complaints about it. I tried out a different test on a Windows 8.1 machine I have here. I was concerned that the test that was used here ends up with tuples that are too narrow and that the executor would spend quite a bit of time going between nodes and performing the actual aggregation. I thought it might be good to add some padding so that there are far fewer tuples on the page. I ended up with: create table t (a int, b text); -- create a table of 100GB in size. insert into t select x,md5(x::text) from generate_series(1,1000000*1572.7381809)x; -- took 1 hr 18 mins vacuum freeze t; query = select count(*) from t; Disk = Samsung SSD 850 EVO mSATA 1TB. Master: workers = 0 : Time: 269104.281 ms (04:29.104) 380MB/s workers = 1 : Time: 741183.646 ms (12:21.184) 138MB/s workers = 2 : Time: 656963.754 ms (10:56.964) 155MB/s Patched: workers = 0 : Should be the same as before as the code for this didn't change. workers = 1 : Time: 300299.364 ms (05:00.299) 340MB/s workers = 2 : Time: 270213.726 ms (04:30.214) 379MB/s (A better query would likely have been just: SELECT * FROM t WHERE a = 1; but I'd run the test by the time I thought of that.) So, this shows that Windows, at least 8.1, does suffer from this too. For the patch. I know you just put it together quickly, but I don't think you can do that ramp up the way you have. It looks like there's a risk of torn reads and torn writes and I'm unsure how much that could affect the test results here. It looks like there's a risk that a worker gets some garbage number of pages to read rather than what you think it will. Also, I also don't quite understand the need for a ramp-up in pages per serving. Shouldn't you instantly start at some size and hold that, then only maybe ramp down at the end so that workers all finish at close to the same time? However, I did have other ideas which I'll explain below. From my previous work on that function to add the atomics. I did think that it would be better to dish out more than 1 page at a time. However, there is the risk that the workload is not evenly distributed between the workers. My thoughts were that we could divide the total pages by the number of workers then again by 100 and dish out blocks based on that. That way workers will get about 100th of their fair share of pages at once, so assuming there's an even amount of work to do per serving of pages, then the last worker should only run on at most 1% longer. Perhaps that 100 should be 1000, then the run on time for the last worker is just 0.1%. Perhaps the serving size can also be capped at some maximum like 64. We'll certainly need to ensure it's at least 1! I imagine that will eliminate the need for any ramp down of pages per serving near the end of the scan. David
pgsql-hackers by date: