Sequential scans - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Sequential scans |
Date | |
Msg-id | 4638918F.90805@enterprisedb.com Whole thread Raw |
Responses |
Re: Sequential scans
Re: Sequential scans |
List | pgsql-hackers |
Hi, I'm starting to review the "synchronized scans" and "scan-resistant buffer cache" patches. The patches have complex interactions so I'm taking a holistic approach. There's four outstanding issues with the sync scans in particular: 1. The simplistic hash approach. While it's nice to not have a lock, I'm worried of collisions. If you had a collision every now and then, it wouldn't be that bad, but because the hash value is computed from the oid, a collision would be persistent. If you create a database and happen to have two frequently seqscanned tables that collide, the only way to get rid of the collision is to drop and recreate a table. Granted, that'd probably be very rare in practice, but when it happens it would be next to impossible to figure out what's going on. Let's use a normal hash table instead, and use a lock to protect it. If we only update it every 10 pages or so, the overhead should be negligible. To further reduce contention, we could modify ReadBuffer to let the caller know if the read resulted in a physical read or not, and only update the entry when a page is physically read in. That way all the synchronized scanners wouldn't be updating the same value, just the one performing the I/O. And while we're at it, let's use the full relfilenode instead of just the table oid in the hash. 2. Under what circumstances does the patch help and when does it hurt? I think the patch is safe in that it should never be any worse than what we have now. But when does it help? That needs to be looked at together with the other patch. I need to dig the archives for the performance test results you posted earlier and try to understand them. There's six distinct scenarios I've come up with this far that need to be looked at: A. A seq scan on a small table B. A seq scan on a table that's 110% the size of shared_buffers, but smaller than RAM C. A seq scan on a table that's 110% the size of RAM D. A seq scan on a huge table E. Two simultaneous seq scans on a large table starting at the same time F. Two simultaneous seq scans on a large table, 2nd one starting when the 1st one is halfway through Also, does it change things if you have a bulk update instead of read-only query? How about bitmap heap scans and large index scans? And vacuums? And the above scenarios need to be considered both alone, and in the presence of other OLTP kind of workload. I realize that we can't have everything, and as long as we get some significant benefit in some scenarios, and don't hurt others, the patch is worthwhile. But let's try to cover as much as we reasonably can. One random idea I had to cover B & C without having the offset variable: Start scanning *backwards* from the page that's in the shared hash table, until you hit a page that's not in buffer cache. Then you continue scanning forwards from the page you started from. This needs more thought but I think we can come up with a pretty simple solution that covers the most common cases. 3. By having different backends doing the reads, are we destroying OS readahead as Tom suggested? I remember you performed some tests on that, and it was a problem on some systems but not on others. This needs some thought, there may be some simple way to address that. 4. It fails regression tests. You get an assertion failure on the portal test. I believe that changing the direction of a scan isn't handled properly; it's probably pretty easy to fix. Jeff, could you please fix 1 and 4? I'll give 2 and 3 some more thought, and take a closer look at the scan-resistant scans patch. Any comments and ideas are welcome, of course.. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: