Re: Extent Locks - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Extent Locks |
Date | |
Msg-id | 20130517035537.GX4361@tamriel.snowman.net Whole thread Raw |
In response to | Extent Locks (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: Extent Locks
|
List | pgsql-hackers |
* Robert Haas (robertmhaas@gmail.com) wrote: > I think it's pretty unrealistic to suppose that this can be made to > work. The most obvious problem is that a sequential scan is coded to > assume that every block between 0 and the last block in the relation > is worth reading, You don't change that. However, when a seq scan asks the storage layer for blocks that it knows don't actually exist, it can simply skip over them or return "empty" records or something equivilant... Yes, that's hand-wavy, but I also think it's doable. > I suspect there are > slightly less obvious problems that would turn out to be highly > intractable. Entirely possible. :) > The assumption that block numbers are dense is probably > embedded in the system in a lot of subtle ways; if we start trying to > change I think we're dooming ourselves to an unending series of crocks > trying to undo the mess we've created. Perhaps. > Also, I think that's really a red herring anyway. Relation extension > per se is not slow - we can grow a file by adding zero bytes at a > pretty good clip, and don't really gain anything at the database level > by spreading the growth across multiple files. That's true when the file is on a single filesystem and a single set of drives. Make them be split across multiple filesystems/volumes where you get more drives involved... > The problem is the > relation extension LOCK, and I think that's where we should be > focusing our attention. I'm pretty confident we can find a way to > take the pressure off the lock without actually changing anything all > at the storage layer. That would certainly be very neat and if possible might render my idea moot, which I would be more than happy with. > As a thought experiment, suppose for example > that we have a background process that knows, by magic, how many new > blocks will be needed in each relation. And it knows this just enough > in advance to have time to extend each such relation by the requisite > number of blocks and add those blocks to the free space map. Since > only that process ever needs a relation extension lock, there is no > longer any contention for any such lock. Problem solved! Sounds cute, but perhaps a bit too cute to be realistic (that's certainly been my opinion when suggested by others, which is has been, in the past). > Actually, I'm not convinced that a background process is the right > approach at all, and of course there's no actual magic that lets us > foresee exact extension needs. But I still feel like that thought > experiment indicates that there must be a solution here just by > rejiggering the locking, and maybe with a bit of modest pre-extension. > The mediocre results of my last couple tries must indicate that I > wasn't entirely successful in getting the backends out of each others' > way, but I tend to think that's just an indication that I don't > understand exactly what's happening in the contention scenarios yet, > rather than a fundamental difficulty with the approach. Perhaps. > > How many concurrent writers did you have and what kind of filesystem was > > backing this? Was it a temp filesystem where writes are essentially to > > memory, causing this relation extention lock to be much more > > contentious? > > 10. ext4. No. Ok. > If I took 30 seconds to pre-extend the relation before writing any > data into it, then writing the data went pretty much exactly 10 times > faster with 10 writers than with 1. That's rather fantastic.. > But small on-the-fly > pre-extensions during the write didn't work as well. I don't remember > exactly what formulas I tried, but I do remember that the few I tried > were not really any better than "always pre-extend by 1 extra block"; > and that alone eliminated about half the contention, but then I > couldn't do better. That seems quite odd to me- I would have thought extending by more than 2 blocks would have helped with the contention. Still, it sounds like extending requires a fair bit of writing, and that sucks in its own right because we're just going to rewrite that- is that correct? If so, I like proposal even more... > I wonder if I need to use LWLockAcquireOrWait(). I'm not seeing how/why that might help? Thanks, Stephen
pgsql-hackers by date: