Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables? - Mailing list pgsql-hackers
From | dg@illustra.com (David Gould) |
---|---|
Subject | Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables? |
Date | |
Msg-id | 9803122200.AA28389@hawk.illustra.com Whole thread Raw |
In response to | Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables? (Bruce Momjian <maillist@candle.pha.pa.us>) |
Responses |
Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables?
Re: [HACKERS] Re: [QUESTIONS] Does Storage Manager support >2GB tables? |
List | pgsql-hackers |
Bruce Momjian writes: > I am adding this to the TODO list: > > * Do async I/O to do better read-ahead of data Good. > Because we are not threaded, we really can't do anything else while we > are waiting for I/O, but we can pre-request data we know we will need. Threading is a bit like raw devices. It sounds like a really good idea, particularly with M$ banging the "NT, now with threads" drum, but in real life there are some very good reasons not to thread. Particularly with an extensible product like Postgres where J-Random routine gets loaded at runtime. In a threaded system, J-Random routine needs to be pretty well perfect or the whole system comes down. In a process based system, unless it trashes something in the shared memory, only the one connection instance needs to come down. My experience with Illustra says that this is fairly important. The other big problem with threading is that now the stacks and all dynamically allocated data are in the shared memory and are not easily extendable. So, if some recursive procedure (like in the rewriter) uses a bit of extra stack some other thread gets its stack trashed. This is never pretty. Or if some user function loads a giant large object (like an mpeg say), that memory has to come out of the shared memory, now if they leak that memory, it is gone for good. In a per process system, it just ends up using a little more swap space. The other thing threading does is introduce new synchronization requirements into things that never needed it before. For example, Postgres is pretty free with memory allocation and deallocation (think Nodes!). With threading each palloc() and pfree() is going to have to take a round trip through a mutex. This makes an expensive operation even more costly. By and large, the dbms's that are threaded have had pretty static (ie pre-allocate every thing in arrays at boot time) memory models. Postgres does not fit this picture very well. Ultimately, threading may buy some performance, but not very much compared to how hard it is to get right and how brittle it tends to make the system. Unless I have misunderstood the state of Postgres, there is a vast amount of performance improvement to be had without even thinking about threading. If it were me, I would pick up the easy stuff, then the middle-hard stuff with the really big wins like a proper transaction log, and leave the very hard stuff like threading until last. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 "Of course, someone who knows more about this will correct me if I'm wrong, and someone who knows less will correct me if I'm right." --David Palmer (palmer@tybalt.caltech.edu)
pgsql-hackers by date: