Thread: Database block lifecycle
Yesterday I had an interesting discussion with my colleague about shared buffers size for our new server. This machine (is dedicated for db) has got 512GB of RAM and database size is about 80GB, so he assumes that db will never have to read from disk, so there is no point to adjust read ahead setting, because every block gonna be read from RAM. As I've red in Greg Smith book, once a block is changed it will be written to a disk and buffers page is marked as clean, which would mean than changes occur in the same page as before? What if dirty page doesn't have enough space for another row and row has to be written to another page? Is it still occurs in RAM? If that's true all updates of FSM occurs in RAM as well? What about buffers_clean and pg_clog then? Are those maintained completely in RAM as well without direct read from disk at all? To be precise, does the path to update and read updated row looks like a or b?: a). clean page (shared buffers) -> dirty page (shared buffers) -> to disk -> read from disk -> shared buffers -> query b). clean page (shared buffers) -> dirty page (shared buffers) -> to disk & dirty page (shared buffers) -> clean page (shared buffers) -> query btw. 512MB if we assume up to 600 connection is a reasonable value? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Database-block-lifecycle-tp5814627.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On 8/12/2014 2:41 PM, pinker wrote: > btw. 512MB if we assume up to 600 connection is a reasonable value? thats an insanely high connection count, if you actually expect those connections to be executing concurrent queries, unless you have something north of 100 CPU cores. you'd be much better to have a MUCH smaller connection count, and use a connection pooler such as pgbouncer, in transaction mode... let 600 client htreads connect to the pooler, but have the pooler share maybe 4X your CPU core/thread count of actual connections for transactions in progress. -- john r pierce 37N 122W somewhere on the middle of the left coast
yes, I know the count is quite high. It is the max value we've estimated, but probably on average day it will be 100-200, and yes we use pgpool. Am 13.08.2014 00:09, schrieb John R Pierce: > On 8/12/2014 2:41 PM, pinker wrote: >> btw. 512MB if we assume up to 600 connection is a reasonable value? > > thats an insanely high connection count, if you actually expect those > connections to be executing concurrent queries, unless you have > something north of 100 CPU cores. > > you'd be much better to have a MUCH smaller connection count, and use > a connection pooler such as pgbouncer, in transaction mode... let 600 > client htreads connect to the pooler, but have the pooler share maybe > 4X your CPU core/thread count of actual connections for transactions > in progress. > > > > >
On 8/12/2014 3:29 PM, pinker wrote: > yes, I know the count is quite high. It is the max value we've > estimated, but probably on average day it will be 100-200, and yes we > use pgpool. if you're using a pooler, then why would you be using 200 concurrent connections, unless you have a 50 or 100 CPU cores/threads ? if you have 1000 transactions to execute on a 32 core server, and you try and do 200 at once, it will take longer than if you do 64 at a time and let the rest queue up. -- john r pierce 37N 122W somewhere on the middle of the left coast
Ok, I wasn't precisely enough, you are right. It's brand new server, nothing is yet configured and we have not even os installed. The number was the overall count we expect for a whole cluster. But the main question is: is it possible to completely avoid disk read if there is huge amount of RAM available? Am 13.08.2014 00:39, schrieb John R Pierce: > On 8/12/2014 3:29 PM, pinker wrote: >> yes, I know the count is quite high. It is the max value we've >> estimated, but probably on average day it will be 100-200, and yes we >> use pgpool. > > > if you're using a pooler, then why would you be using 200 concurrent > connections, unless you have a 50 or 100 CPU cores/threads ? > > if you have 1000 transactions to execute on a 32 core server, and you > try and do 200 at once, it will take longer than if you do 64 at a > time and let the rest queue up. >
On 8/12/2014 3:52 PM, pinker wrote: > Ok, I wasn't precisely enough, you are right. It's brand new server, > nothing is yet configured and we have not even os installed. The > number was the overall count we expect for a whole cluster. > > But the main question is: is it possible to completely avoid disk read > if there is huge amount of RAM available? the OS file cache will ensure that. -- john r pierce 37N 122W somewhere on the middle of the left coast
> Yesterday I had an interesting discussion with my colleague about shared > buffers size for our new server. This machine (is dedicated for db) has got > 512GB of RAM and database size is about 80GB, so he assumes that db will > never have to read from disk, so there is no point to adjust read ahead > setting, because every block gonna be read from RAM. As I've red in Greg > Smith book, once a block is changed it will be written to a disk and buffers > page is marked as clean, which would mean than changes occur in the same > page as before? What if dirty page doesn't have enough space for another row > and row has to be written to another page? Is it still occurs in RAM? If > that's true all updates of FSM occurs in RAM as well? > > What about buffers_clean and pg_clog then? Are those maintained completely > in RAM as well without direct read from disk at all? As long as they are on shared buffers, they are read from the buffer, not from disk. > To be precise, does the path to update and read updated row looks like a or > b?: > a). clean page (shared buffers) -> dirty page (shared buffers) -> to disk -> > read from disk -> shared buffers -> query > b). clean page (shared buffers) -> dirty page (shared buffers) -> to disk > & dirty page (shared buffers) -> clean page (shared buffers) -> query I'm not sure what you exactly mean by a) or b) but both look incorrect for me. A reader can read a page from shared bufferes even if it's dirty. So: clean page (shared buffers) -> dirty page (shared buffers) -> query will be closer to the reality. Note that dirty page will be written by bgwriter process at different timing. Also note that I completely ignore lock or buffer replacement algorithm. Please read src/backend/storage/buffer/README for more precise information. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
On Tuesday, August 12, 2014, pinker <pinker@onet.eu> wrote:
Yesterday I had an interesting discussion with my colleague about shared
buffers size for our new server. This machine (is dedicated for db) has got
512GB of RAM and database size is about 80GB, so he assumes that db will
never have to read from disk,
Do you ever plan on restarting this server? Doing maintenance? Applying security patches?
so there is no point to adjust read ahead
setting, because every block gonna be read from RAM. As I've red in Greg
Smith book, once a block is changed it will be written to a disk and buffers
page is marked as clean, which would mean than changes occur in the same
page as before? What if dirty page doesn't have enough space for another row
and row has to be written to another page? Is it still occurs in RAM? If
that's true all updates of FSM occurs in RAM as well?
None of that still should need to read from disk regularly once the database is warmed up.
What about buffers_clean and pg_clog then? Are those maintained completely
in RAM as well without direct read from disk at all?
To be precise, does the path to update and read updated row looks like a or
b?:
a). clean page (shared buffers) -> dirty page (shared buffers) -> to disk ->
read from disk -> shared buffers -> query
b). clean page (shared buffers) -> dirty page (shared buffers) -> to disk
& dirty page (shared buffers) -> clean page (shared buffers) -> query
More like b), but you are missing all the states that involve "clean in shared_buffers, dirty in FS cache" and such.
btw. 512MB if we assume up to 600 connection is a reasonable value?
Reasonable value for what?
Cheers,
Jeff
Jeff Janes wrote > On Tuesday, August 12, 2014, pinker < > pinker@ > > wrote: > >> Do you ever plan on restarting this server? Doing maintenance? Applying >> security patches? > > Sure, I assumed when db is up and running, of course after first read from > disk when whole data should be in RAM. > > >> More like b), but you are missing all the states that involve "clean in >> shared_buffers, dirty in FS cache" and such. > > Ok, so modified block is taken from shared_buffers or from RAM when > needed, and is readed always from shared buffers? > > >> >> btw. 512MB if we assume up to 600 connection is a reasonable value? >> > >>Reasonable value for what? > > For normal server load. > > Cheers, > > Jeff -- View this message in context: http://postgresql.1045698.n5.nabble.com/Database-block-lifecycle-tp5814627p5814672.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
On Wed, Aug 13, 2014 at 1:07 AM, pinker <pinker@onet.eu> wrote:
>>> For normal server load.
>> btw. 512MB if we assume up to 600 connection is a reasonable value?
>>
>
>>Reasonable value for what?
>
512MB is being questioned as a reasonable value for what? shared_buffers? work_mem? maintenance_work_mem?
Cheers,
Jeff
On 13/08/2014 17:23, Jeff Janes wrote:
On Wed, Aug 13, 2014 at 1:07 AM, pinker <pinker@onet.eu> wrote:>>> For normal server load.
>> btw. 512MB if we assume up to 600 connection is a reasonable value?
>>
>
>>Reasonable value for what?
>512MB is being questioned as a reasonable value for what? shared_buffers? work_mem? maintenance_work_mem?Cheers,Jeff
Generally speaking, folk imagine that DBMS performance is all about disk access - in reality chucking as much memory as possible at the server(s) is an optimal investment. analyse your queries and store time critical stuff in memory
R+C