Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes - Mailing list pgsql-bugs

From Greg Smith
Subject Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes
Date
Msg-id 4DFC3E7B.9060300@2ndQuadrant.com
Whole thread Raw
In response to Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes  (Антон Степаненко <zlobnynigga@yandex.ru>)
Responses Re: could not read block XXXXX in file "base/YYYYY/ZZZZZZ": read only 160 of 8192 bytes
List pgsql-bugs
bOn 06/17/2011 04:47 PM, áÎÔÏÎ óÔÅÐÁÎÅÎËÏ wrote:
> Memory for shared buffers can not be ovesubscribed - because if kernel
> did not provide enough shared memory postgres will not start.

The block is allocated at once.  But the amount of it that various
client backends end up touching varies as they run, slowly increasing
over time as they access more buffers.  After running for a while, the
individual processes will look like this:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2645 gsmith    20   0 12.3g 5.1g 5.1g D   45 32.8  16:59.19 postgres:
gsmith pgbench [local] SELECT

Where their virtual memory size becomes slightly larger than shared_buffers.

I tested this out on a Debian system here, set shared_buffers to 12GB
and beat on the server until every one of them was used by clients
(which is proven by how they've mapped the whole memory set in the
above).  It worked fine.

I suspect you're running into some sort of OpenVZ shared memory handling
bug.  The way it handles this is one of the more complicated, and
therefore likely to have odd failure cases, part of the design.  There's
notes at http://wiki.openvz.org/Postgresql_and_shared_memory about
container-specific things to tune here, so maybe there's just a setting
to tweak you've missed so far.  I'm guessing you already went through
that though.

A quick look around shows there are far more regularly reported bugs
like this in OpenVZ than there are in PostgreSQL, and Ubuntu is not
known for bug-free release practices either.  You're probably chasing
after the wrong thing trying to find a database problem here.  Likely to
end up in the same situation as the last one of these I remember:

http://archives.postgresql.org/pgsql-general/2009-10/msg00125.php
http://lists.debian.org/debian-kernel/2010/03/msg00401.html

...waiting for the OpenVZ problem that's the real cause to get fixed and
make its way to your distribution.

In your situation, I'd just use the smaller setting to avoid the known
problem, and try to focus my energy on finding a platform that isn't as
risky to deploy on instead.  Even if that's not your main deployment
one, just having something on real hardware to compare against would be
extremely valuable for isolating the problem here.

(And that's without even considering that setting shared_buffers so high
on Linux is more likely to slow the server than speed it up, which you
said you didn't want to discuss.  Just pointing it out so no one else
gets the wrong idea from your configuration.)

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

pgsql-bugs by date:

Previous
From: "Marinos Yannikos"
Date:
Subject: Re: Ident authentication fails due to bind error on server (8.4.8)
Next
From: "David Fetter"
Date:
Subject: BUG #6067: In PL/pgsql, EXISTS(SELECT ... INTO...) fails