Re: Monitoring Object access - Mailing list pgsql-general

From adi hirschtein
Subject Re: Monitoring Object access
Date
Msg-id AANLkTik2-DbRDxmGSMuRv4Axm=wry8hhpJoHJOq50n+A@mail.gmail.com
Whole thread Raw
In response to Re: Monitoring Object access  (Craig Ringer <craig@postnewspapers.com.au>)
Responses Re: Monitoring Object access  (Craig Ringer <craig@postnewspapers.com.au>)
Re: Monitoring Object access  (Satoshi Nagayasu <satoshi.nagayasu@gmail.com>)
List pgsql-general
Hi Craig,

Thanks a lot for the quick response!
I'm coming from the Oracle side of the house and In oracle for instance, you use shared buffer as well, but you are still able to see which session is waiting for which blocks
and if one session is doing the "real" I/O then the other one wait on 'wait for other session" event so you are able to know who did the actual I/O
the reason behind it is that you want to check which objects is being heavily hit by which  business processes or users and then tier your storage accordingly.
I agree with your point about the OS buffer cache, I need to monitor it as well.
is there any place rather than pg_stat_activity that you think I should take a look at?

Best Regard,
Adi

On Sun, Sep 12, 2010 at 4:04 PM, Craig Ringer <craig@postnewspapers.com.au> wrote:
On 09/12/2010 06:52 PM, adi hirschtein wrote:
Hi,
Using the catalog tables, is there any way to correlate session id/user
id to which object (i.e. tables, indexes etc) it access and much how
disk reads or I/O wait has been done against the objects.
in general, I'd like to see which objects are being accessed by which
user and the time/amount of I/O wait/reads.

There isn't really anything like that, no. You have pg_stat_activity, but it's a pretty coarse tool.

The shared buffer cache and the use of things like synchronized sequential scans means that it wouldn't even be possible to truly track who's causing load reliably. As I understand it, if Joe and Fred both to a "SELECT * FROM sometable", it's quite likely that only one of the backends will actually appear to read anything from disk. Which one is pretty much luck.

Adding something like this would require instrumenting not only the backends' disk I/O code, but also their shared buffer access code. It'd potentially add a lot of overhead, and I'm not sure the results would mean very much because the caching effects would make "fred's backend did X I/O operations" less meaningful. Fred's might just have been the first that ran after Joe's giant seqscan cross join of two big tables that forced everything else out of shared_buffers.

Even if you did have that instrumentation, you'd need OS-level instrumentation (dtrace, prof, etc) to track the OS's buffer cache, which PostgreSQL relies on heavily. Without that you can't tell the difference between a query that caused I/O calls from postgresql but could be satisfied by OS buffer cache and one that required expensive physical disk I/O to satisfy.

Really, I don't know if it's realistic to do what you're suggesting unless every user's data set is isolated from every other user's, in which case you can maybe use OS-level tools like prof or DTrace to achieve it if you set postgresql up to log whichstarts which backend pid.

If there's any significant overlap in the data sets used by the users (if they use the same databases or schema) I wouldn't think you'd be able to get any meaningful results.

--
Craig Ringer

pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Incrementally Updated Backups
Next
From: Tom Lane
Date:
Subject: Re: why can't see the updated value after SPI_execute("update ....", false, 1);