Re: Do all Postgres queries touch Shared_Buffers at some point? - Mailing list pgsql-general

From Shiv Sharma
Subject Re: Do all Postgres queries touch Shared_Buffers at some point?
Date
Msg-id CA+LWa-LuSYyp9DmOC3j5sPBABFDnUgBo-GG0=V6+JPOR9MZbZw@mail.gmail.com
Whole thread Raw
In response to Re: Do all Postgres queries touch Shared_Buffers at some point?  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-general
Thanks.  We are on Greenplum GP 4.2 (Postgres 8.2).  As per GP suggestions, we have 6 primary/6 mirror instances on each server. The server has 64 G RAM, and shared_buffers is at ...125 MB :-).  I suppose the idea is for the OS buffer cache to do the legwork.

But still...performance is at least "not bad".  If all HASH JOIN queries touch shared_buffers in some way, I find it non-intuitive that we can have concurrent hash queries involving big tables (100M+ joined with say 100K), all apparently using the 125MB shared_buffers in some way, and yet giving reasonable performance.

Basically what is the anatomy of a hash join involving large tables? Disk->Shared_Buffers->Hash Join areas? something like that?


On Sun, Dec 29, 2013 at 9:18 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sun, Dec 29, 2013 at 9:05 PM, Shiv Sharma <shiv.sharma.1835@gmail.com> wrote:
> I am puzzled about the extent to which shared_bufferes is used for different
> queries.  Do _all_ queries "touch" shared buffers at some point of their
> execution?
>
> Many of our warehouse queries are  seq_scan followed by HASH.   I know
> work_mem is assigned for HASH joins:  but does this mean that these queries
> never touch shared buffers at *all* during their execution?  Perhaps they
> are read into shared_buffers and then passed into work_mem HASH areas???
>
> What about updates on big tables? What about inserts  on big tables? What
> about append-inserts?
>
> I think I could get these answers from Explain Analyze Buffers but I am on
> 8.2 :-(
>
> Please tell me which queries use/touch shared_buffers in general terms, or
> please point me to documentation.
All your queries that interacts with relations.

shared_buffers is used for data caching across all the backends of the
server, to put it simply pages of the relation involved. Such data can
be relation data, like data of a table you defined yourself, index
data, or some system catalog data, containing definitions of the
database objects. So simply everything that is a relation and contains
physical data might be in shared buffers. Views for example do not
enter in this category.

You could for example use pg_buffercache to have a look at what
contains the shared buffers:
http://www.postgresql.org/docs/devel/static/pgbuffercache.html

Regards,
--
Michael

pgsql-general by date:

Previous
From: Sameer Kumar
Date:
Subject: Re: PG replication across DataCenters
Next
From: Joe Van Dyk
Date:
Subject: Re: Replication failed after stalling