Re: Reducing the size of BufferTag & remodeling forks - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Reducing the size of BufferTag & remodeling forks
Date
Msg-id CANP8+jL4NTXr=b+1iK9P=z_fhBxuW0=YW7jUA2bFJPsq-cuKnQ@mail.gmail.com
Whole thread Raw
In response to Reducing the size of BufferTag & remodeling forks  (Andres Freund <andres@anarazel.de>)
Responses Re: Reducing the size of BufferTag & remodeling forks  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 2 July 2015 at 14:36, Andres Freund <andres@anarazel.de> wrote:
Hi,

I've complained a number of times that our BufferTag is ridiculously
large:
typedef struct buftag
{
    RelFileNode rnode;          /* physical relation identifier */
    ForkNumber  forkNum;
    BlockNumber blockNum;       /* blknum relative to begin of reln */
} BufferTag;

typedef struct RelFileNode
{
    Oid         spcNode;        /* tablespace */
    Oid         dbNode;         /* database */
    Oid         relNode;        /* relation */
} RelFileNode;

that amounts to 20 bytes. That's problematic because we frequently have
to compare or hash the entire buffer tag. Comparing 20bytes is rather
branch intensive, and shows up noticably on profiles.  It's also a
stumbling block on the way to a smarter buffer mapping data structure,
because it makes e.g. trees rather deep.

The buffer tag is currently used in two situations:

1) Dealing with the buffer mapping, we need to identify the underlying
   file uniquely and we need the block number (8 bytes).

2) When writing out the a block we need, in addition to 1), have
   information about where to store the file. That requires the
   tablespace and database.

You may know that a filenode (RelFileNode->relNode) is currently *not*
unique across databases and tablespaces.

Why do we have to do buffer lookups using the full buffer tag?

Why not just use (relNode, blockNum) and resolve hash collisions, if any?

Your suggestion to avoid hashing the whole buffer tag was a good one. Having a permanent table to produce a smaller tag is a fairly pessimistic solution; why not just have an optimistic solution in memory instead? 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Next
From: Amit Kapila
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive