Home > mailing lists

Re: alternative back-end block formats - Mailing list pgsql-hackers

From	Christian Convey
Subject	Re: alternative back-end block formats
Date	January 27, 2014 21:42:35
Msg-id	CAPfS4ZzwxnQuYjEBnmd0eiYW3t85o4YOvGXfqK=AcNOgKc77rQ@mail.gmail.com Whole thread Raw
In response to	Re: alternative back-end block formats (Craig Ringer <craig@2ndquadrant.com>)
Responses	Re: alternative back-end block formats (Cédric Villemain <cedric@2ndquadrant.com>)
List	pgsql-hackers

Tree view

Hi Craig,

On Sun, Jan 26, 2014 at 5:47 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 01/21/2014 07:43 PM, Christian Convey wrote:
> Hi all,
>
> I'm playing around with Postgres, and I thought it might be fun to
> experiment with alternative formats for relation blocks, to see if I can
> get smaller files and/or faster server performance.

It's not clear how you'd do this without massively rewriting the guts of Pg.

Per the docs on internal structure, Pg has a block header, then tuples
within the blocks, each with a tuple header and list of Datum values for
the tuple. Each Datum has a generic Datum header (handling varlena vs
fixed length values etc) then a type-specific on-disk representation
controlled by the type output function for that type.

I'm still in the process of getting familiar with the pg backend code, so I don't have a concrete plan yet. However, I'm working on the assumption that some set of macros and functions encapsulates the page layout.

If/when I tackle this, I expect to add a layer of indirection somewhere around that boundary, so that some non-catalog tables, whose schemas meet certain simplifying assumptions, are read and modified using specialized code.

I don't want to get into the specific optimizations I'd like to try, only because I haven't fully studied the code yet, so I don't want to put my foot in my mouth.

What concrete problem do you mean to tackle? What idea do you want to
explore or implement?

My real motivation is that I'd like to get more familiar with the pg backend codebase, and tilting at this windmill seemed like an interesting way to accomplish that.

If I was focused on really solving a real-world problem, I'd say that this lays the groundwork for table-schema-specific storage optimizations and optimized record-filtering code. But I'd only make that argument if I planned to (a) perform a careful study with statistically significant benchmarks, and/or (b) produce a merge-worthy patch. At this point I have no intentions of doing so. My main goal really is just to have fun with the code.

> Does anyone know if this has been done before with Postgres? I would
> have assumed yes, but I'm not finding anything in Google about people
> having done this.

AFAIK (and I don't know much in this area) the storage manager isn't
very pluggable compared to the rest of Pg.

Thanks for the warning. Duly noted.

Kind regards,

Christian

pgsql-hackers by date:

From: Fujii Masao
Date: 27 January 2014, 21:42:24
Subject: Re: [PATCH] Support for pg_stat_archiver view

From: Josh Berkus
Date: 27 January 2014, 21:51:31
Subject: Re: Standalone synchronous master

Re: alternative back-end block formats - Mailing list pgsql-hackers

Previous

Next