Re: [HACKERS] LONG - Mailing list pgsql-hackers

From wieck@debis.com (Jan Wieck)
Subject Re: [HACKERS] LONG
Date
Msg-id m11wlwh-0003kGC@orion.SAPserv.Hamburg.dsh.de
Whole thread Raw
In response to Re: [HACKERS] LONG  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: [HACKERS] LONG
Re: [HACKERS] LONG
Re: [HACKERS] LONG
Last thoughts about LONG
List pgsql-hackers
Bruce Momjian wrote:

> Should we use large objects for this, and beef them up.  Seems that
> would be a good way.  I have considered putting them in a hash
> bucket/directory tree for faster access to lots of large objects.
>
> There is a lot to say about storing long tuples outside the tables
> because long tuples fill cache buffers and make short fields longer to
> access.

    I  thought  to  use  a  regular table. Of course, it will eat
    buffers, but managing external files or  even  large  objects
    for  it  IMHO  isn't  that  simple,  if  you take transaction
    commit/abort and MVCC problematic into account too. And  IMHO
    this  is  something  that must be covered, because I meant to
    create a DATATYPE that can be used as a replacement for  TEXT
    if that's too small, so it must behave as a regular datatype,
    without any restrictions WRT beeing able to rollback etc.

    Using LO or external files would need much more testing, than
    creating  one  other  shadow  table (plus an index for it) at
    CREATE TABLE.  This table would automatically  have  all  the
    concurrency,  MVCC  and visibility stuff stable. And it would
    automatically split  into  multiple  files  if  growing  very
    large, be vacuumed, ...

    Let  me  do  it  this way for 7.0, and then lets collect some
    feedback and own experience with it. For 8.0 we  can  discuss
    again, if doing it the hard way would be worth the efford.

> We use 8K blocks because that is the base size for most file systems.
> When we fsync an 8k buffer, the assumption is that that buffer is
> written in a single write to the disk.  Larger buffers would be spread
> over the disk, making a single fsync() impossible to be atomic, I think.
>
> Also, larger buffers take more cache space per buffer, makeing the
> buffer cache more corse holding fewer buffers.

    Maybe something to play with a little.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

pgsql-hackers by date:

Previous
From: "D'Arcy" "J.M." Cain
Date:
Subject: Re: [HACKERS] 6.6 release
Next
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: [HACKERS] Re: [PATCHES] pg_dump primary keys