Re: Big 7.1 open items - Mailing list pgsql-hackers

From JanWieck@t-online.de (Jan Wieck)
Subject Re: Big 7.1 open items
Date
Msg-id 200006180010.CAA07405@hot.jw.home
Whole thread Raw
In response to Re: Big 7.1 open items  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> JanWieck@t-online.de (Jan Wieck) writes:
> > Tom Lane wrote:
> >> It gets a little trickier if you want to be able to split
> >> multi-gig tables across several tablespaces, though, since
> >> you couldn't just append ".N" to the base table path in that
> >> scenario.
> >>
> >> I'd be interested to know what sort of facilities Oracle
> >> provides for managing huge tables...
>
> >     Oracle  tablespaces  are  a  collection of 1...n preallocated
> >     files.   Each  table  then  is  bound  to  a  tablespace  and
> >     allocates extents (chunks) from those files.
>
> OK, to get back to the point here: so in Oracle, tables can't cross
> tablespace boundaries, but a tablespace itself could span multiple
> disks?
   They can. The path in
       ALTER TABLESPACE <tsname> ADD DATAFILE ...
   can point to any location the db system has access to.

>
> Not sure if I like that better or worse than equating a tablespace
> with a directory (so, presumably, all the files within it live on
> one filesystem) and then trying to make tables able to span
> tablespaces.  We will need to do one or the other though, if we want
> to have any significant improvement over the current state of affairs
> for large tables.
>
> One way is to play the flip-the-path-ordering game some more,
> and access multiple-segment tables with pathnames like this:
>
>    .../TABLESPACE/RELATION       -- first or only segment
>    .../TABLESPACE/N/RELATION     -- N'th extension segment
>
> [...]
   In most cases all objects in one database are bound to one or   two tablespaces (data and indices). So you do  an
estimation  of  the  size  required, create the tablespaces (and probably   all their extension files), then create the
schema and  load   it.  The  only reason not to do so is if your DB exceeds some   size where you have to fear not
beeingable to finish  online   backups  before  getting into Online-Relolog stuck. Has to do   the the online backup
procedureof Oracle.
 

> This isn't any harder for md.c to deal with than what we do now,
> but by making the /N subdirectories be symlinks, the dbadmin could
> easily arrange for extension segments to go on different filesystems.
> Also, since /N subdirectory symlinks can be added as needed,
> expanding available space by attaching more disks isn't hard.
> (If the admin hasn't pre-made a /N symlink when it's needed,
> I'd envision the backend just automatically creating a plain
> subdirectory so that it can extend the table.)
   So the admin allways have to leave enough  freespace  in  the   default  location to keep the DB running until he
cantake it   offline, move the autocreated files and create the  symlinks.   What a pain for 24/7 systems.
 

> We'd still want to create some tools to help the dbadmin with slinging
> all these symlinks around, of course.  But I think it's critical to keep
> the low-level file access protocol simple and reliable, which really
> means minimizing the amount of information the backend needs to know to
> figure out which file to write a page in.  With something like the above
> you only need to know the tablespace name (or more likely OID), the
> relation OID (+name or not, depending on outcome of other argument),
> and the offset in the table.  No worse than now from the software's
> point of view.
   Exactly  the  "low-level  file  access"  protocol  is  highly   complicated in Postgres. Because nearly  every
object needs   his  own file, we need to deal with virtual file descriptors.   With an Oracle-like tablespace concept
anda fixed  limit  of   total   tablespace   files  (this  time  OS  or  installation   specific), we could keep them
allopen all the time.  IMHO  a   big win.
 


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




pgsql-hackers by date:

Previous
From: JanWieck@t-online.de (Jan Wieck)
Date:
Subject: Re: Big 7.1 open items
Next
From: JanWieck@t-online.de (Jan Wieck)
Date:
Subject: Re: Big 7.1 open items