Thread: tablespaces for temporary files

tablespaces for temporary files

From
Neil Conway
Date:
I'd like to provide a way for DBAs to specify that the temporary files
needed to for sorting, holdable cursors and similar operations should be
created in a particular tablespace. (Right now these files are created
in the tablespace associated with the current database.)

Two ways to do this come to mind: via a GUC variable, or by setting a
property of CREATE DATABASE (that could be altered via ALTER DATABASE).
I think using a GUC variable is probably the better bet: it is more
flexible, since ALTER DATABASE ... SET and ALTER USER ... SET can be
used to define the GUC variable automatically for particular users and
databases.

So I'd like to add a GUC variable called something like
"scratch_tablespace". If undefined (the default), temporary files for
sorting/etc. will be created in the current database's tablespace. If
set to the name of an existent tablespace, that tablespace will be used
for temporary storage. If set to a nonexistent tablespace, a warning
will be printed and we'll fallback to using the current database's
tablespace.

Regarding naming, I considered calling the GUC variable
"temporary_tablespace" or something similar, but it seems to me that
this might cause confusion with temporary tables -- furthermore, it
might be an interesting feature to define a "temporary table tablespace"
in the future, leading to yet more confusion. I think "scratch
tablespace" is a pretty decent name for this concept, but I'm open to
suggestions.

Comments?

-Neil




Re: tablespaces for temporary files

From
Christopher Kings-Lynne
Date:
> So I'd like to add a GUC variable called something like
> "scratch_tablespace". If undefined (the default), temporary files for

Should be called 'work_tablesapce' to match 'work_mem' :)


Re: tablespaces for temporary files

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> So I'd like to add a GUC variable called something like
> "scratch_tablespace". If undefined (the default), temporary files for
> sorting/etc. will be created in the current database's tablespace.

(1) What are the protection requirements for this variable?

(2) I don't think that "undefined" is a particularly good concept for
GUC variables.  Particularly not ones that you are envisioning setting
from multiple places.

(3) I don't like the idea that a catalog lookup will be necessary before
we can create or access temp files.  It would be quite unacceptable from
a modularity standpoint to have the low-level routines that currently
determine temp file paths do catalog accesses.

On the whole I'm unconvinced that this is worth the trouble.  One of the
reasons for allowing people to move databases around is to determine
where their temp files go.  Also, it's always been possible for people
to change the pgsql_tmp subdirectory into a symlink.  While I know that
that isn't particularly DBA-friendly, it seems sufficient to me for what
I suspect is a third-order requirement.

Let's at least wait till we get some demand from the field before we
start inventing frammishes for tablespaces.
        regards, tom lane


Re: tablespaces for temporary files

From
Neil Conway
Date:
On Sat, 2004-10-30 at 00:50, Tom Lane wrote:
> (1) What are the protection requirements for this variable?

I think it can be USERSET -- most commands let the user specify a
tablespace explicitly, and this is basically just another way of doing
that. The user executing the query will need CREATE privileges on the
tablespace they end up writing to.

> (2) I don't think that "undefined" is a particularly good concept for
> GUC variables.  Particularly not ones that you are envisioning setting
> from multiple places.

Hmm, ok. How about a token like "$database" that expands to the
tablespace of the current database?

> (3) I don't like the idea that a catalog lookup will be necessary before
> we can create or access temp files.  It would be quite unacceptable from
> a modularity standpoint to have the low-level routines that currently
> determine temp file paths do catalog accesses.

I don't agree it is "unacceptable", but it isn't ideal, granted.

> On the whole I'm unconvinced that this is worth the trouble.  One of the
> reasons for allowing people to move databases around is to determine
> where their temp files go.

I think this needlessly limits the flexibility of the system. Once
you've created a database and added a bunch of tables to it (in the DB's
tablespace), is there an easy way to change the tablespace used for
temporary files? What if the DBA has placed the database in a relatively
slow tablespace because that is suitable most of the time, but needs to
quickly execute a large OLAP query that consumes a lot of temporary
space? What if it makes sense at a particular installation for different
users to use different tablespaces for their temporary files?

I just think that always using the database's tablespace for temporary
files needlessly conflates two distinct concepts.

> Also, it's always been possible for people
> to change the pgsql_tmp subdirectory into a symlink.

This is a pain for the DBA, as you mention; it requires shutting down
the database; and it is fragile to begin with because the "pgsql_tmp"
directory is created on demand.

-Neil




Re: tablespaces for temporary files

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> On the whole I'm unconvinced that this is worth the trouble.  One of the
> reasons for allowing people to move databases around is to determine
> where their temp files go.  

The one scenario I would expect to see is having the temp files on filesystem
all to themselves separate from the database. So using the database's location
seems like it would never really satisfy that need.

Actually the sort algorithm postgres uses would be much more efficient if it
could get access to two or three locations guaranteed to be on different
spindles. Last I read the comments it talked about a three tape polyphase sort
emulated on a single tape. It's a _lot_ less efficient emulated on a single
tape than it would be on three separate tapes. And for large sorts drive
really do behave like tapes.

Personally I am inclined to think that sorting and hash table spills really
belong in a location specified completely separate from tablespaces.

Others may be thinking of this more in terms of enforcing resource quotas in
which case the current regime makes more sense. But from a performance point
of view the current system is pointless.

-- 
greg



Re: tablespaces for temporary files

From
Bruce Momjian
Date:
Greg Stark wrote:
> 
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > On the whole I'm unconvinced that this is worth the trouble.  One of the
> > reasons for allowing people to move databases around is to determine
> > where their temp files go.  
> 
> The one scenario I would expect to see is having the temp files on filesystem
> all to themselves separate from the database. So using the database's location
> seems like it would never really satisfy that need.
> 
> Actually the sort algorithm postgres uses would be much more efficient if it
> could get access to two or three locations guaranteed to be on different
> spindles. Last I read the comments it talked about a three tape polyphase sort
> emulated on a single tape. It's a _lot_ less efficient emulated on a single
> tape than it would be on three separate tapes. And for large sorts drive
> really do behave like tapes.
> 
> Personally I am inclined to think that sorting and hash table spills really
> belong in a location specified completely separate from tablespaces.
> 
> Others may be thinking of this more in terms of enforcing resource quotas in
> which case the current regime makes more sense. But from a performance point
> of view the current system is pointless.

Agreed, and I was going to mention the idea of a round-robin allocation
setup where the system cycles through a list of possible locations for
both sort files and temporary tables.  One trick is that the cycle
pointer has to be global controlled so once one session uses an area the
next session uses the next location.

Updated TODO item:

* Add a GUC variable to control the tablespace for temporary objects and sort files
 This perhaps should use a round-robin allocation system where several tablespaces are used in a cycle.  The cycle
pointershould be global.
 


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: tablespaces for temporary files

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Greg Stark wrote:
>> Actually the sort algorithm postgres uses would be much more efficient if it
>> could get access to two or three locations guaranteed to be on different
>> spindles.

> Agreed, and I was going to mention the idea of a round-robin allocation
> setup where the system cycles through a list of possible locations for
> both sort files and temporary tables.

Greg's point was that sort would want to *know* that it created three
temp files on three different devices.  Throwing random effects of other
sessions into the mix wouldn't make it better.
        regards, tom lane


Re: tablespaces for temporary files

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Greg Stark wrote:
> >> Actually the sort algorithm postgres uses would be much more efficient if it
> >> could get access to two or three locations guaranteed to be on different
> >> spindles.
> 
> > Agreed, and I was going to mention the idea of a round-robin allocation
> > setup where the system cycles through a list of possible locations for
> > both sort files and temporary tables.
> 
> Greg's point was that sort would want to *know* that it created three
> temp files on three different devices.  Throwing random effects of other
> sessions into the mix wouldn't make it better.

OK, let's say the sort starts on a random tablespace and then goes
sequentially through the list:* Add a GUC variable to control the tablespace for temporary objects and  sort files  It
couldstart with a random tablespace from a supplied list and cycle  through the list.
 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073