Thread: How to REINDEX in high volume environments?

How to REINDEX in high volume environments?

From
Justin Clift
Date:
Hi all,

Am experimenting to find out what kind of performance gain are achieved
from moving indexes to a different scsi drives than the WAL files, than
the data itself, etc.

Have come across an interesting problem.

Have moved the indexes to another drive, then created symlinks to them.

Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
ANALYZE'd, prepared to re-run the benchmark again and guess what?

The indexes were back on the original drive.

The process of REINDEX-ing obviously creates another file then drops the
original.

Is there a way to allow REINDEX to work without having this side affect?

Pre-creating a bunch of dangling symlinks doesn't work (tried that, it
gives a "ERROR:  cannot create accounts_pkey: File exists" on FreeBSD
4.6.2 when using the REINDEX).

Any suggestions?

:-)

Regards and best wishes,

Justin Clift

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: How to REINDEX in high volume environments?

From
"Shridhar Daithankar"
Date:
On 28 Sep 2002 at 17:08, Justin Clift wrote:

> Have moved the indexes to another drive, then created symlinks to them.
> Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
> ANALYZE'd, prepared to re-run the benchmark again and guess what?
> 
> The indexes were back on the original drive.
> Is there a way to allow REINDEX to work without having this side affect?
> 
> Pre-creating a bunch of dangling symlinks doesn't work (tried that, it
> gives a "ERROR:  cannot create accounts_pkey: File exists" on FreeBSD
> 4.6.2 when using the REINDEX).

Looks like we should have a subdirectory in database directory which stores 
index.

May be transaction logs, indexes goes in separte directory which can be 
symlinked. Linking a directory is much simpler solution than linking a file.

I suggest we have per database transaction log and indexes created in separate 
subdirectories for each database. Furhter given that large tables are segmented 
after one GB size, a table should have it's own subdirectory optionally..

At the cost of few inodes, postgresql would gain much more flexibility and 
hence tunability..

May be TODO for 7.4? Anyone?

ByeShridhar

--
Software, n.:    Formal evening attire for female computer analysts.



Re: How to REINDEX in high volume environments?

From
Justin Clift
Date:
Shridhar Daithankar wrote:
<snip>
> Looks like we should have a subdirectory in database directory which stores
> index.

That was my first thought also, but an alternative/additional approach
would be this (not sure if it's workable):
- As each index already has a bunch of information stored stored for
it, would it be possible to have an additional column added called
'idxpath' or something?
- This would mean that the index location would be stable per index,
and would allow for *really* high volume environments to keep different
indexes on different drives.

Not sure what the default value would be, maybe the PGDATA directory,
maybe something as a GUC variable, etc, but that's the concept.

:-)

Regards and best wishes,

Justin Clift

> May be transaction logs, indexes goes in separte directory which can be
> symlinked. Linking a directory is much simpler solution than linking a file.
> 
> I suggest we have per database transaction log and indexes created in separate
> subdirectories for each database. Furhter given that large tables are segmented
> after one GB size, a table should have it's own subdirectory optionally..
> 
> At the cost of few inodes, postgresql would gain much more flexibility and
> hence tunability..
> 
> May be TODO for 7.4? Anyone?
> 
> Bye
>  Shridhar
> 
> --
> Software, n.:   Formal evening attire for female computer analysts.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: How to REINDEX in high volume environments?

From
"Shridhar Daithankar"
Date:
On 28 Sep 2002 at 17:51, Justin Clift wrote:

> Shridhar Daithankar wrote:
> <snip>
> > Looks like we should have a subdirectory in database directory which stores
> > index.
> 
> That was my first thought also, but an alternative/additional approach
> would be this (not sure if it's workable):
> 
>  - As each index already has a bunch of information stored stored for
> it, would it be possible to have an additional column added called
> 'idxpath' or something?
> 
>  - This would mean that the index location would be stable per index,
> and would allow for *really* high volume environments to keep different
> indexes on different drives.

I have to disagree.. Completely.. This is like turning PG-Metadata into 
registry...

And what happens when index starts splitting when it grows beyond 1GB in size?

Putting indexes into a separate subdirectoy and mount/link that directory on a 
device that is on a separate SCSI channel is what I can think of as last drop 
of performance out of it..

Just a thought, as usual..

I don't know how much efforts it would take but if we have pg_xlog in separte 
configurable dir. now, putting indexes as well and having per database pg_xlog 
should be on the same line. The later aspect is also important IMO..

ByeShridhar

--
VMS, n.:    The world's foremost multi-user adventure game.



Re: How to REINDEX in high volume environments?

From
Justin Clift
Date:
Shridhar Daithankar wrote:
<snip>
> And what happens when index starts splitting when it grows beyond 1GB in size?

Having an index directory:

i.e. $PGDATA/data/<oid>/indexes/

(that's the kind of thing you mean isn't it?)

Sounds workable, and sounds better than the present approach.

The reason that I was thinking of having a different path per index
would be for high volume situations like this:

/dev/dsk1 : /pgdata <- data here
/dev/dsk2 : /pgindexes1 <- some indexes here
/dev/dsk3 : /pgindexes2 <- some ultra-high volume activity here

Let's say that there's a bunch of data on /dev/dsk1, and for performance
reasons it's been decided to move the indexes to another drive
/dev/dsk2.

Now, if just one of those indexes is getting *a lot* of the drive
activity, it would make sense to move it to it's own dedicated drive. 
Having an um... PGINDEX (that's just an identifier for this example, not
an environment variable suggestion) directory location defined would
mean that each time a REINDEX operation occurs, then all new indexes
would be created in the same spot.  That sounds better than the present
approach thus far, but wouldn't work for situations where indexes are
spread across multiple disk drives.

The suggestion of having some kind of path info for each index is merely
a thought of how to meet that potential future need, not necessarily the
best method anyone has ever thought of.  Like someone might pipe up and
say "Nah, it could be done better XYZ way", etc.

:-)

Regards and best wishes,

Justin Clift

> Putting indexes into a separate subdirectoy and mount/link that directory on a
> device that is on a separate SCSI channel is what I can think of as last drop
> of performance out of it..
> 
> Just a thought, as usual..
> 
> I don't know how much efforts it would take but if we have pg_xlog in separte
> configurable dir. now, putting indexes as well and having per database pg_xlog
> should be on the same line. The later aspect is also important IMO..
> 
> Bye
>  Shridhar
> 
> --
> VMS, n.:        The world's foremost multi-user adventure game.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."  - Indira Gandhi


Re: How to REINDEX in high volume environments?

From
Alvaro Herrera
Date:
Justin Clift dijo: 

Hi,

> Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
> ANALYZE'd, prepared to re-run the benchmark again and guess what?
> 
> The indexes were back on the original drive.

Yes, this is expected.  Same for CLUSTER.  They create a different
filenode and point the relation (table or index) at it.

I think the separate space for indexes is a good idea.  However, and
this is orthogonal, I feel the way REINDEX works now is not the best,
because it precludes you from using the index while you are doing it.

I'm trying to implement a way to concurrently compact the indexes.
I hope to have it for 7.4.

-- 
Alvaro Herrera (<alvherre[a]atentus.com>)
Y una voz del caos me hablo y me dijo
"Sonrie y se feliz, podria ser peor".
Y sonrei. Y fui feliz.
Y fue peor.



Re: How to REINDEX in high volume environments?

From
Greg Copeland
Date:
On Sat, 2002-09-28 at 02:16, Shridhar Daithankar wrote:
> On 28 Sep 2002 at 17:08, Justin Clift wrote:
>
> > Have moved the indexes to another drive, then created symlinks to them.
> > Ran a benchmark against the database, REINDEX'd the tables, VACUUM FULL
> > ANALYZE'd, prepared to re-run the benchmark again and guess what?
> >
> > The indexes were back on the original drive.
> > Is there a way to allow REINDEX to work without having this side affect?
> >
> > Pre-creating a bunch of dangling symlinks doesn't work (tried that, it
> > gives a "ERROR:  cannot create accounts_pkey: File exists" on FreeBSD
> > 4.6.2 when using the REINDEX).
>
> Looks like we should have a subdirectory in database directory which stores
> index.
>
> May be transaction logs, indexes goes in separte directory which can be
> symlinked. Linking a directory is much simpler solution than linking a file.
>
> I suggest we have per database transaction log and indexes created in separate
> subdirectories for each database. Furhter given that large tables are segmented
> after one GB size, a table should have it's own subdirectory optionally..
>
> At the cost of few inodes, postgresql would gain much more flexibility and
> hence tunability..
>
> May be TODO for 7.4? Anyone?


Very neat idea!  Sounds like an excellent way of gaining lots of
granularity!

I can't even think of a reason not to use the directory per table scheme
all the time.  Perhaps simply allowing for a script/tool that will
automatically perform such a physical table migration to a distinct
directory would be in order too.

Either way, sounds like a good idea.

Greg



Re: How to REINDEX in high volume environments?

From
Tom Lane
Date:
Justin Clift <justin@postgresql.org> writes:
> Shridhar Daithankar wrote:
>> Looks like we should have a subdirectory in database directory which stores
>> index.

> That was my first thought also, but an alternative/additional approach
> would be this (not sure if it's workable):

See the tablespaces TODO item.  I'm not excited about building
half-baked versions of tablespaces before we get around to doing the
real thing ...
        regards, tom lane


Re: How to REINDEX in high volume environments?

From
"Shridhar Daithankar"
Date:
On 28 Sep 2002 at 12:18, Tom Lane wrote:

> Justin Clift <justin@postgresql.org> writes:
> > Shridhar Daithankar wrote:
> >> Looks like we should have a subdirectory in database directory which stores
> >> index.
> 
> > That was my first thought also, but an alternative/additional approach
> > would be this (not sure if it's workable):
> 
> See the tablespaces TODO item.  I'm not excited about building
> half-baked versions of tablespaces before we get around to doing the
> real thing ...

I wen thr. the messages posted regarding tablespaces. It looks like

Tablesspaces should provide

1. Managability 
2. Performance tuning
3. Better Administration..

Creating a directory for each object or object type would allow to do same 
thing.

Why directory?

1. You can mount it someplace else.
2. You can symlink it without worrying about postgresql creating new files 
instead of symlink while drop/recreate.

Whether to choose directory or tablespaces? I say directory. Why?

1. PostgreSQL philosphy has always been using facilities provided by OS and not 
to duplicate that work. Tablespaces directly violates that. Directory mounting 
does not.

2. Tablespaces combines objects on them, adding a layer of abstraction. and 
then come ideas like vacuuming a tablespace. Frankly given what vacuum does, I 
can't imagine what vacuuming tablespace would exactly do.

3. Tablespace would be a single file or structure of directories? How do we 
configure it? What tuning option do we provide?

Basically table spaces I feel is a layer of abstraction that can be avoided if 
we layout the DB in a directory tree with sufficient levels. That would be easy 
to deal with as configuration and maitainance delegated to OS and it would be 
flexible enough to.

Anyway if we have a directory per object/object type, how much different it's 
going to be from a table space? 

Frankly I am wary of table spaces because I have seen them in oracle and not 
eaxctly convinced that's the best way of doing things. 

If we introdude word tablespace, users will be expecting all those idiocies 
like taking a table space offline/online, adding data files aka pre-claiming 
space etc. All these are responsibilities of OS. Let OS handle it. PostgreSQL 
should just create a file structure which would grow as and when required.

The issue looks similimar to having raw disk I/O. Oracle might have good reason 
to do it but are we sure postgresql needs this? Just another policy decision 
waiting..

Here are some links I found in archive. Would like to know more about this 
issue..

http://candle.pha.pa.us/mhonarc/todo.detail/tablespaces/msg00006.html
http://candle.pha.pa.us/mhonarc/todo.detail/tablespaces/msg00007.html

Just a thought..

ByeShridhar

--
The sooner our happiness together begins, the longer it will last.        -- 
Miramanee, "The Paradise Syndrome", stardate 4842.6



Re: How to REINDEX in high volume environments?

From
"Shridhar Daithankar"
Date:
On 29 Sep 2002 at 0:43, Justin Clift wrote:

> Shridhar Daithankar wrote:
> The reason that I was thinking of having a different path per index
> would be for high volume situations like this:
> 
> /dev/dsk1 : /pgdata <- data here
> /dev/dsk2 : /pgindexes1 <- some indexes here
> /dev/dsk3 : /pgindexes2 <- some ultra-high volume activity here

I would say this would look better..

/pgdata
-indexes
--index1
---indexfiles
--index2
---indexfiles

Where index1 and index2 are two different indexes. Just like each table gets 
it's own directory, each index gets it's own directory as well. 

So the admin would/can tune on per object basis rather than worrying about 
creating right group of objects and then tuning about that group.

If required throwing per database transaction log there as well might prove a 
good idea. It would insulate one db from load of other, as far as I/O is 
concerned..

This possiblity is not lost with this scheme but it just gets something simpler 
IMO..

Just illustration of my another post on hackers on this topic.. 


ByeShridhar

--
You're too beautiful to ignore.  Too much woman.        -- Kirk to Yeoman Rand, "The 
Enemy Within", stardate unknown



Re: How to REINDEX in high volume environments?

From
Mario Weilguni
Date:
Am Samstag, 28. September 2002 10:17 schrieb Shridhar Daithankar:
(snip)
> I have to disagree.. Completely.. This is like turning PG-Metadata into
> registry...
>
> And what happens when index starts splitting when it grows beyond 1GB in
> size?
>
> Putting indexes into a separate subdirectoy and mount/link that directory
> on a device that is on a separate SCSI channel is what I can think of as
> last drop of performance out of it..
(snip)

I think a good approach would be the introduction of tablespaces like oracle has, and assigning locations to that
tablespace.

Best regards,Mario Weilguni


Re: How to REINDEX in high volume environments?

From
"Jim Buttafuoco"
Date:
Just wanted to pipe in here.  I am still very interested in tablespaces ( I have many database systems that are over
500GB and growing) and am willing to port my tablespace patch to 7.4.  I have everything (but only tested here)
working
in 7.2 but the patch was not accepted.  I didn't see a great speed improvement but the patch helps with storage
management.

Recap.  the patch would enable the following
   a database to have a default data tablespace and index tablespace   a user to have a default data and index
tablespace  a table to have a specific tablespace   an index to have a specfic tablespace
 
   I would like to also add  namespace (schema) to have a default data and index tablespaces

Jim




> Justin Clift <justin@postgresql.org> writes:
> > Shridhar Daithankar wrote:
> >> Looks like we should have a subdirectory in database directory which stores
> >> index.
> 
> > That was my first thought also, but an alternative/additional approach
> > would be this (not sure if it's workable):
> 
> See the tablespaces TODO item.  I'm not excited about building
> half-baked versions of tablespaces before we get around to doing the
> real thing ...
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html






Re: How to REINDEX in high volume environments?

From
Bruce Momjian
Date:
Jim, glad you are still around.  Yes, we would love to get tablespaces
in 7.4.  I think we need to think bigger and get something where we can
name tablespaces and place tables/indexes into these named spaces.  I
can reread the TODO.detail stuff and give you an outline.  How does that
sound?  Thomas Lockhart is also interested in this feature.

---------------------------------------------------------------------------

Jim Buttafuoco wrote:
> Just wanted to pipe in here.  I am still very interested in tablespaces ( I have many database systems that are over
> 500GB and growing) and am willing to port my tablespace patch to 7.4.  I have everything (but only tested here)
working
> in 7.2 but the patch was not accepted.  I didn't see a great speed improvement but the patch helps with storage
management.
> 
> Recap.  the patch would enable the following
> 
>     a database to have a default data tablespace and index tablespace
>     a user to have a default data and index tablespace
>     a table to have a specific tablespace
>     an index to have a specfic tablespace
> 
>     I would like to also add  namespace (schema) to have a default data and index tablespaces
> 
> Jim
> 
> 
> 
> 
> > Justin Clift <justin@postgresql.org> writes:
> > > Shridhar Daithankar wrote:
> > >> Looks like we should have a subdirectory in database directory which stores
> > >> index.
> > 
> > > That was my first thought also, but an alternative/additional approach
> > > would be this (not sure if it's workable):
> > 
> > See the tablespaces TODO item.  I'm not excited about building
> > half-baked versions of tablespaces before we get around to doing the
> > real thing ...
> > 
> >             regards, tom lane
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> > 
> > http://www.postgresql.org/users-lounge/docs/faq.html
> 
> 
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073