Thread: Index location patch for review
Hi all, Attached is a patch that adds support for specifying a location for indexes via the "create database" command. I believe this patch is complete, but it is my first . Thanks Jim
Attachment
> Hi all, > > Attached is a patch that adds support for specifying a location for > indexes via the "create database" command. > > I believe this patch is complete, but it is my first . This patch allows index locations to be specified as different from data locations. Is this a feature direction we want to go in? Comments? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > Attached is a patch that adds support for specifying a location for > > indexes via the "create database" command. > This patch allows index locations to be specified as different from data > locations. Is this a feature direction we want to go in? Comments? I have not looked at the patch, either implementation or proposed syntax, but in general we certainly want to head in a direction where we have full control over placement of storage resources. - Thomas
I am very new to this mailinglist so I apologize if I start talking early but I've been working as a sysadmin and that kind of problems for a long while now and my suggestion is that it is a start but I think that we should aim a little higher than this and use something more like the Oracle approach instead. Where they introduce an abstraction layer in the form of a tablespace. And this tablespace is then referenced from the create table or create index instead. eg: table -> tablespace -> path to physical storage index -> tablespace -> path to physical storage Advantages: Changes can be done to storage whithout need to change create scripts for db, tables and so on. Designers can specify in which tablespace tables/indexes should reside based on usage. Sysadmins can work with tablespaces and change paths without changing anything in the database/table/index definitions. The alternative is symlinks to distribute the load and that is not a pretty sight dba-wise. Hope you can bare with me on this, since I think it is an very important issue. I'm unfortunately not a fast coder yet (but I'm getting faster :-) ). But I could start writing a spec if someone is interrested. Bruce Momjian wrote: > > Hi all, > > > > Attached is a patch that adds support for specifying a location for > > indexes via the "create database" command. > > > > I believe this patch is complete, but it is my first . > > This patch allows index locations to be specified as different from data > locations. Is this a feature direction we want to go in? Comments? > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://www.postgresql.org/search.mpl
> > Attached is a patch that adds support for specifying a > > location for indexes via the "create database" command. > > > > I believe this patch is complete, but it is my first . > > This patch allows index locations to be specified as > different from data locations. Is this a feature direction > we want to go in? Comments? The more general and "standard" way to go are TABLESPACEs. But probably proposed feature will be compatible with tablespaces, when we'll got them: we could use new "create database" syntax to specify default tablespace for indices. Unfortunately I removed message with patch, can you send it to me, Bruce? Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > The more general and "standard" way to go are TABLESPACEs. > But probably proposed feature will be compatible with > tablespaces, when we'll got them: Will it be? I'm afraid of creating a backwards-compatibility problem for ourselves when it comes time to implement tablespaces. At the very least I'd like to see some information demonstrating how much benefit there is to this proposed patch, before we consider whether to adopt it. If there's a significant performance benefit to splitting a PG database along the table-vs-index divide, then it's interesting as a short-term improvement ... but Jim didn't even make that assertion, let alone provide evidence to back it up. regards, tom lane
> "Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > > The more general and "standard" way to go are TABLESPACEs. > > But probably proposed feature will be compatible with > > tablespaces, when we'll got them: > > Will it be? I'm afraid of creating a backwards-compatibility > problem for ourselves when it comes time to implement tablespaces. > > At the very least I'd like to see some information demonstrating > how much benefit there is to this proposed patch, before we > consider whether to adopt it. If there's a significant performance > benefit to splitting a PG database along the table-vs-index divide, > then it's interesting as a short-term improvement ... but Jim didn't > even make that assertion, let alone provide evidence to back it up. If that is your only concern, I can tell you for sure that if the locations are on different drives, there will be a performance benefit. It is standard database practice to put indexes on different drives than data. In fact, sometimes you want to put two tables that are frequently joined on separate drives. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
... > At the very least I'd like to see some information demonstrating > how much benefit there is to this proposed patch, before we > consider whether to adopt it. If there's a significant performance > benefit to splitting a PG database along the table-vs-index divide, > then it's interesting as a short-term improvement ... but Jim didn't > even make that assertion, let alone provide evidence to back it up. Clearly there can be a *storage management* benefit to having control over what gets put where, so this does not need to be justified strictly on a performance basis. For features like this, we will feel free to evolve them or revolutionize them with further development, so I'm not worried about the backward compatibility issue for cases like this. Comments? - Thomas
> ... > > At the very least I'd like to see some information demonstrating > > how much benefit there is to this proposed patch, before we > > consider whether to adopt it. If there's a significant performance > > benefit to splitting a PG database along the table-vs-index divide, > > then it's interesting as a short-term improvement ... but Jim didn't > > even make that assertion, let alone provide evidence to back it up. > > Clearly there can be a *storage management* benefit to having control > over what gets put where, so this does not need to be justified strictly > on a performance basis. > > For features like this, we will feel free to evolve them or > revolutionize them with further development, so I'm not worried about > the backward compatibility issue for cases like this. > > Comments? Agreed. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > The more general and "standard" way to go are TABLESPACEs. > > But probably proposed feature will be compatible with > > tablespaces, when we'll got them: > > Will it be? I'm afraid of creating a backwards-compatibility > problem for ourselves when it comes time to implement tablespaces. As I said, INDEX_LOCATION in CREATE DATABASE could mean location of default tablespace for indices in future and one will be able to override tablespace for particular index with TABLESPACE clause in CREATE INDEX command. > At the very least I'd like to see some information demonstrating > how much benefit there is to this proposed patch, before we > consider whether to adopt it. If there's a significant performance > benefit to splitting a PG database along the table-vs-index divide, > then it's interesting as a short-term improvement ... but Jim didn't > even make that assertion, let alone provide evidence to back it up. Agreed. He mentioned significant performance difference but it would be great to see results of pgbench tests with scaling factor of >= 10. Jim? Also, after reviewing patch I have to say that it will NOT work with WAL. Jim, please do not name index' dir as "<TBL_NODE>_index". Instead, just use different TBL_NODE for indices (different number). It's not good to put if(reln->rd_rel->relkind == RELKIND_INDEX) stuff into storage manager - only two numbers (tblnode & relnode) must be used to identify file, no any other logical information totally unrelated to storage issues. Vadim
just change the work tablespace below to location and that is exactly what this patch is trying to do. You can think of the LOCATION and INDEX_LOCATION provided to the create database command as the default storage locations for these objects. In the future, I want to enable the DBA to specify LOCATIONS any object just like Oracle. I am also planning on a pg_locations table and "create location" command which will do what the current initlocation script does and more. Jim > > I am very new to this mailinglist so I apologize if I start talking early but > I've been working as a sysadmin and that kind of problems for a long while > now and my suggestion is that it is a start but I think that we should aim a > little higher than this and use something more like the Oracle approach > instead. Where they introduce an abstraction layer in the form of a > tablespace. And this tablespace is then referenced from the create table or > create index instead. > eg: > table -> tablespace -> path to physical storage > index -> tablespace -> path to physical storage > > Advantages: > Changes can be done to storage whithout need to change create scripts for db, > tables and so on. > Designers can specify in which tablespace tables/indexes should reside based > on usage. > Sysadmins can work with tablespaces and change paths without changing > anything in the database/table/index definitions. > > The alternative is symlinks to distribute the load and that is not a pretty > sight dba-wise. > > Hope you can bare with me on this, since I think it is an very important > issue. > I'm unfortunately not a fast coder yet (but I'm getting faster :-) ). But I > could start writing a spec if someone is interrested. > > Bruce Momjian wrote: > > > > Hi all, > > > > > > Attached is a patch that adds support for specifying a location for > > > indexes via the "create database" command. > > > > > > I believe this patch is complete, but it is my first . > > > > This patch allows index locations to be specified as different from data > > locations. Is this a feature direction we want to go in? Comments? > > > > -- > > Bruce Momjian | http://candle.pha.pa.us > > pgman@candle.pha.pa.us | (610) 853-3000 > > + If your life is a hard drive, | 830 Blythe Avenue > > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 6: Have you searched our list archives? > > > > http://www.postgresql.org/search.mpl > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > >
I agree that groups of objects in separate data storage areas are needed and that is what I am trying to get to. Don't you think that Postgresql with locations/files is the same as Oracle tablespaces. I don't think we want to invent our own filesystem (which is what a tablespace really is...). Jim > > > Attached is a patch that adds support for specifying a > > > location for indexes via the "create database" command. > > > > > > I believe this patch is complete, but it is my first . > > > > This patch allows index locations to be specified as > > different from data locations. Is this a feature direction > > we want to go in? Comments? > > The more general and "standard" way to go are TABLESPACEs. > But probably proposed feature will be compatible with > tablespaces, when we'll got them: we could use new "create > database" syntax to specify default tablespace for indices. > Unfortunately I removed message with patch, can you send it > to me, Bruce? > > Vadim > >
Vadim, I don't understand the WAL issue below, can you explain. The dir name is the same name as the database with _index added to it. This is how the current datpath stuff works. I really just copied the datpath code to get this patch to work... Also I have been running this patch (both 7.1.3 and 7.2devel) against some of my companies applications. I have loaded a small database 10G data and 15G indexes both with and without the patch. There seems to be between 5% and 10% performance gain doing most common db commands (selects, selects with joins and inserts). The system is a DUAL P3 733 with 3 IDE disks. One for PGDATA, second for APPDATA and third for APPIDX. As you can see I have seperated WAL files, GLOBAL, Application data and application indexes over 3 disks. Our production systems have around 50k queries/day ( not including data loads), so I believe that when this patch get put into production, with 20 disks and 10 database the performance increase should go up. I should also add, that I have been working on the second part of this patch, which will allow tables and indexes to be put into LOCATIONS also. I am going planning on having a PG_LOCATIONS table and CREATE|DROP|ALTER location SQL command instead of the initlocation shell script we currently have. The only thing stopping me now is 7.2 testing I am planning on doing once the beta begins and problems adding a location column to the pg_class table with the necessary support code in heap.c... Thanks for all the comments (keep them comming) Jim > > > The more general and "standard" way to go are TABLESPACEs. > > > But probably proposed feature will be compatible with > > > tablespaces, when we'll got them: > > > > Will it be? I'm afraid of creating a backwards-compatibility > > problem for ourselves when it comes time to implement tablespaces. > > As I said, INDEX_LOCATION in CREATE DATABASE could mean location > of default tablespace for indices in future and one will be able > to override tablespace for particular index with TABLESPACE > clause in CREATE INDEX command. > > > At the very least I'd like to see some information demonstrating > > how much benefit there is to this proposed patch, before we > > consider whether to adopt it. If there's a significant performance > > benefit to splitting a PG database along the table-vs-index divide, > > then it's interesting as a short-term improvement ... but Jim didn't > > even make that assertion, let alone provide evidence to back it up. > > Agreed. He mentioned significant performance difference but it would > be great to see results of pgbench tests with scaling factor of >= 10. > Jim? > > Also, after reviewing patch I have to say that it will NOT work > with WAL. Jim, please do not name index' dir as "<TBL_NODE>_index". > Instead, just use different TBL_NODE for indices (different number). > It's not good to put if(reln->rd_rel->relkind == RELKIND_INDEX) > stuff into storage manager - only two numbers (tblnode & relnode) > must be used to identify file, no any other logical information > totally unrelated to storage issues. > > Vadim > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > >
> I don't understand the WAL issue below, can you explain. The dir name > is the same name as the database with _index added to it. This is how > the current datpath stuff works. I really just copied the datpath > code to get this patch to work... At the time of after crash recovery WAL is not able to read relation description from catalog and so only relfilenode is provided for storage manager in relation structure (look backend/access/transam/ xlogutils.c:XLogOpenRelation). Well, we could add Index/Table file type identifier to RmgrData (rmgr.c in the same dir) to set relkind in relation structure, but I don't see any reason to do so when we can just use different tblnode number for indices and name index dirs just like other dirs under 'base' named - ie only tblnode number is used for dir names, without any additions unrelated to storage issues. Vadim
> Also I have been running this patch (both 7.1.3 and 7.2devel) against > some of my companies applications. I have loaded a small database 10G We are not familiar with your applications. It would be better to see results of test suit available to the community. pgbench is first to come in mind. Such tests would be more valuable. Vadim
I could also symlink all index files back to the tblnode directory? > > I don't understand the WAL issue below, can you explain. The dir name > > is the same name as the database with _index added to it. This is how > > the current datpath stuff works. I really just copied the datpath > > code to get this patch to work... > > At the time of after crash recovery WAL is not able to read relation > description from catalog and so only relfilenode is provided for > storage manager in relation structure (look backend/access/transam/ > xlogutils.c:XLogOpenRelation). Well, we could add Index/Table > file type identifier to RmgrData (rmgr.c in the same dir) to set > relkind in relation structure, but I don't see any reason to > do so when we can just use different tblnode number for indices and > name index dirs just like other dirs under 'base' named - ie > only tblnode number is used for dir names, without any additions > unrelated to storage issues. > > Vadim > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > >
Here is my pgbench results. As you can see the I am getting 2X tps with the 2 directories. I believe this is a BIG win for Postgresql if we can figure out the WAL recovery issues. Can someone other than me apply the patch and verify the pgbench results. My hardward setup is a dual processor P3/733 running Redhat 7.1 with 512 megs of memory. The postgresql.conf file is the installed version with NO changes. Jim template1=# create database one_dir with location='PGDATA1'; template1=# create database two_dir with location='PGDATA1' index_location='PGIDX1'; for X in 1 2 3 4 5 6 7 8 9 10 dopgbench -i -s 10 one_dir >>one_dir.logpgbench -i -s 10 two_dir >>two_dir.log done bash-2.04$ grep 'excluding' one_dir.log tps = 44.319306(excluding connections establishing) tps = 34.641020(excluding connections establishing) tps = 50.516889(excluding connections establishing) tps = 52.747039(excluding connections establishing) tps = 16.203821(excluding connections establishing) tps = 36.902861(excluding connections establishing) tps = 52.511769(excluding connections establishing) tps = 53.479882(excluding connections establishing) tps = 54.599429(excluding connections establishing) tps = 36.780419(excluding connections establishing) tps = 48.048279(excluding connections establishing) bash-2.04$ grep 'excluding' two_dir.log tps = 58.739049(excluding connections establishing) tps = 100.259270(excluding connections establishing) tps = 103.156166(excluding connections establishing) tps = 110.829358(excluding connections establishing) tps = 111.929690(excluding connections establishing) tps = 106.840118(excluding connections establishing) tps = 101.563159(excluding connections establishing) tps = 102.877060(excluding connections establishing) tps = 103.784717(excluding connections establishing) tps = 53.056309(excluding connections establishing) tps = 73.842428(excluding connections establishing) > > Also I have been running this patch (both 7.1.3 and 7.2devel) against > > some of my companies applications. I have loaded a small database 10G > > We are not familiar with your applications. It would be better to see > results of test suit available to the community. pgbench is first to > come in mind. Such tests would be more valuable. > > Vadim > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > >
> > Attached is a patch that adds support for specifying a location for > > indexes via the "create database" command. > > > > I believe this patch is complete, but it is my first . > > This patch allows index locations to be specified as > different from data locations. Is this a feature direction > we want to go in? Comments? Having the table and index on separate drives can do wonders for i/o performance. :) darrenk