Thread: Re: Utilizing multiple disks
Steve wrote: > Hi, > > I've been running postgres on my server for over a year now and the > tables have become huge. I have 3 tables that have data over 10GB each > and these tables are read very very frequently. In fact, heavy searches > on these tables are expected every 2 to 3 minutes. This unfortunately > gives a very poor response time to the end user and so I'm looking at > other alternatives now. This depend on the query that you are running on it: Are you performing queries using the like operator? If yes did you define an index on it using the right operator class ? Are you performing queries on a calculated field ? If yes then you need to construct a sort of materialized view. If you are on Linux did you mounted your data partition with the option noatime ? Please provide use more information on your queries and on your datas, your configurations... Usualy split your tables on multiple disk is the last optimization step, are you sure did you already reach the bootleneck of your sistem ? Regards Gaetano Mendola
Steve wrote: > Gaetano, > > Thanks for your reply. > > Gaetano Mendola wrote: > >> Are you performing queries using the like operator? If yes did you define >> an index on it using the right operator class ? > > > Yes I do use the like operator to perform queries on a few 'varchar' and > 'text' fields. I have indexed almost every column in the table (because > the client application allows complex searches on multiple columns and > each column has an equal chance of being used in the 'WHERE' condition) And did you use the right operator class ? See http://www.postgresql.org/docs/7.4/interactive/indexes-opclass.html for details >> Are you performing queries on a calculated field ? If yes then you need >> to construct a sort of materialized view. > > > Hmm not sure what you mean by a calculated field. Since I'm not sure > what it means, I'm probably not using it. I'm definitely not running any > functions on columns to get an accumulated result back etc. if you have for example a view like: CREATE VIEW v_test AS SELECT foo(a) AS field_1, bar(a+b) AS field_2 FROM test; and you do: SELECT * from v_test WHERE field_2 = 123; then you are in trouble! >> If you are on Linux did you mounted your data partition with the option >> noatime ? > > > Yes I'm on linux and am using the ext3 filesystem. I am just about to > mount the data partition with the noatime option. Docos do say that the > speed increases considerably. I'll give it a shot but I'm still very > interested in utilizing the extra three SCSI disks I have at my disposal. mount that disk in RAID configuration then, you can at least try to move the WAL on a separate disk. > Database integrity is of utmost importance and so is speed. I know there > are tradeoffs but I really do think that moving the data to a separate > disk will help. Can you please tell me how to set this up? Thanks a lot! > > Steve The version 7.5 will give you the tablespaces feature, in the mean time you can move your table and or index across your disk using symlinks, moving the file associated with your table or index in another disk and replace it with a link. Regards Gaetano Mendola
Gaetano, Thanks for your reply. Gaetano Mendola wrote: > Are you performing queries using the like operator? If yes did you define > an index on it using the right operator class ? Yes I do use the like operator to perform queries on a few 'varchar' and 'text' fields. I have indexed almost every column in the table (because the client application allows complex searches on multiple columns and each column has an equal chance of being used in the 'WHERE' condition) > Are you performing queries on a calculated field ? If yes then you need > to construct a sort of materialized view. Hmm not sure what you mean by a calculated field. Since I'm not sure what it means, I'm probably not using it. I'm definitely not running any functions on columns to get an accumulated result back etc. > If you are on Linux did you mounted your data partition with the option > noatime ? Yes I'm on linux and am using the ext3 filesystem. I am just about to mount the data partition with the noatime option. Docos do say that the speed increases considerably. I'll give it a shot but I'm still very interested in utilizing the extra three SCSI disks I have at my disposal. > Usualy split your tables on multiple disk is the last optimization step, > are you sure did you already reach the bootleneck of your sistem ? Actually IO is currently the only bottleneck for my system. I was using a single drive and had optimized it as much as I could (used indexes, partitioned tables horizontally on old dates etc) but I thought it would help me on the long run if I could store the actualy physical 'table files' on separate disks (as the rule says: more spindles == parallel IO). My current database basically has quite a few tables, out of which only four are the 'largest'. Each table is completely independant (I don't have a relational database setup because I didn't need it in this case) and is accessed simultaneously by around 10 users at one time. Each table also has almost all of its columns indexed. some of these columns are of type 'varchar', 'text' or 'double'. I am aware that the 'text' fields are stored internally in a separate table. Does this complicate things? Is it as easy as moving the files (I don't know which ones though - please help me out here) to an independant drive and then creating symbolic links to it to the '/opt/postgresql/data/...' directory? How do I achieve this without losing any of the data. Database integrity is of utmost importance and so is speed. I know there are tradeoffs but I really do think that moving the data to a separate disk will help. Can you please tell me how to set this up? Thanks a lot! Steve