Thread: Sizing of LARGE databases.
Hello there, I have a DB that I am attempting to buy "the right hardware" for. This DB today is about 25GB total, with typical tables in the area of 3GB in size, and up to about 20 million records per table for the big tables. Is there any set of recommendations for sizing a DB using PostgreSQL? (I have all the docs but have not found anything of any use). This DB itself will be mostly reads, for a decision system. However, there will be a lot more data added in the future where the DB could easily grow to about 150GB in size, and up to about 60millions records for the large tables. My current setup is Linux RH7 w/PostgreSQL 7.0.2, my intention is to build a large array of Linux Clusters, but I also can not find any documentation on how if at all PostgreSQL handles a clustered environment. (Although I did see something that said you could have multiple "databases" and use a create view to stich together multiple tables from different DBs in a HOWTO on the www.kernel.org/LDP I have also noticed that the doc is longer there, and people who have posted about how to do this on these forums have been told this method does not work). Any thoughts would be greatly appreciated. Michael Miyabara-McCaskey Email: mykarz@miyabara.com Web: http://www.miyabara.com/mykarz/ Mobile: +1 408 504 9014
Hi Michael, I have recently been working on a database which is presently 14 GB in size and will grow to nearly 1/2 terabyte in the next 12 months. This database is mostly large static two column tables needing fast searches. So far what I've done is split the table into various subtables - for example the entries starting with 'AAA' go in one table, the entries starting with 'AAB' go in another table, etc. After this is done I then index the individual tables. As each table and each index is it's own file, I then stop the PostgreSQL daemon and move the individual files & indexes onto separate RAID drives, then create soft links from the <postgres>/data/base/<database> directory to the files. i.e. /opt/postgres/data/base/<database>/AAA -> /drives/AAA/AAA /opt/postgres/data/base/<database>/AAA_idx -> /drives/AAA/AAA_idx /opt/postgres/data/base/<database>/AAB -> /drives/AAB/AAB /opt/postgres/data/base/<database>/AAB_idx -> /drives/AAB/AAB_idx and so on. The biggest limitation I have found with this is the /opt/postgres/data/pg_log file seeming to need to log (write) bunches of data, even when just doing searches (reads) on indexes on other tables. No matter how fast all the other disk subsystems are, the speed of the disk system the pg_log file is on creates an 'artificial' upper throughput limit. My recommendation would be (for mostly static data just doing look-ups) to split things into many logical tables and move these tables onto seperate RAID subsystems. Then put the pg_log file onto the fastest disk subsystem you can buy. I haven't yet moved the pg_log file to a different disk than the main postgresql installation and created a soft link to it (this is the next step) but hopefully it won't be a problem. The system in question is PostgreSQL 7.0.3 running on Solaris 7. Regards and best wishes, Justin Clift Database Administrator Michael Miyabara-McCaskey wrote: > > Hello there, > > I have a DB that I am attempting to buy "the right hardware" for. > > This DB today is about 25GB total, with typical tables in the area of 3GB in > size, and up to about 20 million records per table for the big tables. > > Is there any set of recommendations for sizing a DB using PostgreSQL? (I > have all the docs but have not found anything of any use). > > This DB itself will be mostly reads, for a decision system. However, there > will be a lot more data added in the future where the DB could easily grow > to about 150GB in size, and up to about 60millions records for the large > tables. > > My current setup is Linux RH7 w/PostgreSQL 7.0.2, my intention is to build a > large array of Linux Clusters, but I also can not find any documentation on > how if at all PostgreSQL handles a clustered environment. (Although I did > see something that said you could have multiple "databases" and use a create > view to stich together multiple tables from different DBs in a HOWTO on the > www.kernel.org/LDP I have also noticed that the doc is longer there, and > people who have posted about how to do this on these forums have been told > this method does not work). > > Any thoughts would be greatly appreciated. > > Michael Miyabara-McCaskey > Email: mykarz@miyabara.com > Web: http://www.miyabara.com/mykarz/ > Mobile: +1 408 504 9014
Justin, Thank you for the response. Out of curosity, what type of memory usage are you experiencing with the current setup (14GB)? And have you figured out what your upgrade path may be? For instance, I have not been able to find any evidence on how to make PostgreSQL scale (clustered nodes, parallel fail-over, high availability etc) How has your site decided to handle this? -Michael > -----Original Message----- > From: website@neptune.he.net > [mailto:website@neptune.he.net]On Behalf Of > Justin Clift > Sent: Tuesday, January 30, 2001 9:31 PM > To: mykarz@miyabara.com > Cc: pgsql-novice@postgresql.org; pgsql-general@postgresql.org > Subject: Re: Sizing of LARGE databases. > > > Hi Michael, > > I have recently been working on a database which is presently 14 GB in > size and will grow to nearly 1/2 terabyte in the next 12 months. This > database is mostly large static two column tables needing > fast searches. > > So far what I've done is split the table into various subtables - for > example the entries starting with 'AAA' go in one table, the entries > starting with 'AAB' go in another table, etc. After this is > done I then > index the individual tables. > > As each table and each index is it's own file, I then stop the > PostgreSQL daemon and move the individual files & indexes > onto separate > RAID drives, then create soft links from the > <postgres>/data/base/<database> directory to the files. > > i.e. > > /opt/postgres/data/base/<database>/AAA -> /drives/AAA/AAA > /opt/postgres/data/base/<database>/AAA_idx -> /drives/AAA/AAA_idx > /opt/postgres/data/base/<database>/AAB -> /drives/AAB/AAB > /opt/postgres/data/base/<database>/AAB_idx -> /drives/AAB/AAB_idx > > and so on. > > The biggest limitation I have found with this is the > /opt/postgres/data/pg_log file seeming to need to log (write) > bunches of > data, even when just doing searches (reads) on indexes on > other tables. > No matter how fast all the other disk subsystems are, the speed of the > disk system the pg_log file is on creates an 'artificial' upper > throughput limit. > > My recommendation would be (for mostly static data just doing > look-ups) > to split things into many logical tables and move these tables onto > seperate RAID subsystems. Then put the pg_log file onto the fastest > disk subsystem you can buy. I haven't yet moved the pg_log file to a > different disk than the main postgresql installation and > created a soft > link to it (this is the next step) but hopefully it won't be > a problem. > > The system in question is PostgreSQL 7.0.3 running on Solaris 7. > > Regards and best wishes, > > Justin Clift > Database Administrator > > Michael Miyabara-McCaskey wrote: > > > > Hello there, > > > > I have a DB that I am attempting to buy "the right hardware" for. > > > > This DB today is about 25GB total, with typical tables in > the area of 3GB in > > size, and up to about 20 million records per table for the > big tables. > > > > Is there any set of recommendations for sizing a DB using > PostgreSQL? (I > > have all the docs but have not found anything of any use). > > > > This DB itself will be mostly reads, for a decision system. > However, there > > will be a lot more data added in the future where the DB > could easily grow > > to about 150GB in size, and up to about 60millions records > for the large > > tables. > > > > My current setup is Linux RH7 w/PostgreSQL 7.0.2, my > intention is to build a > > large array of Linux Clusters, but I also can not find any > documentation on > > how if at all PostgreSQL handles a clustered environment. > (Although I did > > see something that said you could have multiple "databases" > and use a create > > view to stich together multiple tables from different DBs > in a HOWTO on the > > www.kernel.org/LDP I have also noticed that the doc is > longer there, and > > people who have posted about how to do this on these forums > have been told > > this method does not work). > > > > Any thoughts would be greatly appreciated. > > > > Michael Miyabara-McCaskey > > Email: mykarz@miyabara.com > > Web: http://www.miyabara.com/mykarz/ > > Mobile: +1 408 504 9014 >
On Wed, 31 Jan 2001, Justin Clift wrote: > The biggest limitation I have found with this is the > /opt/postgres/data/pg_log file seeming to need to log (write) bunches of > data, even when just doing searches (reads) on indexes on other tables. Is there a way to turn off this logging? How about setting a virtual disk big enough to hold 5 to 10 minutes worth of logs and then moving/deleting the log before that time (if it is possible to delete/move the log).
On Thu, 1 Feb 2001, Michael Miyabara-McCaskey wrote: > > And have you figured out what your upgrade path may be? For instance, I > have not been able to find any evidence on how to make PostgreSQL scale > (clustered nodes, parallel fail-over, high availability etc) I believe 7.1 will have some level of replication, although it seems 7.2 will be the "replication" release. I don't know how failover would work, but one possibility may be to use a program like Understudy which monitors an IP address and if a machine dies then it seems requests to the secondary machine. That combined with replication from the database may help prevent down time and may also be used to increase performance by using round-robin with Understudy.
> On Wed, 31 Jan 2001, Justin Clift wrote: > > The biggest limitation I have found with this is the > > /opt/postgres/data/pg_log file seeming to need to log (write) bunches of > > data, even when just doing searches (reads) on indexes on other tables. This is not true. 1) We do not write "bunches of data" into pg_log. We write just 2 bits per transaction. 2) Simple read-only SELECT does not write pg_log at all. -- Tatsuo Ishii