Hi!
I apologize if this is documented elsewhere, but I have looked and failed
to find it.
Setup:
I have a database that contains around 50 tables, and they range in size
from several hundred megabytes, to several gigabytes in size, and between
several tens of thousands of records, to several tens of millions of
records.
These tables grow with a daily data intake by between 100 and 500 MB per
day.
What I am considering:
I am considering splitting the database into tables residing on separate
machines, and connect them on one master node.
The question I have is:
1) How can I do this using PostgreSQL? Is there a standard way of attaching
tables from external databases which reside on different machines?
2) Does the query optimizer understand this sort of setup and make
appropriate adjustments to it's operation? E.g. select subsets from each
table separately from each remote table, so that selects run on the remote
machine, and then do a multi-table join based on that. This is probably a
rather poor example, as there are much more efficient ways to minimize the
amount of work to be done for this sort of thing, but I am only trying to
illustrate the question.
3) What sort of performance increase can I expect from distributing the
database in this manner? What operations would benefit most?
4) Can I set up a "master node" database which connects all the external
tables, and presents them as if they were local to clients? I would like to
keep using the "cluster" in the same way I use the current system, by
making one machine handle all external requests, and taking care of what
goes where and on which "slave node".
If these questions are answered elsewhere, please point me in the right
direction.
Regards.
Gordan