Thread: Postgres on shared network drive

Postgres on shared network drive

From
J Ottery
Date:
Using windows XP and TCP/IP network.

I install PostgreSQL on a client PC and put the data files on a
networked drive (instead of the local drive).  Postgres as user and
localport. This works well.

Now I install postgresSQL on another client machine and point it to
the same data directory on the network drive.

Will this setup work OK for multiple / concurrent users and accessing
the same tables on either of the client machines or is there something
else I need to be aware of?

Thanks for any advice.

Re: Postgres on shared network drive

From
Tomasz Ostrowski
Date:
On 2008-04-11 08:53, J Ottery wrote:

> I install PostgreSQL on a client PC and put the data files on a
> networked drive (instead of the local drive).  Postgres as user and
> localport. This works well.

This is not the way it is meant to work, and it can eat your data.

> Now I install postgresSQL on another client machine and point it to
> the same data directory on the network drive.

Wrong. You have to install PostgreSQL on one computer, with data
directory on local hard drive, and allow many client computers to
connect to it.

> Will this setup work OK for multiple / concurrent users and accessing
> the same tables on either of the client machines or is there something
> else I need to be aware of?

It won't work. Don't even try.

Regards
Tometzky
--
...although Eating Honey was a very good thing to do, there was a
moment just before you began to eat it which was better than when you
were...
                                                       Winnie the Pooh

Re: Postgres on shared network drive

From
Craig Ringer
Date:
J Ottery wrote:
> Using windows XP and TCP/IP network.
>
> I install PostgreSQL on a client PC and put the data files on a
> networked drive (instead of the local drive).  Postgres as user and
> localport. This works well.

I wouldn't personally trust this setup.

> Now I install postgresSQL on another client machine and point it to
> the same data directory on the network drive.

That's an *amazingly* bad idea, and shows that you've misunderstood how
PostgreSQL works.

PostgreSQL expects exclusive access to its data directory. No other
programs, be they other copies of postgresql or anything else, should
ever be allowed any kind of access to the postgresql data directory
except for the very few special cases mentioned in the documentation
(like archive_wal).

If you want multiple users, you need to have them connect over the
network to the same postgresql server. Say you have three computers and
you want them to have access to a database. Here's how it should work:

computer 1
(has postgresql server installed and enabled for tcp/ip connections)
(clients connect to "computer1")

computer 2
(clients connect to "computer1")

computer 3
(clients connect to "computer1")

... and so on. There is *no* way for computer 2 or computer 3 to access
the postgresql data files directly, only via the postgresql server
process. In fact, other programs on computer 1 should also be denied
access to the postgresql data directory.

PostgreSQL is a relational database management system. It's intended as
the sole way to access its data. If you need a simpler shared database
system where all programs open the database files directly over a
network share, there are options out there. SQLite might work, for example.

It's also not a great idea to put it on a network share. You should
explicitly verify that the database system you use is safe to use on the
particular type of network share you are using, because they often cause
problems.

Ideally the postgresql database should be on a server machine that's on
all the time. The postgresql data directory *must* not be shared. All
the client computers should connect to the postgresql server over the
network using a postgresql client like pgODBC, pgJDBC, the psql command
line, etc when they need database access. If you don't have one, maybe
it's time to get a small workgroup server to handle all your file
sharing, database requirements, etc. Even a spare PC will do for a
server in a pinch. Remember to keep good backups!

> Will this setup work OK for multiple / concurrent users and accessing
> the same tables on either of the client machines or is there something
> else I need to be aware of?

No, as far a I know it'll break horribly and eat all your data.

--
Craig Ringer

Re: Postgres on shared network drive

From
Craig Ringer
Date:
J Ottery wrote:
> Thanks so much Craig. I have decided to migrate to Postgres and most
> of my applications are single computer based but I need to plan for
> future needs. Some research is in order for me.

All you should need to do is allow the user / administrator to configure
the connection settings for the application. An initial single user
deployment can be done by installing the PostgreSQL server on the same
computer as the program(s) that use it and having the application
default to connecting to a database on "localhost". If you need to go
multi-user, you can either move the postgresql server to a separate
server computer and update the application settings to have them all
connect to the server computer, or you can just have the additional
users connect to the postgresql server on the first person's computer.
In either case no changes to the application code should be required;
you should only need to change the server hostname setting.

Your application doesn't necessarily have to care where the database
server is. If you use something like ODBC then your data source
definition specifies where the DB server is, and the app just asks to
open the data source. The app won't even be able to tell if the DB
server is on the same computer or not.

If you explicitly connect to the DB in your program code then all you
need to do is provide a way to change the database connection settings
in the application GUI or in a configuration file, so the user can enter
the database server hostname, server port, database username, and
database password.

--
Craig Ringer



Re: Postgres on shared network drive

From
Magnus Hagander
Date:
Tomasz Ostrowski wrote:
> On 2008-04-11 08:53, J Ottery wrote:
>
> > I install PostgreSQL on a client PC and put the data files on a
> > networked drive (instead of the local drive).  Postgres as user and
> > localport. This works well.
>
> This is not the way it is meant to work, and it can eat your data.

Change that to it *will* eat your data.

This is absolutely not supported. If it works, it's pure luck and very
temporary...


> > Now I install postgresSQL on another client machine and point it to
> > the same data directory on the network drive.
>
> Wrong. You have to install PostgreSQL on one computer, with data
> directory on local hard drive, and allow many client computers to
> connect to it.

Correct, that's how you do it.

If you for some reason need to run the server locally on each machine,
you need to still have the data directory locally, and set up
replication (with Slony for example) between the nodes. But I don't
think that's what you want.

//Magnus

Re: Postgres on shared network drive

From
"A. Kretschmer"
Date:
am  Thu, dem 10.04.2008, um 23:53:18 -0700 mailte J Ottery folgendes:
> Using windows XP and TCP/IP network.
>
> I install PostgreSQL on a client PC and put the data files on a
> networked drive (instead of the local drive).  Postgres as user and
> localport. This works well.
>
> Now I install postgresSQL on another client machine and point it to
> the same data directory on the network drive.

Sure, you can do that. But there are faster ways to destroy your data.


>
> Will this setup work OK for multiple / concurrent users and accessing
> the same tables on either of the client machines or is there something
> else I need to be aware of?

You idea is complete ill. PostgreSQL is a Server-Client-database, with
one Server and multiple Clients. You can't access to the same
database-files with multiple database-servers.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Postgres on shared network drive

From
"Pavan Deolasee"
Date:
On Fri, Apr 11, 2008 at 1:04 PM, A. Kretschmer
<andreas.kretschmer@schollglas.com> wrote:
>
>
>  You idea is complete ill. PostgreSQL is a Server-Client-database, with
>  one Server and multiple Clients. You can't access to the same
>  database-files with multiple database-servers.
>

I wonder if it would make sense to add support to mount database in
*read-only* mode from multiple servers though. I am thinking about
data warehouse kind of operations where multiple servers can be
used answer read-only queries. Is there a use case for such applications
in real world ?


Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: Postgres on shared network drive

From
J Ottery
Date:
On Apr 11, 5:34 pm, andreas.kretsch...@schollglas.com ("A.
Kretschmer") wrote:
> am  Thu, dem 10.04.2008, um 23:53:18 -0700 mailte J Ottery folgendes:
>
> > Using windows XP and TCP/IP network.
>
> > I install PostgreSQL on a client PC and put the data files on a
> > networked drive (instead of the local drive).  Postgres as user and
> > localport. This works well.
>
> > Now I install postgresSQL on another client machine and point it to
> > the same data directory on the network drive.
>
> Sure, you can do that. But there are faster ways to destroy your data.
>
>
>
> > Will this setup work OK for multiple / concurrent users and accessing
> > the same tables on either of the client machines or is there something
> > else I need to be aware of?
>
> You idea is complete ill. PostgreSQL is a Server-Client-database, with
> one Server and multiple Clients. You can't access to the same
> database-files with multiple database-servers.
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA  http://wwwkeys.de.pgp.net
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

I understand now Andreas. Thanks for the advice.

Re: Postgres on shared network drive

From
Craig Ringer
Date:
Pavan Deolasee wrote:
>
> I wonder if it would make sense to add support to mount database in
> *read-only* mode from multiple servers though. I am thinking about
> data warehouse kind of operations where multiple servers can be
> used answer read-only queries. Is there a use case for such applications
> in real world ?
>
I'm not sure that makes sense myself. The reason you 'd want multiple
read only instances is for performance and/or redundancy. Neither of
those goals are well served by having a shared data store.

A shared data store won't help performance much because both instances
will compete for I/O bandwidth. It might be faster if most of the
regularly used data and indexes fit in memory on the host, but even then
I'd personally be surprised if the cost of the network/shared storage
didn't counteract that at least in part.

For redundancy, you ideally want to avoid shared infrastructure that can
fail - like shared storage. It's likely to be better to keep separate
copies of the data store on each host.

There are systems - like Slony-I and log shipping replication - that can
keep servers in sync without shared storage, and are almost certainly
more useful than shared-storage DB servers.

What I do think would be very interesting would be the ability to have a
DB serving read-only queries while still reading in shipped WAL archives
as they arrive. That'd be a potential big win on performance because
each DB server could have its own I/O system, CPUs and RAM . With some
trickyness you could even forward queries that did require writes to the
master server transparently, while servicing read only queries locally.

--
Craig Ringer

Re: Postgres on shared network drive

From
"Dawid Kuroczko"
Date:
On Fri, Apr 11, 2008 at 2:54 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> Pavan Deolasee wrote:
> > I wonder if it would make sense to add support to mount database in
> > *read-only* mode from multiple servers though. I am thinking about
> > data warehouse kind of operations where multiple servers can be
> > used answer read-only queries. Is there a use case for such applications
> > in real world ?

Not quite workable.  Remember that table data is not always available on
the block device -- there are pages modified in the buffer cache (shared
memory), and other machines have no access to the other's shared memory
(and it would be a lot of work to do it efficiently).  Remember also about the
MVCC -- if your "read only copy machine" starts a complicated query on
some big_table, and in the meanwhile "read-write machine" decides the
big_table's pages can be reused... well your "read-only" machine doesn't
even have a way of knowing its returning garbage data. ;-)

Noow, if you really really want a read-only copy of the read write data
available over the network, many NAS/SAN devices will allow you to
make a snapshot of the database -- and you can use that snapshot as
a read-only copy of the database.  But then again, if you want a read-only
copy of a days/weeks old database, there are chaper and better ways of
doing it.

>  I'm not sure that makes sense myself. The reason you 'd want multiple read
> only instances is for performance and/or redundancy. Neither of those goals
> are well served by having a shared data store.
>
>  A shared data store won't help performance much because both instances will
> compete for I/O bandwidth. It might be faster if most of the regularly used
> data and indexes fit in memory on the host, but even then I'd personally be
> surprised if the cost of the network/shared storage didn't counteract that
> at least in part.

That is assuming your bottleneck is the I/O subsystem.  If your data fits nicely
in RAM, but you are CPU bound, sometimes it is sensible to have two
machines than having one twice as powerful machine.  Also its easier to
add third machine later, than to buy yet more powerful one.
But this if a field where YMMV.

A known implementation of such a set up would be Oracle RAC, where
you have a shared storage and N machines using it.

>  For redundancy, you ideally want to avoid shared infrastructure that can
> fail - like shared storage. It's likely to be better to keep separate copies
> of the data store on each host.
>
>  There are systems - like Slony-I and log shipping replication - that can
> keep servers in sync without shared storage, and are almost certainly more
> useful than shared-storage DB servers.
>
>  What I do think would be very interesting would be the ability to have a DB
> serving read-only queries while still reading in shipped WAL archives as
> they arrive. That'd be a potential big win on performance because each DB
> server could have its own I/O system, CPUs and RAM . With some trickyness
> you could even forward queries that did require writes to the master server
> transparently, while servicing read only queries locally.

Something like pgpool (which can forward read-write queries to a master, and
handle selects on a pool of read-only machines).

While I think pgpool, pgbouncer and Slony-I are great pieces of
software, I would
like to wake up one day and know that PostgreSQL can do it all internally, under
the hood, just like it does WAL-logging and startup recovery automatically. ;-)

   Regards,
       Dawid

Re: Postgres on shared network drive

From
"Pavan Deolasee"
Date:
On Sat, Apr 12, 2008 at 11:00 PM, Dawid Kuroczko <qnex42@gmail.com> wrote:

>
>  Not quite workable.  Remember that table data is not always available on
>  the block device -- there are pages modified in the buffer cache (shared
>  memory), and other machines have no access to the other's shared memory
>  (and it would be a lot of work to do it efficiently).  Remember also about the
>  MVCC -- if your "read only copy machine" starts a complicated query on
>  some big_table, and in the meanwhile "read-write machine" decides the
>  big_table's pages can be reused... well your "read-only" machine doesn't
>  even have a way of knowing its returning garbage data. ;-)
>

I am not suggesting one read-write and many read-only architecture. I am
rather suggesting all read-only systems. I would be interested in this
setup if I run large read-only queries on historical data and need easy
scalability. With read-only setup, you can easily add another machine to
increase computing power. Also, we may come up with cache-sharing
systems so that if a buffer is cached on some other node, that can
be transfered on a high speed interconnect, rather than reading from a
relatively slower disk.

>  Noow, if you really really want a read-only copy of the read write data
>  available over the network, many NAS/SAN devices will allow you to
>  make a snapshot of the database -- and you can use that snapshot as
>  a read-only copy of the database.  But then again, if you want a read-only
>  copy of a days/weeks old database, there are chaper and better ways of
>  doing it.
>
>

Yes. I was mostly assuming read-only scalability. What are the other
better ways to do so ?

>
>  A known implementation of such a set up would be Oracle RAC, where
>  you have a shared storage and N machines using it.
>

Oracle RAC is a multi-master kind of architecture where each node has
access to the shared storage and can directly read/write data.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Re: Postgres on shared network drive

From
"Dawid Kuroczko"
Date:
On Sat, Apr 12, 2008 at 8:11 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> On Sat, Apr 12, 2008 at 11:00 PM, Dawid Kuroczko <qnex42@gmail.com> wrote:
>  >  Not quite workable.  Remember that table data is not always available on
>  >  the block device -- there are pages modified in the buffer cache (shared
>  >  memory), and other machines have no access to the other's shared memory
>  >  (and it would be a lot of work to do it efficiently).  Remember also about the
>  >  MVCC -- if your "read only copy machine" starts a complicated query on
>  >  some big_table, and in the meanwhile "read-write machine" decides the
>  >  big_table's pages can be reused... well your "read-only" machine doesn't
>  >  even have a way of knowing its returning garbage data. ;-)
>  I am not suggesting one read-write and many read-only architecture. I am
>  rather suggesting all read-only systems. I would be interested in this
>  setup if I run large read-only queries on historical data and need easy
>  scalability. With read-only setup, you can easily add another machine to
>  increase computing power. Also, we may come up with cache-sharing
>  systems so that if a buffer is cached on some other node, that can
>  be transfered on a high speed interconnect, rather than reading from a
>  relatively slower disk.

For example, it coulde be done by an ability to start a cleanly shutdown
database in read-only mode.  I would see it as a very helpful companion
for a PITR recovery.

So in recover.conf you could say you want a recovery done until '10:00',
and then do some read-only queries, decide you want to recover until '10:15',
and so on until you find a place where someone did a big mistake.

Hmm, would be helpful to ask recovery process to create "write before logs",
i.e. logs which would allow you to 'roll back' whole recovery of the database
to '10:05', should you decide '10:15' is too late.

Possible TODO entry? ;-)

[...]
>  Yes. I was mostly assuming read-only scalability. What are the other
>  better ways to do so ?

I was thinking you were saying...  Let's drop the issue, I misunderstood. :-)

Regards,
    Dawid

Re: Postgres on shared network drive

From
Peter Wilson
Date:
Pavan Deolasee wrote:
[...]
>>
>
> I am not suggesting one read-write and many read-only architecture. I am
> rather suggesting all read-only systems. I would be interested in this
> setup if I run large read-only queries on historical data and need easy
> scalability. With read-only setup, you can easily add another machine to
> increase computing power. Also, we may come up with cache-sharing
> systems so that if a buffer is cached on some other node, that can
> be transfered on a high speed interconnect, rather than reading from a
> relatively slower disk.
>
You can have infinite scalability of a read-only database simply by copying the
database to each system. If it's historical data it's not "up to the minute". If
you want to periodically update the read-only databases then that's pretty
straightforward - with various options trading speed against ease - depending on
your system requirements.

>>
>>
>
> Yes. I was mostly assuming read-only scalability. What are the other
> better ways to do so ?
>
>>  A known implementation of such a set up would be Oracle RAC, where
>>  you have a shared storage and N machines using it.
>>
>
> Oracle RAC is a multi-master kind of architecture where each node has
> access to the shared storage and can directly read/write data.
>
> Thanks,
> Pavan
>

Re: Postgres on shared network drive

From
Gregory Stark
Date:
"Pavan Deolasee" <pavan.deolasee@gmail.com> writes:

> I am not suggesting one read-write and many read-only architecture. I am
> rather suggesting all read-only systems. I would be interested in this
> setup if I run large read-only queries on historical data and need easy
> scalability. With read-only setup, you can easily add another machine to
> increase computing power.

That's especially attractive if it could be done on a table-by-table basis. So
for example, if you froze all the tuples in a table and marked it read-only
then another database could mount that table be able to perform queries on it.

There are problems around making sure that all the databases mounting the
table are the same architecture and agree about the meanings of all the type
oids.

And I'm not sure how you would ever undo the situation. You would need some
way to indicate the read-only table is "locked" by the other databases so the
master database would know it couldn't be made read-write until they "unlock"
it.

> On Sat, Apr 12, 2008 at 11:00 PM, Dawid Kuroczko <qnex42@gmail.com> wrote:

>>  Noow, if you really really want a read-only copy of the read write data
>>  available over the network, many NAS/SAN devices will allow you to
>>  make a snapshot of the database -- and you can use that snapshot as
>>  a read-only copy of the database.  But then again, if you want a read-only
>>  copy of a days/weeks old database, there are chaper and better ways of
>>  doing it.

Actually that's a great way of doing it. I think we're going to run into this
"problem" more and more often, that there are good lower-level solutions to
the same problems we're tackling in the application.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!