Thread: Access restriction

Access restriction

From

Jeff Davis

Date:

28 November 2000, 21:44:54

I am trying to set up a database system where each user has his/her own
database without access outside of it of any kind. This seemed simple,
but I cannot figure out how to create a user who cannot log into all
databases, or if I use "grant" it only gives them access to one class.

Thanks,
    Jeff Davis

Re: Access restriction

From

Tom Lane

Date:

29 November 2000, 00:11:59

Jeff Davis <jdavis@wasabimg.com> writes:
> I am trying to set up a database system where each user has his/her own
> database without access outside of it of any kind.

I think you want "sameuser" in pg_hba.conf.

            regards, tom lane

Re: Access restriction

From

Date:

29 November 2000, 03:42:52

On Wed, Nov 29, 2000 at 12:12:00AM -0500, some SMTP stream spewed forth:
> Jeff Davis <jdavis@wasabimg.com> writes:
> > I am trying to set up a database system where each user has his/her own
> > database without access outside of it of any kind.
>
> I think you want "sameuser" in pg_hba.conf.

Would you mind elaborating on that a bit for me?
i.e. Where should it go in pg_hba.conf?

I don't think I saw this in the docs...

Thanks.

gh

>
>             regards, tom lane

Database cluster?

From

"Gordan Bobic"

Date:

29 November 2000, 04:26:30

Hi!

I apologize if this is documented elsewhere, but I have looked and failed
to find it.

Setup:

I have a database that contains around 50 tables, and they range in size
from several hundred megabytes, to several gigabytes in size, and between
several tens of thousands of records, to several tens of millions of
records.

These tables grow with a daily data intake by between 100 and 500 MB per
day.

What I am considering:

I am considering splitting the database into tables residing on separate
machines, and connect them on one master node.

The question I have is:

1) How can I do this using PostgreSQL? Is there a standard way of attaching
tables from external databases which reside on different machines?

2) Does the query optimizer understand this sort of setup and make
appropriate adjustments to it's operation? E.g. select subsets from each
table separately from each remote table, so that selects run on the remote
machine, and then do a multi-table join based on that. This is probably a
rather poor example, as there are much more efficient ways to minimize the
amount of work to be done for this sort of thing, but I am only trying to
illustrate the question.

3) What sort of performance increase can I expect from distributing the
database in this manner? What operations would benefit most?

4) Can I set up a "master node" database which connects all the external
tables, and presents them as if they were local to clients? I would like to
keep using the "cluster" in the same way I use the current system, by
making one machine handle all external requests, and taking care of what
goes where and on which "slave node".

If these questions are answered elsewhere, please point me in the right
direction.

Regards.

Gordan

Re: Access restriction

From

Tom Lane

Date:

29 November 2000, 10:50:26

GH <grasshacker@over-yonder.net> writes:
>> I think you want "sameuser" in pg_hba.conf.

> Would you mind elaborating on that a bit for me?
> i.e. Where should it go in pg_hba.conf?
> I don't think I saw this in the docs...

It's right in pg_hba.conf:

# Format:
#
#   host  DBNAME  IP_ADDRESS  ADDRESS_MASK  AUTHTYPE  [AUTH_ARGUMENT]
#
# DBNAME is the name of a PostgreSQL database, or "all" to indicate all
# databases, or "sameuser" to restrict a user's access to a database with
# the same name as the user.

One thing this doesn't make too clear is that the "restriction" really
means that this record *matches* if the username and the database name
are the same.  Else the postmaster moves on, looking for another
matching record.  A possibly useful example:

localhost    sameuser            ident
localhost    all                passwd passfile

This would mean that a user would get let into his own database on
the basis of IDENT checking, and would get let into other databases
on the basis of password checking using passwords in $PGDATA/passfile.
Since you'd control the contents of this last file, this would allow
you to grant access to all databases to only selected users.
You want some sort of escape hatch like that, at least for the DBA ---
else he couldn't get into any database but his own, either.  Superusers
aren't super as far as the postmaster is concerned... they have to pass
the authentication tests the same as mere mortals.

            regards, tom lane

Re: Database cluster?

From

Peter Eisentraut

Date:

29 November 2000, 13:09:00

Gordan Bobic writes:

> I am considering splitting the database into tables residing on separate
> machines, and connect them on one master node.
>
> The question I have is:
>
> 1) How can I do this using PostgreSQL?

You can't.

--
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/

Re: Database cluster?

From

"Steve Wolfe"

Date:

29 November 2000, 13:28:48

> > I am considering splitting the database into tables residing on separate
> > machines, and connect them on one master node.
> >
> > The question I have is:
> >
> > 1) How can I do this using PostgreSQL?
>
> You can't.

   I'll jump in with a bit more info.  Splitting tables across multiple
machines would do nothing more than make the entire system run at a snail's
pace.  Yes, it would slow it down immensely, because you just couldn't move
data between machines quickly enough.

  Why?  Well, whenever you join two tables that are on different machines,
the tables have to go across whatever sort of connection you have between
the machines.  Even if you use gigabit ethernet, you are still running at a
mere fraction of the bandwidth of the computer's internal bus - and at
orders of magnitude greater latency.  You'd have lots of CPU's sitting
around, doing absolutely nothing, waiting for data to come across the wire.

   There are alternatives, such as IP-over-SCSI.  That reduces the latency
of ethernet quite a bit, and gives you much more bandwidth (say, up to 160
megabytes/second).  However, that's still a pittance compared to the main
system bus inside your computer.

    That's one of the greatest hurdles to distributed computing.  That's why
the applications that are best adapted to distributed computing are those
that don't require much data over the wire - which certainly doesn't apply
to databases. : )

steve

Re: Database cluster?

From

"Gordan Bobic"

Date:

30 November 2000, 04:07:28

> > > I am considering splitting the database into tables residing on
separate
> > > machines, and connect them on one master node.
> > >
> > > The question I have is:
> > >
> > > 1) How can I do this using PostgreSQL?
> >
> > You can't.
>
>    I'll jump in with a bit more info.  Splitting tables across multiple
> machines would do nothing more than make the entire system run at a
snail's
> pace.  Yes, it would slow it down immensely, because you just couldn't
move
> data between machines quickly enough.

I don't believe that is the case. In my case, queries typically return
comparatively small amounts of data. Around 100 records at most. The amount
of data that needs to be transferred is comparatively small, and even over
10 Mb ethernet, it would take at most about a second to transfer. This is a
much smaller delay than the query time itself, which can take 10 seconds or
more. Remember that I said there are tables with over 30M records? Doing
multi-table joins on things like this takes a long time...

So, splitting the data in such a way that one table is queried, and then
tables joined from it are queried in parallel, would cause a signifficant
speed-up.

For example, say we have tables T1, T2 and T3.

T1 has fields F1.1, F1.2 and F1.3. T2 has F2.1 and T3 has F3.1 (at least,
probably lots of other fields.

Say I want to do
SELECT *
FROM T1
WHERE F1.1 = F2.1 AND F1.2 = F3.1 AND F1.3 = 'somedata';

Then F1.3 could be searched for 'somedata'. When the records are found,
this could be cross-matched remotely, in parallel for F1.1=F2.1 and
F1.2=F3.1, on different machines.

This means that depending on the type, configuration and the usage of the
database, a potentially massive improvement in performance could be
achiveved, especially on multi-table joins which span lots of BIG tables.

Somebody mentioned the fact that postgres uses IPC for communicating
between processes. I think there are tools for clustering (I am not sure if
Mosix supports transparently allowing IPC across nodes) which can work
around that.

>   Why?  Well, whenever you join two tables that are on different
machines,
> the tables have to go across whatever sort of connection you have between
> the machines.  Even if you use gigabit ethernet, you are still running at
a
> mere fraction of the bandwidth of the computer's internal bus - and at
> orders of magnitude greater latency.  You'd have lots of CPU's sitting
> around, doing absolutely nothing, waiting for data to come across the
wire.

Gigabit ethernet has around the same bandwidth as PCI bus. I suppose it all
depends on what machine you have running this. This would be true in the
case that the datbase server is a nice big Alpha with severl CPUs.

>    There are alternatives, such as IP-over-SCSI.  That reduces the
latency
> of ethernet quite a bit, and gives you much more bandwidth (say, up to
160
> megabytes/second).  However, that's still a pittance compared to the main
> system bus inside your computer.

But SCSI is still 160MB burst (not sustained, unless you're using very
expensive arrays). And Gigabit ethernet is 133 MB/s, albeit with greater
latency.

>     That's one of the greatest hurdles to distributed computing.  That's
why
> the applications that are best adapted to distributed computing are those
> that don't require much data over the wire - which certainly doesn't
apply
> to databases. : )

I think it depends whether the amount of data is the problem, or fitting it
together.

Somebody please explain to me further why I am wrong in all this?

Regards.

Gordan

Re: Database cluster?

From

Alain Toussaint

Date:

30 November 2000, 15:30:08

> Somebody mentioned the fact that postgres uses IPC for communicating
> between processes. I think there are tools for clustering (I am not sure if
> Mosix supports transparently allowing IPC across nodes) which can work
> around that.

one of those tool is distributed ipc <http://wallybox.cei.net/dipc/> but
it only work with Linux,AFAIK,the software there is just a patch to the
Linux kernel and a daemon.

Alain

RE: Database cluster?

From

"Alistair Hopkins"

Date:

04 December 2000, 15:19:34

http://www.openlinksw.com/virtuoso/

would let you do this sort of thing.

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]On Behalf Of Gordan Bobic
Sent: Thursday, November 30, 2000 9:02 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database cluster?

> > > I am considering splitting the database into tables residing on
separate
> > > machines, and connect them on one master node.
> > >
> > > The question I have is:
> > >
> > > 1) How can I do this using PostgreSQL?
> >
> > You can't.
>
>    I'll jump in with a bit more info.  Splitting tables across multiple
> machines would do nothing more than make the entire system run at a
snail's
> pace.  Yes, it would slow it down immensely, because you just couldn't
move
> data between machines quickly enough.

I don't believe that is the case. In my case, queries typically return
comparatively small amounts of data. Around 100 records at most. The amount
of data that needs to be transferred is comparatively small, and even over
10 Mb ethernet, it would take at most about a second to transfer. This is a
much smaller delay than the query time itself, which can take 10 seconds or
more. Remember that I said there are tables with over 30M records? Doing
multi-table joins on things like this takes a long time...

So, splitting the data in such a way that one table is queried, and then
tables joined from it are queried in parallel, would cause a signifficant
speed-up.

For example, say we have tables T1, T2 and T3.

T1 has fields F1.1, F1.2 and F1.3. T2 has F2.1 and T3 has F3.1 (at least,
probably lots of other fields.

Say I want to do
SELECT *
FROM T1
WHERE F1.1 = F2.1 AND F1.2 = F3.1 AND F1.3 = 'somedata';

Then F1.3 could be searched for 'somedata'. When the records are found,
this could be cross-matched remotely, in parallel for F1.1=F2.1 and
F1.2=F3.1, on different machines.

This means that depending on the type, configuration and the usage of the
database, a potentially massive improvement in performance could be
achiveved, especially on multi-table joins which span lots of BIG tables.

Somebody mentioned the fact that postgres uses IPC for communicating
between processes. I think there are tools for clustering (I am not sure if
Mosix supports transparently allowing IPC across nodes) which can work
around that.

>   Why?  Well, whenever you join two tables that are on different
machines,
> the tables have to go across whatever sort of connection you have between
> the machines.  Even if you use gigabit ethernet, you are still running at
a
> mere fraction of the bandwidth of the computer's internal bus - and at
> orders of magnitude greater latency.  You'd have lots of CPU's sitting
> around, doing absolutely nothing, waiting for data to come across the
wire.

Gigabit ethernet has around the same bandwidth as PCI bus. I suppose it all
depends on what machine you have running this. This would be true in the
case that the datbase server is a nice big Alpha with severl CPUs.

>    There are alternatives, such as IP-over-SCSI.  That reduces the
latency
> of ethernet quite a bit, and gives you much more bandwidth (say, up to
160
> megabytes/second).  However, that's still a pittance compared to the main
> system bus inside your computer.

But SCSI is still 160MB burst (not sustained, unless you're using very
expensive arrays). And Gigabit ethernet is 133 MB/s, albeit with greater
latency.

>     That's one of the greatest hurdles to distributed computing.  That's
why
> the applications that are best adapted to distributed computing are those
> that don't require much data over the wire - which certainly doesn't
apply
> to databases. : )

I think it depends whether the amount of data is the problem, or fitting it
together.

Somebody please explain to me further why I am wrong in all this?

Regards.

Gordan