SQL queries accessing tables in more than one db - Mailing list pgsql-hackers

From De Clarke
Subject SQL queries accessing tables in more than one db
Date
Msg-id 199806222302.QAA14459@celeste.ucolick.org
Whole thread Raw
Responses Re: [HACKERS] SQL queries accessing tables in more than one db  (ocie@paracel.com)
List pgsql-hackers
RE:  ability to access tables from different DB in same SQL query

I have a good example right now of why this is a good feature,
and why I suspect it is impractical or impossible to implement
one of my bread-n-butter sybase-based apps using PG.  This is
rather a fraught question, because NOT migrating to PG may cost
us thousands of dollars.  More like tens of thousands, since we
would need to bring our old licenses up to date (I just found a
horrendous bug in Sybase SQL Server 4.9 date arithmetic) and
buy a couple of new licenses for our project.  So a lot is
riding on our ability to port existing apps to PG.

I don't think this feature is in SQL92, so I'm making a plea for
an "extra", another way in which PG can be "better than" the bare
standard.

-------

There is a telemetry gathering daemon (several actually) dumping
telemetry into tables.  The tables are large (within the limits
of Sybase, which are 250 fields per record and 3700-some bytes).
Say about 220 fields per record, in the high hundreds of bytes,
all ints and floats, and the tables are ever growing in length.
Say between 10K and 250K records depending on how long the logger
has been running and at what frequency telemetry has been gathered.

The engineering staff person wants to do some analysis on these
data.  The front end app provides an easy, friendly way to select
a date range (or other RSE) from the huge mass of telemetry,
and to refine the FSE.

The selected data are then copied into a temp table using a
SELECT INTO.

Here's the point, then.  The user's temp table wants to live in
a DB with generous permissions:  ordinary users can create and
delete tables!  But the original telemetry data want to live in
a very protected DB where users absolutely cannot mess with the
original tables OR go creating tables of their own that compete
with the originals for storage space.

This is where two important features of Sybase come in handy, and
I don't think Oracle does this (correct me, O Oracle users, if I'm
slandering the product):  Different DB can be located on different
partitions, or different inviolable physical chunks of one
partition.  One server can "see" multiple DB.  And Sybase SQL queries
can span databases, that is, the database name is part of the FQON
(fully qual object name).  So it's as easy as

    select * into sandbox.guest.DMyn_de897082595_D1 from
        telem.dbo.hires_Log_1 where logstamp between
        'Jun 19 1998 03:00' and 'Jun 19 1988 08:00'

The user has no privs other than 'select' anywhere in the telem DB.
He/she now works freely with the smaller, lighter table in the
sandbox DB (*not* repeating expensive queries against the potentially
very, very large table in 'telem').  The user could, of course,
specify a very large range of data and create quite a large temp table
(there are some safety limits in the app, but they are generous).
The worst result of this would be that the user would get bored waiting
for the query to return :-), the dataset would be too big to plot easily,
and other users of the sandbox DB would be annoyed when all the disk
space was used up.

Now, as I understand PG, the user would have to create the temp table
in the incoming telemetry db, because the SELECT query could not
reference tables in 2 different DB.   So he/she would have to be granted
table create/delete in a place where creation of large tables could
easily interfere with the essential job of logging the incoming telemetry.
Not acceptable.  The user playpen has to be separate from the incoming
production data.

If PG offers some other way of

    guaranteeing inviolable space for tables that grow with a
        realtime feed,
    yet making those tables accessible -- via SQL query --
        to nonpriv users who want to grab chunks of the data into
        temp tables created on the fly,

I would very much like to know how to do that.  Yeah, I could do it
by buffering the data row by row in the app and then inserting it from
the app into the new table, but what the heck is SQL for if not to do
that job more efficiently and concisely?  SELECT INTO is the right
syntax.  But the limitations of PG prevent my using it.

Yeah, I could mirror the whole telemetry db to a "public" server
periodically, and let the users query that.  But the engineers want
up-to-the-last-sample data to analyze;  they can change the sampling
speed in real time, if something interesting is happening;  they don't
want to wait for the 24-hourly or 12-hourly flush to the warehouse.

Solutions using psql special commands are not acceptable;  it has to
be SQL.

I hope this makes the case for cross-DB queries, which I have agitated
feebly for in the past but never really justified.  If not (if there is
a good way to achieve the same result in PG without x-db queries) then
please do tell!

de


pgsql-hackers by date:

Previous
From: Keith Parks
Date:
Subject: Re: [HACKERS] Divide by zero error on SPARC/Linux.
Next
From: ocie@paracel.com
Date:
Subject: Re: [HACKERS] SQL queries accessing tables in more than one db