Thread: Re: [GENERAL] CURRENT_TIMESTAMP

Re: [GENERAL] CURRENT_TIMESTAMP

From
Bruce Momjian
Date:
Martijn van Oosterhout wrote:
> On Sat, Sep 28, 2002 at 11:28:03PM -0400, Bruce Momjian wrote:
> >
> > OK, we have two db's returning statement start time, and Oracle 8 not
> > having CURRENT_TIMESTAMP.
> >
> > Have we agreed to make CURRENT_TIMESTAMP statement start, and now()
> > transaction start?  Is this an open item or TODO item?
>
> Well, I'd rather it didn't change at all. IMHO it's a feature, not a bug. In
> any case, if it does get changed we'll have to go through the documentation
> and work out whether we mean current_timestamp or now(). I think most people
> actually want now().

Well, I think we have to offer statement start time somewhere, and it
seems the standard probably requires that.  Two other databases do it
that way.  Oracle doesn't have CURRENT_TIMESTAMP in 8.X.  Can anyone
test on 9.X?

> Fortunatly where I work we only use now() so it won't really matter too
> much. Is there a compelling reason to change?

Yes, it will split now() and CURRENT_TIMESTAMP.  I personally would be
happy with STATEMENT_TIMESTAMP, but because the standard requires it we
may just have to fix CURRENT_TIMESTAMP.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [GENERAL] CURRENT_TIMESTAMP

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Sat, Sep 28, 2002 at 11:51:32PM -0400, Bruce Momjian wrote:
>> Yes, it will split now() and CURRENT_TIMESTAMP.  I personally would be
>> happy with STATEMENT_TIMESTAMP, but because the standard requires it we
>> may just have to fix CURRENT_TIMESTAMP.

> Well, my vote would be for STATEMENT_TIMESTAMP.

One problem with inventing STATEMENT_TIMESTAMP is that (if spelled that
way, without parens) it would have to become a fully-reserved keyword,
thus possibly breaking some applications that use that name now.

But the real point, I think, is that the folks pushing for this think
that the standard requires CURRENT_TIMESTAMP to be statement timestamp.
Inventing some other keyword isn't going to satisfy them.

I don't personally find the "it's required by the spec" argument
compelling, because the spec specifically says that the exact behavior
is implementation-dependent --- so anyone who assumes CURRENT_TIMESTAMP
will behave as start-of-statement timestamp is going to have portability
problems anyway.  Oracle didn't seem to find the argument compelling
either; at last report they have no statement-timestamp function.

I'd be happier with the whole thing if anyone had exhibited a convincing
use-case for statement timestamp.  So far I've not seen any actual
examples of situations that are not better served by either transaction
timestamp or true current time.  And the spec is perfectly clear that
CURRENT_TIMESTAMP does not mean true current time...

            regards, tom lane

Re: [GENERAL] CURRENT_TIMESTAMP

From
Josh Berkus
Date:
Tom,

> I'd be happier with the whole thing if anyone had exhibited a convincing
> use-case for statement timestamp.  So far I've not seen any actual
> examples of situations that are not better served by either transaction
> timestamp or true current time.  And the spec is perfectly clear that
> CURRENT_TIMESTAMP does not mean true current time...

Are we still planning on putting the three different versions of now() on the
TODO?  I.e.,
now('transaction'),
now('statement'), and
now('immediate')
With now() = now('transaction')?

I still think it's a good idea, provided that we have some easy means to
determine now('statement').

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: [GENERAL] CURRENT_TIMESTAMP

From
Bruce Momjian
Date:
Josh Berkus wrote:
>
> Tom,
>
> > I'd be happier with the whole thing if anyone had exhibited a convincing
> > use-case for statement timestamp.  So far I've not seen any actual
> > examples of situations that are not better served by either transaction
> > timestamp or true current time.  And the spec is perfectly clear that
> > CURRENT_TIMESTAMP does not mean true current time...
>
> Are we still planning on putting the three different versions of now() on the
> TODO?  I.e.,
> now('transaction'),
> now('statement'), and
> now('immediate')
> With now() = now('transaction')?
>
> I still think it's a good idea, provided that we have some easy means to
> determine now('statement').

I did a  little more research on CURRENT_TIMESTAMP.  I read the Oracle
docs, and while they mention it, they don't say if the date is xact,
statement, or timeofday.  They do mention it was only added in their
newest product, 9.X, so it isn't surpising no one is using it.

I also researched the SQL99 standards and found a much more specific
definition:

         3) Let S be an <SQL procedure statement> that is not generally
            contained in a <triggered action>. All <datetime value
            function>s that are generally contained, without an intervening
            <routine invocation> whose subject routines do not include an
            SQL function, in <value expression>s that are contained either
            in S without an intervening <SQL procedure statement> or in an
            <SQL procedure statement> contained in the <triggered action>
            of a trigger activated as a consequence of executing S, are
            effectively evaluated simultaneously. The time of evaluation of
            a <datetime value function> during the execution of S and its
            activated triggers is implementation-dependent.

They basically seem to be saying that CURRENT_TIMESTAMP has to be the
same for all triggers as it is for the submitted SQL statement.  When
they say "the time of evaluation ... is implementation-dependent" they
mean that is can be the beginning of the statement, or the end of the
statement.  In fact, you can make a strong argument that it should be
the statement end time that is the proper time, but for implementation
reasons, it is certainly easier to make it start.

Now, they are _not_ saying the statement can't have the same time as
other statements in the transaction, but I don't see why they would
explicitly have to state that.  They say statement, so I think we need
to follow that if we want to be standard-compliant.  We already have two
other databases who are doing this timing at statement level.

If we change CURRENT_TIMESTAMP to statement time, I don't think we need
now(""), but if we don't change it, I think we do --- somehow we should
allow users to access statement time.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [GENERAL] CURRENT_TIMESTAMP

From
Josh Berkus
Date:
Bruce,

> If we change CURRENT_TIMESTAMP to statement time, I don't think we need
> now(""), but if we don't change it, I think we do --- somehow we should
> allow users to access statement time.

I'd argue that we need the 3 kinds of now() regardless, just to limit user
confusion.   If we set things up as:

now() = transaction time
current_timestamp = statement time
timeofday() = exact time

That does give users access to all 3 timestamps, but using a competely
non-intuitive nomenclature.  It's likely that the three types of now() would
just be pointers to other time functions, but would provide nomenative
clarity.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: [GENERAL] CURRENT_TIMESTAMP

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Bruce,
>
> > If we change CURRENT_TIMESTAMP to statement time, I don't think we need
> > now(""), but if we don't change it, I think we do --- somehow we should
> > allow users to access statement time.
>
> I'd argue that we need the 3 kinds of now() regardless, just to limit user
> confusion.   If we set things up as:
>
> now() = transaction time
> current_timestamp = statement time
> timeofday() = exact time
>
> That does give users access to all 3 timestamps, but using a competely
> non-intuitive nomenclature.  It's likely that the three types of now() would
> just be pointers to other time functions, but would provide nomenative
> clarity.

I agree, having now() as a central place for time information is a good
idea.  Maybe we need to vote on these issues.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [GENERAL] arrays

From
Mike Sosteric
Date:
On Sun, 29 Sep 2002, Bruce Momjian wrote:

Apologies in advance if there is a more appropriate list.

We are currently developing a database to host some complicated, XMl
layered data. We have chosen postgres because of its ability to store
multidimensional arrays. We feel that using these will allow us to
simplify the database structure considerably by storing some data in
multidimensional arrays.

However, we currently have some dissenters who believe that using the
multidimensional arrays will make queries slower and unneccesarily
complicated. Its hard for us to evaluate in advance because none of us
have much experience with postgres (we are web based and have relied on
MySQL for most projects up to this point).

I have several questions related to the scenario above.

1) are SQL queries slower when extracting data from multidimensional
arrays
2) are table joins more difficult or unneccesarily complicated
3) can you do selects on only a portion of a multidimensional array. That
is, if you were storing multilanguage titles in a two dimensional array,

[en], "english title"
[fr], "french title"

could you select where title[0] = 'en'

I know these may sound like terribily stupid questions. but we need some
quick guidance before proceeding with a schema that relies on these
advanced data features of postgres

tia

mike


___
    This communication is intended for the use of the recipient to whom it
    is addressed, and may contain confidential, personal, and or privileged
    information. Please contact us immediately if you are not the intended
    recipient of this communication, and do not copy, distribute, or take
    action relying on it. Any communications received in error, or
    subsequent reply, should be deleted or destroyed.
---

Re: [GENERAL] CURRENT_TIMESTAMP

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Are we still planning on putting the three different versions of now() on the
> TODO?  I.e.,
> now('transaction'),
> now('statement'), and
> now('immediate')
> With now() = now('transaction')?

I have no objection to doing that.  What seems to be contentious is
whether we should change the current behavior of CURRENT_TIMESTAMP.

            regards, tom lane

Re: [GENERAL] CURRENT_TIMESTAMP

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Now, they are _not_ saying the statement can't have the same time as
> other statements in the transaction, but I don't see why they would
> explicitly have to state that.

Allow me to turn that around: given that they clearly do NOT state that,
how can you argue that "the spec requires it"?  AFAICS the spec does not
require it.  In most places they are considerably more explicit than
this about stating what is required.

> We already have two other databases who are doing this timing at
> statement level.

The behavior of CURRENT_TIMESTAMP is clearly stated by the spec to be
implementation-dependent.  We are under no compulsion to follow any
specific other implementation.  If we were going to follow some other
lead, I'd look to Oracle first...

> If we change CURRENT_TIMESTAMP to statement time, I don't think we need
> now(""), but if we don't change it, I think we do --- somehow we should
> allow users to access statement time.

I have no problem with providing a function to access statement time,
and now('something') seems a reasonable spelling of that function.
But I think the argument that we should change our historical behavior
of CURRENT_TIMESTAMP is very weak.

One reason why I have a problem with the notion that the spec requires
CURRENT_TIMESTAMP to mean "time of arrival of the current interactive
command" (which is the only specific definition I've seen mentioned
here) is that the spec does not truly have a notion of interactive
command to begin with.  AFAICT the spec's model of command execution
is ecpg-like: you have commands embedded in a calling language with
all sorts of opportunities for pre-planning, pre-execution, etc.
The notion of command arrival time is extremely fuzzy in this model.
It could very well be the time you compiled the ecpg application, or
the time you started the application running.

            regards, tom lane

Re: [GENERAL] arrays

From
Bruno Wolff III
Date:
On Sun, Sep 29, 2002 at 18:12:55 -0600,
  Mike Sosteric <mikes@athabascau.ca> wrote:
> On Sun, 29 Sep 2002, Bruce Momjian wrote:
>
> 3) can you do selects on only a portion of a multidimensional array. That
> is, if you were storing multilanguage titles in a two dimensional array,
>
> [en], "english title"
> [fr], "french title"
>
> could you select where title[0] = 'en'

It is unusual to want to store arrays in a database. Normally you want to
use additional tables instead. For example multilanguage titles is something
I would expect to be in a table that had a column referencing back to
another table defining the object a title was for, a column with the
title and a column with the language.

Re: [GENERAL] arrays

From
Mike Sosteric
Date:
On Mon, 30 Sep 2002, Bruno Wolff III wrote:

> > 3) can you do selects on only a portion of a multidimensional array. That
> > is, if you were storing multilanguage titles in a two dimensional array,
> >
> > [en], "english title"
> > [fr], "french title"
> >
> > could you select where title[0] = 'en'
>
> It is unusual to want to store arrays in a database. Normally you want to
> use additional tables instead. For example multilanguage titles is something
> I would expect to be in a table that had a column referencing back to
> another table defining the object a title was for, a column with the
> title and a column with the language.
>

The chances are very very good that in 99% of the cases we'd only ever
have a single title. multiple titles would be rare. and, to make it worse,
there are several instances of this where you need a table but its seems
overkill for the odd 1% time when you actually need teh extra row.

of course, the there'd be a language lookup table.

what about the speed and query issue?
m


___
    This communication is intended for the use of the recipient to whom it
    is addressed, and may contain confidential, personal, and or privileged
    information. Please contact us immediately if you are not the intended
    recipient of this communication, and do not copy, distribute, or take
    action relying on it. Any communications received in error, or
    subsequent reply, should be deleted or destroyed.
---

Re: [GENERAL] arrays

From
Achilleus Mantzios
Date:
On Mon, 30 Sep 2002, Bruno Wolff III wrote:

>
> It is unusual to want to store arrays in a database. Normally you want to
> use additional tables instead. For example multilanguage titles is something
> I would expect to be in a table that had a column referencing back to
> another table defining the object a title was for, a column with the
> title and a column with the language.

I think arrays are one of the cool features of postgres
(along with gist indexes).

Here are some common uses:

- Tree representation (the genealogical from child to ancestors approach)
- Storing of polynomial formulae of arbitary degree

checkout the intarray package in contrib for further info.

I think pgsql arrays provide a natural solution to certain problems
where it fits.



==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: [GENERAL] arrays

From
Bruno Wolff III
Date:
On Mon, Sep 30, 2002 at 06:38:56 -0600,
  Mike Sosteric <mikes@athabascau.ca> wrote:
> On Mon, 30 Sep 2002, Bruno Wolff III wrote:
>
> The chances are very very good that in 99% of the cases we'd only ever
> have a single title. multiple titles would be rare. and, to make it worse,
> there are several instances of this where you need a table but its seems
> overkill for the odd 1% time when you actually need teh extra row.
>
> of course, the there'd be a language lookup table.
>
> what about the speed and query issue?

The book or movie or whatever table should have an index on something
(say bookid). Then make an index on the title table on bookid. This
makes getting the titles for a specific book fairly efficient.

I think using a simpler design (i.e. tables in preference to arrays)
will make doing the project easier. This may override any speed up
you get using arrays.

Re: [GENERAL] arrays

From
Tom Lane
Date:
Mike Sosteric <mikes@athabascau.ca> writes:
> could you select where title[0] = 'en'

You certainly could ... but bear in mind that there's no convenient way
to make such a query be indexed, at present.  So any values that you
actually want to use as search keys had better be in their own fields.

Now, if you are just using this as an extra search condition that picks
one row out of a small number that are identified by another WHERE
clause, then it's good enough to index for the other clause, and so the
lack of an index for title[0] isn't an issue.  In this case, with only
a small number of possible values for title[0], it seems that an index
wouldn't be helpful anyway.

            regards, tom lane

Re: [GENERAL] arrays

From
"Josh Berkus"
Date:
Mike,

> We are currently developing a database to host some complicated, XMl
> layered data. We have chosen postgres because of its ability to store
> multidimensional arrays. We feel that using these will allow us to
> simplify the database structure considerably by storing some data in
> multidimensional arrays. 

Hmmm ... I'm curious; what kind of data do you feel could be
*simplified* by multi-dimensional arrays?   

> However, we currently have some dissenters who believe that using the
> multidimensional arrays will make queries slower and unneccesarily
> complicated. 

They're correct, especially about the latter.

> 1) are SQL queries slower when extracting data from multidimensional
> arrays

Yes, but this is fixable; see the Intarray package in /contrib.

> 2) are table joins more difficult or unneccesarily complicated

Yes.

> 3) can you do selects on only a portion of a multidimensional array.

Yes.

> That
> is, if you were storing multilanguage titles in a two dimensional
> array, 
> 
> [en], "english title"
> [fr], "french title"
> 
> could you select where title[0] = 'en'

Yes.

> I know these may sound like terribily stupid questions. but we need
> some
> quick guidance before proceeding with a schema that relies on these
> advanced data features of postgres

The problem you will be facing is that Arrays are one of the
fundamentally *Non-Relational* features that Postgresql supports for a
limited set of specialized purposes (mostly buffer tables, procedures,
and porting from MySQL).   As such, incorporating arrays into any kind
of complex schema will drive you to drink ... and is 95% likely more
easily done through tables and sub-tables, in any case.  

Let's take your example of "title", and say we wanted to use it in a
join:

SELECT movie.name, movie.show_date, movie.title_lang, title.translation
FROM movies JOIN title_langs ON (movie.title_lang[1] = title_langs.lang OR movie.title_lang[2] =
title_langs.lang OR movie.title_lang[3] = title_langs.lang ... )

... as you can see, the join is extremely painful.   Let alone
constructing a query like "Select all movies with titles only in
English and French and one other language."  (try it, really)

Then there's the not insignificant annoyance of getting data into and
out of multi-dimensional arrays,  which must constantly be parsed into
text strings.  And the fact that you will have to keep track, in your
middleware code, of what the ordinal numbers of arrays mean, since
array elements are fundamentally ordered.   (BTW, Postgres arrays begin
at 1, not 0)

Now, I know at least one person who is using arrays to store scientific
data.  However, that data arrives in his lab in the form of matrices,
and is not used for joins or query criteria beyond a simple "where"
clause.

As such, I'd reccommend one of two approaches for you:

1) Post some of your schema ideas here, and let us show you how they
are better done relationally.   The relational data model has 30 years
of thought behind it -- it can solve a lot of problems.

2) Shift over to an XML database or a full-blown OODB (like Cache').

Good luck.

-Josh Berkus





Re: [GENERAL] arrays

From
"Dan Langille"
Date:
On 30 Sep 2002 at 8:54, Josh Berkus wrote:

> As such, I'd reccommend one of two approaches for you:
> 
> 1) Post some of your schema ideas here, and let us show you how they
> are better done relationally.   The relational data model has 30 years
> of thought behind it -- it can solve a lot of problems.

Mike,

Just in case you or others think Josh is some crazed lunatic[1] who 
doesn't know what he's talking about, I support his views on this 
topic.  Avoid arrays.  Normalize your data.

[1] - Actually, I don't think I know anything about Josh, except that 
he's right about normalizing your data.
-- 
Dan Langille
I'm looking for a computer job:
http://www.freebsddiary.org/dan_langille.php



Re: [GENERAL] arrays

From
Bruce Momjian
Date:
Dan Langille wrote:
> On 30 Sep 2002 at 8:54, Josh Berkus wrote:
> 
> > As such, I'd reccommend one of two approaches for you:
> > 
> > 1) Post some of your schema ideas here, and let us show you how they
> > are better done relationally.   The relational data model has 30 years
> > of thought behind it -- it can solve a lot of problems.
> 
> Mike,
> 
> Just in case you or others think Josh is some crazed lunatic[1] who 
> doesn't know what he's talking about, I support his views on this 
> topic.  Avoid arrays.  Normalize your data.
> 
> [1] - Actually, I don't think I know anything about Josh, except that 
> he's right about normalizing your data.

Yes, arrays have a very small window of usefulness, but the window does
exist, so we haven't removed them.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: [GENERAL] arrays

From
"Dan Langille"
Date:
On 30 Sep 2002 at 12:09, Bruce Momjian wrote:

> Dan Langille wrote:
> > On 30 Sep 2002 at 8:54, Josh Berkus wrote:
> > 
> > > As such, I'd reccommend one of two approaches for you:
> > > 
> > > 1) Post some of your schema ideas here, and let us show you how they
> > > are better done relationally.   The relational data model has 30 years
> > > of thought behind it -- it can solve a lot of problems.
> > 
> > Mike,
> > 
> > Just in case you or others think Josh is some crazed lunatic[1] who 
> > doesn't know what he's talking about, I support his views on this 
> > topic.  Avoid arrays.  Normalize your data.
> > 
> > [1] - Actually, I don't think I know anything about Josh, except that 
> > he's right about normalizing your data.
> 
> Yes, arrays have a very small window of usefulness, but the window does
> exist, so we haven't removed them.

I do not advocate removing them.  I do advocate data normalization. 
Let's say it's a matter of Do The Right Thing(tm) unless you know 
what you're doing.
-- 
Dan Langille
I'm looking for a computer job:
http://www.freebsddiary.org/dan_langille.php



Re: [GENERAL] CURRENT_TIMESTAMP

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Now, they are _not_ saying the statement can't have the same time as
> > other statements in the transaction, but I don't see why they would
> > explicitly have to state that.
>
> Allow me to turn that around: given that they clearly do NOT state that,
> how can you argue that "the spec requires it"?  AFAICS the spec does not
> require it.  In most places they are considerably more explicit than
> this about stating what is required.

I just looked at the SQL99 spec again:

         3) Let S be an <SQL procedure statement> that is not generally
            contained in a <triggered action>. All <datetime value
            function>s that are generally contained, without an intervening
            <routine invocation> whose subject routines do not include an
            SQL function, in <value expression>s that are contained either
            in S without an intervening <SQL procedure statement> or in an
            <SQL procedure statement> contained in the <triggered action>
            of a trigger activated as a consequence of executing S, are
            effectively evaluated simultaneously. The time of evaluation of
            a <datetime value function> during the execution of S and its
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^
            activated triggers is implementation-dependent.

Notice the part I highlighted.  The time returned is
implementation-dependent "during the execution of S".  Now, if we do:

    BEGIN;
    SELECT CURRENT_TIMESTAMP;
    SELECT CURRENT_TIMESTAMP;

the time currently returned for the second query is _not_ during the
duration of S (S being an SQL procedure statement) so I don't see how we
can be viewed as spec-compliant.

> > We already have two other databases who are doing this timing at
> > statement level.
>
> The behavior of CURRENT_TIMESTAMP is clearly stated by the spec to be
> implementation-dependent.  We are under no compulsion to follow any
> specific other implementation.  If we were going to follow some other
> lead, I'd look to Oracle first...

Only "implementation-dependent" during the execution of the statement.
We can't just return the session start time or 1970-01-01 for every
invocation of CURRENT_TIMESTAMP.

> > If we change CURRENT_TIMESTAMP to statement time, I don't think we need
> > now(""), but if we don't change it, I think we do --- somehow we should
> > allow users to access statement time.
>
> I have no problem with providing a function to access statement time,
> and now('something') seems a reasonable spelling of that function.
> But I think the argument that we should change our historical behavior
> of CURRENT_TIMESTAMP is very weak.

Hard to see how it is "very weak".   What do you base that on?
Everything I have seen looks pretty strong that we are wrong in our
current implementation.

> One reason why I have a problem with the notion that the spec requires
> CURRENT_TIMESTAMP to mean "time of arrival of the current interactive
> command" (which is the only specific definition I've seen mentioned
> here) is that the spec does not truly have a notion of interactive
> command to begin with.  AFAICT the spec's model of command execution
> is ecpg-like: you have commands embedded in a calling language with
> all sorts of opportunities for pre-planning, pre-execution, etc.
> The notion of command arrival time is extremely fuzzy in this model.
> It could very well be the time you compiled the ecpg application, or
> the time you started the application running.

The spec says "during the execution of S" so that is what I think we
have to follow.

Hopefully we will get an Oracle 9 tester soon.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [GENERAL] arrays

From
"Josh Berkus"
Date:
Dan,

> Just in case you or others think Josh is some crazed lunatic[1] who 
> doesn't know what he's talking about, I support his views on this 
> topic.  Avoid arrays.  Normalize your data.

And just because I'm a crazed lunatic, that doesn't mean that I don't
know what I'm talking about.

Um.  I mean, "Even if I were a crazed lunatic, that wouldn't mean that
I don't know what I'm  talking about."

<grin>

-Josh "Relational Mania" Berkus


Re: [GENERAL] arrays

From
Mike Sosteric
Date:
On Mon, 30 Sep 2002, Josh Berkus wrote:


I have a very good sense of the strengths of relational databases. But
they are also limited when it comes to object orientaed data (like XML
records). I though arrays would be a way to simply the complexity you get
when you try and map objects to relations. 

so a couple more questions then

Is Cache open source?
are the XML databases that are evolved and sophisticated enough to use in
production environments. 

m

> of thought behind it -- it can solve a lot of problems.
> 
> 2) Shift over to an XML database or a full-blown OODB (like Cache').
> 
> Good luck.
> 
> -Josh Berkus
> 
> 
> 
> 

Mike Sosteric <mikes@athabascau.ca>       Managing Editor, EJS <http://www.sociology.org/>
Department of Global and Social Analysis  Executive Director, ICAAP <http://www.icaap.org/>
Athabasca University                      Cell: 1 780 909 1418
Simon Fraser University                   Adjunct Professor                   Masters of Publishing Program 
--
This troubled planet is a place of the most violent contrasts. 
Those that receive the rewards are totally separated from those who
shoulder the burdens.  It is not a wise leadership - Spock, "The Cloud Minders."


___   This communication is intended for the use of the recipient to whom it   is addressed, and may contain
confidential,personal, and or privileged   information. Please contact us immediately if you are not the intended
recipientof this communication, and do not copy, distribute, or take   action relying on it. Any communications
receivedin error, or   subsequent reply, should be deleted or destroyed.
 
---


Re: [GENERAL] arrays

From
Josh Berkus
Date:
Mike,

> I have a very good sense of the strengths of relational databases. But
> they are also limited when it comes to object orientaed data (like XML
> records). I though arrays would be a way to simply the complexity you get
> when you try and map objects to relations.

In my experience, most XML records are, in fact, simple tree structures that
are actually easy to represent in SQL.   But I don't know about yours.

Certainly the translation of XML --> SQL Tree Structure is no more complex
than XML --> Array, that I can see.

> Is Cache open source?

No.   It's a proprietary, and probably very expensive, database.  There are no
open source OODBs that I know of, partly because of the current lack of
international standards for OODBs.

> are the XML databases that are evolved and sophisticated enough to use in
> production environments.

I don't know.   The last time I evaluated XML databases was a year ago, when
there was nothing production-quality in existence.   But I don't know what
the situation is now.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: [GENERAL] arrays

From
Mike Sosteric
Date:
On Mon, 30 Sep 2002, Josh Berkus wrote:

thanks for this. we will stick with the relational model. 

m

> 
> Mike,
> 
> > I have a very good sense of the strengths of relational databases. But
> > they are also limited when it comes to object orientaed data (like XML
> > records). I though arrays would be a way to simply the complexity you get
> > when you try and map objects to relations. 
> 
> In my experience, most XML records are, in fact, simple tree structures that 
> are actually easy to represent in SQL.   But I don't know about yours.
> 
> Certainly the translation of XML --> SQL Tree Structure is no more complex 
> than XML --> Array, that I can see.
> 
> > Is Cache open source?
> 
> No.   It's a proprietary, and probably very expensive, database.  There are no 
> open source OODBs that I know of, partly because of the current lack of 
> international standards for OODBs.  
> 
> > are the XML databases that are evolved and sophisticated enough to use in
> > production environments. 
> 
> I don't know.   The last time I evaluated XML databases was a year ago, when 
> there was nothing production-quality in existence.   But I don't know what 
> the situation is now.
> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

Mike Sosteric <mikes@athabascau.ca>       Managing Editor, EJS <http://www.sociology.org/>
Department of Global and Social Analysis  Executive Director, ICAAP <http://www.icaap.org/>
Athabasca University                      Cell: 1 780 909 1418
Simon Fraser University                   Adjunct Professor                   Masters of Publishing Program 
--
This troubled planet is a place of the most violent contrasts. 
Those that receive the rewards are totally separated from those who
shoulder the burdens.  It is not a wise leadership - Spock, "The Cloud Minders."


___   This communication is intended for the use of the recipient to whom it   is addressed, and may contain
confidential,personal, and or privileged   information. Please contact us immediately if you are not the intended
recipientof this communication, and do not copy, distribute, or take   action relying on it. Any communications
receivedin error, or   subsequent reply, should be deleted or destroyed.
 
---


Re: [GENERAL] arrays

From
Josh Berkus
Date:
Mike,

> thanks for this. we will stick with the relational model.

Hey, don't make your decision entirely based on my advice.    Do some
research!  I'm just responding "off the cuff" to your questions.

If you do take the relational approach, post some sample problems here and
people can help you with how to represent XML data relationally.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: [GENERAL] CURRENT_TIMESTAMP

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Notice the part I highlighted.  The time returned is
> implementation-dependent "during the execution of S".  Now, if we do:

>     BEGIN;
>     SELECT CURRENT_TIMESTAMP;
>     SELECT CURRENT_TIMESTAMP;

> the time currently returned for the second query is _not_ during the
> duration of S (S being an SQL procedure statement)

Not so fast.  What is an "SQL procedure statement"?

Our interactive commands do not map real well to the spec's definitions.
Consider for example SQL92 section 4.17:

         4.17  Procedures

         A <procedure> consists of a <procedure name>, a sequence of <pa-
         rameter declaration>s, and a single <SQL procedure statement>.
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
         A <procedure> in a <module> is invoked by a compilation unit as-
         sociated with the <module> by means of a host language "call"
         statement that specifies the <procedure name> of the <procedure>
         and supplies a sequence of parameter values corresponding in number
         and in <data type> to the <parameter declaration>s of the <proce-
         dure>. A call of a <procedure> causes the <SQL procedure statement>
         that it contains to be executed.

The only thing you can easily map this onto in Postgres is stored
functions; your reading would then say that each Postgres function call
requires its own evaluation of current_timestamp, which I think we are
all agreed would be a disastrous interpretation.

It would be pretty easy to make the case that an ECPG module represents
a "procedure" in the spec's meaning, in which case it is *necessary* for
spec compliance that the ECPG module be able to execute all its commands
with the same value of current_timestamp.  This would look like a series
of interactive commands to the backend.

So I do not think that the spec provides clear support for your position.
The only thing that is really clear is that there is a minimum unit
of execution in which current_timestamp is not supposed to change.
It does not clearly define any maximum unit; and it is even less clear
that our interactive commands should be equated to "SQL procedure
statement".

            regards, tom lane

Re: [GENERAL] arrays

From
Mike Sosteric
Date:
On Mon, 30 Sep 2002, Josh Berkus wrote:

Don't worry. 

Our biggest problem is that each XML data entry, say 

<title=en>This is the title</title>

has an language attribute. if there are, say 67 seperate items, each with
multiple languages, then the comlexity of the table structure skyrockets
because you have to allow for multiple titles, multiple names, multiple
everything. 

the resulting relational model is icky to say the least. The question, is
how to simplify that. I had thought arrays would help because you can
store the multiple language strings in a single table along with other
records..

any ideas?

m

> 
> Mike,
> 
> > thanks for this. we will stick with the relational model. 
> 
> Hey, don't make your decision entirely based on my advice.    Do some 
> research!  I'm just responding "off the cuff" to your questions.
> 
> If you do take the relational approach, post some sample problems here and 
> people can help you with how to represent XML data relationally.
> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 
> 

Mike Sosteric <mikes@athabascau.ca>       Managing Editor, EJS <http://www.sociology.org/>
Department of Global and Social Analysis  Executive Director, ICAAP <http://www.icaap.org/>
Athabasca University                      Cell: 1 780 909 1418
Simon Fraser University                   Adjunct Professor                   Masters of Publishing Program 
--
This troubled planet is a place of the most violent contrasts. 
Those that receive the rewards are totally separated from those who
shoulder the burdens.  It is not a wise leadership - Spock, "The Cloud Minders."


___   This communication is intended for the use of the recipient to whom it   is addressed, and may contain
confidential,personal, and or privileged   information. Please contact us immediately if you are not the intended
recipientof this communication, and do not copy, distribute, or take   action relying on it. Any communications
receivedin error, or   subsequent reply, should be deleted or destroyed.
 
---


Re: [GENERAL] arrays

From
Josh Berkus
Date:
Mike,

> has an language attribute. if there are, say 67 seperate items, each with
> multiple languages, then the comlexity of the table structure skyrockets
> because you have to allow for multiple titles, multiple names, multiple
> everything.

This looks soluable several ways.

Question #1:  If each record has 67 fields, and each field may appear in
several languages, is it possible for some fields to be in more languages
than others?  I.e. if "title-en" and "title-de" exist, does it follow that
"content-en" and "content-de" exist as well?   Or not?

Question #2: Does your XML schema allow locall defined attributes?  That is,
do some records have entire attributes ("fields" ) that other records do not?

Suggestion #1:  Joe Celko's "SQL for Smarties, 2nd Ed." is an excellent book
for giving you ideas on how to adapt SQL structures to odd purposes.

--
-Josh BerkusAglio Database SolutionsSan Francisco



Re: [GENERAL] arrays

From
Mike Sosteric
Date:
On Mon, 30 Sep 2002, Josh Berkus wrote:

> 
> Question #1:  If each record has 67 fields, and each field may appear in 
> several languages, is it possible for some fields to be in more languages 
> than others?  I.e. if "title-en" and "title-de" exist, does it follow that 
> "content-en" and "content-de" exist as well?   Or not?

yes. 

> 
> Question #2: Does your XML schema allow locall defined attributes?  That is, 
> do some records have entire attributes ("fields" ) that other records do not?

yes. 

> 
> Suggestion #1:  Joe Celko's "SQL for Smarties, 2nd Ed." is an excellent book 
> for giving you ideas on how to adapt SQL structures to odd purposes.

I have ordered the book from amazon.ca

m


> 
> -- 
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 

Mike Sosteric <mikes@athabascau.ca>       Managing Editor, EJS <http://www.sociology.org/>
Department of Global and Social Analysis  Executive Director, ICAAP <http://www.icaap.org/>
Athabasca University                      Cell: 1 780 909 1418
Simon Fraser University                   Adjunct Professor                   Masters of Publishing Program 
--
This troubled planet is a place of the most violent contrasts. 
Those that receive the rewards are totally separated from those who
shoulder the burdens.  It is not a wise leadership - Spock, "The Cloud Minders."


___   This communication is intended for the use of the recipient to whom it   is addressed, and may contain
confidential,personal, and or privileged   information. Please contact us immediately if you are not the intended
recipientof this communication, and do not copy, distribute, or take   action relying on it. Any communications
receivedin error, or   subsequent reply, should be deleted or destroyed.
 
---


Re: [GENERAL] CURRENT_TIMESTAMP

From
Jean-Luc Lachance
Date:
How can you make a difference between now('statement'), and
now('immediate').
To me they are the same thing. Why not simply now() for transaction, and
now('CLOCK') or better yet system_clock() or clock() for curent time.

JLL

Josh Berkus wrote:
>
> Tom,
>
> > I'd be happier with the whole thing if anyone had exhibited a convincing
> > use-case for statement timestamp.  So far I've not seen any actual
> > examples of situations that are not better served by either transaction
> > timestamp or true current time.  And the spec is perfectly clear that
> > CURRENT_TIMESTAMP does not mean true current time...
>
> Are we still planning on putting the three different versions of now() on the
> TODO?  I.e.,
> now('transaction'),
> now('statement'), and
> now('immediate')
> With now() = now('transaction')?
>
> I still think it's a good idea, provided that we have some easy means to
> determine now('statement').
>
> --
> -Josh Berkus
>  Aglio Database Solutions
>  San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

Re: [GENERAL] CURRENT_TIMESTAMP

From
Jean-Luc Lachance
Date:
OK, forget system_clock() or clock() timeofday() will do.


Jean-Luc Lachance wrote:
>
> How can you make a difference between now('statement'), and
> now('immediate').
> To me they are the same thing. Why not simply now() for transaction, and
> now('CLOCK') or better yet system_clock() or clock() for curent time.
>
> JLL

Re: [GENERAL] CURRENT_TIMESTAMP

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Notice the part I highlighted.  The time returned is
> > implementation-dependent "during the execution of S".  Now, if we do:
>
> >     BEGIN;
> >     SELECT CURRENT_TIMESTAMP;
> >     SELECT CURRENT_TIMESTAMP;
>
> > the time currently returned for the second query is _not_ during the
> > duration of S (S being an SQL procedure statement)
>
> Not so fast.  What is an "SQL procedure statement"?
>
> Our interactive commands do not map real well to the spec's definitions.
> Consider for example SQL92 section 4.17:
>
>          4.17  Procedures
>
>          A <procedure> consists of a <procedure name>, a sequence of <pa-
>          rameter declaration>s, and a single <SQL procedure statement>.
>                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>          A <procedure> in a <module> is invoked by a compilation unit as-
>          sociated with the <module> by means of a host language "call"
>          statement that specifies the <procedure name> of the <procedure>
>          and supplies a sequence of parameter values corresponding in number
>          and in <data type> to the <parameter declaration>s of the <proce-
>          dure>. A call of a <procedure> causes the <SQL procedure statement>
>          that it contains to be executed.
>
> The only thing you can easily map this onto in Postgres is stored
> functions; your reading would then say that each Postgres function call
> requires its own evaluation of current_timestamp, which I think we are
> all agreed would be a disastrous interpretation.
>
> It would be pretty easy to make the case that an ECPG module represents
> a "procedure" in the spec's meaning, in which case it is *necessary* for
> spec compliance that the ECPG module be able to execute all its commands
> with the same value of current_timestamp.  This would look like a series
> of interactive commands to the backend.
>
> So I do not think that the spec provides clear support for your position.
> The only thing that is really clear is that there is a minimum unit
> of execution in which current_timestamp is not supposed to change.
> It does not clearly define any maximum unit; and it is even less clear
> that our interactive commands should be equated to "SQL procedure
> statement".


OK, you don't like "SQL procedure statement".  Let's look at SQL92:

         3) If an SQL-statement generally contains more than one reference
            to one or more <datetime value function>s, then all such ref-
            erences are effectively evaluated simultaneously. The time of
            evaluation of the <datetime value function> during the execution
                                                        ^^^^^^^^^^^^^^^^^^^^
            of the SQL-statement is implementation-dependent.
            ^^^^^^^^^^^^^^^^^^^^

so, again, we have wording that is has to be "during" the SQL statement.

Also, we have MSSQL, Interbase, and now Oracle modifying
CURRENT_TIMESTAMP during the transaction.  (The Oracle report just came
in a few hours ago.)

Perhaps we need a vote on this.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: [GENERAL] CURRENT_TIMESTAMP

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Also, we have MSSQL, Interbase, and now Oracle modifying
> CURRENT_TIMESTAMP during the transaction.  (The Oracle report just came
> in a few hours ago.)

Weren't you dissatisfied with the specificity of that Oracle report?

> Perhaps we need a vote on this.

Perhaps, but let's wait till the facts are in.

            regards, tom lane

Re: [GENERAL] arrays

From
Achilleus Mantzios
Date:
I was wondering why is such a rage against arrays.

I posted 2 very common problems where arrays provide
the only natural (and efficient) fit. (and got no responses)
So it seems to me that:

- Arrays implementation (along with the intarray package) in postgresql
  is well performing and stable.
- Some problems shout out for array usage.
- The Array interface is defined in java.sql package.
  (I dont know if sql arrays is in some standard but it seems that
  Java sees it that way, at least).
- The Array interface is implemented in the official postgresql java
package.
- In some problems replacing arrays according the tradition relational
paradigm would end up in a such a performance degradation, that
some applications would be unusable.
- Oleg and Teodor did a great job in intarray, making array usage
 easy and efficient.

Thanx!


==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel:    +30-10-8981112
fax:    +30-10-8981877
email:  achill@matrix.gatewaynet.com
        mantzios@softlab.ece.ntua.gr


Re: [GENERAL] arrays

From
Roland Roberts
Date:
>>>>> "Josh" == Josh Berkus <josh@agliodbs.com> writes:
   Josh> Now, I know at least one person who is using arrays to store   Josh> scientific data.  However, that data
arrivesin his lab in   Josh> the form of matrices, and is not used for joins or query   Josh> criteria beyond a simple
"where"clause.
 

Indeed, my first attempt to use arrays was to maintain some basic
statistics about a set of data.  The array elements where to be
distribution moments and would only be used in "where" clauses.  The
problem was that I wanted to be about to update the statistics using
triggers whenever the main data was updated.  The inability to access
a specific array element in PL/pgSQL code made this so painful I ended
up just extending a table with more columns.

roland
--            PGP Key ID: 66 BC 3B CD
Roland B. Roberts, PhD                             RL Enterprises
roland@rlenter.com                     76-15 113th Street, Apt 3B
roland@astrofoto.org                       Forest Hills, NY 11375


Re: [GENERAL] arrays

From
"Josh Berkus"
Date:
Achilleus,

> I was wondering why is such a rage against arrays.
>
> I posted 2 very common problems where arrays provide
> the only natural (and efficient) fit. (and got no responses)
> So it seems to me that:

All of your points are correct.

Us "old database hands" have a knee-jerk reaction against arrays for
long-term data storage because, much of the time, developers use arrays
because they are lazy or don't understand the relational model instead
of because they are the best thing to use.   This is particularly true
of people who come to database development from, say, web design.

In this thread particularly, Mike was suggesting using arrays for a
field used in JOINs, which would be a royal mess.   Which was why you
heard so many arguments against using arrays.

Or, to put it another way:

1. Array data types are perfect for storing data that arrives in the
form of arrays or matricies, such as scientific data , or interface
programs that store arrays of object properties.

2. For other purposes, arrays are a very poor substitute for proper
sub-table storage of related data according to the relational model.

3. The distinguishing factor is "atomicity": ask yourself: "is this
array a discrete and undivisible unit, or is is a collection of related
but mutable elements?" If the former, use and array.  If the latter,
use a sub-table.

Clearer now?

-Josh Berkus



Re: [GENERAL] CURRENT_TIMESTAMP

From
Martijn van Oosterhout
Date:
On Sat, Sep 28, 2002 at 11:51:32PM -0400, Bruce Momjian wrote:
> Martijn van Oosterhout wrote:
> > Well, I'd rather it didn't change at all. IMHO it's a feature, not a bug. In
> > any case, if it does get changed we'll have to go through the documentation
> > and work out whether we mean current_timestamp or now(). I think most people
> > actually want now().
>
> Well, I think we have to offer statement start time somewhere, and it
> seems the standard probably requires that.  Two other databases do it
> that way.  Oracle doesn't have CURRENT_TIMESTAMP in 8.X.  Can anyone
> test on 9.X?

Hmm, well having a statement start time could be conceivably useful.

> > Fortunatly where I work we only use now() so it won't really matter too
> > much. Is there a compelling reason to change?
>
> Yes, it will split now() and CURRENT_TIMESTAMP.  I personally would be
> happy with STATEMENT_TIMESTAMP, but because the standard requires it we
> may just have to fix CURRENT_TIMESTAMP.

Well, my vote would be for STATEMENT_TIMESTAMP. Is there really no other
database that does it the way we do? Perhaps it could be matched with a
TRANSACTION_TIMESTAMP and we can sort out CURRENT_TIMESTAMP some other way.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

Re: [GENERAL] arrays

From
"scott.marlowe"
Date:
On Sun, 29 Sep 2002, Mike Sosteric wrote:

> On Sun, 29 Sep 2002, Bruce Momjian wrote:
>
> Apologies in advance if there is a more appropriate list.
>
> We are currently developing a database to host some complicated, XMl
> layered data. We have chosen postgres because of its ability to store
> multidimensional arrays. We feel that using these will allow us to
> simplify the database structure considerably by storing some data in
> multidimensional arrays.

the long and the short of it is that arrays are useful to store data, but
should not be used where you need to look up the data in them in a where
clause.