Thread: fixes for date_part micro/millisecond precision

fixes for date_part micro/millisecond precision

From
Brent Verner
Date:
Hi all,

  This patch fixes a problem where extract/date_part returns bad values
for microsecond and millisecond parts.  Patch is attached.  I'm not
sure if the solution is correct, but it cures this symptom.

before:
========
brent=# select date_part('milliseconds','2001-11-24 13:49:12.826833-05'::timestamp);
    date_part
------------------
 826.833000000001
(1 row)

brent=# select date_part('microseconds','2001-11-24 13:49:12.826833-05'::timestamp);
    date_part
------------------
 826833.000000001
(1 row)


after:
========
brent=# select date_part('milliseconds','2001-11-24 13:49:12.826833-05'::timestamp);
 date_part
-----------
   826.833
(1 row)

brent=# select date_part('microseconds','2001-11-24 13:49:12.826833-05'::timestamp);
 date_part
-----------
    826833
(1 row)

cheers.
  brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman

Attachment

Re: fixes for date_part micro/millisecond precision

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
>   This patch fixes a problem where extract/date_part returns bad values
> for microsecond and millisecond parts.

I'm not convinced the existing behavior is wrong.  Timestamps are floats
in PG, and therefore not infinitely precise.  Your patch seems to be
trying to cover this up, but I fear it will only succeed in introducing
other problems.

            regards, tom lane

Re: fixes for date_part micro/millisecond precision

From
Brent Verner
Date:
On 24 Nov 2001 at 14:24 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| >   This patch fixes a problem where extract/date_part returns bad values
| > for microsecond and millisecond parts.
|
| I'm not convinced the existing behavior is wrong.  Timestamps are floats
| in PG, and therefore not infinitely precise.  Your patch seems to be
| trying to cover this up, but I fear it will only succeed in introducing
| other problems.

<pedantically>
The current bahavior seems to be precise, just not accurate.
</pedantically>

  I agree much that my fix is covering over the problem, but something
needs to change (IMO), because if I insert a timestamp of
'2001-1-1 11:11:11.12341234-05' I /really/ want to get back '12341234'
when  I ask for the microseconds that I stored, so I can't see how the
current behavior isn't asking for problems in applications that use
these features.  After many hours of looking at how to solve this, the
only way I could find to ensure "what I put in is what I get out," was
the change in timestamptz_part().

  Anyone have any idea of how (better) to fix this, short of
changing the internal representation of Timestamp to a struct?
Is there a better place to 'force' the accuracy of the fractional
second part that would be less prone to introduce other problems?

thanks.
  brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman

Re: fixes for date_part micro/millisecond precision

From
Tom Lane
Date:
Brent Verner <brent@rcfile.org> writes:
> ... if I insert a timestamp of
> '2001-1-1 11:11:11.12341234-05' I /really/ want to get back '12341234'
> when  I ask for the microseconds that I stored,

You cannot expect to get that back exactly, because *the precision is
not there* (and yes, the term here is precision not accuracy).  Right
now (late 2001) we have about seven digits of precision to the right
of the decimal point in a timestamp.  As we get further away from the
2000-01-01 origin, precision will drop; it'll be six digits or less
by 2010.  The further you go from 2000 in either direction, the worse
the precision.

This is an inherent property of float arithmetic and can't be papered
over with display hacks, at least not without losing more than you gain.

> Is there a better place to 'force' the accuracy of the fractional
> second part that would be less prone to introduce other problems?

You cannot force accuracy that isn't there.

            regards, tom lane

Re: fixes for date_part micro/millisecond precision

From
Brent Verner
Date:
On 24 Nov 2001 at 15:32 (-0500), Tom Lane wrote:
| Brent Verner <brent@rcfile.org> writes:
| > ... if I insert a timestamp of
| > '2001-1-1 11:11:11.12341234-05' I /really/ want to get back '12341234'
| > when  I ask for the microseconds that I stored,
|
| You cannot expect to get that back exactly, because *the precision is
| not there* (and yes, the term here is precision not accuracy).  Right
| now (late 2001) we have about seven digits of precision to the right
| of the decimal point in a timestamp.  As we get further away from the
| 2000-01-01 origin, precision will drop; it'll be six digits or less
| by 2010.  The further you go from 2000 in either direction, the worse
| the precision.

ah, I see. I wasn't thinking about the limited range that could be
accurately stored in the double... :-(

| This is an inherent property of float arithmetic and can't be papered
| over with display hacks, at least not without losing more than you gain.
|
| > Is there a better place to 'force' the accuracy of the fractional
| > second part that would be less prone to introduce other problems?
|
| You cannot force accuracy that isn't there.

thanks, and sorry for the hassle.

wandering-off-to-try-making-Timestamp-a-struct-ly yours,
  brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman

Re: fixes for date_part micro/millisecond precision

From
Karel Zak
Date:
On Sat, Nov 24, 2001 at 02:24:09PM -0500, Tom Lane wrote:
> Brent Verner <brent@rcfile.org> writes:
> >   This patch fixes a problem where extract/date_part returns bad values
> > for microsecond and millisecond parts.
>
> I'm not convinced the existing behavior is wrong.  Timestamps are floats
> in PG, and therefore not infinitely precise.  Your patch seems to be
> trying to cover this up, but I fear it will only succeed in introducing
> other problems.

 Tom is right. If you need non-float microsecond/millisecond you can
 use to_char() -- it's news in 7.2:

test=# select to_char('2001-11-24 13:49:12.826833-05'::timestamp, 'MS');
 to_char
---------
   827

test=# select to_char('2001-11-24 13:49:12.826833-05'::timestamp, 'US');
 to_char
---------
 826833

        Karel

--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Re: fixes for date_part micro/millisecond precision

From
Brent Verner
Date:
On 26 Nov 2001 at 13:38 (+0100), Karel Zak wrote:
| On Sat, Nov 24, 2001 at 02:24:09PM -0500, Tom Lane wrote:
| > Brent Verner <brent@rcfile.org> writes:
| > >   This patch fixes a problem where extract/date_part returns bad values
| > > for microsecond and millisecond parts.
| >
| > I'm not convinced the existing behavior is wrong.  Timestamps are floats
| > in PG, and therefore not infinitely precise.  Your patch seems to be
| > trying to cover this up, but I fear it will only succeed in introducing
| > other problems.
|
|  Tom is right. If you need non-float microsecond/millisecond you can
|  use to_char() -- it's news in 7.2:
|
| test=# select to_char('2001-11-24 13:49:12.826833-05'::timestamp, 'MS');
|  to_char
| ---------
|    827

Also, casting to float4 gives the 'expected' results.  The behavior
still seems wrong, since it gives me back something that I did not
put in.  Additionally, I can get back a value that I /can't/ put back
in the database.  I do agree with Tom that my solution was simply
a cover-up.  The SQL-99 spec says nothing of MILLI or MICROSECONDS,
so I'm curious if there is interest in really fixing the date_part
behavior WRT fractional seconds.  Comments Thomas?

The only way I can see to fix it is to make Timestamp a struct
something like...

typedef struct _pgTimestamp {
  unsigned long     ts_epoch;     /* seconds */
  unsigned int      ts_frac;      /* microseconds */
  signed   char     ts_tzo;       /* GMT offset in hours */
  unsigned char     ts_precision; /* (???) precision of fractional part */
} Timestamp;

  I'll probably try to implement this new Timestamp as a learning
exercise.  If my hacking goes well and there is a desire to fix
the current Timestamp limitations, I'll submit a patch.

cheers.
  brent

--
"Develop your talent, man, and leave the world something. Records are
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman

Re: fixes for date_part micro/millisecond precision

From
Thomas Lockhart
Date:
> ... The behavior
> still seems wrong, since it gives me back something that I did not
> put in.  Additionally, I can get back a value that I /can't/ put back
> in the database.  I do agree with Tom that my solution was simply
> a cover-up.  The SQL-99 spec says nothing of MILLI or MICROSECONDS,
> so I'm curious if there is interest in really fixing the date_part
> behavior WRT fractional seconds.  Comments Thomas?

You are exploring the downside of using float8 for timestamp storage.
This has not been a major issue until now (and may not be for most
users) but since we now allow more than two decimal places in the output
format it becomes more obvious.

> The only way I can see to fix it is to make Timestamp a struct
> something like...

Yuck. That *may* be how other databases do it, but having to carry along
multiple fields complicates the math and comparison functions, and slows
down the code.

I have been thinking of implementing timestamp (and other related types)
as 8 byte integers, which are not slow on some platforms. doubles are
fast on most platforms nowadays. The tradeoff will be repeatability vs
range, since we will not get the quasi-infinite range we have now when
using a fixed decimal type. We could have a third option, which uses 4
byte integers in structures, which you proposed. But that seems the
least desirable for the long run.

If I do this it will be as a configurable option, at least at first.

                       - Thomas

Re: fixes for date_part micro/millisecond precision

From
Tom Lane
Date:
Thomas Lockhart <lockhart@fourpalms.org> writes:
> I have been thinking of implementing timestamp (and other related types)
> as 8 byte integers, which are not slow on some platforms. doubles are
> fast on most platforms nowadays.

OTOH, 8 byte integers fail to exist at all on some other platforms.

I'd be in favor of reimplementing timestamps as int8 if it weren't for
the portability issue.  But I'm not sure I want to blow off int8-less
platforms quite yet.

> The tradeoff will be repeatability vs
> range, since we will not get the quasi-infinite range we have now when
> using a fixed decimal type.

If we didn't mind increasing the storage requirement, we could do
"float8 for the integral seconds, plus an int4 for the nanoseconds",
which would work perfectly out to about 2^52 seconds either way from the
epoch, and continue to work with reduced accuracy until the heat death
of the universe.  But this would be 12 bytes, plus 4 bytes alignment
padding on some platforms, which might be excessive just to guarantee
nanosecond precision out to the next geological era.

            regards, tom lane

Re: fixes for date_part micro/millisecond precision

From
Tom Lane
Date:
Thomas Lockhart <lockhart@fourpalms.org> writes:
> Yes. Mixing the float with the int seems to invite math troubles though

True, any multi-member struct will be a huge PITA for arithmetic
operations.

Using int8 if available or float8 if not might work pretty nicely
now that I think about it, at least from a coding point of view.
You could hide a lot of the differences in the PG_GET and PG_RETURN
macros.

But do we want to have such a radical difference in accuracy and
range across platforms?

            regards, tom lane

Re: fixes for date_part micro/millisecond precision

From
Thomas Lockhart
Date:
> > I have been thinking of implementing timestamp (and other related types)
> > as 8 byte integers, which are not slow on some platforms. doubles are
> > fast on most platforms nowadays.
> OTOH, 8 byte integers fail to exist at all on some other platforms.

Sure. It would be selectable at configure/build time. Failing to support
int64 would mean you don't get the integer alternative for timestamp
either.

> > The tradeoff will be repeatability vs
> > range, since we will not get the quasi-infinite range we have now when
> > using a fixed decimal type.
> If we didn't mind increasing the storage requirement, we could do
> "float8 for the integral seconds, plus an int4 for the nanoseconds",
> which would work perfectly out to about 2^52 seconds either way from the
> epoch, and continue to work with reduced accuracy until the heat death
> of the universe.  But this would be 12 bytes, plus 4 bytes alignment
> padding on some platforms, which might be excessive just to guarantee
> nanosecond precision out to the next geological era.

Yes. Mixing the float with the int seems to invite math troubles though
(as does the struct solution of using two ints, since roundoff and
carryover might be troublesome).

Just a thought for now, but the int8 alternative may be pretty easy to
do.

                        - Thomas