Thread: BUG #2477: Aggregate Integer divisors incorrectly yield integer-type quotient

BUG #2477: Aggregate Integer divisors incorrectly yield integer-type quotient

From
"John Rylander"
Date:
The following bug has been logged online:

Bug reference:      2477
Logged by:          John Rylander
Email address:      rylander@prolexia.com
PostgreSQL version: 8.1.4
Operating system:   Windows XP :(
Description:        Aggregate Integer divisors incorrectly yield
integer-type quotient
Details:

When for some reason I did a sum/count instead of an avg, the result was
always an integer (either 0 or 1):

Sum(
    Case when Sign(Delta)=Sign(AvgDelta) then
       1
    else
       0
   ) / Count(*) as HitRate;

If I use "/ Count(*)::float4", or use Avg instead of Sum/Count, it works
fine.

Yet another maddening PostgreSQL bug/idiosyncrasy.  Time for this boy to go
to Oracle (not because of this, but because of the function->transaction
problem, such that a function that has nested loops that create myriad
temporary analytical tables necessarily runs out of disk space; if it
weren't for the mandatory and completely counterproductive implicit
transaction, it'd take very little space and a lot less time).  :(
"John Rylander" <rylander@prolexia.com> writes:
> When for some reason I did a sum/count instead of an avg, the result was
> always an integer (either 0 or 1):

What's your point?  Postgres has always defined division of integers as
yielding an integer result.  AFAICS this choice is fully sanctioned by
the SQL standard:

         1) If the data type of both operands of a dyadic arithmetic opera-
            tor is exact numeric, then the data type of the result is exact
            numeric, with precision and scale determined as follows:
            ...
            d) The precision and scale of the result of division is
              implementation-defined.

(SQL92 6.12 <numeric value expression> syntax rule 1)

> Yet another maddening PostgreSQL bug/idiosyncrasy.  Time for this boy to go
> to Oracle

Sayonara.  Last I checked, they had a few idiosyncrasies too.

            regards, tom lane
On Sun, Jun 11, 2006 at 09:08:00PM +0000, John Rylander wrote:
>
> The following bug has been logged online:
>
> Bug reference:      2477
> Logged by:          John Rylander
> Email address:      rylander@prolexia.com
> PostgreSQL version: 8.1.4
> Operating system:   Windows XP :(
> Description:        Aggregate Integer divisors incorrectly yield
> integer-type quotient
> Details:
>
> When for some reason I did a sum/count instead of an avg, the result was
> always an integer (either 0 or 1):
>
> Sum(
>     Case when Sign(Delta)=Sign(AvgDelta) then
>        1
>     else
>        0
>    ) / Count(*) as HitRate;
>
> If I use "/ Count(*)::float4", or use Avg instead of Sum/Count, it works
> fine.

Do you have a test case for this?

> Yet another maddening PostgreSQL bug/idiosyncrasy.  Time for this boy to go
> to Oracle (not because of this, but because of the function->transaction
> problem, such that a function that has nested loops that create myriad
> temporary analytical tables necessarily runs out of disk space; if it
> weren't for the mandatory and completely counterproductive implicit
> transaction, it'd take very little space and a lot less time).  :(

Huh? What implicit transaction? Do you have an example that illustrates
this problem?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: BUG #2477: Aggregate Integer divisors incorrectly yield integer-type quotient

From
"John E. Rylander"
Date:
Jim,

First, I got a minor rebuke ;) from Tom Lane on this matter, he rightly
pointing out to me that this integer division "bug" is in fact a part of the
SQL standard.  For the simplest illustration of it, try this:

Select 2/3;

It comes back with zero!  To me, this is unbelievable; but it's not a bug,
and it's not PostgreSQL's fault, as it were.  As Tom put it:

>What's your point?  Postgres has always defined division of integers as
>yielding an integer result.  AFAICS this choice is fully sanctioned by
>the SQL standard:
>
>         1) If the data type of both operands of a dyadic arithmetic opera-
>            tor is exact numeric, then the data type of the result is exact
>            numeric, with precision and scale determined as follows:
>            ...
>            d) The precision and scale of the result of division is
>              implementation-defined.

I suppose point d) leaves PostgreSQL developers some freedom here, but it
also, particularly when combined with 1), gives an apparent okay to what is
done now.  (After all, the result is "exact", even if it's exactly wrong! :)
)

It's interesting that this isn't wholly consistent: the AVG aggregate, e.g.,
doesn't follow this rule in the same way, even though it's strictly composed
of such dyadic operations-and it's a good thing, too.

I reported this as a bug because I never considered the possibility that
this could be by design.  Truncating or rounding to an integer manually, or
when assigning a floating point value to an integer variable or field, sure;
but doing it that way even when assigning the result to a floating point
field?  I have no idea why this is considered desirable, and have to wonder
if that's the best way of interpreting the standard, but I'm an SQL and
(hence) PostgreSQL newbie, so....


Second, and this is the real showstopper for me, it's my understanding that
PostgreSQL (unlike, I've read, most other SQL RDBMS) implicitly and
necessarily wraps every user-defined function call in a transactional
wrapper.  Typically, this seems like a sound approach-no worries then when a
function fails.  But when it's mandatory, one runs into problems with highly
query-intensive and analytical functions.  I have a PL/PGSQL analytical
function that iteratively drops a table, queries to create a new table of
the same name based on iterated parameters, and then runs numerous queries
on the resulting table, storing a few summary records to a log table.  The
nested loops in the function intelligently iterate tens or hundreds of
thousands of times on tables containing tens or low-hundreds of thousands of
records.
    The problem is, I think, since this function is involuntarily
wrapped in a transaction, PostgreSQL wastes a TON of time and space keeping
track of every single interim table, and eventually (pretty soon, in fact)
it runs out of disk space.  (I have only 20 GB free, but the same thing
might happen if I had 200 GB free.)
    This involuntary transaction wrapping has another serious downside:
it makes debugging functions harder, since one can't do any table-based
"bomb damage assessment" when a function blows up-the target site is all
cleaned up (i.e., rolled back) before one can take a look.
    (I've read that Oracle does NOT do this; OTOH, since Oracle does (I
believe) wrap each SQL statement in a transactional wrapper, then it seems a
bit ambiguous as to how that applies to an SQL statement that calls a stored
procedure.  I sure HOPE that it doesn't wrap the call in a transaction,
however appealing that might be at first glance!)

If I'm misunderstanding things, I'd greatly appreciate correction on these
matters!  As it is now, I may need painfully to switch to Oracle :( just to
get my analytical functions to work properly on anything but very small
scale tables-otherwise I just run of out disk space a small part of the way
through.

Thanks for any guidance you can offer on this-I'd love to stick with
PostgreSQL!

Best regards,

John Rylander


-----Original Message-----
From: Jim C. Nasby [mailto:jnasby@pervasive.com]
Sent: Monday, June 12, 2006 10:48 AM
To: John Rylander
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #2477: Aggregate Integer divisors incorrectly yield
integer-type quotient

On Sun, Jun 11, 2006 at 09:08:00PM +0000, John Rylander wrote:
>
> The following bug has been logged online:
>
> Bug reference:      2477
> Logged by:          John Rylander
> Email address:      rylander@prolexia.com
> PostgreSQL version: 8.1.4
> Operating system:   Windows XP :(
> Description:        Aggregate Integer divisors incorrectly yield
> integer-type quotient
> Details:
>
> When for some reason I did a sum/count instead of an avg, the result was
> always an integer (either 0 or 1):
>
> Sum(
>     Case when Sign(Delta)=Sign(AvgDelta) then
>        1
>     else
>        0
>    ) / Count(*) as HitRate;
>
> If I use "/ Count(*)::float4", or use Avg instead of Sum/Count, it works
> fine.

Do you have a test case for this?

> Yet another maddening PostgreSQL bug/idiosyncrasy.  Time for this boy to
go
> to Oracle (not because of this, but because of the function->transaction
> problem, such that a function that has nested loops that create myriad
> temporary analytical tables necessarily runs out of disk space; if it
> weren't for the mandatory and completely counterproductive implicit
> transaction, it'd take very little space and a lot less time).  :(

Huh? What implicit transaction? Do you have an example that illustrates
this problem?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
"John E. Rylander" <rylander@prolexia.com> writes:
> ... I have a PL/PGSQL analytical
> function that iteratively drops a table, queries to create a new table of
> the same name based on iterated parameters, and then runs numerous queries
> on the resulting table, storing a few summary records to a log table.  The
> nested loops in the function intelligently iterate tens or hundreds of
> thousands of times on tables containing tens or low-hundreds of thousands of
> records.
>     The problem is, I think, since this function is involuntarily
> wrapped in a transaction, PostgreSQL wastes a TON of time and space keeping
> track of every single interim table, and eventually (pretty soon, in fact)
> it runs out of disk space.

It might be possible to teach the system that it could immediately
unlink the table file when dropping a table that was created in the
current transaction.  See the NOTE in smgrscheduleunlink() for a hint
about one way to do it.  I think you'd have to worry about matching
subtransaction nest levels, but if they're the same then allowing
inverse unlink requests to "blow each other up" would work.

            regards, tom lane