Thread: Inconsistent results postgresql

Inconsistent results postgresql

From
Emir Ibrahimbegovic
Date:

Hello all,

I've got two queries which should produce the same results but they don't for some reason, please consider these :

SELECT date_trunc('day', payments.created_at) "day", SUM("payments"."amount") AS sum_id 
FROM "payments"  INNER JOIN "users" ON "users"."id" = "payments"."user_id" 
WHERE "payments"."currency" = 'gbp' AND (payments.refunded_date is null)  AND 
(users.deleted_at is null)  AND 
(users.subscribed_at between '2014-07-07 00:00:00.000000' and '2014-07-07 23:59:59.999999')  AND 
("payments"."created_at" BETWEEN '2014-07-07 00:00:00.000000' AND '2014-07-07 23:59:59.999999')
GROUP BY 1
ORDER by 1 asc

It produces this :

day                   | sum_id
------------------------------
"2014-07-07 00:00:00" | 1863.85

But when I try to group by day at looking at 30 days period with this query :

SELECT date_trunc('day', payments.created_at) "day", SUM("payments"."amount") AS sum_id 
FROM "payments"  INNER JOIN "users" ON "users"."id" = "payments"."user_id" 
WHERE "payments"."currency" = 'gbp' AND (payments.refunded_date is null)  AND 
(users.deleted_at is null)  AND 
(users.subscribed_at between '2014-06-30 00:00:00.000000' and '2014-07-30 23:59:59.999999')  AND 
("payments"."created_at" BETWEEN '2014-06-30 00:00:00.000000' AND '2014-07-30 23:59:59.999999')
GROUP BY 1
ORDER by 1 asc

It produces this (truncated to include the important data only) :

day                   | sum_id
------------------------------
"2014-07-07 00:00:00" | 1898.84

So looking for same date using different date range I get different results, how is this even possible? Can I look at something else? I'm really stuck here

Thanks

Re: Inconsistent results postgresql

From
David G Johnston
Date:
Emir Ibrahimbegovic wrote
> Hello all,
>
> I've got two queries which should produce the same results but they don't
> for some reason, please consider these :
>
> SELECT date_trunc('day', payments.created_at) "day",
> SUM("payments"."amount") AS sum_id FROM "payments"
>   INNER JOIN "users" ON "users"."id" = "payments"."user_id" WHERE
> "payments"."currency" = 'gbp' AND (payments.refunded_date is null)
>   AND (users.deleted_at is null)
>   AND (users.subscribed_at between '2014-07-07 00:00:00.000000' and
> '2014-07-07 23:59:59.999999')
>   AND ("payments"."created_at" BETWEEN '2014-07-07 00:00:00.000000'
> AND '2014-07-07 23:59:59.999999')GROUP BY 1ORDER by 1 asc
>
> It produces this :
>
> day                   |
> sum_id------------------------------"2014-07-07 00:00:00" | 1863.85
>
> But when I try to group by day at looking at 30 days period with this
> query
> :
>
> SELECT date_trunc('day', payments.created_at) "day",
> SUM("payments"."amount") AS sum_id FROM "payments"
>   INNER JOIN "users" ON "users"."id" = "payments"."user_id" WHERE
> "payments"."currency" = 'gbp' AND (payments.refunded_date is null)
>   AND (users.deleted_at is null)
>   AND (users.subscribed_at between '2014-06-30 00:00:00.000000' and
> '2014-07-30 23:59:59.999999')
>   AND ("payments"."created_at" BETWEEN '2014-06-30 00:00:00.000000'
> AND '2014-07-30 23:59:59.999999')GROUP BY 1ORDER by 1 asc
>
> It produces this (truncated to include the important data only) :
>
> day                   |
> sum_id------------------------------"2014-07-07 00:00:00" | 1898.84
>
> So looking for same date using different date range I get different
> results, how is this even possible? Can I look at something else? I'm
> really stuck here
>
> Thanks

There may be a timezone issue here - hard to tell without seeing the data
types for subscribed_at and created_at.


The first thing I would do is move the WHERE clause portion of the query
into a CTE:

WITH detail AS ( SELECT )

Then I would run the GROUP BY portion over the CTE

SELECT * FROM (
SELECT date_trunc(...), sum(...) FROM detail WHERE date_trunc(...) GROUP BY
1
) src WHERE date_trunc = '2014-07-07'::date

Assuming you are still getting different sums you can now readily list the
corresponding detail (i.e., remove the group by) and see which records show
in the one set but not the other.  You can add additional CTE queries to
facilitate this if desired - i.e. put each where clause into its own detail
query and, if you have some kind of ID to compare against, perform a FULL
OUTER JOIN between the two CTE tables and exclude any items that appear in
both.

Ignoring indexes for the moment it would be much safer and clearer to write
the WHERE clause using "BETWEEN date AND date" instead of using timestamps
with subsecond precision.

If you do insist on using timestamps I would then suggest at least
explicitly casting them.

'2014-07-07 00:00:00.000000'::timestamp

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Inconsistent-results-postgresql-tp5813375p5813377.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Inconsistent results postgresql

From
Chris Curvey
Date:



On Wed, Jul 30, 2014 at 8:41 PM, Emir Ibrahimbegovic <emir.ibrahimbegovic@gmail.com> wrote:

Hello all,

I've got two queries which should produce the same results but they don't for some reason, please consider these :

SELECT date_trunc('day', payments.created_at) "day", SUM("payments"."amount") AS sum_id 
FROM "payments"  INNER JOIN "users" ON "users"."id" = "payments"."user_id" 
WHERE "payments"."currency" = 'gbp' AND (payments.refunded_date is null)  AND 
(users.deleted_at is null)  AND 
(users.subscribed_at between '2014-07-07 00:00:00.000000' and '2014-07-07 23:59:59.999999')  AND 
("payments"."created_at" BETWEEN '2014-07-07 00:00:00.000000' AND '2014-07-07 23:59:59.999999')
GROUP BY 1
ORDER by 1 asc

It produces this :

day                   | sum_id
------------------------------
"2014-07-07 00:00:00" | 1863.85

But when I try to group by day at looking at 30 days period with this query :

SELECT date_trunc('day', payments.created_at) "day", SUM("payments"."amount") AS sum_id 
FROM "payments"  INNER JOIN "users" ON "users"."id" = "payments"."user_id" 
WHERE "payments"."currency" = 'gbp' AND (payments.refunded_date is null)  AND 
(users.deleted_at is null)  AND 
(users.subscribed_at between '2014-06-30 00:00:00.000000' and '2014-07-30 23:59:59.999999')  AND 
("payments"."created_at" BETWEEN '2014-06-30 00:00:00.000000' AND '2014-07-30 23:59:59.999999')
GROUP BY 1
ORDER by 1 asc

It produces this (truncated to include the important data only) :

day                   | sum_id
------------------------------
"2014-07-07 00:00:00" | 1898.84

So looking for same date using different date range I get different results, how is this even possible? Can I look at something else? I'm really stuck here

Thanks

Remove the sum (just select "payments.amount") and the GROUP BY and run your queries.  You'll see that you're getting different rows included than you think you are.


--
I asked the Internet how to train my cat, and the Internet told me to get a dog.

Re: Inconsistent results postgresql

From
Emir Ibrahimbegovic
Date:
Thank you for your response guys.


On Wed, Jul 30, 2014 at 9:25 PM, Chris Curvey <chris@chriscurvey.com> wrote:



On Wed, Jul 30, 2014 at 8:41 PM, Emir Ibrahimbegovic <emir.ibrahimbegovic@gmail.com> wrote:

Hello all,

I've got two queries which should produce the same results but they don't for some reason, please consider these :

SELECT date_trunc('day', payments.created_at) "day", SUM("payments"."amount") AS sum_id 
FROM "payments"  INNER JOIN "users" ON "users"."id" = "payments"."user_id" 
WHERE "payments"."currency" = 'gbp' AND (payments.refunded_date is null)  AND 
(users.deleted_at is null)  AND 
(users.subscribed_at between '2014-07-07 00:00:00.000000' and '2014-07-07 23:59:59.999999')  AND 
("payments"."created_at" BETWEEN '2014-07-07 00:00:00.000000' AND '2014-07-07 23:59:59.999999')
GROUP BY 1
ORDER by 1 asc

It produces this :

day                   | sum_id
------------------------------
"2014-07-07 00:00:00" | 1863.85

But when I try to group by day at looking at 30 days period with this query :

SELECT date_trunc('day', payments.created_at) "day", SUM("payments"."amount") AS sum_id 
FROM "payments"  INNER JOIN "users" ON "users"."id" = "payments"."user_id" 
WHERE "payments"."currency" = 'gbp' AND (payments.refunded_date is null)  AND 
(users.deleted_at is null)  AND 
(users.subscribed_at between '2014-06-30 00:00:00.000000' and '2014-07-30 23:59:59.999999')  AND 
("payments"."created_at" BETWEEN '2014-06-30 00:00:00.000000' AND '2014-07-30 23:59:59.999999')
GROUP BY 1
ORDER by 1 asc

It produces this (truncated to include the important data only) :

day                   | sum_id
------------------------------
"2014-07-07 00:00:00" | 1898.84

So looking for same date using different date range I get different results, how is this even possible? Can I look at something else? I'm really stuck here

Thanks

Remove the sum (just select "payments.amount") and the GROUP BY and run your queries.  You'll see that you're getting different rows included than you think you are.


--
I asked the Internet how to train my cat, and the Internet told me to get a dog.

Re: Inconsistent results postgresql

From
Adrian Klaver
Date:
On 07/31/2014 05:44 AM, Emir Ibrahimbegovic wrote:
> Thank you for your response guys.

So, did you find the cause?

--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Inconsistent results postgresql

From
Emir Ibrahimbegovic
Date:
Yes there was some users subscribed at different date than payment was made. So I used this to get the results on the daily basis :

SELECT date_trunc('day', payments.created_at) "day", SUM("payments"."amount") AS sum_id 
FROM "payments" 
  INNER JOIN "users" ON "users"."id" = "payments"."user_id" 
WHERE "payments"."currency" = 'gbp' AND (payments.refunded_date is null) 
  AND 
(users.deleted_at is null) 
  AND 
(users.subscribed_at between '2014-06-30 00:00:00.000000' and '2014-07-30 23:59:59.999999') 
  AND 
("payments"."created_at" BETWEEN '2014-07-07 00:00:00.000000' AND '2014-07-07 23:59:59.999999')
GROUP BY 1
ORDER by 1 asc

So use the payment.created_at to for the current date and the range for the users.subscribed_at

On Thu, Jul 31, 2014 at 9:00 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 07/31/2014 05:44 AM, Emir Ibrahimbegovic wrote:
Thank you for your response guys.

So, did you find the cause?

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Inconsistent results postgresql

From
Kevin Grittner
Date:
Chris Curvey <chris@chriscurvey.com> wrote:
> Emir Ibrahimbegovic <emir.ibrahimbegovic@gmail.com> wrote:

>> So looking for same date using different date range I get
>> different results, how is this even possible? Can I look at
>> something else? I'm really stuck here

> Remove the sum (just select "payments.amount") and the GROUP BY
> and run your queries.  You'll see that you're getting different
> rows included than you think you are.

Well, either that or he's getting them in a different order and the
data type being summed is a floating point approximate type.  With
the approximate types the order that addition happens in can affect
the sum.  For the most accurate result you want to add from the
numbers closest to zero to the ones farthest away from zero.

If exact results are desired, the best solution is to not use an
approximate data type.  Try converting the column to numeric, which
is not as vulnerable to rounding errors.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company