Thread: [GENERAL] Problems with Greatest

[GENERAL] Problems with Greatest

From
Teddy Schmitz
Date:

Hello,


I have a query using Greatest that hangs and never returns when called with two tables.


Postgres Version: 9.6


Tables

t1{ id bigint }


t2 { id bigint }


they are sharing a sequence


the query

select greatest(max(t1.id), max(t2.id)) from t1, t2;

The purpose was to call setval on the sequence after doing a bulk data load into the database. But this query never returns. I have tried it with various combinations,

select
greatest(max(t1.id), 6) from t1; -> This returns

select
greatest(max(t1.id), 6) from t1, t2; -> This never returns.

The query does work if there is only a few hundred items between the tables but I'm importing about ~300,000 rows between the two tables. I looked at pg_stat_activity and it says the query is active

I worked around this problem using a union all query but I'm wondering if this is a bug or I am just using greatest wrong.

Re: [GENERAL] Problems with Greatest

From
Teddy Schmitz
Date:

As a quick follow up I just did an explain on the query,


Aggregate  (cost=258007258.87..258007258.88 rows=1 width=8)
->  Nested Loop  (cost=0.00..184292254.83 rows=14743000807 width=16)
->  Seq Scan on t1  (cost=0.00..3796.41 rows=263141 width=8)
->  Materialize  (cost=0.00..1088.40 rows=56027 width=8)
->  Seq Scan on t2  (cost=0.00..808.27 rows=56027 width=8)


It seems it has to do a loop on 14 billion rows?  Can someone explain why this would happen?


Thanks,


Teddy


From: Teddy Schmitz
Sent: Thursday, February 16, 2017 12:32:41 PM
To: pgsql-general@postgresql.org
Subject: Problems with Greatest
 

Hello,


I have a query using Greatest that hangs and never returns when called with two tables.


Postgres Version: 9.6


Tables

t1{ id bigint }


t2 { id bigint }


they are sharing a sequence


the query

select greatest(max(t1.id), max(t2.id)) from t1, t2;

The purpose was to call setval on the sequence after doing a bulk data load into the database. But this query never returns. I have tried it with various combinations,

select
greatest(max(t1.id), 6) from t1; -> This returns

select
greatest(max(t1.id), 6) from t1, t2; -> This never returns.

The query does work if there is only a few hundred items between the tables but I'm importing about ~300,000 rows between the two tables. I looked at pg_stat_activity and it says the query is active

I worked around this problem using a union all query but I'm wondering if this is a bug or I am just using greatest wrong.

Re: [GENERAL] Problems with Greatest

From
Pavel Stehule
Date:


2017-02-16 5:38 GMT+01:00 Teddy Schmitz <teddy.schmitz@actuallywedo.com>:

As a quick follow up I just did an explain on the query,


Aggregate  (cost=258007258.87..258007258.88 rows=1 width=8)
->  Nested Loop  (cost=0.00..184292254.83 rows=14743000807 width=16)
->  Seq Scan on t1  (cost=0.00..3796.41 rows=263141 width=8)
->  Materialize  (cost=0.00..1088.40 rows=56027 width=8)
->  Seq Scan on t2  (cost=0.00..808.27 rows=56027 width=8)


It seems it has to do a loop on 14 billion rows?  Can someone explain why this would happen?



for bigger tables it should be pretty slow and it is expected behave

Regards

Pavel 


Thanks,


Teddy


From: Teddy Schmitz
Sent: Thursday, February 16, 2017 12:32:41 PM
To: pgsql-general@postgresql.org
Subject: Problems with Greatest
 

Hello,


I have a query using Greatest that hangs and never returns when called with two tables.


Postgres Version: 9.6


Tables

t1{ id bigint }


t2 { id bigint }


they are sharing a sequence


the query

select greatest(max(t1.id), max(t2.id)) from t1, t2;

The purpose was to call setval on the sequence after doing a bulk data load into the database. But this query never returns. I have tried it with various combinations,

select
greatest(max(t1.id), 6) from t1; -> This returns

select
greatest(max(t1.id), 6) from t1, t2; -> This never returns.

The query does work if there is only a few hundred items between the tables but I'm importing about ~300,000 rows between the two tables. I looked at pg_stat_activity and it says the query is active

I worked around this problem using a union all query but I'm wondering if this is a bug or I am just using greatest wrong.


[GENERAL] Problems with Greatest

From
"David G. Johnston"
Date:
On Wednesday, February 15, 2017, Teddy Schmitz <teddy.schmitz@actuallywedo.com> wrote:

select greatest(max(t1.id), max(t2.id)) from t1, t2;

I seriously doubt that the greatest function is a larger contributor to run time than the cross join between t1 and t2.

David J. 

Re: [GENERAL] Problems with Greatest

From
Teddy Schmitz
Date:

Thanks for the link Pavel,  that makes perfect sense now.


Teddy


From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Thursday, February 16, 2017 12:42:36 PM
To: Teddy Schmitz
Cc: pgsql-general@postgresql.org
Subject: [GENERAL] Problems with Greatest
 
On Wednesday, February 15, 2017, Teddy Schmitz <teddy.schmitz@actuallywedo.com> wrote:

select greatest(max(t1.id), max(t2.id)) from t1, t2;

I seriously doubt that the greatest function is a larger contributor to run time than the cross join between t1 and t2.

David J. 

Re: [GENERAL] Problems with Greatest

From
Thomas Kellerer
Date:
Teddy Schmitz schrieb am 16.02.2017 um 05:38:
> As a quick follow up I just did an explain on the query,
>
>
> Aggregate  (cost=258007258.87..258007258.88 rows=1 width=8)
> ->  Nested Loop  (cost=0.00..184292254.83 rows=14743000807 width=16)
> ->  Seq Scan on t1  (cost=0.00..3796.41 rows=263141 width=8)
> ->  Materialize  (cost=0.00..1088.40 rows=56027 width=8)
> ->  Seq Scan on t2  (cost=0.00..808.27 rows=56027 width=8)
>
>
> It seems it has to do a loop on 14 billion rows?  Can someone explain why this would happen?

You are not joining those two tables.

"from t1, t2" is a cross join that will calculate the cartesian product of both tables