Thread: scenario with a slow query

scenario with a slow query

From
Volodymyr Kostyrko
Date:
Hi all.

Maybe I'm missing something but I have found a case when planner is
unoptimal.

# Creating table

create table test_stat(id smallint, count smallint, date date);

# Filling table, sorry for php
<?php
$db = new PDO('');

$insert = $db->prepare('insert into test_stat (id, count, date) values
(?, 1, to_timestamp(?)::date)');

$today = mktime(0, 0, 0);

$db->beginTransaction();
for($i = 0; $i < 1500000; $i++) {
   $insert(rand(0, 1000), $today);
}
$db->commit();
?>

And now goes the query.

select * from (
   select id, sum(count) as today
   from test_stat
   where date = now()::date
   group by id
)a natural full join (
   select id, sum(count) as lastday
   from test_stat
   where date = (now() - interval '1 day')::date
   group by id
)b natural full join (
   select id, sum(count) as week
   from test_stat
   where date_trunc('week', now()) = date_trunc('week', date)
     and date <> now()::date
   group by id
)c natural full join (
   select id, sum(count) as whole
   from test_stat
   where date <> now()::date
     or date is null
   group by id
)d
where id = ?;

Which yields this explain:


     QUERY PLAN



----------------------------------------------------------------------------------------------------------------------------------------------------------------
  Hash Full Join  (cost=94830.30..126880.73 rows=5 width=48)
    Hash Cond: (COALESCE(COALESCE(public.test_stat.id,
public.test_stat.id), public.test_stat.id) = public.test_stat.id)
    Filter: (COALESCE(COALESCE(COALESCE(public.test_stat.id,
public.test_stat.id), public.test_stat.id), public.test_stat.id) = 1)
    ->  Hash Full Join  (cost=91193.49..123240.10 rows=1001 width=36)
          Hash Cond: (COALESCE(public.test_stat.id, public.test_stat.id)
= public.test_stat.id)
          ->  Hash Full Join  (cost=40259.93..72302.74 rows=1001 width=24)
                Hash Cond: (public.test_stat.id = public.test_stat.id)
                ->  GroupAggregate  (cost=0.01..32042.63 rows=1 width=8)
                      ->  Index Scan using test_stat__id_date on
test_stat  (cost=0.01..32042.61 rows=1 width=8)
                            Index Cond: (date = ((now() - '1
day'::interval))::date)
                ->  Hash  (cost=40247.41..40247.41 rows=1001 width=12)
                      ->  HashAggregate  (cost=40227.39..40237.40
rows=1001 width=8)
                            ->  Seq Scan on test_stat
(cost=0.00..33089.97 rows=1427484 width=8)
                                  Filter: (date = (now())::date)
          ->  Hash  (cost=50933.55..50933.55 rows=1 width=12)
                ->  HashAggregate  (cost=50933.53..50933.54 rows=1 width=8)
                      ->  Seq Scan on test_stat  (cost=0.00..50933.52
rows=1 width=8)
                            Filter: ((date <> (now())::date) AND
(date_trunc('week'::text, now()) = date_trunc('week'::text,
(date)::timestamp with time zone)))
    ->  Hash  (cost=3636.80..3636.80 rows=1 width=12)
          ->  GroupAggregate  (cost=34.80..3636.79 rows=1 width=8)
                ->  Bitmap Heap Scan on test_stat  (cost=34.80..3636.78
rows=1 width=8)
                      Recheck Cond: (id = 1)
                      Filter: ((date <> (now())::date) OR (date IS NULL))
                      ->  Bitmap Index Scan on test_stat__id_date
(cost=0.00..34.80 rows=1378 width=0)
                            Index Cond: (id = 1)
(25 rows)

The part which yields a Seq scan is a:
   select id, sum(count) as today
   from test_stat
   where date = now()::date
   group by id

And it uses index when executed like this:
select * from (
   select id, sum(count) as today
   from test_stat
   where date = now()::date
   group by id
)a where id = 1

Where am I wrong here? What I have done so this subquery can't inherit
constraint from outer query?

--
Sphinx of black quartz judge my vow.

Re: scenario with a slow query

From
Tom Lane
Date:
Volodymyr Kostyrko <c.kworr@gmail.com> writes:
> Maybe I'm missing something but I have found a case when planner is
> unoptimal.

The planner knows next to nothing about optimizing FULL JOIN, and
I would not recommend holding your breath waiting for it to get better
about that, because there's basically no demand for the work that'd
be involved.  I'd suggest refactoring this query instead.  A nest of
full joins seems like a rather unintuitive way to get the result
anyway ...

            regards, tom lane

Transaction ID wraparound, Oracle style

From
Igor Polishchuk
Date:
Here is an article on a recently discovered Oracle flaw, which allows SCN to
reach its limit.
http://www.computerworld.com/s/article/9223506/Fundamental_Oracle_flaw_revea
led?taxonomyId=18&pageNumber=1

Please don't beat me for posting a link for an Oracle related article.
If you despise a very notion of mentioning Oracle, please just don't read
the post.
This article may be interesting to any RDBMS  professional, no mater what db
flavor he/she is working with.
Also, this story may be a lesson for the Postgresql community on how not do
things. I'm not a developer, but it seems that having synchronized
transaction id between let say streaming-replicated databases would give
some advantages if done properly.
Regards
Igor Polishchuk



Re: Transaction ID wraparound, Oracle style

From
Scott Marlowe
Date:
On Wed, Jan 18, 2012 at 11:21 AM, Igor Polishchuk <igor@powerreviews.com> wrote:
> Here is an article on a recently discovered Oracle flaw, which allows SCN to
> reach its limit.
> http://www.computerworld.com/s/article/9223506/Fundamental_Oracle_flaw_revea
> led?taxonomyId=18&pageNumber=1
>
> Please don't beat me for posting a link for an Oracle related article.
> If you despise a very notion of mentioning Oracle, please just don't read
> the post.
> This article may be interesting to any RDBMS  professional, no mater what db
> flavor he/she is working with.
> Also, this story may be a lesson for the Postgresql community on how not do
> things. I'm not a developer, but it seems that having synchronized
> transaction id between let say streaming-replicated databases would give
> some advantages if done properly.

Wow, interesting difference between postgresql which occasionally
resets its smaller transaction id to prevent wrap whereas oracle just
uses a bigger number.  If my calcs are right, Oracle has about 500
years to figure out the wrap around limit at 16ktps etc.

Thanks for the link, it was a fascinating read.


--
To understand recursion, one must first understand recursion.

Re: Transaction ID wraparound, Oracle style

From
"A.M."
Date:
On Jan 18, 2012, at 2:15 PM, Scott Marlowe wrote:

> On Wed, Jan 18, 2012 at 11:21 AM, Igor Polishchuk <igor@powerreviews.com> wrote:
>> Here is an article on a recently discovered Oracle flaw, which allows SCN to
>> reach its limit.
>> http://www.computerworld.com/s/article/9223506/Fundamental_Oracle_flaw_revea
>> led?taxonomyId=18&pageNumber=1
>>
>> Please don't beat me for posting a link for an Oracle related article.
>> If you despise a very notion of mentioning Oracle, please just don't read
>> the post.
>> This article may be interesting to any RDBMS  professional, no mater what db
>> flavor he/she is working with.
>> Also, this story may be a lesson for the Postgresql community on how not do
>> things. I'm not a developer, but it seems that having synchronized
>> transaction id between let say streaming-replicated databases would give
>> some advantages if done properly.
>
> Wow, interesting difference between postgresql which occasionally
> resets its smaller transaction id to prevent wrap whereas oracle just
> uses a bigger number.  If my calcs are right, Oracle has about 500
> years to figure out the wrap around limit at 16ktps etc.
>
> Thanks for the link, it was a fascinating read.

By the way, this is called a Lamport clock.

http://en.wikipedia.org/wiki/Lamport_timestamps?banner=none

"On receiving a message, the receiver process sets its counter to be greater than the maximum of its own value and the
receivedvalue before it considers the message received." 

Cheers,
M

Re: scenario with a slow query

From
Volodymyr Kostyrko
Date:
Tom Lane wrote:
> Volodymyr Kostyrko<c.kworr@gmail.com>  writes:
>> Maybe I'm missing something but I have found a case when planner is
>> unoptimal.
>
> The planner knows next to nothing about optimizing FULL JOIN, and
> I would not recommend holding your breath waiting for it to get better
> about that, because there's basically no demand for the work that'd
> be involved.  I'd suggest refactoring this query instead.  A nest of
> full joins seems like a rather unintuitive way to get the result
> anyway ...

That's not about FULL JOIN, that's seems to be about all JOIN's:

select * from (
   select 1 as id
)x natural left join (
   select id, sum(count) as today
   from test_stat
   where date = now()::date group by id
)a natural left join (
   select id, sum(count) as lastday
   from test_stat
   where date = (now() - interval '1 day')::date group by id
)b natural left join (
   select id, sum(count) as week
   from test_stat
   where date between (now() - interval '1 day') and (now() - interval
'7 day')
   group by id
)c natural left join (
   select id, sum(count) as whole
   from test_stat
   where date <> now()::date
   group by id
)d;

This query exhibits the same seq scan.

By refactoring did you mean something like this:

select
   (select sum(count) from test_stat
     where date = now()::date and id = 1
     group by id) as today,
   ( select sum (count) from test_stat
     where date = (now() - interval '1 day')::date and id = 1
     group by id) as lastday,
   ( select sum(count) from test_stat
     where date between (now() - interval '1 day')
       and (now() - interval '7 day') and id = 1
     group by id) as week,
   (select sum(count) from test_stat
     where date <> now()::date and id = 1
     group by id) as whole;

This one works much better requiring mostly no planner involvment...
Yielding the same result though.

--
Sphinx of black quartz judge my vow.