Thread: Repetitive code

Repetitive code

From
Joe
Date:
Hi,

This is prompted by the previous thread on "SQL Technique Question".  I 
have the following query, extracted from a PHP script, where $dt is a 
date provided to the script.
    SELECT created, topic_id, 0, 0, 0, 0 FROM topic    WHERE created >= $dt AND page_type IN (1, 2)  UNION    SELECT
updated,topic_id, 1, 0, 0, 0 FROM topic    WHERE date_trunc('day', updated) != created      AND updated >= $dt AND
page_typeIN (1, 2)  UNION    SELECT e.created, subject_id, 0, 1, entry_id, subject_type    FROM entry e, topic    WHERE
subject_id= topic_id AND e.created >= $dt      AND page_type IN (1, 2)  UNION    SELECT e.created, actor_id, 0, 1,
entry_id,actor_type    FROM entry e, topic    WHERE actor_id = topic_id AND e.created >= $dt      AND page_type IN (1,
2) UNION    SELECT e.updated, subject_id, 1, 1, entry_id, subject_type    FROM entry e, topic    WHERE
date_trunc('day',e.updated) != e.created      AND subject_id = topic_id AND e.updated >= $dt      AND page_type IN (1,
2) UNION    SELECT e.updated, actor_id, 1, 1, entry_id, actor_type    FROM entry e, topic    WHERE date_trunc('day',
e.updated)!= e.created      AND actor_id = topic_id AND e.updated >= $dt      AND page_type IN (1, 2)  UNION    SELECT
e.created,e.topic_id, 0, 1, entry_id, rel_type    FROM topic_entry e, topic t    WHERE e.topic_id = t.topic_id AND
e.created>= $dt      AND page_type IN (1, 2)  UNION    SELECT e.updated, e.topic_id, 1, 1, entry_id, rel_type    FROM
topic_entrye, topic t    WHERE e.topic_id = t.topic_id      AND date_trunc('day', e.updated) != e.created      AND
e.updated>= $dt AND page_type IN (1, 2);
 

As you can see, there's quite a bit of repetitive code, so the previous 
thread got me to thinking about simplifying it, perhaps through a view, 
perhaps through the use of CASE statements, particularly since I'm about 
to add at least one other table to the mix.

As background, each table has a 'created' date column and an 'updated' 
timestamp column and the purpose of the various selects is to find the 
rows that were created or updated since the given $dt date.  The third 
expression in each select list is an indicator of NEW (0) or CHANGED 
(1).  The fourth item is a code for row type (topic=0, entry=1, but a 
new code is coming).

I've been trying to figure out if simplifying into a view (one or more) 
is indeed possible.  One factoring out that I can see is the "topics of 
interest" restriction (i.e., the join of each secondary table back to 
topic to get only topics whose page_types are 1 or 2).  Another 
redundancy is the "date_trunc('day', updated) != created" which is there 
to avoid selecting "changed" records when they're actually new. 
However, although creating these views may simplify the subqueries it 
doesn't seem there is a way to avoid the eight-way UNION, or is there?

TIA

Joe


Re: Repetitive code

From
"Aaron Bono"
Date:
Each of your queries has the filter xxx >= $dt where the xxx is the first column in each select.  You could simplify the query by turning the unioned selects into a sub-query and then putting the $dt filter in the outer query.

I don't know if this will cause performance problems though.  If PostgreSQL completes the inner query before filtering by your $dt you may be better off leaving the $dt filters where they are.

I know Oracle has materialized views.  Does PostgreSQL also have materialized views?  If so, you could get great performance from your views AND simplify your SQL.

-Aaron Bono

On 6/15/06, Joe <dev@freedomcircle.net> wrote:
Hi,

This is prompted by the previous thread on "SQL Technique Question".  I
have the following query, extracted from a PHP script, where $dt is a
date provided to the script.

     SELECT created, topic_id, 0, 0, 0, 0 FROM topic
     WHERE created >= $dt AND page_type IN (1, 2)
   UNION
     SELECT updated, topic_id, 1, 0, 0, 0 FROM topic
     WHERE date_trunc('day', updated) != created
       AND updated >= $dt AND page_type IN (1, 2)
   UNION
     SELECT e.created, subject_id, 0, 1, entry_id, subject_type
     FROM entry e, topic
     WHERE subject_id = topic_id AND e.created >= $dt
       AND page_type IN (1, 2)
   UNION
     SELECT e.created, actor_id, 0, 1, entry_id, actor_type
     FROM entry e, topic
     WHERE actor_id = topic_id AND e.created >= $dt
       AND page_type IN (1, 2)
   UNION
     SELECT e.updated, subject_id, 1, 1, entry_id, subject_type
     FROM entry e, topic
     WHERE date_trunc('day', e.updated) != e.created
       AND subject_id = topic_id AND e.updated >= $dt
       AND page_type IN (1, 2)
   UNION
     SELECT e.updated, actor_id, 1, 1, entry_id, actor_type
     FROM entry e, topic
     WHERE date_trunc('day', e.updated) != e.created
       AND actor_id = topic_id AND e.updated >= $dt
       AND page_type IN (1, 2)
   UNION
     SELECT e.created, e.topic_id , 0, 1, entry_id, rel_type
     FROM topic_entry e, topic t
     WHERE e.topic_id = t.topic_id AND e.created >= $dt
       AND page_type IN (1, 2)
   UNION
     SELECT e.updated, e.topic_id, 1, 1, entry_id, rel_type
     FROM topic_entry e, topic t
     WHERE e.topic_id = t.topic_id
       AND date_trunc('day', e.updated) != e.created
       AND e.updated >= $dt AND page_type IN (1, 2);

As you can see, there's quite a bit of repetitive code, so the previous
thread got me to thinking about simplifying it, perhaps through a view,
perhaps through the use of CASE statements, particularly since I'm about
to add at least one other table to the mix.

As background, each table has a 'created' date column and an 'updated'
timestamp column and the purpose of the various selects is to find the
rows that were created or updated since the given $dt date.  The third
expression in each select list is an indicator of NEW (0) or CHANGED
(1).  The fourth item is a code for row type (topic=0, entry=1, but a
new code is coming).

I've been trying to figure out if simplifying into a view (one or more)
is indeed possible.  One factoring out that I can see is the "topics of
interest" restriction (i.e., the join of each secondary table back to
topic to get only topics whose page_types are 1 or 2).  Another
redundancy is the "date_trunc('day', updated) != created" which is there
to avoid selecting "changed" records when they're actually new.
However, although creating these views may simplify the subqueries it
doesn't seem there is a way to avoid the eight-way UNION, or is there?

TIA

Joe

Re: Repetitive code

From
Joe
Date:
Aaron Bono wrote:
> Each of your queries has the filter xxx >= $dt where the xxx is the 
> first column in each select.  You could simplify the query by turning 
> the unioned selects into a sub-query and then putting the $dt filter in 
> the outer query.

It would probably have to be two subqueries unless I can find a way to 
merge the differences between new and changed rows.

> I don't know if this will cause performance problems though.  If 
> PostgreSQL completes the inner query before filtering by your $dt you 
> may be better off leaving the $dt filters where they are.

The query is only run a few times a week so performance is largely not a 
concern.  I'm trying to simplify it to make adding tables less 
cumbersome (as a separate effort, the schema may be modified to 
normalize it, e.g., topic joins to entry via subject_id and actor_id and 
a subject and actor can also appear in topic_entry's topic_id).

> I know Oracle has materialized views.  Does PostgreSQL also have 
> materialized views?  If so, you could get great performance from your 
> views AND simplify your SQL.

AFAIK PostgreSQL does not support materialized views but it's 
interesting that you mention that because in essence the query is used 
to materialize a view, i.e., it's part of an INSERT / SELECT into a 
table which is then joined back to the other tables to construct a web 
page as well as an RSS feed.

Joe


Re: Repetitive code

From
Richard Broersma Jr
Date:
> AFAIK PostgreSQL does not support materialized views but it's 
> interesting that you mention that because in essence the query is used 
> to materialize a view, i.e., it's part of an INSERT / SELECT into a 
> table which is then joined back to the other tables to construct a web 
> page as well as an RSS feed.

This thread:
http://archives.postgresql.org/pgsql-performance/2006-06/msg00324.php

mentions the use of materialized views in postgresql. See:
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

However, I am not sure if this link shows how postgresql supports materialized views or if it just
shows how to simulate a materialized view with procedural code.

Either way, I thought it might be of interest to you.

Regards,

Richard Broersma Jr.


Re: Repetitive code

From
Tom Lane
Date:
Richard Broersma Jr <rabroersma@yahoo.com> writes:
> http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

> However, I am not sure if this link shows how postgresql supports
> materialized views or if it just shows how to simulate a materialized
> view with procedural code.

The latter.  PG doesn't (currently) have direct support for materialized
views, but it has some tools you can get the effect with.
        regards, tom lane


Re: Repetitive code

From
"Aaron Bono"
Date:
I haven't stared at your query as long as you have so I may have missed something but it looks like in all the selects you are combining the first column in the select is the column you filter on.  So the the outer query doesn't have to know wiether it is a new or changed row:

SELECT * FROM (
    SELECT created as my_date, topic_id, 0, 0, 0, 0 FROM topic
    WHERE page_type IN (1, 2)
  UNION
    SELECT updated as my_date, topic_id, 1, 0, 0, 0 FROM topic
    WHERE date_trunc('day', updated) != created
      AND page_type IN (1, 2)
  UNION
    SELECT e.created as my_date, subject_id, 0, 1, entry_id, subject_type
    FROM entry e, topic
    WHERE subject_id = topic_id AND page_type IN (1, 2)
  UNION
    SELECT e.created as my_date, actor_id, 0, 1, entry_id, actor_type
    FROM entry e, topic
    WHERE actor_id = topic_id  AND page_type IN (1, 2)
  UNION
    SELECT e.updated as my_date, subject_id, 1, 1, entry_id, subject_type
    FROM entry e, topic
    WHERE date_trunc('day', e.updated) != e.created
      AND subject_id = topic_id AND page_type IN (1, 2)
  UNION
    SELECT e.updated as my_date, actor_id, 1, 1, entry_id, actor_type
    FROM entry e, topic
    WHERE date_trunc('day', e.updated) != e.created
      AND actor_id = topic_id AND page_type IN (1, 2)
  UNION
    SELECT e.created as my_date, e.topic_id, 0, 1, entry_id, rel_type
    FROM topic_entry e, topic t
    WHERE e.topic_id = t.topic_id AND page_type IN (1, 2)
  UNION
    SELECT e.updated as my_date, e.topic_id, 1, 1, entry_id, rel_type
    FROM topic_entry e, topic t
    WHERE e.topic_id = t.topic_id
      AND date_trunc('day', e.updated) != e.created
      AND page_type IN (1, 2)
) my_union
where my_union.my_date >= $dt

I would almost be tempted to create a view for each small query and name them something meaningful and then another view that does the union.  It would make the queries easier to understand at least (self documented).

-Aaron

On 6/16/06, Joe <dev@freedomcircle.net> wrote:
Aaron Bono wrote:
> Each of your queries has the filter xxx >= $dt where the xxx is the
> first column in each select.  You could simplify the query by turning
> the unioned selects into a sub-query and then putting the $dt filter in
> the outer query.

It would probably have to be two subqueries unless I can find a way to
merge the differences between new and changed rows.

Re: Repetitive code

From
Joe
Date:
Aaron Bono wrote:
> I haven't stared at your query as long as you have so I may have missed 
> something but it looks like in all the selects you are combining the 
> first column in the select is the column you filter on.  So the the 
> outer query doesn't have to know wiether it is a new or changed row:
> 
> SELECT * FROM (
>     SELECT created as my_date, topic_id, 0, 0, 0, 0 FROM topic
>     WHERE page_type IN (1, 2)
>   UNION
> [snip]
>     SELECT e.updated as my_date, e.topic_id, 1, 1, entry_id, rel_type
>     FROM topic_entry e, topic t
>     WHERE e.topic_id = t.topic_id
>       AND date_trunc('day', e.updated) != e.created
>       AND page_type IN (1, 2)
> ) my_union
> where my_union.my_date >= $dt

Thanks Aaron.  That does look like a great solution, overlooked since 
I'm not that familiar with SELECTs in the FROM clause.  It may even make 
it possible to discard the interim table and do the web page/RSS feed 
directly from the view.

> I would almost be tempted to create a view for each small query and name 
> them something meaningful and then another view that does the union.  It 
> would make the queries easier to understand at least (self documented).

That sounds like a good idea too because schema changes would be 
somewhat insulated by the layered views.

Best regards,

Joe


Re: Repetitive code

From
Greg Stark
Date:
"Aaron Bono" <postgresql@aranya.com> writes:

> I haven't stared at your query as long as you have so I may have missed
> something

Likewise I'm perhaps speaking too quickly, but at the risk of making a fool of
myself: you should perhaps realize that UNION has to do a fair amount of work
to eliminate duplicates. If you know the rows are going to be distinct or if
you don't care about duplicates you can save a lot of time by using UNION ALL
instead.

-- 
greg