Thread: Repetitive code
Hi, This is prompted by the previous thread on "SQL Technique Question". I have the following query, extracted from a PHP script, where $dt is a date provided to the script. SELECT created, topic_id, 0, 0, 0, 0 FROM topic WHERE created >= $dt AND page_type IN (1, 2) UNION SELECT updated,topic_id, 1, 0, 0, 0 FROM topic WHERE date_trunc('day', updated) != created AND updated >= $dt AND page_typeIN (1, 2) UNION SELECT e.created, subject_id, 0, 1, entry_id, subject_type FROM entry e, topic WHERE subject_id= topic_id AND e.created >= $dt AND page_type IN (1, 2) UNION SELECT e.created, actor_id, 0, 1, entry_id,actor_type FROM entry e, topic WHERE actor_id = topic_id AND e.created >= $dt AND page_type IN (1, 2) UNION SELECT e.updated, subject_id, 1, 1, entry_id, subject_type FROM entry e, topic WHERE date_trunc('day',e.updated) != e.created AND subject_id = topic_id AND e.updated >= $dt AND page_type IN (1, 2) UNION SELECT e.updated, actor_id, 1, 1, entry_id, actor_type FROM entry e, topic WHERE date_trunc('day', e.updated)!= e.created AND actor_id = topic_id AND e.updated >= $dt AND page_type IN (1, 2) UNION SELECT e.created,e.topic_id, 0, 1, entry_id, rel_type FROM topic_entry e, topic t WHERE e.topic_id = t.topic_id AND e.created>= $dt AND page_type IN (1, 2) UNION SELECT e.updated, e.topic_id, 1, 1, entry_id, rel_type FROM topic_entrye, topic t WHERE e.topic_id = t.topic_id AND date_trunc('day', e.updated) != e.created AND e.updated>= $dt AND page_type IN (1, 2); As you can see, there's quite a bit of repetitive code, so the previous thread got me to thinking about simplifying it, perhaps through a view, perhaps through the use of CASE statements, particularly since I'm about to add at least one other table to the mix. As background, each table has a 'created' date column and an 'updated' timestamp column and the purpose of the various selects is to find the rows that were created or updated since the given $dt date. The third expression in each select list is an indicator of NEW (0) or CHANGED (1). The fourth item is a code for row type (topic=0, entry=1, but a new code is coming). I've been trying to figure out if simplifying into a view (one or more) is indeed possible. One factoring out that I can see is the "topics of interest" restriction (i.e., the join of each secondary table back to topic to get only topics whose page_types are 1 or 2). Another redundancy is the "date_trunc('day', updated) != created" which is there to avoid selecting "changed" records when they're actually new. However, although creating these views may simplify the subqueries it doesn't seem there is a way to avoid the eight-way UNION, or is there? TIA Joe
Each of your queries has the filter xxx >= $dt where the xxx is the first column in each select. You could simplify the query by turning the unioned selects into a sub-query and then putting the $dt filter in the outer query.
I don't know if this will cause performance problems though. If PostgreSQL completes the inner query before filtering by your $dt you may be better off leaving the $dt filters where they are.
I know Oracle has materialized views. Does PostgreSQL also have materialized views? If so, you could get great performance from your views AND simplify your SQL.
-Aaron Bono
I don't know if this will cause performance problems though. If PostgreSQL completes the inner query before filtering by your $dt you may be better off leaving the $dt filters where they are.
I know Oracle has materialized views. Does PostgreSQL also have materialized views? If so, you could get great performance from your views AND simplify your SQL.
-Aaron Bono
On 6/15/06, Joe <dev@freedomcircle.net> wrote:
Hi,
This is prompted by the previous thread on "SQL Technique Question". I
have the following query, extracted from a PHP script, where $dt is a
date provided to the script.
SELECT created, topic_id, 0, 0, 0, 0 FROM topic
WHERE created >= $dt AND page_type IN (1, 2)
UNION
SELECT updated, topic_id, 1, 0, 0, 0 FROM topic
WHERE date_trunc('day', updated) != created
AND updated >= $dt AND page_type IN (1, 2)
UNION
SELECT e.created, subject_id, 0, 1, entry_id, subject_type
FROM entry e, topic
WHERE subject_id = topic_id AND e.created >= $dt
AND page_type IN (1, 2)
UNION
SELECT e.created, actor_id, 0, 1, entry_id, actor_type
FROM entry e, topic
WHERE actor_id = topic_id AND e.created >= $dt
AND page_type IN (1, 2)
UNION
SELECT e.updated, subject_id, 1, 1, entry_id, subject_type
FROM entry e, topic
WHERE date_trunc('day', e.updated) != e.created
AND subject_id = topic_id AND e.updated >= $dt
AND page_type IN (1, 2)
UNION
SELECT e.updated, actor_id, 1, 1, entry_id, actor_type
FROM entry e, topic
WHERE date_trunc('day', e.updated) != e.created
AND actor_id = topic_id AND e.updated >= $dt
AND page_type IN (1, 2)
UNION
SELECT e.created, e.topic_id , 0, 1, entry_id, rel_type
FROM topic_entry e, topic t
WHERE e.topic_id = t.topic_id AND e.created >= $dt
AND page_type IN (1, 2)
UNION
SELECT e.updated, e.topic_id, 1, 1, entry_id, rel_type
FROM topic_entry e, topic t
WHERE e.topic_id = t.topic_id
AND date_trunc('day', e.updated) != e.created
AND e.updated >= $dt AND page_type IN (1, 2);
As you can see, there's quite a bit of repetitive code, so the previous
thread got me to thinking about simplifying it, perhaps through a view,
perhaps through the use of CASE statements, particularly since I'm about
to add at least one other table to the mix.
As background, each table has a 'created' date column and an 'updated'
timestamp column and the purpose of the various selects is to find the
rows that were created or updated since the given $dt date. The third
expression in each select list is an indicator of NEW (0) or CHANGED
(1). The fourth item is a code for row type (topic=0, entry=1, but a
new code is coming).
I've been trying to figure out if simplifying into a view (one or more)
is indeed possible. One factoring out that I can see is the "topics of
interest" restriction (i.e., the join of each secondary table back to
topic to get only topics whose page_types are 1 or 2). Another
redundancy is the "date_trunc('day', updated) != created" which is there
to avoid selecting "changed" records when they're actually new.
However, although creating these views may simplify the subqueries it
doesn't seem there is a way to avoid the eight-way UNION, or is there?
TIA
Joe
Aaron Bono wrote: > Each of your queries has the filter xxx >= $dt where the xxx is the > first column in each select. You could simplify the query by turning > the unioned selects into a sub-query and then putting the $dt filter in > the outer query. It would probably have to be two subqueries unless I can find a way to merge the differences between new and changed rows. > I don't know if this will cause performance problems though. If > PostgreSQL completes the inner query before filtering by your $dt you > may be better off leaving the $dt filters where they are. The query is only run a few times a week so performance is largely not a concern. I'm trying to simplify it to make adding tables less cumbersome (as a separate effort, the schema may be modified to normalize it, e.g., topic joins to entry via subject_id and actor_id and a subject and actor can also appear in topic_entry's topic_id). > I know Oracle has materialized views. Does PostgreSQL also have > materialized views? If so, you could get great performance from your > views AND simplify your SQL. AFAIK PostgreSQL does not support materialized views but it's interesting that you mention that because in essence the query is used to materialize a view, i.e., it's part of an INSERT / SELECT into a table which is then joined back to the other tables to construct a web page as well as an RSS feed. Joe
> AFAIK PostgreSQL does not support materialized views but it's > interesting that you mention that because in essence the query is used > to materialize a view, i.e., it's part of an INSERT / SELECT into a > table which is then joined back to the other tables to construct a web > page as well as an RSS feed. This thread: http://archives.postgresql.org/pgsql-performance/2006-06/msg00324.php mentions the use of materialized views in postgresql. See: http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html However, I am not sure if this link shows how postgresql supports materialized views or if it just shows how to simulate a materialized view with procedural code. Either way, I thought it might be of interest to you. Regards, Richard Broersma Jr.
Richard Broersma Jr <rabroersma@yahoo.com> writes: > http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html > However, I am not sure if this link shows how postgresql supports > materialized views or if it just shows how to simulate a materialized > view with procedural code. The latter. PG doesn't (currently) have direct support for materialized views, but it has some tools you can get the effect with. regards, tom lane
I haven't stared at your query as long as you have so I may have missed something but it looks like in all the selects you are combining the first column in the select is the column you filter on. So the the outer query doesn't have to know wiether it is a new or changed row:
SELECT * FROM (
SELECT created as my_date, topic_id, 0, 0, 0, 0 FROM topic
WHERE page_type IN (1, 2)
UNION
SELECT updated as my_date, topic_id, 1, 0, 0, 0 FROM topic
WHERE date_trunc('day', updated) != created
AND page_type IN (1, 2)
UNION
SELECT e.created as my_date, subject_id, 0, 1, entry_id, subject_type
FROM entry e, topic
WHERE subject_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.created as my_date, actor_id, 0, 1, entry_id, actor_type
FROM entry e, topic
WHERE actor_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.updated as my_date, subject_id, 1, 1, entry_id, subject_type
FROM entry e, topic
WHERE date_trunc('day', e.updated) != e.created
AND subject_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.updated as my_date, actor_id, 1, 1, entry_id, actor_type
FROM entry e, topic
WHERE date_trunc('day', e.updated) != e.created
AND actor_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.created as my_date, e.topic_id, 0, 1, entry_id, rel_type
FROM topic_entry e, topic t
WHERE e.topic_id = t.topic_id AND page_type IN (1, 2)
UNION
SELECT e.updated as my_date, e.topic_id, 1, 1, entry_id, rel_type
FROM topic_entry e, topic t
WHERE e.topic_id = t.topic_id
AND date_trunc('day', e.updated) != e.created
AND page_type IN (1, 2)
) my_union
where my_union.my_date >= $dt
I would almost be tempted to create a view for each small query and name them something meaningful and then another view that does the union. It would make the queries easier to understand at least (self documented).
-Aaron
SELECT * FROM (
SELECT created as my_date, topic_id, 0, 0, 0, 0 FROM topic
WHERE page_type IN (1, 2)
UNION
SELECT updated as my_date, topic_id, 1, 0, 0, 0 FROM topic
WHERE date_trunc('day', updated) != created
AND page_type IN (1, 2)
UNION
SELECT e.created as my_date, subject_id, 0, 1, entry_id, subject_type
FROM entry e, topic
WHERE subject_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.created as my_date, actor_id, 0, 1, entry_id, actor_type
FROM entry e, topic
WHERE actor_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.updated as my_date, subject_id, 1, 1, entry_id, subject_type
FROM entry e, topic
WHERE date_trunc('day', e.updated) != e.created
AND subject_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.updated as my_date, actor_id, 1, 1, entry_id, actor_type
FROM entry e, topic
WHERE date_trunc('day', e.updated) != e.created
AND actor_id = topic_id AND page_type IN (1, 2)
UNION
SELECT e.created as my_date, e.topic_id, 0, 1, entry_id, rel_type
FROM topic_entry e, topic t
WHERE e.topic_id = t.topic_id AND page_type IN (1, 2)
UNION
SELECT e.updated as my_date, e.topic_id, 1, 1, entry_id, rel_type
FROM topic_entry e, topic t
WHERE e.topic_id = t.topic_id
AND date_trunc('day', e.updated) != e.created
AND page_type IN (1, 2)
) my_union
where my_union.my_date >= $dt
I would almost be tempted to create a view for each small query and name them something meaningful and then another view that does the union. It would make the queries easier to understand at least (self documented).
-Aaron
On 6/16/06, Joe <dev@freedomcircle.net> wrote:
Aaron Bono wrote:
> Each of your queries has the filter xxx >= $dt where the xxx is the
> first column in each select. You could simplify the query by turning
> the unioned selects into a sub-query and then putting the $dt filter in
> the outer query.
It would probably have to be two subqueries unless I can find a way to
merge the differences between new and changed rows.
Aaron Bono wrote: > I haven't stared at your query as long as you have so I may have missed > something but it looks like in all the selects you are combining the > first column in the select is the column you filter on. So the the > outer query doesn't have to know wiether it is a new or changed row: > > SELECT * FROM ( > SELECT created as my_date, topic_id, 0, 0, 0, 0 FROM topic > WHERE page_type IN (1, 2) > UNION > [snip] > SELECT e.updated as my_date, e.topic_id, 1, 1, entry_id, rel_type > FROM topic_entry e, topic t > WHERE e.topic_id = t.topic_id > AND date_trunc('day', e.updated) != e.created > AND page_type IN (1, 2) > ) my_union > where my_union.my_date >= $dt Thanks Aaron. That does look like a great solution, overlooked since I'm not that familiar with SELECTs in the FROM clause. It may even make it possible to discard the interim table and do the web page/RSS feed directly from the view. > I would almost be tempted to create a view for each small query and name > them something meaningful and then another view that does the union. It > would make the queries easier to understand at least (self documented). That sounds like a good idea too because schema changes would be somewhat insulated by the layered views. Best regards, Joe
"Aaron Bono" <postgresql@aranya.com> writes: > I haven't stared at your query as long as you have so I may have missed > something Likewise I'm perhaps speaking too quickly, but at the risk of making a fool of myself: you should perhaps realize that UNION has to do a fair amount of work to eliminate duplicates. If you know the rows are going to be distinct or if you don't care about duplicates you can save a lot of time by using UNION ALL instead. -- greg