Thread: generic sql question

generic sql question

From

Dennis Gearon

Date:

21 October 2004, 05:58:23

My question is it possible to speed up a query doing preselects? What I'm working on could end up being a very large
dataset.I hope to have 100-1000 queries per second (0r more?), and if very large tables are joined with very large
tables,I imagine that the memory would be get very full, overfull? 

So in the schema below the following queries, usrs own articles, articles are of article types, issues have dates and
names,and issues_published has an issue id and sets of articles, and article can be in many issues. 

So if I wanted to find articles of a certain article type within a certain date range for the article and had actually
beenpublished, I believe that this query could find it, joining three tables and then doing the qualifications for date
andtype: (assume values in {} are escaped and proper syntax) 

-----------------------------------------------------------------
SELECT article_id
FROM issues_published, issues, articles

WHERE issues_published.article_id = articles.article_id
   AND
      issues_published.issue_id = issues.issue_id
   AND
      articles.article_type = {article_type_id desired}
   AND
      article.article_date < {highest date}
   AND
      issues.article_date > {lowest date};

But would the following reduce the size of the join in memory?

SELECT article_id
FROM (select *
      from articles
      where article_date < {highest date}
   AND
      article_date > {lowest date} ) as articles_in_range, issues, issues_published

WHERE issues_published.article_id = articles_in_range.article_id
   AND
      issues_published.issue_id = issues.issue_id
   AND
      articles_in_range.article_type = {article type desired}

-------------------------------------------------------------------------
CREATE TABLE usr (
    usr_id SERIAL NOT NULL,
    PRIMARY KEY (usr_id)
);

CREATE TABLE article_types (
    ariticle_type_id SERIAL NOT NULL,
    article_type VARCHAR(40) NOT NULL,
    PRIMARY KEY (ariticle_type_id)
);

CREATE TABLE articles (
    article_id SERIAL NOT NULL,
    ariticle_type_id INT4 NOT NULL,
    author INT4 NOT NULL,
    body TEXT NOT NULL,
    date_written DATE NOT NULL,
    PRIMARY KEY (article_id, ariticle_type_id, author)
);

CREATE TABLE issues (
    issue_id SERIAL NOT NULL,
    issue_title VARCHAR(40) NOT NULL,
    issue_date DATE NOT NULL,
    PRIMARY KEY (issue_id)
);

CREATE TABLE issues_published (
    issue_id INT4 NOT NULL,
    article_id INT4 NOT NULL,
    PRIMARY KEY (issue_id, author, ariticle_type_id, article_id)
);

/*==========================================================================*/
/*  Foreign Keys                                                            */
/*==========================================================================*/

ALTER TABLE articles
    ADD FOREIGN KEY (author) REFERENCES usr (usr_id);

ALTER TABLE articles
    ADD FOREIGN KEY (ariticle_type_id) REFERENCES article_types (ariticle_type_id);

ALTER TABLE issue_articles
    ADD FOREIGN KEY (issue_id) REFERENCES issues (issue_id);

ALTER TABLE issue_articles
    ADD FOREIGN KEY (author,ariticle_type_id,article_id) REFERENCES articles (author, ariticle_type_id, article_id);

Re: generic sql question

From

"Hicham G. Elmongui"

Date:

21 October 2004, 09:28:23

i do no think writing the query in the second form differs from the first
one. In both cases, only the relevent articles (in range and of desired
type) will come out of the scan operator that scans the articles.
--h



"Dennis Gearon" <gearond@fireserve.net> wrote in message
news:41771795.6040605@fireserve.net...
> My question is it possible to speed up a query doing preselects? What I'm
working on could end up being a very large dataset. I hope to have 100-1000
queries per second (0r more?), and if very large tables are joined with very
large tables, I imagine that the memory would be get very full, overfull?
>
> So in the schema below the following queries, usrs own articles, articles
are of article types, issues have dates and names, and issues_published has
an issue id and sets of articles, and article can be in many issues.
>
> So if I wanted to find articles of a certain article type within a certain
date range for the article and had actually been published, I believe that
this query could find it, joining three tables and then doing the
qualifications for date and type: (assume values in {} are escaped and
proper syntax)
>
> -----------------------------------------------------------------
> SELECT article_id
> FROM issues_published, issues, articles
>
> WHERE issues_published.article_id = articles.article_id
>    AND
>       issues_published.issue_id = issues.issue_id
>    AND
>       articles.article_type = {article_type_id desired}
>    AND
>       article.article_date < {highest date}
>    AND
>       issues.article_date > {lowest date};
>
> But would the following reduce the size of the join in memory?
>
> SELECT article_id
> FROM (select *
>       from articles
>       where article_date < {highest date}
>    AND
>       article_date > {lowest date} ) as articles_in_range, issues,
issues_published
>
> WHERE issues_published.article_id = articles_in_range.article_id
>    AND
>       issues_published.issue_id = issues.issue_id
>    AND
>       articles_in_range.article_type = {article type desired}
>
> -------------------------------------------------------------------------
> CREATE TABLE usr (
>     usr_id SERIAL NOT NULL,
>     PRIMARY KEY (usr_id)
> );
>
> CREATE TABLE article_types (
>     ariticle_type_id SERIAL NOT NULL,
>     article_type VARCHAR(40) NOT NULL,
>     PRIMARY KEY (ariticle_type_id)
> );
>
> CREATE TABLE articles (
>     article_id SERIAL NOT NULL,
>     ariticle_type_id INT4 NOT NULL,
>     author INT4 NOT NULL,
>     body TEXT NOT NULL,
>     date_written DATE NOT NULL,
>     PRIMARY KEY (article_id, ariticle_type_id, author)
> );
>
> CREATE TABLE issues (
>     issue_id SERIAL NOT NULL,
>     issue_title VARCHAR(40) NOT NULL,
>     issue_date DATE NOT NULL,
>     PRIMARY KEY (issue_id)
> );
>
> CREATE TABLE issues_published (
>     issue_id INT4 NOT NULL,
>     article_id INT4 NOT NULL,
>     PRIMARY KEY (issue_id, author, ariticle_type_id, article_id)
> );
>
>
/*==========================================================================
*/
> /*  Foreign Keys
*/
>
/*==========================================================================
*/
>
> ALTER TABLE articles
>     ADD FOREIGN KEY (author) REFERENCES usr (usr_id);
>
> ALTER TABLE articles
>     ADD FOREIGN KEY (ariticle_type_id) REFERENCES article_types
(ariticle_type_id);
>
> ALTER TABLE issue_articles
>     ADD FOREIGN KEY (issue_id) REFERENCES issues (issue_id);
>
> ALTER TABLE issue_articles
>     ADD FOREIGN KEY (author,ariticle_type_id,article_id) REFERENCES
articles (author, ariticle_type_id, article_id);
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>