Thread: query for a time interval

query for a time interval

From
Mark
Date:
Hello everybody,

I'm looking for an elegant SQL statement that will work in
Postgresql, MySQL and ORACLE.
The query will be executed by Java client.

To have this query for Postgresql is priority number one.


In this query I try to get a list of message Ids that expired.

time_to_live is in seconds.

SELECT id
  FROM mq
 WHERE now - start_date > time_to_live;

I have a following table:

CREATE TABLE mq
{
msg_id INTEGER,
retry_date  TIMESTAMP NOT NULL DEFAULT ('now'::text)::timestamp(1),
start_date  TIMESTAMP NOT NULL DEFAULT ('now'::text)::timestamp(1),
time_to_live INTEGER
}

Thanks!
Mark.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

view or index to optimize performance

From
Klein Balázs
Date:
Hello everybody,

I have a table that stores responses to questions in different
questionnaires. This table will grow to millions of rows.

My problem is that while most of the data in the table are rarely used in
queries one type of response will be used quite often: biodata - name,
gender, e-mail and this sort of. This data is also collected as responses to
questionnaires.

My question: what is the best strategy if I wanted to quickly retrieve
biodata from this table:

CREATE TABLE "public"."itemresponse" (
  "testoccasionid" INTEGER NOT NULL,
  "itemorder" SMALLINT NOT NULL,
  "response" TEXT NOT NULL,
  "bio" INTEGER DEFAULT 0 NOT NULL,
  "datatype" SMALLINT NOT NULL,
  CONSTRAINT "ItemResponseText_pk" PRIMARY KEY("testoccasionid",
"itemorder"),
  CONSTRAINT "ItemResponseText_TestOccasionID_fkey" FOREIGN KEY
("testoccasionid")
    REFERENCES "public"."testoccasion"("testoccasionid")
    ON DELETE NO ACTION
    ON UPDATE NO ACTION
    NOT DEFERRABLE
) WITH OIDS;

I can store the fact that it is biodata in the bio field - it is biodata if
the value of that field is not 0 and I can index that field and simply use
that as one of the conditions in queries.

Or should I instead create a view that contains only the biodata and select
from that? But will postgres use the indexes than? Would that be a better
approach?

Thanks for the help.
SWK





Re: view or index to optimize performance

From
Peter Eisentraut
Date:
Am Mittwoch, 21. Dezember 2005 21:27 schrieb Klein Balázs:
> Or should I instead create a view that contains only the biodata and select
> from that? But will postgres use the indexes than? Would that be a better
> approach?

Whether the query is executed by a view or typed in in its full form by hand
is completely irrelevant to the question whether indexes are used or should
be created.  Views do not optimize anything.

Re: view or index to optimize performance

From
Klein Balázs
Date:
I thought that if I used a view to retrieve data its content might be cached
so it would make the query faster.

Am Mittwoch, 21. Dezember 2005 21:27 schrieb Klein Balázs:
> Or should I instead create a view that contains only the biodata and
select
> from that? But will postgres use the indexes than? Would that be a better
> approach?

Whether the query is executed by a view or typed in in its full form by hand

is completely irrelevant to the question whether indexes are used or should
be created.  Views do not optimize anything.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly


Re: view or index to optimize performance

From
Richard Huxton
Date:
Klein Balázs wrote:
> Hello everybody,
>
> I have a table that stores responses to questions in different
> questionnaires. This table will grow to millions of rows.
[snip]
> I can store the fact that it is biodata in the bio field - it is biodata if
> the value of that field is not 0 and I can index that field and simply use
> that as one of the conditions in queries.
>
> Or should I instead create a view that contains only the biodata and select
> from that? But will postgres use the indexes than? Would that be a better
> approach?

Create the index on the table and a view should use it.

You might like to read up on partial indexes where you can do something
like:

CREATE INDEX my_index ON itemresponse (testoccasionid,itemorder)
WHERE bio > 0;

So long as your query/view definition has WHERE bio > 0 in it then this
index can be used.

HTH
--
   Richard Huxton
   Archonet Ltd


Re: view or index to optimize performance

From
"Jim C. Nasby"
Date:
On Wed, Dec 21, 2005 at 10:49:29PM +0100, Klein Bal?zs wrote:
> I thought that if I used a view to retrieve data its content might be cached
> so it would make the query faster.

No. A view is essentially exactly the same as inserting the view
definition into the query that's using it. IE:

CREATE VIEW v AS SELECT * FROM t;

SELECT * FROM v becomes:

SELECT * FROM (SELECT * FROM t) v;

What you could do is partition the table so that critical information is
stored in a smaller table while everything else goes to a larger table.
You can then do a UNION ALL view on top of that to 'glue' the two tables
together. You can even define rules so that you can do updates on the
view. http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ has an
example that's similar to this. Note that you'll need an appropriate
index on the large table so that PostgreSQL can quickly tell it doesn't
contain values that are in the small table. Or, in 8.1 you could use a
constraint. You could also do this with inherited tables instead of
views.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: query for a time interval

From
"Jim C. Nasby"
Date:
On Wed, Dec 21, 2005 at 11:52:56AM -0800, Mark wrote:
> SELECT id
>   FROM mq
>  WHERE now - start_date > time_to_live;

The problem is you can't use an index on this, because you'd need to
index on (now() - start_date), which obviously wouldn't work. Instead,
re-write the WHERE as:

WHERE start_date < now() - time_to_live
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: view or index to optimize performance

From
Klein Balázs
Date:
thanks for the help

>What you could do is partition the table so that critical information is
>stored in a smaller table while everything else goes to a larger table.

I was thinking the other way round - maybe I can split the large table by
creating a materialized view. But than I read that it is maybe unnecessary
to create a materialized view because pg

"Materialized views sound a useful workaround, if your database doesn't have
a query cache. If you do have a query cache, then you already effectively
have eager or lazy materialized views (depending on your isolation level):
Just use your normal view (or query) and let the database figure it out."
Quote from Farce Pest in
http://spyced.blogspot.com/2005/05/materialized-views-in-postgresql.html

But later in the same blog it seems to indicate that there is a choice to
either use or not use the query cache of pg.

So I don't know now how this cache works and whether it could help me in
this.

SWK


Re: view or index to optimize performance

From
Tom Lane
Date:
=?iso-8859-1?Q?Klein_Bal=E1zs?= <Balazs.Klein@axelero.hu> writes:
> But later in the same blog it seems to indicate that there is a choice to
> either use or not use the query cache of pg.

Hm?  There is no query cache in PG.

            regards, tom lane

Re: query for a time interval

From
Michael Fuhr
Date:
On Thu, Dec 22, 2005 at 09:47:11AM -0600, Jim C. Nasby wrote:
> On Wed, Dec 21, 2005 at 11:52:56AM -0800, Mark wrote:
> > SELECT id
> >   FROM mq
> >  WHERE now - start_date > time_to_live;
>
> The problem is you can't use an index on this, because you'd need to
> index on (now() - start_date), which obviously wouldn't work. Instead,
> re-write the WHERE as:
>
> WHERE start_date < now() - time_to_live

Unless I'm missing something that wouldn't use an index either,
because the planner wouldn't know what value to compare start_date
against without hitting each row to find that row's time_to_live.
But something like this should be able to use an expression index
on (start_date + time_to_live):

WHERE start_date + time_to_live < now()

--
Michael Fuhr

Re: query for a time interval

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> Unless I'm missing something that wouldn't use an index either,
> because the planner wouldn't know what value to compare start_date
> against without hitting each row to find that row's time_to_live.
> But something like this should be able to use an expression index
> on (start_date + time_to_live):

> WHERE start_date + time_to_live < now()

Or perhaps simpler, get rid of time_to_live in favor of an end_date
column, which could be indexed directly.

            regards, tom lane

Re: query for a time interval

From
Bruno Wolff III
Date:
On Wed, Dec 21, 2005 at 11:52:56 -0800,
  Mark <sendmailtomark@yahoo.com> wrote:
> Hello everybody,
>
> I'm looking for an elegant SQL statement that will work in
> Postgresql, MySQL and ORACLE.
> The query will be executed by Java client.
>
> To have this query for Postgresql is priority number one.
>
>
> In this query I try to get a list of message Ids that expired.
>
> time_to_live is in seconds.
>
> SELECT id
>   FROM mq
>  WHERE now - start_date > time_to_live;

An interval is not going to be comparable to an integer without casting.
If you use explicit casting the query probably won't be portable. If there
is some implicit cast that makes the above SQL valid, it is probably a cast
to text which won't do what you want.

>
> I have a following table:
>
> CREATE TABLE mq
> {
> msg_id INTEGER,
> retry_date  TIMESTAMP NOT NULL DEFAULT ('now'::text)::timestamp(1),
> start_date  TIMESTAMP NOT NULL DEFAULT ('now'::text)::timestamp(1),
> time_to_live INTEGER
> }

Can you make time_to_live an interval?

>
> Thanks!
> Mark.
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match