Thread:

From

"Scott Morrison"

Date:

06 February 2003, 17:03:12

I would like to know if there is a more efficient way to perform a query.

I have a table (see below) with an ID, a date, and a value.  The date
specifies when the entry was added.  I want to query the table to determine
what the value was on a specific day.

I have the following query, which works quite well but it is very slow (the
real table I'm using has 200000 records)... I'm sure the slowness is due to
the subselect... does anybody know any way to make this query faster?

Thanks in advance,
Scott

My current query:
select a.* from sample a where (id,date) in (select a.id,max(date) from
sample where date<='<<the date>>' and id=a.id) order by id;

So with the data below, filling in '2003-02-01' for <<the date>> gives:
id |    date    | value
----+------------+-------
1 | 2003-02-01 |    12
2 | 2003-02-01 |     9
3 | 2003-02-01 |     2
4 | 2003-02-01 |    11

And filling in '2003-02-04' for <<the date>> gives:
id |    date    | value
----+------------+-------
1 | 2003-02-04 |    21
2 | 2003-02-01 |     9
3 | 2003-02-01 |     2
4 | 2003-02-03 |    12

Here is the table layout and the sample data I'm using:

       Table "sample"
Column |  Type   | Modifiers
--------+---------+-----------
id     | integer | not null
date   | date    | not null
value  | integer | not null
Primary key: sample_pkey

id |    date    | value
----+------------+-------
1 | 2003-02-01 |    12
1 | 2003-02-02 |    16
1 | 2003-02-04 |    21
2 | 2003-02-01 |     9
3 | 2003-02-01 |     2
4 | 2003-02-01 |    11
4 | 2003-02-03 |    12
(7 rows)

Re:

From

Doug Silver

Date:

06 February 2003, 18:03:36

On Tuesday 04 February 2003 03:35 pm, Scott Morrison wrote:
> I would like to know if there is a more efficient way to perform a query.
>
> I have a table (see below) with an ID, a date, and a value.  The date
> specifies when the entry was added.  I want to query the table to determine
> what the value was on a specific day.
>
> I have the following query, which works quite well but it is very slow (the
> real table I'm using has 200000 records)... I'm sure the slowness is due to
> the subselect... does anybody know any way to make this query faster?
>
> Thanks in advance,
> Scott
>
> My current query:
> select a.* from sample a where (id,date) in (select a.id,max(date) from
> sample where date<='<<the date>>' and id=a.id) order by id;
>
> So with the data below, filling in '2003-02-01' for <<the date>> gives:
> id |    date    | value
> ----+------------+-------
> 1 | 2003-02-01 |    12
> 2 | 2003-02-01 |     9
> 3 | 2003-02-01 |     2
> 4 | 2003-02-01 |    11
>
> And filling in '2003-02-04' for <<the date>> gives:
> id |    date    | value
> ----+------------+-------
> 1 | 2003-02-04 |    21
> 2 | 2003-02-01 |     9
> 3 | 2003-02-01 |     2
> 4 | 2003-02-03 |    12
>
> Here is the table layout and the sample data I'm using:
>
>        Table "sample"
> Column |  Type   | Modifiers
> --------+---------+-----------
> id     | integer | not null
> date   | date    | not null
> value  | integer | not null
> Primary key: sample_pkey
>
> id |    date    | value
> ----+------------+-------
> 1 | 2003-02-01 |    12
> 1 | 2003-02-02 |    16
> 1 | 2003-02-04 |    21
> 2 | 2003-02-01 |     9
> 3 | 2003-02-01 |     2
> 4 | 2003-02-01 |    11
> 4 | 2003-02-03 |    12
> (7 rows)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

First, you didn't say if you've indexed the date field, which you should do if
you're searching on that field more than once.  That will certainly help out
with the speed.  Also, that query seems a bit overdone, how about this:

select * from sample where date<='<<the date>>' order by id,date;

You're only getting information from a single table, so I don't think the
subselect was necessary, though I might be missing something since the coffee
has worn off ;)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Doug Silver
Urchin Software Corp.    http://www.urchin.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re:

From

Tom Lane

Date:

06 February 2003, 18:44:14

"Scott Morrison" <smorrison@navtechinc.com> writes:
> I would like to know if there is a more efficient way to perform a query.
> I have a table (see below) with an ID, a date, and a value.  The date
> specifies when the entry was added.  I want to query the table to determine
> what the value was on a specific day.

Try something like

    SELECT ... FROM table
    WHERE date <= 'target date'
    ORDER BY date DESC
    LIMIT 1;

Given an index on the date column, this should be very fast.

If you need additional constraints (like the 'id' in your example),
you can try

    SELECT ... FROM table
    WHERE id = 'target id' AND date <= 'target date'
    ORDER BY id DESC, date DESC
    LIMIT 1;

where now you need an index declared on (id, date)  (in that order).
The basic idea is to make sure that the row you want is the first one
reached when scanning from a given boundary in the index.

            regards, tom lane

Re:

From

"Scott Morrison"

Date:

06 February 2003, 20:38:40

Thanks to everybody for your replies.  For some reason that has went through
as two different threads.

Anyways, I do have indices on both the date and id fields.

The queries suggested by Tom Lane and Doug Silver do not do exactly what I
want to do... but I believe that's because I did not form what it was
correctly.

I want a query which lists all of the ids from the table with the date which
is closest to (but not past) a given date.

For example, the statement
select * from sample where date<='2003-02-04' order by id,date;
will return:
 id |    date    | value
----+------------+-------
  1 | 2003-02-01 |    12
  1 | 2003-02-02 |    16
  1 | 2003-02-04 |    21 **
  2 | 2003-02-01 |     9 **
  3 | 2003-02-01 |     2 **
  4 | 2003-02-01 |    11
  4 | 2003-02-03 |    12 **

whereas I only want the fields marked with ** (the most recent date on or
before the one specified in the query)

Greg Sabino Mullane (in the other thread) suggested the following alternate,
but it ran slightly slower on my machine than the original.

SELECT id, date, value FROM sample a WHERE date <='2003-01-01' AND oid =
(SELECT oid FROM sample WHERE id = a.id AND DATE <='2003-01-01' ORDER BY
date DESC LIMIT 1)
ORDER BY id;

Thanks,
Scott




-----Original Message-----
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org]On Behalf Of Doug Silver
Sent: Thursday, February 06, 2003 3:03 PM
To: Scott Morrison; pgsql-novice@postgresql.org
Subject: Re: [NOVICE]


On Tuesday 04 February 2003 03:35 pm, Scott Morrison wrote:
> I would like to know if there is a more efficient way to perform a query.
>
> I have a table (see below) with an ID, a date, and a value.  The date
> specifies when the entry was added.  I want to query the table to
determine
> what the value was on a specific day.
>
> I have the following query, which works quite well but it is very slow
(the
> real table I'm using has 200000 records)... I'm sure the slowness is due
to
> the subselect... does anybody know any way to make this query faster?
>
> Thanks in advance,
> Scott
>
> My current query:
> select a.* from sample a where (id,date) in (select a.id,max(date) from
> sample where date<='<<the date>>' and id=a.id) order by id;
>
> So with the data below, filling in '2003-02-01' for <<the date>> gives:
> id |    date    | value
> ----+------------+-------
> 1 | 2003-02-01 |    12
> 2 | 2003-02-01 |     9
> 3 | 2003-02-01 |     2
> 4 | 2003-02-01 |    11
>
> And filling in '2003-02-04' for <<the date>> gives:
> id |    date    | value
> ----+------------+-------
> 1 | 2003-02-04 |    21
> 2 | 2003-02-01 |     9
> 3 | 2003-02-01 |     2
> 4 | 2003-02-03 |    12
>
> Here is the table layout and the sample data I'm using:
>
>        Table "sample"
> Column |  Type   | Modifiers
> --------+---------+-----------
> id     | integer | not null
> date   | date    | not null
> value  | integer | not null
> Primary key: sample_pkey
>
> id |    date    | value
> ----+------------+-------
> 1 | 2003-02-01 |    12
> 1 | 2003-02-02 |    16
> 1 | 2003-02-04 |    21
> 2 | 2003-02-01 |     9
> 3 | 2003-02-01 |     2
> 4 | 2003-02-01 |    11
> 4 | 2003-02-03 |    12
> (7 rows)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

First, you didn't say if you've indexed the date field, which you should do
if
you're searching on that field more than once.  That will certainly help out
with the speed.  Also, that query seems a bit overdone, how about this:

select * from sample where date<='<<the date>>' order by id,date;

You're only getting information from a single table, so I don't think the
subselect was necessary, though I might be missing something since the
coffee
has worn off ;)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Doug Silver
Urchin Software Corp.    http://www.urchin.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re:

From

greg@turnstep.com

Date:

07 February 2003, 13:53:19

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Greg Sabino Mullane (in the other thread) suggested the following alternate,
> but it ran slightly slower on my machine than the original.
>
> SELECT id, date, value FROM sample a WHERE date <='2003-01-01' AND oid =
> (SELECT oid FROM sample WHERE id = a.id AND DATE <='2003-01-01' ORDER BY
> date DESC LIMIT 1)
> ORDER BY id;

You should check this again. There is no way that my query is slower, as
long as you have an index on the "date" column and have run an ANALYZE on
the "sample" table. Try running these:

ANALYZE sample;
CREATE INDEX sample_date on sample(date);
EXPLAIN <the query above>;

...and make sure that it is hitting the sample_date index. I also just
realized that if you have pre-existing indexes, they may be slowing my
query down (e.g. one on "date" *and* "id"). Make sure that the explain
above only uses the sample_date index.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302071047

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+Q9VdvJuQZxSWSsgRAjHVAJwP6Qw4L+gOpfCC6yYkgIgJjzkcZQCggZ1y
2ckvZhgeUpQYOByPxHNKOmk=
=ix4n
-----END PGP SIGNATURE-----

Re:

From

"Chad Thompson"

Date:

07 February 2003, 17:14:27


> in one place you are counting(*), in another you are counting something
else -count(distinct(l.full_phone).
>
> Not surprising they are different?
>
>
> Ken
>
>


Thanks Ken.
I realized that as well just after I hit send. ;-)

Thanks
Chad

Re:

From

Andrew McMillan

Date:

08 February 2003, 06:35:43

On Sat, 2003-02-08 at 04:48, greg@turnstep.com wrote:
> > Greg Sabino Mullane (in the other thread) suggested the following alternate,
> > but it ran slightly slower on my machine than the original.
> >
> > SELECT id, date, value FROM sample a WHERE date <='2003-01-01' AND oid =
> > (SELECT oid FROM sample WHERE id = a.id AND DATE <='2003-01-01' ORDER BY
> > date DESC LIMIT 1)
> > ORDER BY id;
>
> You should check this again. There is no way that my query is slower, as
> long as you have an index on the "date" column and have run an ANALYZE on
> the "sample" table.

Wouldn't there need to be an index on OID for your query to be fast?

Cheers,
                    Andrew.
--
---------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for nothing with http://survey.net.nz/
---------------------------------------------------------------------

lock timeout in pgsql 7.3.1

From

Muhammad Shariq Muzaffar

Date:

08 February 2003, 08:20:16

I am having a little problem with STATEMENT_TIMEOUT, i
dont know why this is happening....

there are 2 users, first user have selected a record
using FOR UPDATE.

second user have selected 5 records using FOR UPDATE.
but when the second user tries to select the record
which is locked by the first user. Pgsql gives the
error "ERROR: query was cancelled." and rollbacks the
transaction which unlocks all the records locked by
the second user.

Is there any way that pgsql only gives the error and
keep the status of the transction as it is?

thanx in advance.


__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Re:

From

greg@turnstep.com

Date:

10 February 2003, 17:10:02

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Wouldn't there need to be an index on OID for your query to be fast?

No, the single index on the "date" field is enough. See the EXPLAIN
output from my first post in the original thread:

http://archives.postgresql.org/pgsql-novice/2003-02/msg00075.php

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302101404
-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+R/jEvJuQZxSWSsgRAsaYAKDOzKuujdaqkvImutA0nl1/YqVNvgCg+Qk2
lNTYcT/cGyoFKZzd3bVqef4=
=n1Iq
-----END PGP SIGNATURE-----

Re:

From

Andrew McMillan

Date:

10 February 2003, 17:28:47

On Tue, 2003-02-11 at 08:10, greg@turnstep.com wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> > Wouldn't there need to be an index on OID for your query to be fast?
>
> No, the single index on the "date" field is enough. See the EXPLAIN
> output from my first post in the original thread:

However the original poster suggested there would ultimately be 200,000
or more rows in the table.

Cheers,
                    Andrew.
>
> http://archives.postgresql.org/pgsql-novice/2003-02/msg00075.php
>
> - --
> Greg Sabino Mullane greg@turnstep.com
> PGP Key: 0x14964AC8 200302101404
> -----BEGIN PGP SIGNATURE-----
> Comment: http://www.turnstep.com/pgp.html
>
> iD8DBQE+R/jEvJuQZxSWSsgRAsaYAKDOzKuujdaqkvImutA0nl1/YqVNvgCg+Qk2
> lNTYcT/cGyoFKZzd3bVqef4=
> =n1Iq
> -----END PGP SIGNATURE-----
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
--
---------------------------------------------------------------------
Andrew @ Catalyst .Net.NZ Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/         PHYS: Level 2, 150-154 Willis St
DDI: +64(4)916-7201     MOB: +64(21)635-694    OFFICE: +64(4)499-2267
           Survey for nothing with http://survey.net.nz/
---------------------------------------------------------------------

Re:

From

greg@turnstep.com

Date:

10 February 2003, 17:49:52

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>>> Wouldn't there need to be an index on OID for your query to be fast?
>>>
>> No, the single index on the "date" field is enough. See the EXPLAIN
>> output from my first post in the original thread:
>>
> However the original poster suggested there would ultimately be 200,000
> or more rows in the table.

Yes, and my EXPLAIN is on a table with 200,000 rows in it. Hence the fact
that it took over 5 seconds - which is still very fast compared to the
original query.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302101447

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+SAHxvJuQZxSWSsgRAkxTAJ9TnEAjypMGbv8ZRn55yqe/8AZyQgCcCUPm
/dU6kHroxm1XpC0lUrvdY+Y=
=GwmT
-----END PGP SIGNATURE-----

Re:

From

"Scott Morrison"

Date:

10 February 2003, 20:50:13

Well I ran the queries through some more tests and the original query is
still faster than any of the suggested queries.

Query 1 (Original):
ANALYZE sample;
CREATE INDEX sample_id ON sample(id);
CREATE INDEX sample_date ON sample(date);
ANALYZE sample;
EXPLAIN ANALYZE
SELECT * FROM sample a WHERE (id,date) IN
    (SELECT id,max(date) FROM sample
     WHERE id=a.id AND date<='2003-02-07'
     GROUP BY id);
DROP INDEX sample_id;
DROP INDEX sample_date;

Query 2 (Greg):
ANALYZE sample;
CREATE INDEX sample_id ON sample(id);
CREATE INDEX sample_date ON sample(date);
ANALYZE sample;
EXPLAIN ANALYZE
SELECT * FROM sample a WHERE date<='2003-02-07' AND oid=
    (SELECT oid FROM sample WHERE id=a.id AND
    date<='2003-02-07' ORDER BY date DESC LIMIT 1);
DROP INDEX sample_id;
DROP INDEX sample_date;

As for Greg's query, I tried it without an id index and it took a long long
time (I broke it after about 5 minutes, so that's why the id index is added
in there).  I also took out the ORDER BY id because that slowed it down by
another 3 seconds or so, and I don't need it for what I'm querying out.
Below are the analysis reports for both queries.

Query 1 Analysis (Average time 21779.98ms):
Seq Scan on sample a  (cost=0.00..13702.91 rows=3126 width=12) (actual
time=58.67..21770.83 rows=99999 loops=1)
  SubPlan
    ->  Aggregate  (cost=0.00..3.74 rows=1 width=8) (actual time=0.11..0.11
rows=1 loops=180566)
          ->  Group  (cost=0.00..3.74 rows=1 width=8) (actual
time=0.06..0.10 rows=2 loops=180566)
                ->  Index Scan using sample_id on sample  (cost=0.00..3.73
rows=1 width=8) (actual time=0.05..0.08 rows=2 loops=180566)

Query 2 Analysis (Average time 28358.92ms):
Index Scan using sample_date on sample a  (cost=0.00..24294.52 rows=1
width=12) (actual time=0.42..27831.20 rows=99999 loops=1)
  SubPlan
    ->  Limit  (cost=3.84..3.84 rows=1 width=8) (actual time=0.15..0.16
rows=1 loops=160446)
          ->  Sort  (cost=3.84..3.84 rows=1 width=8) (actual time=0.15..0.15
rows=2 loops=160446)
                ->  Index Scan using sample_id on sample  (cost=0.00..3.83
rows=1 width=8) (actual time=0.05..0.07 rows=2 loops=160446)

So in conclusion, it doesn't look like I'll be able to get the query any
faster than my original query.  I was really hoping for something that would
run in 5-10% of the time, but I guess that's not possible.

Thanks for the suggestions.
-- Scott

-----Original Message-----
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org]On Behalf Of greg@turnstep.com
Sent: Monday, February 10, 2003 2:50 PM
To: pgsql-novice@postgresql.org
Subject: Re: [NOVICE]



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


>>> Wouldn't there need to be an index on OID for your query to be fast?
>>>
>> No, the single index on the "date" field is enough. See the EXPLAIN
>> output from my first post in the original thread:
>>
> However the original poster suggested there would ultimately be 200,000
> or more rows in the table.

Yes, and my EXPLAIN is on a table with 200,000 rows in it. Hence the fact
that it took over 5 seconds - which is still very fast compared to the
original query.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302101447

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+SAHxvJuQZxSWSsgRAkxTAJ9TnEAjypMGbv8ZRn55yqe/8AZyQgCcCUPm
/dU6kHroxm1XpC0lUrvdY+Y=
=GwmT
-----END PGP SIGNATURE-----



---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re:

From

Tom Lane

Date:

10 February 2003, 22:41:52

"Scott Morrison" <smorrison@navtechinc.com> writes:
> Query 1 (Original):
> ANALYZE sample;
> CREATE INDEX sample_id ON sample(id);
> CREATE INDEX sample_date ON sample(date);
> ANALYZE sample;
> EXPLAIN ANALYZE
> SELECT * FROM sample a WHERE (id,date) IN
>     (SELECT id,max(date) FROM sample
>      WHERE id=a.id AND date<='2003-02-07'
>      GROUP BY id);

It's got to be possible to improve on that.  The big problem is the IN,
which is a very inefficient construct (and the upcoming 7.4 improvements
won't help any, because they only address cases where the subselect is
uncorrelated, ie, doesn't use variables from the outer query).  But it's
easy to get rid of the IN.  First off, because the sub-select contains
WHERE id = a.id, there is no need to return id from the sub-SELECT; it
must be equal to the outer a.id, no?

SELECT * FROM sample a WHERE date IN
    (SELECT max(date) FROM sample
     WHERE id=a.id AND date<='2003-02-07'
     GROUP BY id);

For the same reason, we do not need GROUP BY in the sub-select, because
only one group can be formed.  This gives us

SELECT * FROM sample a WHERE date IN
    (SELECT max(date) FROM sample
     WHERE id=a.id AND date<='2003-02-07');

and now we can see that the sub-select returns exactly one row, which
means we can reduce IN to "="

SELECT * FROM sample a WHERE date =
    (SELECT max(date) FROM sample
     WHERE id=a.id AND date<='2003-02-07');

and now the sub-select is amenable to the LIMIT-and-index trick for
implementing MAX() quickly:

CREATE INDEX fooi on sample (id, date);

SELECT * FROM sample a WHERE date =
    (SELECT date FROM sample
     WHERE id=a.id AND date<='2003-02-07'
     ORDER BY id DESC, date DESC LIMIT 1);

Now this is a pretty decent subplan --- if you try it you should see
a plan like

 Seq Scan on sample a  (cost=0.00..4330.81 rows=5 width=8)
   Filter: (date = (subplan))
   SubPlan
     ->  Limit  (cost=0.00..4.31 rows=1 width=8)
           ->  Index Scan Backward using fooi on sample  (cost=0.00..7.18 rows=2 width=8)
                 Index Cond: ((id = $0) AND (date <= '2003-02-07'::date))

and there isn't any way to make it quicker: the index is taking us
directly to the single value that we want.  But the overall performance
will still suck, because we are invoking the subplan once for every row
of the table.  And here's where you need to consider rearranging your
schema.  The only reason we need to scan all of the 'sample' table is that
we are using the table rows as the source of potential values of 'id' to
plug into the subquery.  Are you willing to make a side table showing
the allowed values of 'id'?  (You could enforce that it's correct with a
foreign key constraint on 'sample'.)  If you are, then this would work:

SELECT a.* FROM sample a, possible_ids p WHERE p.id = a.id
AND date =
    (SELECT date FROM sample
         WHERE id = p.id AND date <= '2003-02-07'
         ORDER BY id DESC, date DESC LIMIT 1);

Experimenting, I get a plan like

 Nested Loop  (cost=0.00..13477.14 rows=25 width=12)
   ->  Seq Scan on possible_ids p  (cost=0.00..20.00 rows=1000 width=4)
   ->  Index Scan using fooi on sample a  (cost=0.00..9.13 rows=1 width=8)
         Index Cond: (("outer".id = a.id) AND (a.date = (subplan)))
         SubPlan
           ->  Limit  (cost=0.00..4.31 rows=1 width=8)
                 ->  Index Scan Backward using fooi on sample  (cost=0.00..7.18 rows=2 width=8)
                       Index Cond: ((id = $0) AND (date <= '2003-02-07'::date))
           ->  Limit  (cost=0.00..4.31 rows=1 width=8)
                 ->  Index Scan Backward using fooi on sample  (cost=0.00..7.18 rows=2 width=8)
                       Index Cond: ((id = $0) AND (date <= '2003-02-07'::date))

which looks pretty promising.  This should require just one subplan
execution (one index probe) followed by an index probe into the 'sample'
table, for each possible 'id' value.  Assuming you have many fewer
distinct 'id' values than you do sample rows, this should win big.

(Note: the reason we see the subplan listed twice is an arcane
implementation detail; in practice only one of the copies will get
executed in this query.)

            regards, tom lane

Re:

From

greg@turnstep.com

Date:

11 February 2003, 15:50:02

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> Well I ran the queries through some more tests and the original query is
> still faster than any of the suggested queries.

I finally have it figured out - it all depends on the distribution of the
data. I was assuming that there were relatively few unique ids in each
of the 200,000 rows. When this is true, my query runs fast. When there
are few duplicate ids, and the number approaches the total number of
rows, the original query (and Tom's rewrite) works much better. I was
getting the opposite results as you: my query was fast, and yours
would basically time out. When I rearranged the data to provide for
more unique ids, this was reversed. Ohwell :)


One minor tweak's to Tom's final query:

EXPLAIN ANALYZE
SELECT * FROM sample a WHERE date =
    (SELECT date FROM sample
     WHERE id=a.id AND date<='2003-02-07'
     ORDER BY id DESC, date DESC LIMIT 1);

If you know that you are grabbing a certain percentage of the
data, you can add another WHERE clause to speed things up:

EXPLAIN ANALYZE
SELECT * FROM sample a WHERE
date <='2003-02-07' AND date =
 (SELECT date FROM sample
  WHERE id=a.id AND date<='2003-02-07'
  ORDER BY id DESC, date DESC LIMIT 1);

This hits the sample_date index instead of doing a sequential
scan of sample: and is probably faster for more cases. In my tests
the two even out only when we start grabbing most of the rows: all
dates earlier than 02-25, with a even sitribution of dates throughout 2/2003.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302111237

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+STTuvJuQZxSWSsgRAgcyAJ4zuW6G/j2cvvjaPynKRyV7rsih6ACeNpkl
UXerayY3r02qttNs6tTUMiw=
=dlwt
-----END PGP SIGNATURE-----