Thread: Full table scan: 300 million rows

Full table scan: 300 million rows

From

David Jarvis

Date:

14 May 2010, 22:48:02

Hi,

I have the following query:

Select avg(d.amount) AS amount, y.year
From year_ref y
    Join month_ref m
        On m.year_ref_id = y.id
    Join daily d
        On d.month_ref_id = m.id
Where y.year Between 1980 And 2000
    And m.month = 12
    And m.category_id = '001'
    And d.daily_flag_id <> 'M'
    And exists   (
                Select 1
                From station s1
                    inner Join city c1
                        On c1.id = 10663
                Where 6371.009
                      * SQRT(
                            POW(RADIANS(c1.latitude_decimal - s1.latitude_decimal), 2)
                            + (
                                COS(RADIANS(c1.latitude_decimal + s1.latitude_decimal) / 2)
                                * POW(RADIANS(c1.longitude_decimal - s1.longitude_decimal), 2)
                                )
                            ) <= 50
                    And S1.station_district_id = y.station_district_id
                )
Group By y.year

I posted the explain plan as an image (can't figure out how to export it from the pgadmin3):

http://i.imgur.com/m6YIV.png

I have yet to let this query finish.

Any ideas how I can speed it up?

Thank you!

Re: Full table scan: 300 million rows

From

Andreas Kretschmer

Date:

15 May 2010, 04:22:41

David Jarvis <thangalin@gmail.com> wrote:

> Hi,
>
> I have the following query:
>
> Select  avg(d.amount) AS amount,  y.year
> From year_ref y
>     Join month_ref m
>         On m.year_ref_id = y.id
>     Join daily d
>         On d.month_ref_id = m.id
> Where y.year Between 1980 And 2000
>     And m.month = 12
>     And m.category_id = '001'
>     And d.daily_flag_id <> 'M'
>     And exists   (

I think, you have a bad table design: you have splitted a date into
year, month and day, and stored that in different tables.

If i were you, i would use a regular date-field and indexes like:

test=# create table d (d date);
CREATE TABLE
test=*# create index idx_d_year on d(extract (year from d));
CREATE INDEX
test=*# create index idx_d_month on d(extract (month from d));
CREATE INDEX

Your query with this structure:

select ... from table where
  extract(year from d) between 1980 And 2000
  and extract(month from d) = 12
  and daily_flag_id ...

This can use the indexes and avoid the seq-scan. You can also use
table-partitioning and constraint exclusion (for instance, one table per
month)


> http://i.imgur.com/m6YIV.png
>
> I have yet to let this query finish.
>
> Any ideas how I can speed it up?

Do you have an index on daily.daily_flag_id ?


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°