Re: Optimize date query for large child tables: GiST or GIN? - Mailing list pgsql-performance

From Stephen Frost
Subject Re: Optimize date query for large child tables: GiST or GIN?
Date
Msg-id 20100520211919.GP21875@tamriel.snowman.net
Whole thread Raw
In response to Optimize date query for large child tables: GiST or GIN?  (David Jarvis <thangalin@gmail.com>)
Responses Re: Optimize date query for large child tables: GiST or GIN?  (David Jarvis <thangalin@gmail.com>)
List pgsql-performance
* David Jarvis (thangalin@gmail.com) wrote:
> There are 72 child tables, each having a year index and a station index,
> which are defined as follows:

Soooo, my thoughts:

Partition by something that makes sense...  Typically, I'd say that you
would do it by the category id and when the measurement was taken.  Then
set up the appropriate check constraints on that so that PG can use
constraint_exclusion to identify what table it needs to actually go look
in.  How much data are we talking about, by the way? (# of rows)  If
you're not in the milions, partitioning at all is probably overkill and
might be part of the problem here..

create table climate.measurement_12_013 (
    id bigint not null DEFAULT nextval('climate.measurement_id_seq'::regclass),
    station_id integer not null,
    taken date not null,
    amount numeric(8,2) not null,
    category_id integer not null,
    flag varchar(1) not null default ' ',
    check (category_id = 7),
    check (taken >= '1913-12-01' and taken <= '1913-12-31')
    )
    inherits (climate.measurement);

    CREATE INDEX measurement_12_013_s_idx
      ON climate.measurement_12_013
      USING btree
      (station_id);

    CREATE INDEX measurement_12_013_d_idx
      ON climate.measurement_12_013
      USING btree
      (taken);

    SELECT
      count(1) AS measurements,
      avg(m.amount) AS amount
    FROM
      climate.measurement m
    WHERE
      m.station_id IN (
        SELECT
          s.id
        FROM
          climate.station s,
          climate.city c
        WHERE
            /* For one city... */
            c.id = 5182 AND

            /* Where stations are within an elevation range... */
            s.elevation BETWEEN 0 AND 3000 AND

            /* and within a specific radius... */
            -- Seriously, you should be using PostGIS here, that can
            -- then use a GIST index to do this alot faster with a
            -- bounding box...
            6371.009 * SQRT(
              POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
                (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
                  POW(RADIANS(c.longitude_decimal - s.longitude_decimal),
2))
            ) <= 50
        ) AND

      /* Data before 1900 is shaky; insufficient after 2009. */
      -- I have no idea why this is here..  Aren't you forcing
      -- this already in your application code that's checking
      -- user input values?  Also, do you actually *have* any
      -- data outside this range?  If so, just pull out the
      -- tables with that data from the inheiritance
      -- m.taken >= '1900-01-01' AND m.taken <= '2009-12-31'
      -- extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND

      /* Whittled down by category... */
      m.category_id = 1 AND

      /* Between the selected days and years... */
       CASE
         WHEN (user_start_year || user_start_day <= user_stop_year || user_stop) THEN
         m.taken BETWEEN user_start_year || user_start_day  AND user_stop_year || user_stop
         WHEN (user_start_year || user_start_day > user_stop_year || user_stop) THEN
         m.taken BETWEEN (user_start_year || user_start_day)::date  AND
         ((user_stop_year || user_stop)::date + '1
         year'::interval)::date
    -- I don't think you need/want this..?
    -- GROUP BY
    --  extract( YEAR FROM m.taken )

        Enjoy,

            Stephen

Attachment

pgsql-performance by date:

Previous
From: Yeb Havinga
Date:
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Next
From: Stephen Frost
Date:
Subject: Re: Optimize date query for large child tables: GiST or GIN?