Home > mailing lists

Re: Optimize date query for large child tables: GiST or GIN? - Mailing list pgsql-performance

From	Stephen Frost
Subject	Re: Optimize date query for large child tables: GiST or GIN?
Date	May 20, 2010 21:19:28
Msg-id	20100520211919.GP21875@tamriel.snowman.net Whole thread Raw
In response to	Optimize date query for large child tables: GiST or GIN? (David Jarvis <thangalin@gmail.com>)
Responses	Re: Optimize date query for large child tables: GiST or GIN? (David Jarvis <thangalin@gmail.com>)
List	pgsql-performance

Tree view

* David Jarvis (thangalin@gmail.com) wrote:
> There are 72 child tables, each having a year index and a station index,
> which are defined as follows:

Soooo, my thoughts:

Partition by something that makes sense...  Typically, I'd say that you
would do it by the category id and when the measurement was taken.  Then
set up the appropriate check constraints on that so that PG can use
constraint_exclusion to identify what table it needs to actually go look
in.  How much data are we talking about, by the way? (# of rows)  If
you're not in the milions, partitioning at all is probably overkill and
might be part of the problem here..

create table climate.measurement_12_013 (
    id bigint not null DEFAULT nextval('climate.measurement_id_seq'::regclass),
    station_id integer not null,
    taken date not null,
    amount numeric(8,2) not null,
    category_id integer not null,
    flag varchar(1) not null default ' ',
    check (category_id = 7),
    check (taken >= '1913-12-01' and taken <= '1913-12-31')
    )
    inherits (climate.measurement);

    CREATE INDEX measurement_12_013_s_idx
      ON climate.measurement_12_013
      USING btree
      (station_id);

    CREATE INDEX measurement_12_013_d_idx
      ON climate.measurement_12_013
      USING btree
      (taken);

    SELECT
      count(1) AS measurements,
      avg(m.amount) AS amount
    FROM
      climate.measurement m
    WHERE
      m.station_id IN (
        SELECT
          s.id
        FROM
          climate.station s,
          climate.city c
        WHERE
            /* For one city... */
            c.id = 5182 AND

            /* Where stations are within an elevation range... */
            s.elevation BETWEEN 0 AND 3000 AND

            /* and within a specific radius... */
            -- Seriously, you should be using PostGIS here, that can
            -- then use a GIST index to do this alot faster with a
            -- bounding box...
            6371.009 * SQRT(
              POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
                (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
                  POW(RADIANS(c.longitude_decimal - s.longitude_decimal),
2))
            ) <= 50
        ) AND

      /* Data before 1900 is shaky; insufficient after 2009. */
      -- I have no idea why this is here..  Aren't you forcing
      -- this already in your application code that's checking
      -- user input values?  Also, do you actually *have* any
      -- data outside this range?  If so, just pull out the
      -- tables with that data from the inheiritance
      -- m.taken >= '1900-01-01' AND m.taken <= '2009-12-31'
      -- extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND

      /* Whittled down by category... */
      m.category_id = 1 AND

      /* Between the selected days and years... */
       CASE
         WHEN (user_start_year || user_start_day <= user_stop_year || user_stop) THEN
         m.taken BETWEEN user_start_year || user_start_day  AND user_stop_year || user_stop
         WHEN (user_start_year || user_start_day > user_stop_year || user_stop) THEN
         m.taken BETWEEN (user_start_year || user_start_day)::date  AND
         ((user_stop_year || user_stop)::date + '1
         year'::interval)::date
    -- I don't think you need/want this..?
    -- GROUP BY
    --  extract( YEAR FROM m.taken )

        Enjoy,

            Stephen

Attachment

signature.asc

pgsql-performance by date:

From: Yeb Havinga
Date: 20 May 2010, 21:08:28
Subject: Re: Optimize date query for large child tables: GiST or GIN?

From: Stephen Frost
Date: 20 May 2010, 21:30:37
Subject: Re: Optimize date query for large child tables: GiST or GIN?

Re: Optimize date query for large child tables: GiST or GIN? - Mailing list pgsql-performance

Attachment

Previous

Next