Re: very large tables - Mailing list pgsql-general

From Matthew Brand
Subject Re: very large tables
Date
Msg-id 4180c8810905250217o472c55cdxdb2511cc88316dcd@mail.gmail.com
Whole thread Raw
In response to very large tables  (Ramiro Diaz Trepat <ramiro@diaztrepat.name>)
Responses Re: very large tables  (Laurent Wandrebeck <l.wandrebeck@gmail.com>)
List pgsql-general
You might want to try CREATE CLUSTER. I had a 40M row table that was
taking ages to access, I tried partitioning it into 12000 sub-tables,
and obtained a modest speed up. Using CREATE CLUSTER on an
un-partitioned table resulted in an enormous speed up though.

You will need to choose the axis you want to use to index the pages
by, probably "moment". CLUSTER will mean that all of the data for a
given "moment" is stored in adjacent disk locations and can be read
quickly.

On Mon, May 25, 2009 at 9:58 AM, Ramiro Diaz Trepat
<ramiro@diaztrepat.name> wrote:
> Hello list,
> I will try to make this as brief as possible.
> I have a brother who is a scientist studding atmospheric problems.   He was
> trying to handle all of his data with flat files and MatLab, when I stepped
> in and said, wait, you need a RDBMS to handle all this data.
> So, he basically has 2 very simple tables, one describing an atmosphere
> pixel (latitude, longitude and height) and a couple of other simple values.
>  The other table has values of different variables meassured at each pixel.
> The table with the atmosphere pixels, currently has about 140MM records, and
> the one the values about 1000MM records.   They should both grow to about
> twice this size.
> I have not started yet to deal with the largest table, I am only doing test
> with the smaller one (140MM rows), and they much slower than what we were
> expecting.
> A simple query with no joins like this:
> select trajectory from atmospherepoint where moment='1979-05-02 11:45:00'
> and height >= 12000 and height <= 14000 and longitude >= 180 and longitude
> <= 190 and latitude >= 0 and latitude <= 10;
> is taking almost 4 minutes in a decent multi core server.  Also, the moment
> equality test should yield no more than 2MM rows, so I thought that should
> make things a lot faster.
> The explain returns the suspicious "Seq Scan" up front:
> Seq Scan on atmospherepoint  (cost=0.00..5461788.08 rows=134 width=8)
>    Filter: ((moment = '1979-05-02 11:45:00'::timestamp without time zone)
> AND (height >= 12000::double precision) AND (height <= 14000::double
> precision) AND (longitude >= 180::double precision) AND (longitude <=
> 190::double precision) AND (latitude >= 0::double precision) AND (latitude
> <= 10::double precision))
> but I have created indices for every column in the table and I have also
> runned ANALIZE and VACUUM:
>            Table "public.atmospherepoint2"
>    Column   |            Type             | Modifiers
> ------------+-----------------------------+-----------
>  id         | integer                     | not null
>  trajectory | integer                     | not null
>  moment     | timestamp without time zone | not null
>  longitude  | real                        | not null
>  latitude   | real                        | not null
>  height     | real                        | not null
> Indexes:
>     "atmospherepoint2_pkey" PRIMARY KEY, btree (id)
>     "atm_height_idx" btree (height)
>     "atm_latitude_idx" btree (latitude)
>     "atm_longitude_idx" btree (longitude)
>     "atm_moment_idx" btree (moment)
>     "atm_trajectory_idx" btree (trajectory)
> Foreign-key constraints:
>     "atmospherepoint2_trajectory_fkey" FOREIGN KEY (trajectory) REFERENCES
> trajectory2(id)
>
> I will welcome a lot any advice or pointer on how to tune these tables to
> work faster.
> The tables don't change at all once the data has been loaded, they are only
> used for queries.
> Thank you very much.
>
> r.
>
>

pgsql-general by date:

Previous
From: Otandeka Simon Peter
Date:
Subject: Re: very large tables
Next
From: Laurent Wandrebeck
Date:
Subject: Re: very large tables