Re: sum of all values - Mailing list pgsql-performance

From Richard Huxton
Subject Re: sum of all values
Date
Msg-id 41E7934E.2080509@archonet.com
Whole thread Raw
In response to sum of all values  (Madison Kelly <linux@alteeve.com>)
Responses Re: sum of all values  (Madison Kelly <linux@alteeve.com>)
List pgsql-performance
Madison Kelly wrote:
> Hi all,
>
>   Is there a fast(er) way to get the sum of all integer values for a
> certain condition over many thousands of rows? What I am currently doing
> is this (which takes ~5-10sec.):

OK, I'm assuming you've configured PG to your satisfaction and this is
the only query giving you problems.

> SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE
> a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND
> a.file_type=b.fs_type AND b.fs_backup='t';

You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output of
that, although the query looks straightforward enough.

>   Here are the schemas, in case they help:
>
> tle-bu=> \d file_info_1                Table "public.file_info_1"
>      Column      |  Type   |         Modifiers
> -----------------+---------+----------------------------
>  file_acc_time   | bigint  | not null
>  file_group_name | text    | not null
>  file_group_uid  | integer | not null
>  file_mod_time   | bigint  | not null
>  file_name       | text    | not null
>  file_parent_dir | text    | not null
>  file_perm       | text    | not null
>  file_size       | bigint  | not null
>  file_type       | text    | not null default 'f'::text
>  file_user_name  | text    | not null
>  file_user_uid   | integer | not null
> Indexes:
>     "file_info_1_display_idx" btree (file_parent_dir, file_name, file_type)
>     "file_info_1_search_idx" btree (file_parent_dir, file_name, file_type)
>
> tle-bu=> \d file_set_1               Table "public.file_set_1"
>     Column     |  Type   |         Modifiers
> ---------------+---------+----------------------------
>  fs_backup     | boolean | not null default true
>  fs_display    | boolean | not null default false
>  fs_name       | text    | not null
>  fs_parent_dir | text    | not null
>  fs_restore    | boolean | not null default false
>  fs_type       | text    | not null default 'f'::text
> Indexes:
>     "file_set_1_sync_idx" btree (fs_parent_dir, fs_name, fs_type)

1. WHERE ARE YOUR PRIMARY KEYS???
2. Why do you have two identical indexes on file_info_1
3. WHERE ARE YOUR PRIMARY KEYS???
4. Am I right in thinking that always, file_name==fs_name (i.e. they
represent the same piece of information) and if so, why are you storing
it twice? Same for _parent_dir too
5. file_type/fs_type are being held as unbounded text? Not an index into
some lookup table or a varchar(N)?

Can you explain what you're trying to do here - it might be you want to
alter your database design.
--
   Richard Huxton
   Archonet Ltd

pgsql-performance by date:

Previous
From: "Benjamin Wragg"
Date:
Subject: Re: which dual-CPU hardware/OS is fastest for PostgreSQL?
Next
From: sarlav kumar
Date:
Subject: query optimization help