Thread: max time in a table query takes ages

max time in a table query takes ages

From

"Grzegorz Jaśkiewicz"

Date:

23 October 2008, 07:22:48

hey folks

I have a simple query over a fairly simple query here, that scans for max date in a table that's fairly hudge (300M rows). there's index on that field that's being used, but for whatever reason, it takes ages. Ideas ?

select date_trunc('day', max(data)) into dt from staticstats where processed = false

explain analyze:

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.89..3.90 rows=1 width=0) (actual time=2558459.883..2558459.884 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.89 rows=1 width=8) (actual time=2558362.751..2558362.753 rows=1 loops=1)
-> Index Scan Backward using sstats_date_idx on staticstats (cost=0.00..1566198296.88 rows=402561795 width=8) (actual time=2558362.747..2558362.747 rows=1 loops=1)
Filter: ((data IS NOT NULL) AND (NOT processed))
Total runtime: 2558540.800 ms
(6 rows)

Time: 2558545.012 ms

one thing I am amazed by, is the filter data is not null, well - take a look at the schema here:

it takes ms if there's somethign that's been recently added to that table. The table itself is vacuumed/analyzed quite often, and more or less clustered by sstats_date_idx - althrough in that instance, I wasn't able to recluster it - because there's not enough disc space (only 45GB free, and for whatever reason - even tho the table is only about 25GB in size - postgresql requires more than 40GB of space to recluster it).

any hints please ?

--
GJ

Re: max time in a table query takes ages

From

"Pavel Stehule"

Date:

23 October 2008, 07:49:03

Hello

try VACUUM and REINDEX

regards
Pavel Stehule

2008/10/23 Grzegorz Jaśkiewicz <gryzman@gmail.com>:
> hey folks
>
> I have a simple query over a fairly simple query here, that scans for max
> date in a table that's fairly hudge (300M rows). there's index on that field
> that's being used, but for whatever reason, it takes ages. Ideas ?
>
>  select date_trunc('day', max(data)) into dt from staticstats where
> processed = false
>
> explain analyze:
>
>   QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Result (cost=3.89..3.90 rows=1 width=0) (actual
> time=2558459.883..2558459.884 rows=1 loops=1)
>   InitPlan
>   -> Limit (cost=0.00..3.89 rows=1 width=8) (actual
> time=2558362.751..2558362.753 rows=1 loops=1)
>   -> Index Scan Backward using sstats_date_idx on staticstats
> (cost=0.00..1566198296.88 rows=402561795 width=8) (actual
> time=2558362.747..2558362.747 rows=1 loops=1)
>   Filter: ((data IS NOT NULL) AND (NOT processed))
>  Total runtime: 2558540.800 ms
> (6 rows)
>
> Time: 2558545.012 ms
>
> one thing I am amazed by, is the filter data is not null, well - take a look
> at the schema here:
>
> staty=> \d+ staticstats
>   Table "public.staticstats"
>   Column | Type | Modifiers | Description
> -----------+--------------------------------+------------------------------------------------------+-------------
>  data | timestamp(0) without time zone | not null |
>  size | integer | not null default 0 |
>  proto | integer | not null |
>  macfrom | integer | not null |
>  macto | integer | not null |
>  processed | boolean | not null default false |
>  id | bigint | not null default nextval('sstatic_id_seq'::regclass) |
> Indexes:
>   "blah123s" PRIMARY KEY, btree (macto, data, proto, macfrom)
>   "sstats_id_idx" UNIQUE, btree (id)
>   "sstats_date_idx" btree (data)
>   "staticstat_processed_idxs" btree (processed)
> Foreign-key constraints:
>   "staty_fk1s" FOREIGN KEY (macfrom) REFERENCES macs(id)
>   "staty_fks" FOREIGN KEY (macto) REFERENCES macs(id)
> Has OIDs: no
>
> it takes ms if there's somethign that's been recently added to that table.
> The table itself is vacuumed/analyzed quite often, and more or less
> clustered by sstats_date_idx - althrough in that instance, I wasn't able to
> recluster it - because there's not enough disc space (only 45GB free, and
> for whatever reason - even tho the table is only about 25GB in size -
> postgresql requires more than 40GB of space to recluster it).
>
> any hints please ?
>
> --
>
> GJ
>

Re: max time in a table query takes ages

From

"Grzegorz Jaśkiewicz"

Date:

23 October 2008, 08:48:42

if reindex will help (which I will run overnight, and will let you know Tomorrow) - I find it quite worrying, cos it is 8.3, and I was hoping - with HOT in place, and all these features - that reindexing of that table wouldn't be needed. it is 'only' 375M rows now, but I can definetively feel pain ppl with much bigger dbs/tables have, if they have to reindex once in a while.

I also have to think about maybe partitioning that table by quarters, or months even, but that's a subject for different set of questions - if I run into any problems here.

Re: max time in a table query takes ages

From

Tom Lane

Date:

23 October 2008, 10:48:04

"=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?=" <gryzman@gmail.com> writes:
> I have a simple query over a fairly simple query here, that scans for max
> date in a table that's fairly hudge (300M rows). there's index on that field
> that's being used, but for whatever reason, it takes ages. Ideas ?

>  select date_trunc('day', max(data)) into dt from staticstats where
> processed = false

I suppose the problem is that rows with processed = false are very few
in the upper range of data.  If so, and if you really need this to go
fast, a partial index might be worth its overhead:
    create index foo on staticstats(data) where processed = false;

            regards, tom lane

Re: max time in a table query takes ages

From

"Grzegorz Jaśkiewicz"

Date:

23 October 2008, 10:52:39

On Thu, Oct 23, 2008 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I suppose the problem is that rows with processed = false are very few
in the upper range of data. If so, and if you really need this to go
fast, a partial index might be worth its overhead:
create index foo on staticstats(data) where processed = false;

I mostly scan that table for processed=false. Everything with processed = true is really useless, and there to guard other tables against duplication (all other tables contain pretty much the same data, but information about relation isn't stored - because whole structure is quite complicated, and I need to retrieve data from other tables in matter of miliseconds - I get live graphs plotted based on huge range of input params, no chance to pre-render it).

I also thought about create index bar staticstats(data, processed) where processed = false;

but that would be useful only to that query.

Can postgres use combined indicies for queries that would only require part of it ?

iow, do I have to create index for every possible combination that can be used in a query ? what's the best way to do it ?

--
GJ

Re: max time in a table query takes ages

From

Scott Ribe

Date:

24 October 2008, 14:19:14

> Can postgres use combined indicies for queries that would only require part of
> it ?

Even if not, if there is at least one index that reduces the potential
matches to a small set, then scanning those rows against the other criteria
won't take so long. (Assuming good stats and PG choosing a good plan.)

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice

Re: max time in a table query takes ages

From

"Grzegorz Jaśkiewicz"

Date:

24 October 2008, 16:08:42

that index did the job, also reindexing, and getting rid of two other not quite often used indices helped a lot.

Now, this whole machine is fairly simple two way p4, with two sata discs in software raid 1 on linux. And it seems to spend loads of time (40-60% sometimes) on waits. I guess this is due to lack of aio support in postgresql, but still its times better from what it used to be on 8.1 (the machine is running 8.3).

Re: max time in a table query takes ages

From

Alan Hodgson

Date:

24 October 2008, 17:36:44

On Friday 24 October 2008, "Grzegorz Jaśkiewicz" <gryzman@gmail.com> wrote:
> with two sata discs in software raid 1 on linux. And it seems to spend
> loads of time (40-60% sometimes) on waits.  I guess this is due to lack of
> >aio support in postgresql,

No, it's due to the fact that hard disks are slow; much, much slower than
CPU or memory.

--
Alan