Thread: max time in a table query takes ages
hey folks
I have a simple query over a fairly simple query here, that scans for max date in a table that's fairly hudge (300M rows). there's index on that field that's being used, but for whatever reason, it takes ages. Ideas ?
select date_trunc('day', max(data)) into dt from staticstats where processed = false
explain analyze:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=3.89..3.90 rows=1 width=0) (actual time=2558459.883..2558459.884 rows=1 loops=1)
InitPlan
-> Limit (cost=0.00..3.89 rows=1 width=8) (actual time=2558362.751..2558362.753 rows=1 loops=1)
-> Index Scan Backward using sstats_date_idx on staticstats (cost=0.00..1566198296.88 rows=402561795 width=8) (actual time=2558362.747..2558362.747 rows=1 loops=1)
Filter: ((data IS NOT NULL) AND (NOT processed))
Total runtime: 2558540.800 ms
(6 rows)
Time: 2558545.012 ms
one thing I am amazed by, is the filter data is not null, well - take a look at the schema here:
staty=> \d+ staticstats
Table "public.staticstats"
Column | Type | Modifiers | Description
-----------+--------------------------------+------------------------------------------------------+-------------
data | timestamp(0) without time zone | not null |
size | integer | not null default 0 |
proto | integer | not null |
macfrom | integer | not null |
macto | integer | not null |
processed | boolean | not null default false |
id | bigint | not null default nextval('sstatic_id_seq'::regclass) |
Indexes:
"blah123s" PRIMARY KEY, btree (macto, data, proto, macfrom)
"sstats_id_idx" UNIQUE, btree (id)
"sstats_date_idx" btree (data)
"staticstat_processed_idxs" btree (processed)
Foreign-key constraints:
"staty_fk1s" FOREIGN KEY (macfrom) REFERENCES macs(id)
"staty_fks" FOREIGN KEY (macto) REFERENCES macs(id)
Has OIDs: no
it takes ms if there's somethign that's been recently added to that table. The table itself is vacuumed/analyzed quite often, and more or less clustered by sstats_date_idx - althrough in that instance, I wasn't able to recluster it - because there's not enough disc space (only 45GB free, and for whatever reason - even tho the table is only about 25GB in size - postgresql requires more than 40GB of space to recluster it).
any hints please ?
--
GJ
Hello try VACUUM and REINDEX regards Pavel Stehule 2008/10/23 Grzegorz Jaśkiewicz <gryzman@gmail.com>: > hey folks > > I have a simple query over a fairly simple query here, that scans for max > date in a table that's fairly hudge (300M rows). there's index on that field > that's being used, but for whatever reason, it takes ages. Ideas ? > > select date_trunc('day', max(data)) into dt from staticstats where > processed = false > > explain analyze: > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Result (cost=3.89..3.90 rows=1 width=0) (actual > time=2558459.883..2558459.884 rows=1 loops=1) > InitPlan > -> Limit (cost=0.00..3.89 rows=1 width=8) (actual > time=2558362.751..2558362.753 rows=1 loops=1) > -> Index Scan Backward using sstats_date_idx on staticstats > (cost=0.00..1566198296.88 rows=402561795 width=8) (actual > time=2558362.747..2558362.747 rows=1 loops=1) > Filter: ((data IS NOT NULL) AND (NOT processed)) > Total runtime: 2558540.800 ms > (6 rows) > > Time: 2558545.012 ms > > one thing I am amazed by, is the filter data is not null, well - take a look > at the schema here: > > staty=> \d+ staticstats > Table "public.staticstats" > Column | Type | Modifiers | Description > -----------+--------------------------------+------------------------------------------------------+------------- > data | timestamp(0) without time zone | not null | > size | integer | not null default 0 | > proto | integer | not null | > macfrom | integer | not null | > macto | integer | not null | > processed | boolean | not null default false | > id | bigint | not null default nextval('sstatic_id_seq'::regclass) | > Indexes: > "blah123s" PRIMARY KEY, btree (macto, data, proto, macfrom) > "sstats_id_idx" UNIQUE, btree (id) > "sstats_date_idx" btree (data) > "staticstat_processed_idxs" btree (processed) > Foreign-key constraints: > "staty_fk1s" FOREIGN KEY (macfrom) REFERENCES macs(id) > "staty_fks" FOREIGN KEY (macto) REFERENCES macs(id) > Has OIDs: no > > it takes ms if there's somethign that's been recently added to that table. > The table itself is vacuumed/analyzed quite often, and more or less > clustered by sstats_date_idx - althrough in that instance, I wasn't able to > recluster it - because there's not enough disc space (only 45GB free, and > for whatever reason - even tho the table is only about 25GB in size - > postgresql requires more than 40GB of space to recluster it). > > any hints please ? > > -- > > GJ >
if reindex will help (which I will run overnight, and will let you know Tomorrow) - I find it quite worrying, cos it is 8.3, and I was hoping - with HOT in place, and all these features - that reindexing of that table wouldn't be needed. it is 'only' 375M rows now, but I can definetively feel pain ppl with much bigger dbs/tables have, if they have to reindex once in a while.
I also have to think about maybe partitioning that table by quarters, or months even, but that's a subject for different set of questions - if I run into any problems here.
"=?UTF-8?Q?Grzegorz_Ja=C5=9Bkiewicz?=" <gryzman@gmail.com> writes: > I have a simple query over a fairly simple query here, that scans for max > date in a table that's fairly hudge (300M rows). there's index on that field > that's being used, but for whatever reason, it takes ages. Ideas ? > select date_trunc('day', max(data)) into dt from staticstats where > processed = false I suppose the problem is that rows with processed = false are very few in the upper range of data. If so, and if you really need this to go fast, a partial index might be worth its overhead: create index foo on staticstats(data) where processed = false; regards, tom lane
I suppose the problem is that rows with processed = false are very few
in the upper range of data. If so, and if you really need this to go
fast, a partial index might be worth its overhead:
create index foo on staticstats(data) where processed = false;
--
GJ
> Can postgres use combined indicies for queries that would only require part of > it ? Even if not, if there is at least one index that reduces the potential matches to a small set, then scanning those rows against the other criteria won't take so long. (Assuming good stats and PG choosing a good plan.) -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
On Friday 24 October 2008, "Grzegorz Jaśkiewicz" <gryzman@gmail.com> wrote: > with two sata discs in software raid 1 on linux. And it seems to spend > loads of time (40-60% sometimes) on waits. I guess this is due to lack of > >aio support in postgresql, No, it's due to the fact that hard disks are slow; much, much slower than CPU or memory. -- Alan