Re: AW: 7.0.2 issues / Geocrawler - Mailing list pgsql-hackers

From Tom Lane
Subject Re: AW: 7.0.2 issues / Geocrawler
Date
Msg-id 24447.963428496@sss.pgh.pa.us
Whole thread Raw
In response to Re: AW: 7.0.2 issues / Geocrawler  (Tim Perdue <tperdue@valinux.com>)
List pgsql-hackers
Tim Perdue <tperdue@valinux.com> writes:
> Zeugswetter Andreas SB wrote:
>> 1. index on (mail_list, mail_date)
>> 2. SELECT mailid, mail_date, mail_is_followup, mail_from, mail_subject
>> FROM mail_archive WHERE mail_list=35
>> AND mail_date between '20000100' and '20000199'
>> ORDER BY mail_list DESC, mail_date DESC LIMIT 26 OFFSET 0;
>> 
>> Note the appended 00 and 99 which is generic for all months.

> shouldn't it be between '20000100000000' and '20000199000000'?

Shouldn't matter, given that this is a char() field and not a numeric...

> I've never indexed that date column, because it is likely that there are
> 3 million+ different dates in there - remember 4 million emails sent
> over the course of 15 years are likely to have a lot of different dates,
> when the hour/minute/second is attached.

What of it?  There will be one index entry per table row in any case.
Actually, btree indexes work a heck of a lot better when there are a lot
of distinct values than when there are many duplicates, so I think you'd
find a index on mail_date to work better than an index on mail_year and
mail_month.

I think Andreas' advice is sound.  I'd still like to understand why 7.0
is slower than 6.5 given the query as posed --- that may reveal
something that needs fixing.  But if you just want to get some work done
I'd suggest trying the arrangement he recommends.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Serious Performance Loss in 7.0.2??
Next
From: Tim Perdue
Date:
Subject: Re: Serious Performance Loss in 7.0.2??