Thread: AW: 7.0.2 issues / Geocrawler

AW: 7.0.2 issues / Geocrawler

From
Zeugswetter Andreas SB
Date:
> Your ideas for selecting based on the date are intriguing, however the
> schema of the db was not done with that in mind. Everyone thinks I'm a
> nut when I say this, but the date is stored in a char(14) field in
> gregorian format: 19990101125959

Perfect, that makes it a lot easier:

1. index on (mail_list, mail_date)
2. SELECT mailid, mail_date, mail_is_followup, mail_from, mail_subject    FROM mail_archive WHERE mail_list=35    AND
mail_datebetween '20000100' and '20000199'   ORDER BY mail_list DESC, mail_date DESC LIMIT 26 OFFSET 0;
 

Note the appended 00 and 99 which is generic for all months.

Andreas


Re: AW: 7.0.2 issues / Geocrawler

From
Tim Perdue
Date:
Zeugswetter Andreas SB wrote:
> 
> > Your ideas for selecting based on the date are intriguing, however the
> > schema of the db was not done with that in mind. Everyone thinks I'm a
> > nut when I say this, but the date is stored in a char(14) field in
> > gregorian format: 19990101125959
> 
> Perfect, that makes it a lot easier:
> 
> 1. index on (mail_list, mail_date)
> 2. SELECT mailid, mail_date, mail_is_followup, mail_from, mail_subject
>     FROM mail_archive WHERE mail_list=35
>     AND mail_date between '20000100' and '20000199'
>     ORDER BY mail_list DESC, mail_date DESC LIMIT 26 OFFSET 0;
> 
> Note the appended 00 and 99 which is generic for all months.

shouldn't it be between '20000100000000' and '20000199000000'?

I've never indexed that date column, because it is likely that there are
3 million+ different dates in there - remember 4 million emails sent
over the course of 15 years are likely to have a lot of different dates,
when the hour/minute/second is attached.

You still think that will work?

Tim

-- 
Founder - PHPBuilder.com / Geocrawler.com
Lead Developer - SourceForge
VA Linux Systems
408-542-5723


Re: AW: 7.0.2 issues / Geocrawler

From
Tom Lane
Date:
Tim Perdue <tperdue@valinux.com> writes:
> Zeugswetter Andreas SB wrote:
>> 1. index on (mail_list, mail_date)
>> 2. SELECT mailid, mail_date, mail_is_followup, mail_from, mail_subject
>> FROM mail_archive WHERE mail_list=35
>> AND mail_date between '20000100' and '20000199'
>> ORDER BY mail_list DESC, mail_date DESC LIMIT 26 OFFSET 0;
>> 
>> Note the appended 00 and 99 which is generic for all months.

> shouldn't it be between '20000100000000' and '20000199000000'?

Shouldn't matter, given that this is a char() field and not a numeric...

> I've never indexed that date column, because it is likely that there are
> 3 million+ different dates in there - remember 4 million emails sent
> over the course of 15 years are likely to have a lot of different dates,
> when the hour/minute/second is attached.

What of it?  There will be one index entry per table row in any case.
Actually, btree indexes work a heck of a lot better when there are a lot
of distinct values than when there are many duplicates, so I think you'd
find a index on mail_date to work better than an index on mail_year and
mail_month.

I think Andreas' advice is sound.  I'd still like to understand why 7.0
is slower than 6.5 given the query as posed --- that may reveal
something that needs fixing.  But if you just want to get some work done
I'd suggest trying the arrangement he recommends.
        regards, tom lane