Re: 7.0.2 issues / Geocrawler - Mailing list pgsql-hackers
From | Ross J. Reedstrom |
---|---|
Subject | Re: 7.0.2 issues / Geocrawler |
Date | |
Msg-id | 20000712113644.B1487@rice.edu Whole thread Raw |
In response to | Re: 7.0.2 issues / Geocrawler (Tim Perdue <tperdue@valinux.com>) |
Responses |
Re: 7.0.2 issues / Geocrawler
|
List | pgsql-hackers |
On Wed, Jul 12, 2000 at 08:14:29AM -0700, Tim Perdue wrote: > "Ross J. Reedstrom" wrote: > > Tim, why are you building a multikey index, especially one containing a > > large text field? It's almost never a win to index a text field, unless > > This is not a key on a text field. > Ah, I see, I had merged the two problems you reported together. I see now that the 'can't create index' problem was on a different index. Mike Mascari gave you a detailed answer to that, which you seemd to just blow off, based on you guesstimate that it would run too long: > This is a *big* help. > > Yes, the table is approx 10-12GB in size and running your length() and > update queries is going to take a lifetime, since it will require a > calculation on 4 million rows. Mike mentioned that he's run similar index creations on 2 million rows, and it took 5-10 minutes. I reiterate: you've got a long subject that tripped a bug in index creation in postgresql versions < 7.0. Give his solution a try. It's a 'clean it up once' sort of thing: I don't think anyone's going to complain about the subject getting trimmed at ~ 2k. > The keys are: > > mail_list (example, the PHP mailing list=1) > mail_year (1999) > mail_month (July=7) > > Yes it is a multi-key index, and the matches are exact. > Right, as your explain output showed: the planner is picking this index and using it. I'd guess that your time is getting lost in the sort step. I seem to recall that Tom reworked the sort code as well, to reduce the size of temporary sort files: perhaps you've found a corner case that is much slower. Do you still have the 6.X install available? EXPLAIN output from that would be useful. > Someone else asked why I have separated these fields out from the > mail_date. > > If I didn't, and I wanted to see the messages for this month, I'd have > to regex and that would overwhelm the database. That's what the date_part function is for: reedstrm=# select now(); now ------------------------2000-07-12 11:03:11-05 (1 row) reedstrm=# select date_part('month', now());date_part ----------- 7 (1 row) reedstrm=# select date_part('year', now());date_part ----------- 2000 (1 row) So your query would look like: SELECT mailid, mail_date, mail_is_followup, mail_from, mail_subject FROM mail_archive WHERE mail_list=35 AND date_part('year',mail_date)=2000 AND date_part('month',mail_date)=1 ORDER BY mail_date DESC LIMIT 26 OFFSET 0; You can even build functional indices. However, since you're selecting and sorting based on the same attribute, the time of the message, it should be possible to build an index on mail_date, and construct a SELECT that uses it for ordering as well as limiting the tuples returned. You're generating the queries programmatically, from a scripting language, right? So, the best thing would be if you could create a query that looks something like: SELECT mailid, mail_date, mail_is_followup, mail_from, mail_subject FROM mail_archive WHERE mail_list=35 AND mail_date >='January 1, 2000' AND mail_date < 'February 1, 2000' ORDER BY mail_date DESC LIMIT 26 OFFSET 0; With an index on mail_date, that should do a single index scan, returning the first 26, and stop. I'd bet a lot that it's the sort that's killing you, since the backend has to retrieve the entire result set and sort it to be sure it returns the first 26. You might be able to use a two key index, on mail_date, mailid. I think you have to be careful to put key you want sorted output on first, to ensure that the index order is presorted, and the planner know it. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
pgsql-hackers by date: