Re: Strange (?) Index behavior? - Mailing list pgsql-performance

From Joshua D. Drake
Subject Re: Strange (?) Index behavior?
Date
Msg-id 4193DE2B.5060407@commandprompt.com
Whole thread Raw
In response to Re: Strange (?) Index behavior?  (Allen Landsidel <alandsidel@gmail.com>)
List pgsql-performance
>>>-----------------------------------------------------------------------------------------------------------------------------------------------
>>> Index Scan using sname_unique on "testtable"  (cost=0.00..34453.74
>>>rows=8620 width=20) (actual time=77.004..537065.079 rows=74612
>>>loops=1)
>>>   Index Cond: ((sname >= 'AA'::text) AND (sname < 'AB'::text))
>>>   Filter: (sname ~~ 'AA%'::text)
>>> Total runtime: 537477.737 ms
>>>(4 rows)
>>>
>>>Time: 537480.571 ms
>>
>>Nothing you're going to do to the query is going to come up with a more
>>effective plan than this. It's using the index after all. It's never going to
>>be lightning fast because it has to process 75k rows.
>>
>>However 75k rows shouldn't be taking nearly 10 minutes. It should be taking
>>about 10 seconds.

I am confused about this statement. I have a table with 1.77 million
rows that I use gist indexes on (TSearch) and I can pull out of it in
less than 2 seconds.

Are you saying it should be taking 10 seconds because of the type of
plan? 10 seconds seems like an awfullong time for this.

Sincerely,

Joshua D. Drake



>
>
> That's my feeling as well, I thought the index was to blame because it
> will be quite large, possibly large enough to not fit in memory nor be
> quickly bursted up.
>
>
>>The 77ms before finding the first record is a bit suspicious. Have you
>>vacuumed this table regularly? Try a VACUUM FULL VERBOSE, and send the
>>results. You might try to REINDEX it as well, though I doubt that would help.
>
>
> This table is *brand spanking new* for lack of a better term.  I have
> the data for it in a CSV.  I load the CSV up which takes a bit, then
> create the indexes, do a vacuum analyze verbose, and then posted the
> results above.  I don't think running vacuum a more times is going to
> change things, at least not without tweaking config settings that
> affect vacuum. Not a single row has been inserted or altered since the
> initial load.. it's just a test.
>
> I can't give vacuum stats right now because the thing is reloading
> (again) with different newfs settings -- something I figure I have the
> time to fiddle with now, and seldom do at other times.  These numbers
> though don't change much between 8K on up to 64K 'cluster' sizes.  I'm
> trying it now with 8K page sizes, with 8K "minimum fragment" sizes.
> Should speed things up a tiny bit but not enough to really affect this
> query.
>
> Do you still see a need to have the output from the vacuum?
>
>
>>Actually you might consider clustering the table on sname_unique. That would
>>accomplish the same thing as the VACUUM FULL command and also speed up the
>>index scan. And the optimizer knows (if you analyze afterwards) it so it
>>should be more likely to pick the index scan. But currently you have to rerun
>>cluster periodically.
>
>
> Clustering is really unworkable in this situation.  It would work now,
> in this limited test case, but using it if this were to go into
> production is unrealistic.  It would have to happen fairly often since
> this table is updated frequently, which will break the clustering
> quickly with MVCC.
>
> Running it often.. well.. it has 70M+ rows, and the entire table is
> copied, reordered, and rewritten.. so that's a lot of 'scratch space'
> needed.  Finally, clustering locks the table..
>
> Something I'd already considered but quickly ruled out because of
> these reasons..
>
> More ideas are welcome though. ;)
>
> -Allen
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


--
Command Prompt, Inc., home of PostgreSQL Replication, and plPHP.
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Strange (?) Index behavior?
Next
From: Josh Berkus
Date:
Subject: Clarification on two bits on VACUUM FULL VERBOSE output