Re: strategies for optimizing read on rather large tables - Mailing list pgsql-performance
From | PFC |
---|---|
Subject | Re: strategies for optimizing read on rather large tables |
Date | |
Msg-id | op.srue0eroth1vuj@localhost Whole thread Raw |
In response to | strategies for optimizing read on rather large tables (hubert lubaczewski <hubert.lubaczewski@eo.pl>) |
Responses |
Re: strategies for optimizing read on rather large tables
Re: strategies for optimizing read on rather large tables |
List | pgsql-performance |
> select advert_id from acr_cache where category_id = ? and region_id = ? > order by XXX {asc|desc} limit 20; > > where XXX is one of 5 possible fields, > timestamp, > timestamp, > text, > text, > numeric Create 5 indexes on ( category_id, region_id, a field ) where "a field" is one of your 5 fields. Then write your query as : select advert_id from acr_cache where category_id = ? and region_id = ? order by category_id, region_id, XXX limit 20; select advert_id from acr_cache where category_id = ? and region_id = ? order by category_id desc, region_id desc, XXX desc limit 20; This should put your query down to a millisecond. It will use the index for the lookup, the sort and the limit, and hence only retrieve 20 rows for the table. Downside is you have 5 indexes, but that's not so bad. If your categories and regions form a tree, you should definitely use a ltree datatype, which enables indexed operators like "is contained in" which would probably allow you to reduce the size of your cache table a lot. > > we created index on acr_cache (category_id, region_id) > and it works rather well. > usually. > if a given "crossing" (category + region) has small amount of ads (less > then 10000) - the query is good enough (up to 300 miliseconds). > but when we enter the crossings which result in 50000 ads - the query > takes up to 10 seconds. > which is almost "forever". > > we thought about creating indices like this: > index on acr_cache (effective_date); > where effective_dateis on of the timestamp fields. > it worked well for the crossings with lots of ads, but when we asked for > small crossing (like 1000 ads) it took > 120 seconds! > it appears that postgresql was favorizing this new advert instead of > using much better index on category_id and region_id. > > actually - i'm not sure what to do next. > i am even thinkinh about createing special indices (partial) for big > crossings, but that's just weird. plus the fact that already the > acr_cache vacuum time exceeds 3 hours!. > > > any suggestions? > hardware is dual xeon 3 ghz, 4G ram, hardware scsi raid put into raid 1. > settings in postgresql.conf: > listen_addresses = '*' > port = 5800 > max_connections = 300 > superuser_reserved_connections = 50 > shared_buffers = 131072 > work_mem = 4096 > maintenance_work_mem = 65536 > fsync = false > commit_delay = 100 > commit_siblings = 5 > checkpoint_segments = 10 > effective_cache_size = 10000 > random_page_cost = 1.1 > log_destination = 'stderr' > redirect_stderr = true > log_directory = '/home/pgdba/logs' > log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' > log_truncate_on_rotation = false > log_rotation_age = 1440 > log_rotation_size = 502400 > log_min_duration_statement = -1 > log_connections = true > log_duration = true > log_line_prefix = '[%t] [%p] <%u@%d> ' > log_statement = 'all' > stats_start_collector = true > stats_command_string = true > stats_block_level = true > stats_row_level = true > stats_reset_on_server_start = true > lc_messages = 'en_US.UTF-8' > lc_monetary = 'en_US.UTF-8' > lc_numeric = 'en_US.UTF-8' > lc_time = 'en_US.UTF-8' > > actual max numer of connection is 120 plus some administrative > connections (psql sessions). > postgresql version 8.0.2 on linux debian sarge. > > best regards, > > depesz >
pgsql-performance by date: