Re: strategies for optimizing read on rather large tables - Mailing list pgsql-performance
From | Christopher Kings-Lynne |
---|---|
Subject | Re: strategies for optimizing read on rather large tables |
Date | |
Msg-id | 42A18DBD.5090904@familyhealth.com.au Whole thread Raw |
In response to | strategies for optimizing read on rather large tables (hubert lubaczewski <hubert.lubaczewski@eo.pl>) |
Responses |
Re: strategies for optimizing read on rather large tables
|
List | pgsql-performance |
Without reading too hard, I suggest having a quick look at contrib/ltree module in the PostgreSQL distribution. It may or may not help you. Chris hubert lubaczewski wrote: > hi > first let me draw the outline. > we have a database which stores "adverts". > each advert is in one category, and one or more "region". > regions and categories form (each) tree structure. > assume category tree: > > a > / \ > b c > / \ > d e > > if any given advert is in category "e". it means it is also in "b" and > "a". > same goes for regions. > > as for now we have approx. 400 categories, 1300 regions, and 1000000 > adverts. > > since checking always over the tress of categories and regions we > created acr_cache table (advert/category/region) > which stores information on all adverts and all categories and regions > this particular region is in. > plus some more information for sorting purposes. > > this table is ~ 11 milion records. > > now. > we query this in more or less this manner: > > select advert_id from acr_cache where category_id = ? and region_id = ? > order by XXX {asc|desc} limit 20; > > where XXX is one of 5 possible fields, > timestamp, > timestamp, > text, > text, > numeric > > we created index on acr_cache (category_id, region_id) > and it works rather well. > usually. > if a given "crossing" (category + region) has small amount of ads (less > then 10000) - the query is good enough (up to 300 miliseconds). > but when we enter the crossings which result in 50000 ads - the query > takes up to 10 seconds. > which is almost "forever". > > we thought about creating indices like this: > index on acr_cache (effective_date); > where effective_dateis on of the timestamp fields. > it worked well for the crossings with lots of ads, but when we asked for > small crossing (like 1000 ads) it took > 120 seconds! > it appears that postgresql was favorizing this new advert instead of > using much better index on category_id and region_id. > > actually - i'm not sure what to do next. > i am even thinkinh about createing special indices (partial) for big > crossings, but that's just weird. plus the fact that already the > acr_cache vacuum time exceeds 3 hours!. > > > any suggestions? > hardware is dual xeon 3 ghz, 4G ram, hardware scsi raid put into raid 1. > settings in postgresql.conf: > listen_addresses = '*' > port = 5800 > max_connections = 300 > superuser_reserved_connections = 50 > shared_buffers = 131072 > work_mem = 4096 > maintenance_work_mem = 65536 > fsync = false > commit_delay = 100 > commit_siblings = 5 > checkpoint_segments = 10 > effective_cache_size = 10000 > random_page_cost = 1.1 > log_destination = 'stderr' > redirect_stderr = true > log_directory = '/home/pgdba/logs' > log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' > log_truncate_on_rotation = false > log_rotation_age = 1440 > log_rotation_size = 502400 > log_min_duration_statement = -1 > log_connections = true > log_duration = true > log_line_prefix = '[%t] [%p] <%u@%d> ' > log_statement = 'all' > stats_start_collector = true > stats_command_string = true > stats_block_level = true > stats_row_level = true > stats_reset_on_server_start = true > lc_messages = 'en_US.UTF-8' > lc_monetary = 'en_US.UTF-8' > lc_numeric = 'en_US.UTF-8' > lc_time = 'en_US.UTF-8' > > actual max numer of connection is 120 plus some administrative connections (psql sessions). > postgresql version 8.0.2 on linux debian sarge. > > best regards, > > depesz >
pgsql-performance by date: