Re: strategies for optimizing read on rather large tables - Mailing list pgsql-performance

From Christopher Kings-Lynne
Subject Re: strategies for optimizing read on rather large tables
Date
Msg-id 42A18DBD.5090904@familyhealth.com.au
Whole thread Raw
In response to strategies for optimizing read on rather large tables  (hubert lubaczewski <hubert.lubaczewski@eo.pl>)
Responses Re: strategies for optimizing read on rather large tables  (hubert lubaczewski <hubert.lubaczewski@eo.pl>)
List pgsql-performance
Without reading too hard, I suggest having a quick look at contrib/ltree
module in the PostgreSQL distribution.  It may or may not help you.

Chris

hubert lubaczewski wrote:
> hi
> first let me draw the outline.
> we have a database which stores "adverts".
> each advert is in one category, and one or more "region".
> regions and categories form (each) tree structure.
> assume category tree:
>
>      a
>     / \
>    b   c
>   / \
>  d   e
>
> if any given advert is in category "e". it means it is also in "b" and
> "a".
> same goes for regions.
>
> as for now we have approx. 400 categories, 1300 regions, and 1000000
> adverts.
>
> since checking always over the tress of categories and regions we
> created acr_cache table (advert/category/region)
> which stores information on all adverts and all categories and regions
> this particular region is in.
> plus some more information for sorting purposes.
>
> this table is ~ 11 milion records.
>
> now.
> we query this in more or less this manner:
>
> select advert_id from acr_cache where category_id = ? and region_id = ?
> order by XXX {asc|desc} limit 20;
>
> where XXX is one of 5 possible fields,
> timestamp,
> timestamp,
> text,
> text,
> numeric
>
> we created index on acr_cache (category_id, region_id)
> and it works rather well.
> usually.
> if a given "crossing" (category + region) has small amount of ads (less
> then 10000) - the query is good enough (up to 300 miliseconds).
> but when we enter the crossings which result in 50000 ads - the query
> takes up to 10 seconds.
> which is almost "forever".
>
> we thought about creating indices like this:
> index on acr_cache (effective_date);
> where effective_dateis on of the timestamp fields.
> it worked well for the crossings with lots of ads, but when we asked for
> small crossing (like 1000 ads) it took > 120 seconds!
> it appears that postgresql was favorizing this new advert instead of
> using much better index on category_id and region_id.
>
> actually - i'm not sure what to do next.
> i am even thinkinh about createing special indices (partial) for big
> crossings, but that's just weird. plus the fact that already the
> acr_cache vacuum time exceeds 3 hours!.
>
>
> any suggestions?
> hardware is dual xeon 3 ghz, 4G ram, hardware scsi raid put into raid 1.
> settings in postgresql.conf:
> listen_addresses = '*'
> port = 5800
> max_connections = 300
> superuser_reserved_connections = 50
> shared_buffers = 131072
> work_mem = 4096
> maintenance_work_mem = 65536
> fsync = false
> commit_delay = 100
> commit_siblings = 5
> checkpoint_segments = 10
> effective_cache_size = 10000
> random_page_cost = 1.1
> log_destination = 'stderr'
> redirect_stderr = true
> log_directory = '/home/pgdba/logs'
> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
> log_truncate_on_rotation = false
> log_rotation_age = 1440
> log_rotation_size = 502400
> log_min_duration_statement = -1
> log_connections = true
> log_duration = true
> log_line_prefix = '[%t] [%p] <%u@%d> '
> log_statement = 'all'
> stats_start_collector = true
> stats_command_string = true
> stats_block_level = true
> stats_row_level = true
> stats_reset_on_server_start = true
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'
>
> actual max numer of connection is 120 plus some administrative connections (psql sessions).
> postgresql version 8.0.2 on linux debian sarge.
>
> best regards,
>
> depesz
>

pgsql-performance by date:

Previous
From:
Date:
Subject: Re: Query plan for very large number of joins
Next
From: PFC
Date:
Subject: Re: strategies for optimizing read on rather large tables