Re: strategies for optimizing read on rather large tables - Mailing list pgsql-performance

From PFC
Subject Re: strategies for optimizing read on rather large tables
Date
Msg-id op.srue0eroth1vuj@localhost
Whole thread Raw
In response to strategies for optimizing read on rather large tables  (hubert lubaczewski <hubert.lubaczewski@eo.pl>)
Responses Re: strategies for optimizing read on rather large tables
Re: strategies for optimizing read on rather large tables
List pgsql-performance

> select advert_id from acr_cache where category_id = ? and region_id = ?
> order by XXX {asc|desc} limit 20;
>
> where XXX is one of 5 possible fields,
> timestamp,
> timestamp,
> text,
> text,
> numeric

    Create 5 indexes on ( category_id, region_id, a field )
    where "a field" is one of your 5 fields.

Then write your query as :

select advert_id from acr_cache where category_id = ? and region_id = ?
order by category_id, region_id, XXX limit 20;

select advert_id from acr_cache where category_id = ? and region_id = ?
order by category_id desc, region_id desc, XXX desc limit 20;

This should put your query down to a millisecond. It will use the index
for the lookup, the sort and the limit, and hence only retrieve 20 rows
for the table. Downside is you  have 5 indexes, but that's not so bad.

If your categories and regions form a tree, you should definitely use a
ltree datatype, which enables indexed operators like "is contained in"
which would probably allow you to reduce the size of your cache table a
lot.



>
> we created index on acr_cache (category_id, region_id)
> and it works rather well.
> usually.
> if a given "crossing" (category + region) has small amount of ads (less
> then 10000) - the query is good enough (up to 300 miliseconds).
> but when we enter the crossings which result in 50000 ads - the query
> takes up to 10 seconds.
> which is almost "forever".
>
> we thought about creating indices like this:
> index on acr_cache (effective_date);
> where effective_dateis on of the timestamp fields.
> it worked well for the crossings with lots of ads, but when we asked for
> small crossing (like 1000 ads) it took > 120 seconds!
> it appears that postgresql was favorizing this new advert instead of
> using much better index on category_id and region_id.
>
> actually - i'm not sure what to do next.
> i am even thinkinh about createing special indices (partial) for big
> crossings, but that's just weird. plus the fact that already the
> acr_cache vacuum time exceeds 3 hours!.
>
>
> any suggestions?
> hardware is dual xeon 3 ghz, 4G ram, hardware scsi raid put into raid 1.
> settings in postgresql.conf:
> listen_addresses = '*'
> port = 5800
> max_connections = 300
> superuser_reserved_connections = 50
> shared_buffers = 131072
> work_mem = 4096
> maintenance_work_mem = 65536
> fsync = false
> commit_delay = 100
> commit_siblings = 5
> checkpoint_segments = 10
> effective_cache_size = 10000
> random_page_cost = 1.1
> log_destination = 'stderr'
> redirect_stderr = true
> log_directory = '/home/pgdba/logs'
> log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
> log_truncate_on_rotation = false
> log_rotation_age = 1440
> log_rotation_size = 502400
> log_min_duration_statement = -1
> log_connections = true
> log_duration = true
> log_line_prefix = '[%t] [%p] <%u@%d> '
> log_statement = 'all'
> stats_start_collector = true
> stats_command_string = true
> stats_block_level = true
> stats_row_level = true
> stats_reset_on_server_start = true
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'
>
> actual max numer of connection is 120 plus some administrative
> connections (psql sessions).
> postgresql version 8.0.2 on linux debian sarge.
>
> best regards,
>
> depesz
>



pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: strategies for optimizing read on rather large tables
Next
From: hubert lubaczewski
Date:
Subject: Re: strategies for optimizing read on rather large tables