Thread: PG8.2.1 choosing slow seqscan over idx scan
Running PostgreSQL 8.2.1 on Win32. The query planner is choosing a seq scan over index scan even though index scan is faster (as shown by disabling seqscan). Table is recently analyzed and row count estimates seem to be in the ballpark. Another tidbit - I haven't done a "vacuum full" ever, although I "vacuum analyze" regularly (and autovacuum). I recently noticed that the PG data drive is 40% fragmented (NTFS). Could that be making the seqscan slower than it should be? Regardless of the fragmentations affect on performance, is the query planner making a good decision here? SOME CONFIGURATION PARAMS effective_cache_size=1000MB random_page_cost=3 default_statistics_target=50 shared_buffers=400MB temp_buffers=10MB work_mem=10MB checkpoint_segments=12 QUERY select merchant_dim_id, dcms_dim_id, sum(success) as num_success, sum(failed) as num_failed, count(*) as total_transactions, (sum(success) * 1.0 / count(*)) as success_rate from transaction_facts where transaction_date >= '2007-1-16' and transaction_date < '2007-1-16 15:20' group by merchant_dim_id, dcms_dim_id; EXPLAIN ANALYZE (enable_seqscan=true) HashAggregate (cost=339573.01..340089.89 rows=15904 width=16) (actual time=140606.593..140650.573 rows=10549 loops=1) -> Seq Scan on transaction_facts (cost=0.00..333928.25 rows=322558 width=16) (actual time=19917.957..140036.910 rows=347434 loops=1) Filter: ((transaction_date >= '2007-01-16 00:00:00'::timestamp without time zone) AND (transaction_date < '2007-01-16 15:20:00'::timestamp without time zone)) Total runtime: 140654.813 ms EXPLAIN ANALYZE (enable_seqscan=false) HashAggregate (cost=379141.53..379658.41 rows=15904 width=16) (actual time=3720.838..3803.748 rows=10549 loops=1) -> Bitmap Heap Scan on transaction_facts (cost=84481.80..373496.76 rows=322558 width=16) (actual time=244.568..3133.741 rows=347434 loops=1) Recheck Cond: ((transaction_date >= '2007-01-16 00:00:00'::timestamp without time zone) AND (transaction_date < '2007-01-16 15:20:00'::timestamp without time zone)) -> Bitmap Index Scan on transaction_facts_transaction_date_idx (cost=0.00..84401.16 rows=322558 width=0) (actual time=241.994..241.994 rows=347434 loops=1) Index Cond: ((transaction_date >= '2007-01-16 00:00:00'::timestamp without time zone) AND (transaction_date < '2007-01-16 15:20:00'::timestamp without time zone)) Total runtime: 3810.795 ms
"Jeremy Haile" <jhaile@fastmail.fm> writes: > Running PostgreSQL 8.2.1 on Win32. The query planner is choosing a seq > scan over index scan even though index scan is faster (as shown by > disabling seqscan). Table is recently analyzed and row count estimates > seem to be in the ballpark. Try reducing random_page_cost a bit. Keep in mind that you are probably measuring a fully-cached situation here, if you repeated the test case. If your database fits into memory reasonably well then that's fine and you want to optimize for that case ... but otherwise you may find yourself pessimizing the actual behavior. regards, tom lane
Thanks Tom! Reducing random_page_cost to 2 did the trick for this query. It now favors the index scan. Even if this is a cached situation, I wouldn't expect a difference of 3 min vs 3 seconds. Even if unrelated, do you think disk fragmentation would have negative effects? Is it worth trying to defragment the drive on a regular basis in Windows? Jeremy Haile On Tue, 16 Jan 2007 16:39:07 -0500, "Tom Lane" <tgl@sss.pgh.pa.us> said: > "Jeremy Haile" <jhaile@fastmail.fm> writes: > > Running PostgreSQL 8.2.1 on Win32. The query planner is choosing a seq > > scan over index scan even though index scan is faster (as shown by > > disabling seqscan). Table is recently analyzed and row count estimates > > seem to be in the ballpark. > > Try reducing random_page_cost a bit. Keep in mind that you are probably > measuring a fully-cached situation here, if you repeated the test case. > If your database fits into memory reasonably well then that's fine and > you want to optimize for that case ... but otherwise you may find > yourself pessimizing the actual behavior. > > regards, tom lane
On 1/16/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
Out of curiosity, is this table heavily updated or deleted from? Perhaps there is an unfavorable "correlation" between the btree and data? Can you dump the results of
select attname, null_frac, avg_width, n_distinct, correlation from pg_stats where tablename = 'transaction_facts'
--
Chad
http://www.postgresqlforums.com/
Even if unrelated, do you think disk fragmentation would have negative
effects? Is it worth trying to defragment the drive on a regular basis
in Windows?
Out of curiosity, is this table heavily updated or deleted from? Perhaps there is an unfavorable "correlation" between the btree and data? Can you dump the results of
select attname, null_frac, avg_width, n_distinct, correlation from pg_stats where tablename = 'transaction_facts'
--
Chad
http://www.postgresqlforums.com/
Hey Chad, The table is heavily inserted and deleted from. Recently I had done a very large delete. Here is the results of the query you sent me: (sorry it's hard to read) "dcms_dim_id";0;4;755;-0.00676181 "transaction_fact_id";0;4;-1;-0.194694 "failed";0;4;2;0.964946 "van16";0;23;145866;0.00978649 "vendor_response";0.9942;43;9;0.166527 "transaction_id";0;4;-1;-0.199583 "transaction_date";0;8;172593;-0.194848 "serial_number";0.0434667;16;53311;0.0713039 "merchant_dim_id";0;4;105;0.299335 "comment";0.0052;29;7885;0.0219167 "archived";0;1;2;0.84623 "response_code";0.9942;4;3;0.905409 "transaction_source";0;4;2;0.983851 "location_dim_id";0;4;86;0.985384 "success";0;4;2;0.981072 Just curious - what does that tell us? Jeremy Haile On Tue, 16 Jan 2007 17:44:53 -0500, "Chad Wagner" <chad.wagner@gmail.com> said: > On 1/16/07, Jeremy Haile <jhaile@fastmail.fm> wrote: > > > > Even if unrelated, do you think disk fragmentation would have negative > > effects? Is it worth trying to defragment the drive on a regular basis > > in Windows? > > > > Out of curiosity, is this table heavily updated or deleted from? Perhaps > there is an unfavorable "correlation" between the btree and data? Can > you > dump the results of > > select attname, null_frac, avg_width, n_distinct, correlation from > pg_stats > where tablename = 'transaction_facts' > > > > > -- > Chad > http://www.postgresqlforums.com/
On 1/16/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
That's what I suspected.
Based on the explain plan you posted earlier we learned the optimizer believed the query would return 322558 rows (and it was reasonably accurate, too) for a relatively small time frame (15 hours and 20 minutes).
-> Seq Scan on transaction_facts (cost=0.00..333928.25 rows=322558
width=16) (actual time=19917.957..140036.910 rows=347434 loops=1)
Based on the information you just posted, the average row length is 156 bytes.
347434 rows * 156 bytes = 52MB (reasonable it could be held in your shared buffers, which makes Tom's suggestion very plausible, the index scan may not be cheaper -- because it is all cached)
The estimation of cost when you are requesting a "range" of data will involve the "correlation" factor, the correlation is defined as:
"Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a < operator.)"
Which means that as correlation approaches zero (which it is -0.19, I would call that zero) it represents that the physical ordering of the data (in the data files) is such that a range scan of the btree index would result in many scatter reads (which are slow). So the optimizer considers whether a "scattered" read will be cheaper than a sequential scan based on a few factors: a) correlation [for ranges] b) size of table c) estimated cardinality [what does it think you are going to get back]. Based on those factors it may choose either access path (sequential or index).
One of the reasons why the sequential scan is slower is because the optimizer doesn't know the data you are requesting is sitting in the cache (and it is VERY unlikely you have the entire table in cache, unless it is a heavily used table or very small table, which it's probably not). So once it decides a sequential scan is the best plan, it starts chugging away at the disk and pulling in most of the pages in the table and throws away the pages that do not meet your criteria.
The index scan is quicker (may be bogus, as Tom suggested) because the it starts chugging away at the index and finds that many of the pages you are interested in are cached (but it didn't know, you just got lucky!).
In practice, once you start pulling in 15% or more of the table it is often cheaper just to read the entire table in, rather than scatter reads + double I/O. Remember that an index access means I have to read the index PLUS the table from disk, and a sequential scan just throws away the index and reads the table from disk.
I would suggest running explain analyze after restarting the database (that may not be even good enough, because a large portion of the data file may be in the OS's buffer cache, hrm -- reboot? unmount?) and see how cheap that index access path is.
One thing to be careful of here is that you really need to consider what is the primary use of the table, and what are the major queries you will be launching against it. But you could improve the correlation by rebuilding the table ordered by the transaction_date column, but it may screw up other range scans. Another option is partitioning. I wouldn't do any of this stuff, until you find out the last tweak you made still holds true, give it a few days, perhaps test it after a clean reboot of the server.
Let me know if any of this is inaccurate folks, as I certainly don't profess to be an expert on the internals of PostgreSQL, but I believe it is accurate based on my prior experiences with other database products. :)
--
Chad
http://www.postgresqlforums.com/
The table is heavily inserted and deleted from. Recently I had done a
very large delete.
That's what I suspected.
Here is the results of the query you sent me: (sorry it's hard to read)
"transaction_date";0;8;172593;-0.194848
Just curious - what does that tell us?
Based on the explain plan you posted earlier we learned the optimizer believed the query would return 322558 rows (and it was reasonably accurate, too) for a relatively small time frame (15 hours and 20 minutes).
-> Seq Scan on transaction_facts (cost=0.00..333928.25 rows=322558
width=16) (actual time=19917.957..140036.910 rows=347434 loops=1)
Based on the information you just posted, the average row length is 156 bytes.
347434 rows * 156 bytes = 52MB (reasonable it could be held in your shared buffers, which makes Tom's suggestion very plausible, the index scan may not be cheaper -- because it is all cached)
The estimation of cost when you are requesting a "range" of data will involve the "correlation" factor, the correlation is defined as:
"Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a < operator.)"
Which means that as correlation approaches zero (which it is -0.19, I would call that zero) it represents that the physical ordering of the data (in the data files) is such that a range scan of the btree index would result in many scatter reads (which are slow). So the optimizer considers whether a "scattered" read will be cheaper than a sequential scan based on a few factors: a) correlation [for ranges] b) size of table c) estimated cardinality [what does it think you are going to get back]. Based on those factors it may choose either access path (sequential or index).
One of the reasons why the sequential scan is slower is because the optimizer doesn't know the data you are requesting is sitting in the cache (and it is VERY unlikely you have the entire table in cache, unless it is a heavily used table or very small table, which it's probably not). So once it decides a sequential scan is the best plan, it starts chugging away at the disk and pulling in most of the pages in the table and throws away the pages that do not meet your criteria.
The index scan is quicker (may be bogus, as Tom suggested) because the it starts chugging away at the index and finds that many of the pages you are interested in are cached (but it didn't know, you just got lucky!).
In practice, once you start pulling in 15% or more of the table it is often cheaper just to read the entire table in, rather than scatter reads + double I/O. Remember that an index access means I have to read the index PLUS the table from disk, and a sequential scan just throws away the index and reads the table from disk.
I would suggest running explain analyze after restarting the database (that may not be even good enough, because a large portion of the data file may be in the OS's buffer cache, hrm -- reboot? unmount?) and see how cheap that index access path is.
One thing to be careful of here is that you really need to consider what is the primary use of the table, and what are the major queries you will be launching against it. But you could improve the correlation by rebuilding the table ordered by the transaction_date column, but it may screw up other range scans. Another option is partitioning. I wouldn't do any of this stuff, until you find out the last tweak you made still holds true, give it a few days, perhaps test it after a clean reboot of the server.
Let me know if any of this is inaccurate folks, as I certainly don't profess to be an expert on the internals of PostgreSQL, but I believe it is accurate based on my prior experiences with other database products. :)
--
Chad
http://www.postgresqlforums.com/
On Tue, 2007-01-16 at 21:58 -0500, Jeremy Haile wrote: > Hey Chad, > > The table is heavily inserted and deleted from. Recently I had done a > very large delete. I still keep wondering if this table is bloated with dead tuples. Even if you vacuum often if there's a connection with an idle transaction, the tuples can't be reclaimed and the table would continue to grow. Anyway, what does vacuum analyze tablename say (where tablename is, of course, the name of the table we're looking at)? Pay particular attention to DETAIL statements. Assuming the table's NOT bloated, you may do well to increase the effective_cache_size, which doesn't allocate anything, but just tells the query planner about how big your operating systems file cache is as regards postgresql. It's a bit of a course setting, i.e. you can make rather large changes to it without blowing things up. If you've got a couple gigs on your machine, try setting it to something like 512MB or so. If your table is bloating, and you don't have idle transactions hanging of the database, it could be that your fsm settings are too low.
> I still keep wondering if this table is bloated with dead tuples. Even > if you vacuum often if there's a connection with an idle transaction, > the tuples can't be reclaimed and the table would continue to grow. I used to vacuum once an hour, although I've switched it to autovacuum now. It definitely could be bloated with dead tuples. I'll paste the "vacuum analyze verbose" output at the bottom of this e-mail. Would a vacuum full be a good idea? > Assuming the table's NOT bloated, you may do well to increase the > effective_cache_size, which doesn't allocate anything, <snip> > try setting it to something like 512MB or so. It's currently set to 1000MB. > If your table is bloating, and you don't have idle transactions hanging > of the database, it could be that your fsm settings are too low. fsm is currently set to 2000000. Is there any harm in setting it too high? =) Here's the vacuum analyze verbose output: INFO: vacuuming "public.transaction_facts" INFO: scanned index "transaction_facts_pkey" to remove 759969 row versions DETAIL: CPU 7.20s/2.31u sec elapsed 315.31 sec. INFO: scanned index "transaction_facts_dcms_dim_id_idx" to remove 759969 row versions DETAIL: CPU 1.29s/2.15u sec elapsed 146.98 sec. INFO: scanned index "transaction_facts_merchant_dim_id_idx" to remove 759969 row versions DETAIL: CPU 1.10s/2.10u sec elapsed 126.09 sec. INFO: scanned index "transaction_facts_transaction_date_idx" to remove 759969 row versions DETAIL: CPU 1.65s/2.40u sec elapsed 259.25 sec. INFO: scanned index "transaction_facts_transaction_id_idx" to remove 759969 row versions DETAIL: CPU 7.48s/2.85u sec elapsed 371.98 sec. INFO: scanned index "transaction_facts_product_date_idx" to remove 759969 row versions DETAIL: CPU 2.32s/2.10u sec elapsed 303.83 sec. INFO: scanned index "transaction_facts_merchant_product_date_idx" to remove 759969 row versions DETAIL: CPU 2.48s/2.31u sec elapsed 295.19 sec. INFO: scanned index "transaction_facts_merchant_date_idx" to remove 759969 row versions DETAIL: CPU 8.10s/3.35u sec elapsed 398.73 sec. INFO: scanned index "transaction_facts_success_idx" to remove 759969 row versions DETAIL: CPU 5.01s/2.84u sec elapsed 192.73 sec. INFO: scanned index "transaction_facts_failed_idx" to remove 759969 row versions DETAIL: CPU 1.03s/1.90u sec elapsed 123.00 sec. INFO: scanned index "transaction_facts_archived_idx" to remove 759969 row versions DETAIL: CPU 1.03s/1.39u sec elapsed 104.42 sec. INFO: scanned index "transaction_facts_response_code_idx" to remove 759969 row versions DETAIL: CPU 0.75s/2.17u sec elapsed 36.71 sec. INFO: scanned index "transaction_facts_transaction_source_idx" to remove 759969 row versions DETAIL: CPU 0.60s/1.75u sec elapsed 42.29 sec. INFO: scanned index "transaction_facts_transaction_id_source_idx" to remove 759969 row versions DETAIL: CPU 1.14s/1.84u sec elapsed 44.75 sec. INFO: "transaction_facts": removed 759969 row versions in 14360 pages DETAIL: CPU 0.57s/0.23u sec elapsed 45.28 sec. INFO: index "transaction_facts_pkey" now contains 2274280 row versions in 152872 pages DETAIL: 759969 index row versions were removed. 134813 index pages have been deleted, 134813 are currently reusable. CPU 0.00s/0.01u sec elapsed 0.01 sec. INFO: index "transaction_facts_dcms_dim_id_idx" now contains 2274280 row versions in 85725 pages DETAIL: 759323 index row versions were removed. 75705 index pages have been deleted, 73721 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_merchant_dim_id_idx" now contains 2274280 row versions in 80023 pages DETAIL: 759969 index row versions were removed. 71588 index pages have been deleted, 69210 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_transaction_date_idx" now contains 2274280 row versions in 144196 pages DETAIL: 759969 index row versions were removed. 126451 index pages have been deleted, 126451 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_transaction_id_idx" now contains 2274280 row versions in 150529 pages DETAIL: 759969 index row versions were removed. 130649 index pages have been deleted, 130649 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_product_date_idx" now contains 2274280 row versions in 202248 pages DETAIL: 759969 index row versions were removed. 174652 index pages have been deleted, 174652 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_merchant_product_date_idx" now contains 2274280 row versions in 202997 pages DETAIL: 759969 index row versions were removed. 175398 index pages have been deleted, 175398 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_merchant_date_idx" now contains 2274280 row versions in 203561 pages DETAIL: 759969 index row versions were removed. 175960 index pages have been deleted, 175960 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_success_idx" now contains 2274280 row versions in 78237 pages DETAIL: 759969 index row versions were removed. 70239 index pages have been deleted, 67652 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_failed_idx" now contains 2274280 row versions in 78230 pages DETAIL: 759969 index row versions were removed. 70231 index pages have been deleted, 67665 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_archived_idx" now contains 2274280 row versions in 72943 pages DETAIL: 759969 index row versions were removed. 64962 index pages have been deleted, 62363 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_response_code_idx" now contains 2274280 row versions in 16918 pages DETAIL: 759969 index row versions were removed. 8898 index pages have been deleted, 6314 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_transaction_source_idx" now contains 2274280 row versions in 14235 pages DETAIL: 759969 index row versions were removed. 6234 index pages have been deleted, 3663 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "transaction_facts_transaction_id_source_idx" now contains 2274280 row versions in 18053 pages DETAIL: 759969 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "transaction_facts": found 759969 removable, 2274280 nonremovable row versions in 308142 pages DETAIL: 0 dead row versions cannot be removed yet. There were 15710471 unused item pointers. 266986 pages contain useful free space. 0 pages are entirely empty. CPU 58.00s/35.59u sec elapsed 3240.94 sec. INFO: analyzing "public.transaction_facts" INFO: "transaction_facts": scanned 15000 of 308142 pages, containing 113476 live rows and 0 dead rows; 15000 rows in sample, 2331115 estimated total rows
A good idea here will be to first do a VACUUM FULL and then keep the Autovacuum settings you want.
-----------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
-----------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)
On 1/17/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
> I still keep wondering if this table is bloated with dead tuples. Even
> if you vacuum often if there's a connection with an idle transaction,
> the tuples can't be reclaimed and the table would continue to grow.
I used to vacuum once an hour, although I've switched it to autovacuum
now. It definitely could be bloated with dead tuples. I'll paste the
"vacuum analyze verbose" output at the bottom of this e-mail. Would a
vacuum full be a good idea?
> Assuming the table's NOT bloated, you may do well to increase the
> effective_cache_size, which doesn't allocate anything,
<snip>
> try setting it to something like 512MB or so.
It's currently set to 1000MB.
> If your table is bloating, and you don't have idle transactions hanging
> of the database, it could be that your fsm settings are too low.
fsm is currently set to 2000000. Is there any harm in setting it too
high? =)
Here's the vacuum analyze verbose output:
INFO: vacuuming "public.transaction_facts"
INFO: scanned index "transaction_facts_pkey" to remove 759969 row
versions
DETAIL: CPU 7.20s/2.31u sec elapsed 315.31 sec.
INFO: scanned index "transaction_facts_dcms_dim_id_idx" to remove
759969 row versions
DETAIL: CPU 1.29s/2.15u sec elapsed 146.98 sec.
INFO: scanned index "transaction_facts_merchant_dim_id_idx" to remove
759969 row versions
DETAIL: CPU 1.10s/2.10u sec elapsed 126.09 sec.
INFO: scanned index "transaction_facts_transaction_date_idx" to remove
759969 row versions
DETAIL: CPU 1.65s/2.40u sec elapsed 259.25 sec.
INFO: scanned index "transaction_facts_transaction_id_idx" to remove
759969 row versions
DETAIL: CPU 7.48s/2.85u sec elapsed 371.98 sec.
INFO: scanned index "transaction_facts_product_date_idx" to remove
759969 row versions
DETAIL: CPU 2.32s/2.10u sec elapsed 303.83 sec.
INFO: scanned index "transaction_facts_merchant_product_date_idx" to
remove 759969 row versions
DETAIL: CPU 2.48s/2.31u sec elapsed 295.19 sec.
INFO: scanned index "transaction_facts_merchant_date_idx" to remove
759969 row versions
DETAIL: CPU 8.10s/3.35u sec elapsed 398.73 sec.
INFO: scanned index "transaction_facts_success_idx" to remove 759969
row versions
DETAIL: CPU 5.01s/2.84u sec elapsed 192.73 sec.
INFO: scanned index "transaction_facts_failed_idx" to remove 759969 row
versions
DETAIL: CPU 1.03s/1.90u sec elapsed 123.00 sec.
INFO: scanned index "transaction_facts_archived_idx" to remove 759969
row versions
DETAIL: CPU 1.03s/1.39u sec elapsed 104.42 sec.
INFO: scanned index "transaction_facts_response_code_idx" to remove
759969 row versions
DETAIL: CPU 0.75s/2.17u sec elapsed 36.71 sec.
INFO: scanned index "transaction_facts_transaction_source_idx" to
remove 759969 row versions
DETAIL: CPU 0.60s/1.75u sec elapsed 42.29 sec.
INFO: scanned index "transaction_facts_transaction_id_source_idx" to
remove 759969 row versions
DETAIL: CPU 1.14s/1.84u sec elapsed 44.75 sec.
INFO: "transaction_facts": removed 759969 row versions in 14360 pages
DETAIL: CPU 0.57s/0.23u sec elapsed 45.28 sec.
INFO: index "transaction_facts_pkey" now contains 2274280 row versions
in 152872 pages
DETAIL: 759969 index row versions were removed.
134813 index pages have been deleted, 134813 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO: index "transaction_facts_dcms_dim_id_idx" now contains 2274280
row versions in 85725 pages
DETAIL: 759323 index row versions were removed.
75705 index pages have been deleted, 73721 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_merchant_dim_id_idx" now contains
2274280 row versions in 80023 pages
DETAIL: 759969 index row versions were removed.
71588 index pages have been deleted, 69210 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_transaction_date_idx" now contains
2274280 row versions in 144196 pages
DETAIL: 759969 index row versions were removed.
126451 index pages have been deleted, 126451 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_transaction_id_idx" now contains 2274280
row versions in 150529 pages
DETAIL: 759969 index row versions were removed.
130649 index pages have been deleted, 130649 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_product_date_idx" now contains 2274280
row versions in 202248 pages
DETAIL: 759969 index row versions were removed.
174652 index pages have been deleted, 174652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_merchant_product_date_idx" now contains
2274280 row versions in 202997 pages
DETAIL: 759969 index row versions were removed.
175398 index pages have been deleted, 175398 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_merchant_date_idx" now contains 2274280
row versions in 203561 pages
DETAIL: 759969 index row versions were removed.
175960 index pages have been deleted, 175960 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_success_idx" now contains 2274280 row
versions in 78237 pages
DETAIL: 759969 index row versions were removed.
70239 index pages have been deleted, 67652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_failed_idx" now contains 2274280 row
versions in 78230 pages
DETAIL: 759969 index row versions were removed.
70231 index pages have been deleted, 67665 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_archived_idx" now contains 2274280 row
versions in 72943 pages
DETAIL: 759969 index row versions were removed.
64962 index pages have been deleted, 62363 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_response_code_idx" now contains 2274280
row versions in 16918 pages
DETAIL: 759969 index row versions were removed.
8898 index pages have been deleted, 6314 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_transaction_source_idx" now contains
2274280 row versions in 14235 pages
DETAIL: 759969 index row versions were removed.
6234 index pages have been deleted, 3663 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: index "transaction_facts_transaction_id_source_idx" now contains
2274280 row versions in 18053 pages
DETAIL: 759969 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "transaction_facts": found 759969 removable, 2274280 nonremovable
row versions in 308142 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 15710471 unused item pointers.
266986 pages contain useful free space.
0 pages are entirely empty.
CPU 58.00s/35.59u sec elapsed 3240.94 sec.
INFO: analyzing "public.transaction_facts"
INFO: "transaction_facts": scanned 15000 of 308142 pages, containing
113476 live rows and 0 dead rows; 15000 rows in sample, 2331115
estimated total rows
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
Thanks for the great info Chad. I'm learning a lot from this thread! > 347434 rows * 156 bytes = 52MB (reasonable it could be held in your > shared buffers, which makes Tom's suggestion very plausible, the > index scan may not be cheaper -- because it is all cached) Maybe - I tried running the same query for an older time range that is less likely to be cached. The index scan took longer than my previous example, but still only took 16 seconds, compared to the 87 seconds required to seqscan the table. When I can, I'll restart the machine and run a comparison again to get a "pure" test. > One of the reasons why the sequential scan is slower is because the > optimizer doesn't know the data you are requesting is sitting in the > cache (and it is VERY unlikely you have the entire table in cache, > unless it is a heavily used table or very small table, which it's probably > not). This is a large table (3 million rows). Rows are typically inserted in date order, although large numbers of rows are deleted every night. Basically, this table contains a list of transactions in a rolling time window. So inserts happen throughout the day, and then a day's worth of old rows are deleted every night. The most accessed rows are going to be today's rows, which is a small subset of the overall table. (maybe 14%) > One thing to be careful of here is that you really need to consider what > is the primary use of the table, and what are the major queries you will be > launching against it. But you could improve the correlation by > rebuilding the table ordered by the transaction_date column, but it may screw up > other range scans. Date is almost always a criteria in scans of this table. As mentioned earlier, the table is naturally built in date order - so would rebuilding the table help? Is it possible that even though I'm inserting in date order, since I delete rows so often the physical correlation would get disrupted as disk pages are reused? Perhaps clustering on the transaction_date index and periodically running "cluster" would help? Does vacuum full help with this at all? > Another option is partitioning. I wouldn't do any of this > stuff, until you find out the last tweak you made still holds true, give > it a few days, perhaps test it after a clean reboot of the server. Yeah - partitioning makes a lot of sense and I've thought about doing this in the past. Although I do run queries that cross multiple days, most of my queries only look at today's data, so the physical disk organization would likely be much better with a partitioned table setup. Also, since I usually delete old data one day at a time, I could simply drop the old day's partition. This would make vacuuming much less of an issue. But I won't be making any changes immediately, so I'll continue to run tests given your advice. Thanks again, Jeremy Haile
On 17-Jan-07, at 9:37 AM, Jeremy Haile wrote: >> I still keep wondering if this table is bloated with dead tuples. >> Even >> if you vacuum often if there's a connection with an idle transaction, >> the tuples can't be reclaimed and the table would continue to grow. > > I used to vacuum once an hour, although I've switched it to autovacuum > now. It definitely could be bloated with dead tuples. I'll paste the > "vacuum analyze verbose" output at the bottom of this e-mail. Would a > vacuum full be a good idea? > > >> Assuming the table's NOT bloated, you may do well to increase the >> effective_cache_size, which doesn't allocate anything, > <snip> >> try setting it to something like 512MB or so. > > It's currently set to 1000MB. How much memory does the box have > > >> If your table is bloating, and you don't have idle transactions >> hanging >> of the database, it could be that your fsm settings are too low. > > fsm is currently set to 2000000. Is there any harm in setting it too > high? =) Yes, it takes up space > > Here's the vacuum analyze verbose output: > > INFO: vacuuming "public.transaction_facts" > INFO: scanned index "transaction_facts_pkey" to remove 759969 row > versions > DETAIL: CPU 7.20s/2.31u sec elapsed 315.31 sec. > INFO: scanned index "transaction_facts_dcms_dim_id_idx" to remove > 759969 row versions > DETAIL: CPU 1.29s/2.15u sec elapsed 146.98 sec. > INFO: scanned index "transaction_facts_merchant_dim_id_idx" to remove > 759969 row versions > DETAIL: CPU 1.10s/2.10u sec elapsed 126.09 sec. > INFO: scanned index "transaction_facts_transaction_date_idx" to > remove > 759969 row versions > DETAIL: CPU 1.65s/2.40u sec elapsed 259.25 sec. > INFO: scanned index "transaction_facts_transaction_id_idx" to remove > 759969 row versions > DETAIL: CPU 7.48s/2.85u sec elapsed 371.98 sec. > INFO: scanned index "transaction_facts_product_date_idx" to remove > 759969 row versions > DETAIL: CPU 2.32s/2.10u sec elapsed 303.83 sec. > INFO: scanned index "transaction_facts_merchant_product_date_idx" to > remove 759969 row versions > DETAIL: CPU 2.48s/2.31u sec elapsed 295.19 sec. > INFO: scanned index "transaction_facts_merchant_date_idx" to remove > 759969 row versions > DETAIL: CPU 8.10s/3.35u sec elapsed 398.73 sec. > INFO: scanned index "transaction_facts_success_idx" to remove 759969 > row versions > DETAIL: CPU 5.01s/2.84u sec elapsed 192.73 sec. > INFO: scanned index "transaction_facts_failed_idx" to remove > 759969 row > versions > DETAIL: CPU 1.03s/1.90u sec elapsed 123.00 sec. > INFO: scanned index "transaction_facts_archived_idx" to remove 759969 > row versions > DETAIL: CPU 1.03s/1.39u sec elapsed 104.42 sec. > INFO: scanned index "transaction_facts_response_code_idx" to remove > 759969 row versions > DETAIL: CPU 0.75s/2.17u sec elapsed 36.71 sec. > INFO: scanned index "transaction_facts_transaction_source_idx" to > remove 759969 row versions > DETAIL: CPU 0.60s/1.75u sec elapsed 42.29 sec. > INFO: scanned index "transaction_facts_transaction_id_source_idx" to > remove 759969 row versions > DETAIL: CPU 1.14s/1.84u sec elapsed 44.75 sec. > INFO: "transaction_facts": removed 759969 row versions in 14360 pages > DETAIL: CPU 0.57s/0.23u sec elapsed 45.28 sec. > INFO: index "transaction_facts_pkey" now contains 2274280 row > versions > in 152872 pages > DETAIL: 759969 index row versions were removed. > 134813 index pages have been deleted, 134813 are currently reusable. > CPU 0.00s/0.01u sec elapsed 0.01 sec. > INFO: index "transaction_facts_dcms_dim_id_idx" now contains 2274280 > row versions in 85725 pages > DETAIL: 759323 index row versions were removed. > 75705 index pages have been deleted, 73721 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_merchant_dim_id_idx" now contains > 2274280 row versions in 80023 pages > DETAIL: 759969 index row versions were removed. > 71588 index pages have been deleted, 69210 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_transaction_date_idx" now contains > 2274280 row versions in 144196 pages > DETAIL: 759969 index row versions were removed. > 126451 index pages have been deleted, 126451 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_transaction_id_idx" now contains > 2274280 > row versions in 150529 pages > DETAIL: 759969 index row versions were removed. > 130649 index pages have been deleted, 130649 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_product_date_idx" now contains 2274280 > row versions in 202248 pages > DETAIL: 759969 index row versions were removed. > 174652 index pages have been deleted, 174652 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_merchant_product_date_idx" now > contains > 2274280 row versions in 202997 pages > DETAIL: 759969 index row versions were removed. > 175398 index pages have been deleted, 175398 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_merchant_date_idx" now contains > 2274280 > row versions in 203561 pages > DETAIL: 759969 index row versions were removed. > 175960 index pages have been deleted, 175960 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_success_idx" now contains 2274280 row > versions in 78237 pages > DETAIL: 759969 index row versions were removed. > 70239 index pages have been deleted, 67652 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_failed_idx" now contains 2274280 row > versions in 78230 pages > DETAIL: 759969 index row versions were removed. > 70231 index pages have been deleted, 67665 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_archived_idx" now contains 2274280 row > versions in 72943 pages > DETAIL: 759969 index row versions were removed. > 64962 index pages have been deleted, 62363 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_response_code_idx" now contains > 2274280 > row versions in 16918 pages > DETAIL: 759969 index row versions were removed. > 8898 index pages have been deleted, 6314 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_transaction_source_idx" now contains > 2274280 row versions in 14235 pages > DETAIL: 759969 index row versions were removed. > 6234 index pages have been deleted, 3663 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: index "transaction_facts_transaction_id_source_idx" now > contains > 2274280 row versions in 18053 pages > DETAIL: 759969 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "transaction_facts": found 759969 removable, 2274280 > nonremovable > row versions in 308142 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 15710471 unused item pointers. > 266986 pages contain useful free space. > 0 pages are entirely empty. > CPU 58.00s/35.59u sec elapsed 3240.94 sec. > INFO: analyzing "public.transaction_facts" > INFO: "transaction_facts": scanned 15000 of 308142 pages, containing > 113476 live rows and 0 dead rows; 15000 rows in sample, 2331115 > estimated total rows > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
> How much memory does the box have 2GB > Yes, it takes up space Well, I upped max_fsm_pages to 2000000 because it vacuums were failing with it set to 1500000. However, I'm now autovacuuming, which might be keeping my fsm lower. I didn't realize that setting it too high had negative effects, so I'll try to get a better handle on how large this needs to be.
On Wed, 2007-01-17 at 08:37, Jeremy Haile wrote: > > I still keep wondering if this table is bloated with dead tuples. Even > > if you vacuum often if there's a connection with an idle transaction, > > the tuples can't be reclaimed and the table would continue to grow. > > I used to vacuum once an hour, although I've switched it to autovacuum > now. It definitely could be bloated with dead tuples. I'll paste the > "vacuum analyze verbose" output at the bottom of this e-mail. Would a > vacuum full be a good idea? > > > > Assuming the table's NOT bloated, you may do well to increase the > > effective_cache_size, which doesn't allocate anything, > <snip> > > try setting it to something like 512MB or so. > > It's currently set to 1000MB. Sounds about right for a machine with 2G memory (as you mentioned elsewhere) > > If your table is bloating, and you don't have idle transactions hanging > > of the database, it could be that your fsm settings are too low. > > fsm is currently set to 2000000. Is there any harm in setting it too > high? =) As long as you don't run out of memory, it can be pretty high. note that it uses shared memory, so setting it too high can cause the db to fail to start. > INFO: scanned index "transaction_facts_pkey" to remove 759969 row > versions > DETAIL: CPU 7.20s/2.31u sec elapsed 315.31 sec. > INFO: scanned index "transaction_facts_dcms_dim_id_idx" to remove > 759969 row versions SNIP > DETAIL: 759969 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "transaction_facts": found 759969 removable, 2274280 nonremovable > row versions in 308142 pages > DETAIL: 0 dead row versions cannot be removed yet. > There were 15710471 unused item pointers. > 266986 pages contain useful free space. > 0 pages are entirely empty. > CPU 58.00s/35.59u sec elapsed 3240.94 sec. That's about 32% dead rows. Might be worth scheduling a vacuum full, but it's not like I was afraid it might be. It looks to me like you could probably use a faster I/O subsystem in that machine though. If the random page cost being lower fixes your issues, then I'd just run with it lower for now. note that while lowering it may fix one query, it may break another. Tuning pgsql, like any database, is as much art as science...
> That's about 32% dead rows. Might be worth scheduling a vacuum full, > but it's not like I was afraid it might be. It looks to me like you > could probably use a faster I/O subsystem in that machine though. I'll try to schedule a full vacuum tonight. As far as I/O - it's using SAN over fiber. Not as fast as internal SCSI though...
On 1/17/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
Heh, querying a different range of data was a better idea compared to rebooting.. doh.. So I think you reasonably established that an index scan for unbuffered data would still be faster than a sequential scan.
To me this is one of those cases where the optimizer doesn't understand the clustering of the data, and it is being misled by the statistics and fixed parameters it has. If you have fast disks (I think a fiber SAN probably counts here) then adjusting random_page_cost lower is reasonable, the lowest I have heard recommended is 2.0. It would be nice if the database could learn to estimate these values, as newer versions of Oracle does.
Yes, cluster would rebuild the table for you. I wouldn't do anything too intrusive, run with the random_page_cost lowered, perhaps vacuum full, reindex, and see what happens. If it degrades over time, then I would start looking at partitioning or some other solution.
Yep, my thoughts exactly. Partitioning support is PostgreSQL is there, but it needs a bit more of a tighter integration into the core (I shouldn't have to create a view, n tables, n rules, etc). Additionally, I have read that at some point when you have "y" partitions the performance degrades, haven't really looked into it myself.
Maybe - I tried running the same query for an older time range that is
less likely to be cached. The index scan took longer than my previous
example, but still only took 16 seconds, compared to the 87 seconds
required to seqscan the table. When I can, I'll restart the machine and
run a comparison again to get a "pure" test.
Heh, querying a different range of data was a better idea compared to rebooting.. doh.. So I think you reasonably established that an index scan for unbuffered data would still be faster than a sequential scan.
To me this is one of those cases where the optimizer doesn't understand the clustering of the data, and it is being misled by the statistics and fixed parameters it has. If you have fast disks (I think a fiber SAN probably counts here) then adjusting random_page_cost lower is reasonable, the lowest I have heard recommended is 2.0. It would be nice if the database could learn to estimate these values, as newer versions of Oracle does.
Date is almost always a criteria in scans of this table. As mentioned
earlier, the table is naturally built in date order - so would
rebuilding the table help? Is it possible that even though I'm
inserting in date order, since I delete rows so often the physical
correlation would get disrupted as disk pages are reused? Perhaps
clustering on the transaction_date index and periodically running
"cluster" would help? Does vacuum full help with this at all?
Yes, cluster would rebuild the table for you. I wouldn't do anything too intrusive, run with the random_page_cost lowered, perhaps vacuum full, reindex, and see what happens. If it degrades over time, then I would start looking at partitioning or some other solution.
Yeah - partitioning makes a lot of sense and I've thought about doing
this in the past. Although I do run queries that cross multiple days,
most of my queries only look at today's data, so the physical disk
organization would likely be much better with a partitioned table setup.
Also, since I usually delete old data one day at a time, I could simply
drop the old day's partition. This would make vacuuming much less of an
issue.
Yep, my thoughts exactly. Partitioning support is PostgreSQL is there, but it needs a bit more of a tighter integration into the core (I shouldn't have to create a view, n tables, n rules, etc). Additionally, I have read that at some point when you have "y" partitions the performance degrades, haven't really looked into it myself.
--
Chad
http://www.postgresqlforums.com/
On Wed, 2007-01-17 at 10:28, Jeremy Haile wrote: > > That's about 32% dead rows. Might be worth scheduling a vacuum full, > > but it's not like I was afraid it might be. It looks to me like you > > could probably use a faster I/O subsystem in that machine though. > > I'll try to schedule a full vacuum tonight. As far as I/O - it's using > SAN over fiber. Not as fast as internal SCSI though... Also, look at the thread going by about index bloat by 4x. You'll likely want to reindex after a vacuum full since vacuum full doesn't reclaim space in indexes and in fact often bloats indexes.
> It would be nice if the database could > learn to estimate these values, as newer versions of Oracle does. That would be really nice since it would take some of the guess work out of it. > Yes, cluster would rebuild the table for you. I wouldn't do anything too > intrusive, run with the random_page_cost lowered, perhaps vacuum full, > reindex, and see what happens. I'll try doing the vacuum full and reindex tonight since they require exclusive locks. > Yep, my thoughts exactly. Partitioning support is PostgreSQL is there, > but it needs a bit more of a tighter integration into the core (I shouldn't > have to create a view, n tables, n rules, etc). Additionally, I have read > that at some point when you have "y" partitions the performance degrades, > haven't really looked into it myself. Yeah - I haven't setup partitioning in PostgreSQL before, although I've read quite a bit about it. I've talked about getting improved syntax for partitioning in PostgreSQL. MySQL's syntax is much simpler and more intuitive compared to setting them up with Postgres - it would be nice if PostgreSQL adopted a similar syntax where partitions were first-class citizens.
> Also, look at the thread going by about index bloat by 4x. You'll > likely want to reindex after a vacuum full since vacuum full doesn't > reclaim space in indexes and in fact often bloats indexes. Thanks for the pointer. That thread might indeed apply to my situation. I'm going to reindex the the table tonight. Jeremy Haile
>> Assuming the table's NOT bloated, you may do well to increase the >> effective_cache_size, which doesn't allocate anything, > <snip> >> try setting it to something like 512MB or so. > > It's currently set to 1000MB. > > >> If your table is bloating, and you don't have idle transactions hanging >> of the database, it could be that your fsm settings are too low. > > fsm is currently set to 2000000. Is there any harm in setting it too > high? =) I generally recomend to use this - it's a nice list of the most important settings in postgresql.conf (with respect to performance), along with a short explanation, and suggested values: http://www.powerpostgresql.com/PerfList I'm using it as a general guide when setting and tuning our servers. Anyway, as someone already pointed out, it's an art to choose the proper values - there's nothing like 'the only best values'. tomas
> That's about 32% dead rows. Might be worth scheduling a vacuum full, > but it's not like I was afraid it might be. It looks to me like you > could probably use a faster I/O subsystem in that machine though. > > If the random page cost being lower fixes your issues, then I'd just run > with it lower for now. note that while lowering it may fix one query, > it may break another. Tuning pgsql, like any database, is as much art > as science... A nice feature of postgresql is the ability to log the 'slow queries' (exceeding some time limit) - you can use it to compare the performance of various settings. We're using it to 'detect' stupid SQL etc. Just set it reasonably (the value depends on you), for example we used about 500ms originally and after about two months of improvements we lowered it to about 100ms. You can analyze the log by hand, but about a year ago I've written a tool to parse it and build a set of HTML reports with an overview and details about each query) along with graphs and examples of queries. You can get it here: http://opensource.pearshealthcyber.cz/ Just beware, it's written in PHP and it definitely is not perfect: (1) memory requirements (about 4x the size of the log) (2) not to fast (about 20mins of P4@3GHz for a 200MB log) (3) it requires a certain log format (see the page) I did some improvements to the script recently, but forgot to upload it. I'll do that tomorrow. Tomas
Interesting - I haven't seen that tool before. I'll have to check it out when I get a chance. Thanks! On Wed, 17 Jan 2007 20:32:37 +0100, "Tomas Vondra" <tv@fuzzy.cz> said: > > That's about 32% dead rows. Might be worth scheduling a vacuum full, > > but it's not like I was afraid it might be. It looks to me like you > > could probably use a faster I/O subsystem in that machine though. > > > > If the random page cost being lower fixes your issues, then I'd just run > > with it lower for now. note that while lowering it may fix one query, > > it may break another. Tuning pgsql, like any database, is as much art > > as science... > > A nice feature of postgresql is the ability to log the 'slow queries' > (exceeding some time limit) - you can use it to compare the performance > of various settings. We're using it to 'detect' stupid SQL etc. > > Just set it reasonably (the value depends on you), for example we used > about 500ms originally and after about two months of improvements we > lowered it to about 100ms. > > You can analyze the log by hand, but about a year ago I've written a > tool to parse it and build a set of HTML reports with an overview and > details about each query) along with graphs and examples of queries. > > You can get it here: http://opensource.pearshealthcyber.cz/ > > Just beware, it's written in PHP and it definitely is not perfect: > > (1) memory requirements (about 4x the size of the log) > (2) not to fast (about 20mins of P4@3GHz for a 200MB log) > (3) it requires a certain log format (see the page) > > I did some improvements to the script recently, but forgot to upload it. > I'll do that tomorrow. > > Tomas > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
Hey there; I've been lurking on this list awhile, and I've been working with postgres for a number of years so I'm not exactly new to this. But I'm still having trouble getting a good balance of settings and I'd like to see what other people think. We may also be willing to hire a contractor to help tackle this problem if anyone is interested. I've got an application here that runs large (in terms of length -- the queries have a lot of conditions in them) queries that can potentially return millions of rows but on average probably return tens of thousands of rows. It's read only for most of the day, and pretty much all the queries except one are really fast. However, each night we load data from a legacy cobol system into the SQL system and then we summarize that data to make the reports faster. This load process is intensely insert/update driven but also has a hefty amount of selects as well. This load process is taking ever longer to complete. SO ... our goal here is to make this load process take less time. It seems the big part is building the big summary table; this big summary table is currently 9 million rows big. Every night, we drop the table, re-create it, build the 9 million rows of data (we use COPY to put hte data in when it's prepared, not INSERT), and then build the indexes on it -- of which there are many. Unfortunately this table gets queried in a lot of different ways and needs these indexes; also unfortunately, we have operator class indexes to support both ASC and DESC sorting on columns so these are for all intents and purposes duplicate but required under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still a requirement?) Building these indexes takes forever! It's a long grind through inserts and then building the indexes takes a hefty amount of time too. (about 9 hours). Now, the application is likely part at fault, and we're working to make it more efficient, but it has nothing to do with the index building time. I'm wondering what we can do to make this better if anything; would it be better to leave the indexes on? It doesn't seem to be. Would it be better to use INSERTs instead of copies? Doesn't seem to be. Anyway -- ANYTHING we can do to make this go faster is appreciated :) Here's some vital statistics: - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI discs. The disc configuration seems to be a good one, it's the best of all the ones we've tested so far. - The load process itself takes about 6 gigs of memory, the rest is free for postgres because this is basically all the machine does. - If this was your machine and situation, how would you lay out the emmory settings? What would you set the FSM to? Would you leave teh bgwriter on or off? We've already got FSYNC off because "data integrity" doesn't matter -- this stuff is religeously backed up and we've got no problem reinstalling it. Besides, in order for this machine to go down, data integrity of the DB is the least of the worries :) Do wal_buffers/full_page_writes matter of FSYNC is off? If so, what settings? What about checkpoints? Any finally, any ideas on planner constants? Here's what I'm using: seq_page_cost = 0.5 # measured on an arbitrary scale random_page_cost = 1.0 # same scale as above cpu_tuple_cost = 0.001 # same scale as above cpu_index_tuple_cost = 0.0001 # same scale as above cpu_operator_cost = 0.00025 # same scale as above effective_cache_size = 679006 I really don't remember how I came up with that effective_cache_size number.... Anyway... any advice would be appreciated :) Steve
> Any finally, any ideas on planner constants? Here's what I'm using: > > seq_page_cost = 0.5 # measured on an arbitrary scale > random_page_cost = 1.0 # same scale as above > cpu_tuple_cost = 0.001 # same scale as above > cpu_index_tuple_cost = 0.0001 # same scale as above > cpu_operator_cost = 0.00025 # same scale as above > effective_cache_size = 679006 > > I really don't remember how I came up with that effective_cache_size > number.... I don't have much experience with the way your application works, but: 1) What is the size of the whole database? Does that fit in your memory? That's the first thing I'd like to know and I can't find it in your post. I'm missing several other important values too - namely shared_buffers max_fsm_pages work_mem maintenance_work_mem BTW, is the autovacuum daemon running? If yes, try to stop it during the import (and run ANALYZE after the import of data). 2) What is the size of a disc page? Without that we can only guess what doest the effective_cache_size number means - in the usual case it's 8kB thus giving about 5.2 GiB of memory. As suggested in http://www.powerpostgresql.com/PerfList I'd increase that to about 1.400.000 which about 10.5 GiB (about 2/3 of RAM). Anyway - don't be afraid this breaks something. This is just an information for PostgreSQL how much memory the OS is probably using as a filesystem cache. PostgreSQL uses this to evaluate the probability that the page is in a cache. 3) What is the value of maintenance_work_mem? This is a very important value for CREATE INDEX (and some other). The lower this value is, the slower the CREATE INDEX is. So try to increase the value as much as you can - this could / should improve the import performance considerably. But be careful - this does influence the amount of memmory allocated by PostgreSQL. Being in your position I wouldn't do this in the postgresql.conf - I'd do that in the connection used by the import using SET command, ie. something like SET maintenance_work_mem = 524288; CREATE INDEX ... CREATE INDEX ... CREATE INDEX ... CREATE INDEX ... for a 512 MiB of maintenance_work_mem. Maybe even a higher value could be used (1 GiB?). Just try to fiddle with this a little. 4) Try to set up some performance monitoring - for example a 'dstat' is a nice way to do that. This way you can find yout where's the bottleneck (memory, I/O etc.) That's basically all I can think of right now. Tomas
On 1/17/07, Steve <cheetah@tanabi.org> wrote:
How many rows do you typically load each night? If it is say less than 10% of the total rows, then perhaps the suggestion in the next paragraph is reasonable.
Perhaps, placing a trigger on the source table and building a "change log" would be useful. For example, you could scan the change log (looking for insert, update, and deletes) and integrate those changes into your summary table. Obviously if you are using complex aggregates it may not be possible to adjust the summary table, but if you are performing simple SUM's, COUNT's, etc. then this is a workable solution.
--
Chad
http://www.postgresqlforums.com/
However, each night we load data from a legacy cobol system into the SQL
system and then we summarize that data to make the reports faster. This
load process is intensely insert/update driven but also has a hefty
amount of selects as well. This load process is taking ever longer to
complete.
How many rows do you typically load each night? If it is say less than 10% of the total rows, then perhaps the suggestion in the next paragraph is reasonable.
SO ... our goal here is to make this load process take less time. It
seems the big part is building the big summary table; this big summary
table is currently 9 million rows big. Every night, we drop the table,
re-create it, build the 9 million rows of data (we use COPY to put hte
data in when it's prepared, not INSERT), and then build the indexes on it
Perhaps, placing a trigger on the source table and building a "change log" would be useful. For example, you could scan the change log (looking for insert, update, and deletes) and integrate those changes into your summary table. Obviously if you are using complex aggregates it may not be possible to adjust the summary table, but if you are performing simple SUM's, COUNT's, etc. then this is a workable solution.
--
Chad
http://www.postgresqlforums.com/
Steve wrote: > SO ... our goal here is to make this load process take less time. It > seems the big part is building the big summary table; this big summary > table is currently 9 million rows big. Every night, we drop the table, > re-create it, build the 9 million rows of data (we use COPY to put hte > data in when it's prepared, not INSERT), and then build the indexes on > it -- of which there are many. Would it be possible to just update the summary table, instead of recreating it from scratch every night? > Unfortunately this table gets queried in > a lot of different ways and needs these indexes; also unfortunately, we > have operator class indexes to support both ASC and DESC sorting on > columns so these are for all intents and purposes duplicate but required > under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this > still a requirement?) I don't think this has changed in 8.2. > Building these indexes takes forever! It's a long grind through inserts > and then building the indexes takes a hefty amount of time too. (about > 9 hours). Now, the application is likely part at fault, and we're > working to make it more efficient, but it has nothing to do with the > index building time. I'm wondering what we can do to make this better > if anything; would it be better to leave the indexes on? It doesn't > seem to be. Would it be better to use INSERTs instead of copies? > Doesn't seem to be. Would it help if you created multiple indexes simultaneously? You have enough CPU to do it. Is the index creation CPU or I/O bound? 9 million rows should fit in 16 GB of memory, right? > - The load process itself takes about 6 gigs of memory, the rest is free > for postgres because this is basically all the machine does. Can you describe the load process in more detail? What's it doing with the 6 gigs? > - If this was your machine and situation, how would you lay out the > emmory settings? What would you set the FSM to? FSM seems irrelevant here.. > Do wal_buffers/full_page_writes matter of FSYNC is off? Better turn off full_page_writes, since you can kiss goodbye to data integrity anyway with fsync=off. > Anyway... any advice would be appreciated :) What's your maintenance_work_mem setting? It can make a big difference in sorting the data for indexes. If you could post the schema including the indexes, people might have more ideas... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
> > 1) What is the size of the whole database? Does that fit in your memory? > That's the first thing I'd like to know and I can't find it in your > post. Current on-disk size is about 51 gig. I'm not sure if there's a different size I should be looking at instead, but that's what du tells me the directory for the database in the "base" directory is sized at. So, no, it doesn't fit into memory all the way. > I'm missing several other important values too - namely > > shared_buffers > max_fsm_pages > work_mem > maintenance_work_mem > I didn't share these because they've been in flux :) I've been experimenting with different values, but currently we're using: 8GB shared_buffers 100000 max_fsm_pages 256MB work_mem 6GB maintenance_work_mem > BTW, is the autovacuum daemon running? If yes, try to stop it during > the import (and run ANALYZE after the import of data). No. all vacuums are done explicitly since the database doesn't change during the day. The 'order of operations' is: - Load COBOL data into database (inserts/updates) - VACUUM COBOL data - Summarize COBOL data (inserts/updates with the big table using COPY) - VACUUM summary tables So everything gets vacuumed as soon as it's updated. > 2) What is the size of a disc page? Without that we can only guess what > doest the effective_cache_size number means - in the usual case it's > 8kB thus giving about 5.2 GiB of memory. > I believe it's 8kB. I definitely haven't changed it :) > As suggested in http://www.powerpostgresql.com/PerfList I'd increase > that to about 1.400.000 which about 10.5 GiB (about 2/3 of RAM). > > Anyway - don't be afraid this breaks something. This is just an > information for PostgreSQL how much memory the OS is probably using > as a filesystem cache. PostgreSQL uses this to evaluate the > probability that the page is in a cache. Okay, I'll try the value you recommend. :) > 3) What is the value of maintenance_work_mem? This is a very important > value for CREATE INDEX (and some other). The lower this value is, > the slower the CREATE INDEX is. So try to increase the value as much > as you can - this could / should improve the import performance > considerably. > > But be careful - this does influence the amount of memmory allocated > by PostgreSQL. Being in your position I wouldn't do this in the > postgresql.conf - I'd do that in the connection used by the import > using SET command, ie. something like > > SET maintenance_work_mem = 524288; > CREATE INDEX ... > CREATE INDEX ... > CREATE INDEX ... > CREATE INDEX ... > > for a 512 MiB of maintenance_work_mem. Maybe even a higher value > could be used (1 GiB?). Just try to fiddle with this a little. It's currently at 6GB in postgres.conf, though you have a good point in that maybe that should be before the indexes are made to save room. Things are certainly kinda tight in the config as is. > 4) Try to set up some performance monitoring - for example a 'dstat' is > a nice way to do that. This way you can find yout where's the > bottleneck (memory, I/O etc.) > > That's basically all I can think of right now. > Thanks for the tips :) Steve
On Wed, 17 Jan 2007, Benjamin Minshall wrote: > >> Building these indexes takes forever! > >> Anyway -- ANYTHING we can do to make this go faster is appreciated :) >> Here's some vital statistics: > >> - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI >> discs. The disc configuration seems to be a good one, it's the best of all >> the ones we've tested so far. > > What are your shared_buffers, work_mem, and maintenance_work_mem settings? > > maintenance_work_mem is used for CREATE INDEX, and with 16GB of memory in the > machine, maintenance_work_mem should be set to at least 1GB in my opinion. > shared_buffers = 8GB work_mem = 256MB maintenance_work_mem = 6GB So that should be covered, unless I'm using too much memory and swapping. It does look like it's swapping a little, but not too badly as far as I can tell. I'm thinking of dialing back everything a bit, but I'm not really sure what the heck to do :) It's all guessing for me right now. Steve
> Building these indexes takes forever! > Anyway -- ANYTHING we can do to make this go faster is appreciated :) > Here's some vital statistics: > - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI > discs. The disc configuration seems to be a good one, it's the best of > all the ones we've tested so far. What are your shared_buffers, work_mem, and maintenance_work_mem settings? maintenance_work_mem is used for CREATE INDEX, and with 16GB of memory in the machine, maintenance_work_mem should be set to at least 1GB in my opinion. -- Benjamin Minshall <minshall@intellicon.biz> Senior Developer -- Intellicon, Inc. http://www.intellicon.biz
Attachment
On Wed, 2007-01-17 at 15:58, Steve wrote: > On Wed, 17 Jan 2007, Benjamin Minshall wrote: > > > > >> Building these indexes takes forever! > > > >> Anyway -- ANYTHING we can do to make this go faster is appreciated :) > >> Here's some vital statistics: > > > >> - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI > >> discs. The disc configuration seems to be a good one, it's the best of all > >> the ones we've tested so far. > > > > What are your shared_buffers, work_mem, and maintenance_work_mem settings? > > > > maintenance_work_mem is used for CREATE INDEX, and with 16GB of memory in the > > machine, maintenance_work_mem should be set to at least 1GB in my opinion. > > > > shared_buffers = 8GB > work_mem = 256MB > maintenance_work_mem = 6GB > > So that should be covered, unless I'm using too much memory and swapping. > It does look like it's swapping a little, but not too badly as far as I > can tell. I'm thinking of dialing back everything a bit, but I'm not > really sure what the heck to do :) It's all guessing for me right now. Generally speaking, once you've gotten to the point of swapping, even a little, you've gone too far. A better approach is to pick some conservative number, like 10-25% of your ram for shared_buffers, and 1 gig or so for maintenance work_mem, and then increase them while exercising the system, and measure the difference increasing them makes. If going from 1G shared buffers to 2G shared buffers gets you a 10% increase, then good. If going from 2G to 4G gets you a 1.2% increase, it's questionable. You should reach a point where throwing more shared_buffers stops helping before you start swapping. But you might not. Same goes for maintenance work mem. Incremental changes, accompanied by reproduceable benchmarks / behaviour measurements are the way to determine the settings. Note that you can also vary those during different times of the day. you can have maint_mem set to 1Gig during the day and crank it up to 8 gig or something while loading data. Shared_buffers can't be changed without restarting the db though.
> How many rows do you typically load each night? If it is say less than 10% > of the total rows, then perhaps the suggestion in the next paragraph is > reasonable. Hrm. It's very, very variable. I'd say it's more than 10% on average, and it can actually be pretty close to 50-100% on certain days. Our data is based upon customer submissions, and normally it's a daily basis kind of deal, but sometimes they'll resubmit their entire year on certain deadlines to make sure it's all in. Now, we don't have to optimize for those deadlines, just the 'average daily load'. It's okay if on those deadlines it takes forever, because that's understandable. However, I will look into this and see if I can figure out this average value. This may be a valid idea, and I'll look some more at it. Thanks! Steve > SO ... our goal here is to make this load process take less time. It >> seems the big part is building the big summary table; this big summary >> table is currently 9 million rows big. Every night, we drop the table, >> re-create it, build the 9 million rows of data (we use COPY to put hte >> data in when it's prepared, not INSERT), and then build the indexes on it > > > Perhaps, placing a trigger on the source table and building a "change log" > would be useful. For example, you could scan the change log (looking for > insert, update, and deletes) and integrate those changes into your summary > table. Obviously if you are using complex aggregates it may not be possible > to adjust the summary table, but if you are performing simple SUM's, > COUNT's, etc. then this is a workable solution. > > > -- > Chad > http://www.postgresqlforums.com/ >
On 17-Jan-07, at 3:41 PM, Steve wrote: > Hey there; > > I've been lurking on this list awhile, and I've been working with > postgres for a number of years so I'm not exactly new to this. But > I'm still having trouble getting a good balance of settings and I'd > like to see what other people think. We may also be willing to > hire a contractor to help tackle this problem if anyone is interested. > > I've got an application here that runs large (in terms of length -- > the queries have a lot of conditions in them) queries that can > potentially return millions of rows but on average probably return > tens of thousands of rows. It's read only for most of the day, and > pretty much all the queries except one are really fast. > > However, each night we load data from a legacy cobol system into > the SQL system and then we summarize that data to make the reports > faster. This load process is intensely insert/update driven but > also has a hefty amount of selects as well. This load process is > taking ever longer to complete. > > > SO ... our goal here is to make this load process take less time. > It seems the big part is building the big summary table; this big > summary table is currently 9 million rows big. Every night, we > drop the table, re-create it, build the 9 million rows of data (we > use COPY to put hte data in when it's prepared, not INSERT), and > then build the indexes on it -- of which there are many. > Unfortunately this table gets queried in a lot of different ways > and needs these indexes; also unfortunately, we have operator class > indexes to support both ASC and DESC sorting on columns so these > are for all intents and purposes duplicate but required under > Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this > still a requirement?) > > Building these indexes takes forever! It's a long grind through > inserts and then building the indexes takes a hefty amount of time > too. (about 9 hours). Now, the application is likely part at > fault, and we're working to make it more efficient, but it has > nothing to do with the index building time. I'm wondering what we > can do to make this better if anything; would it be better to leave > the indexes on? It doesn't seem to be. Would it be better to use > INSERTs instead of copies? Doesn't seem to be. > > > Anyway -- ANYTHING we can do to make this go faster is > appreciated :) Here's some vital statistics: > > - Machine is a 16 GB, 4 actual CPU dual-core opteron system using > SCSI discs. The disc configuration seems to be a good one, it's > the best of all the ones we've tested so far. > The basic problem here is simply writing all the data to disk. you are building 9M rows of data plus numerous index's. How much data are you actually writing to the disk. Try looking at iostat while this is going on. My guess is you are maxing out the disk write speed. > - The load process itself takes about 6 gigs of memory, the rest is > free for postgres because this is basically all the machine does. > > - If this was your machine and situation, how would you lay out the > emmory settings? What would you set the FSM to? Would you leave > teh bgwriter on or off? We've already got FSYNC off because "data > integrity" doesn't matter -- this stuff is religeously backed up > and we've got no problem reinstalling it. Besides, in order for > this machine to go down, data integrity of the DB is the least of > the worries :) > > Do wal_buffers/full_page_writes matter of FSYNC is off? If so, > what settings? What about checkpoints? > Not reallly, I'd have WAL buffers write to a ram disk > Any finally, any ideas on planner constants? Here's what I'm using: > > seq_page_cost = 0.5 # measured on an arbitrary > scale > random_page_cost = 1.0 # same scale as above > cpu_tuple_cost = 0.001 # same scale as above > cpu_index_tuple_cost = 0.0001 # same scale as above > cpu_operator_cost = 0.00025 # same scale as above > effective_cache_size = 679006 > as a general rule make shared buffers about 25% of free mem, effective cache 75% but with a write intensive load like you have I think the first thing to look at is write speed. > I really don't remember how I came up with that > effective_cache_size number.... > > > Anyway... any advice would be appreciated :) > > > Steve > > ---------------------------(end of > broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster >
> Would it be possible to just update the summary table, instead of recreating > it from scratch every night? Hrm, I believe it's probably less work for the computer to do if it's rebuilt. Any number of rows may be changed during an update, not including additions, so I'd have to pull out what's changed and sync it with what's in the summary table already. It'll be a lot more selects and program-side computation to save the big copy; it might work out, but I'd say this would be my last ditch thing. :) >> Building these indexes takes forever! It's a long grind through inserts >> and then building the indexes takes a hefty amount of time too. (about 9 >> hours). Now, the application is likely part at fault, and we're working to >> make it more efficient, but it has nothing to do with the index building >> time. I'm wondering what we can do to make this better if anything; would >> it be better to leave the indexes on? It doesn't seem to be. Would it be >> better to use INSERTs instead of copies? Doesn't seem to be. > > Would it help if you created multiple indexes simultaneously? You have enough > CPU to do it. Is the index creation CPU or I/O bound? 9 million rows should > fit in 16 GB of memory, right? This is a very very interesting idea. It looks like we're probably not fully utilizing the machine for the index build, and this could be the ticket for us. I'm going to go ahead and set up a test for this and we'll see how it goes. > Can you describe the load process in more detail? What's it doing with the 6 > gigs? There's two halves to the load process; the loader and the summarizer. The loader is the part that takes 6 gigs; the summarizer only takes a few hundred MEG. Basically we have these COBOL files that vary in size but are usually in the hundred's of MEG realm. These files contain new data OR updates to existing data. We load this data from the COBOL files in chunks, so that's not a place where we're burning a lot of memory. The first thing we do is cache the list of COBOL ID codes that are already in the DB; the COBOL ID codes are really long numeric strings, so we use a sequenced integer primary key. The cache translates COBOL IDs to primary keys, and this takes most of our memory nowadays. Our cache is fast, but it's kind of a memory hog. We're working on trimming that down, but it's definitely faster than making a query for each COBOL ID. The load is relatively fast and is considered "acceptable", and has been relatively constant in speed. It's the summarizer that's brutal. The summarizer produces 3 main summary tables and a few secondaries that don't take much time to make. Two of them are smallish and not that big a deal, and the last one is the biggie that's 9 mil rows and growing. To produce the 9 mil row table, we query out the data in groups, do our processing, and save that data to a series of text files that are in blocks of 10,000 rows as I recall. We then copy each file into the DB (there were some issues with copying in an entire 9 mil row file in the past, which is why we don't use just one file -- those issues have been fixed, but we didn't undo the change). > What's your maintenance_work_mem setting? It can make a big difference in > sorting the data for indexes. 6 gigs currently. :) > If you could post the schema including the indexes, people might have more > ideas... I'll have to ask first, but I'll see if I can :) Talk to you later, and thanks for the info! Steve
On 1/17/07, Steve <cheetah@tanabi.org> wrote:
It must be, Oracle sells it pretty heavily as a data warehousing feature ;). Oracle calls it a materialized view, and the basic premise is you have a "change" log (called a materialized log by Oracle) and you have a job that runs through the change log and applies the changes to the materialized view.
If you are using aggregates, be careful and make sure you use simple forms of those aggregates. For example, if you are using an "average" function then you should have two columns sum and count instead. Some aggregates are too complex and cannot be represented by this solution and you will find that you can't update the summary tables, so definitely try to stay away from complex aggregates if you do not need them.
Here is a link to a PL/pgSQL effort that tries to simulate materialized views:
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
I don't know how complete it is, and it looks like there was a project started but has been abandoned for the last 3 years.
--
Chad
http://www.postgresqlforums.com/
However, I will look into this and see if I can figure out this
average value. This may be a valid idea, and I'll look some more at it.
It must be, Oracle sells it pretty heavily as a data warehousing feature ;). Oracle calls it a materialized view, and the basic premise is you have a "change" log (called a materialized log by Oracle) and you have a job that runs through the change log and applies the changes to the materialized view.
If you are using aggregates, be careful and make sure you use simple forms of those aggregates. For example, if you are using an "average" function then you should have two columns sum and count instead. Some aggregates are too complex and cannot be represented by this solution and you will find that you can't update the summary tables, so definitely try to stay away from complex aggregates if you do not need them.
Here is a link to a PL/pgSQL effort that tries to simulate materialized views:
http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html
I don't know how complete it is, and it looks like there was a project started but has been abandoned for the last 3 years.
--
Chad
http://www.postgresqlforums.com/
cheetah@tanabi.org (Steve) writes: > I'm wondering what we can do to make > this better if anything; would it be better to leave the indexes on? > It doesn't seem to be. Definitely NOT. Generating an index via a bulk sort is a LOT faster than loading data into an index one tuple at a time. We saw a BIG increase in performance of Slony-I when, in version 1.1.5, we added a modification that shuts off indexes during COPY and then does a reindex. Conceivably, you might look at how Slony-I does that, and try doing the same thing; it might well be faster than doing a bunch of reindexes serially. (Or not...) > Would it be better to use INSERTs instead of copies? Doesn't seem > to be. I'd be mighty surprised. > - The load process itself takes about 6 gigs of memory, the rest is > free for postgres because this is basically all the machine does. The settings you have do not seem conspicuously wrong in any way. The one thought which comes to mind is that if you could turn this into a *mostly* incremental change, that might help. The thought: - Load the big chunk of data into a new table - Generate some minimal set of indices on the new table - Generate four queries that compare old to new: q1 - See which tuples are unchanged from yesterday to today q2 - See which tuples have been deleted from yesterday to today q3 - See which tuples have been added q4 - See which tuples have been modified If the "unchanged" set is extremely large, then you might see benefit to doing updates based on deleting the rows indicated by q2, inserting rows based on q3, and updating based on q4. In principle, computing and applying those 4 queries might be quicker than rebuilding from scratch. In principle, applying q2, then q4, then vacuuming, then q3, ought to be "optimal." -- let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;; http://linuxdatabases.info/info/linux.html "A 'Cape Cod Salsa' just isn't right." -- Unknown
> Generally speaking, once you've gotten to the point of swapping, even a > little, you've gone too far. A better approach is to pick some > conservative number, like 10-25% of your ram for shared_buffers, and 1 > gig or so for maintenance work_mem, and then increase them while > exercising the system, and measure the difference increasing them makes. > > If going from 1G shared buffers to 2G shared buffers gets you a 10% > increase, then good. If going from 2G to 4G gets you a 1.2% increase, > it's questionable. You should reach a point where throwing more > shared_buffers stops helping before you start swapping. But you might > not. > > Same goes for maintenance work mem. Incremental changes, accompanied by > reproduceable benchmarks / behaviour measurements are the way to > determine the settings. > > Note that you can also vary those during different times of the day. > you can have maint_mem set to 1Gig during the day and crank it up to 8 > gig or something while loading data. Shared_buffers can't be changed > without restarting the db though. > I'm currently benchmarking various configuration adjustments. Problem is these tests take a really long time because I have to run the load process... which is like a 9 hour deal. That's why I'm asking for advice here, because there's a lot of variables here and it's really time costly to test :) I'm still working on the benchmarkings and by Friday I should have some interesting statistics to work with and maybe help figure out what's going on. Thanks! Steve
> The thought: > > - Load the big chunk of data into a new table > > - Generate some minimal set of indices on the new table > > - Generate four queries that compare old to new: > q1 - See which tuples are unchanged from yesterday to today > q2 - See which tuples have been deleted from yesterday to today > q3 - See which tuples have been added > q4 - See which tuples have been modified > > If the "unchanged" set is extremely large, then you might see benefit > to doing updates based on deleting the rows indicated by q2, > inserting rows based on q3, and updating based on q4. > > In principle, computing and applying those 4 queries might be quicker > than rebuilding from scratch. > > In principle, applying q2, then q4, then vacuuming, then q3, ought to > be "optimal." This looks like an interesting idea, and I'm going to take a look at how feasible it'll be to impletement. I may be able to combine this with Mr. Wagner's idea to make a much more efficient system overall. It's going to be a pretty big programming task, but I've a feeling this summarizer thing may just need to be re-written with a smarter system like this to get something faster. Thanks! Steve
On Wed, 2007-01-17 at 18:27, Steve wrote: > > Generally speaking, once you've gotten to the point of swapping, even a > > little, you've gone too far. A better approach is to pick some > > conservative number, like 10-25% of your ram for shared_buffers, and 1 > > gig or so for maintenance work_mem, and then increase them while > > exercising the system, and measure the difference increasing them makes. > > > > If going from 1G shared buffers to 2G shared buffers gets you a 10% > > increase, then good. If going from 2G to 4G gets you a 1.2% increase, > > it's questionable. You should reach a point where throwing more > > shared_buffers stops helping before you start swapping. But you might > > not. > > > > Same goes for maintenance work mem. Incremental changes, accompanied by > > reproduceable benchmarks / behaviour measurements are the way to > > determine the settings. > > > > Note that you can also vary those during different times of the day. > > you can have maint_mem set to 1Gig during the day and crank it up to 8 > > gig or something while loading data. Shared_buffers can't be changed > > without restarting the db though. > > > > I'm currently benchmarking various configuration adjustments. Problem is > these tests take a really long time because I have to run the load > process... which is like a 9 hour deal. That's why I'm asking for advice > here, because there's a lot of variables here and it's really time costly > to test :) > > I'm still working on the benchmarkings and by Friday I should have some > interesting statistics to work with and maybe help figure out what's going > on. You can probably take a portion of what you're loading and make a benchmark of the load process that is repeatable (same data, size, etc...) each time, but only takes 30 minutes to an hour to run each time. shortens your test iteration AND makes it reliably repeatable.
On 1/17/07, Steve <cheetah@tanabi.org> wrote: > Hey there; > I've been lurking on this list awhile, and I've been working with postgres > for a number of years so I'm not exactly new to this. But I'm still > having trouble getting a good balance of settings and I'd like to see what > other people think. We may also be willing to hire a contractor to help > tackle this problem if anyone is interested. I happen to be something of a cobol->sql expert, if you are interested in some advice you can contact me off-list. I converted an enterprise cobol (in acucobol) app to Postgresql by plugging pg into the cobol system via custom c++ isam driver. > I've got an application here that runs large (in terms of length -- the > queries have a lot of conditions in them) queries that can potentially > return millions of rows but on average probably return tens of thousands > of rows. It's read only for most of the day, and pretty much all the > queries except one are really fast. If it's just one query I think I'd focus on optimizing that query, not .conf settings. In my opinion .conf tuning (a few gotchas aside) doesn't really get you all that much. > However, each night we load data from a legacy cobol system into the SQL > system and then we summarize that data to make the reports faster. This > load process is intensely insert/update driven but also has a hefty > amount of selects as well. This load process is taking ever longer to > complete. > > > SO ... our goal here is to make this load process take less time. It > seems the big part is building the big summary table; this big summary > table is currently 9 million rows big. Every night, we drop the table, > re-create it, build the 9 million rows of data (we use COPY to put hte > data in when it's prepared, not INSERT), and then build the indexes on it > -- of which there are many. Unfortunately this table gets queried > in a lot of different ways and needs these indexes; also unfortunately, we > have operator class indexes to support both ASC and DESC sorting on I have some very specific advice here. Check out row-wise comparison feature introduced in 8.2. > columns so these are for all intents and purposes duplicate but required > under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still > a requirement?) > Building these indexes takes forever! It's a long grind through inserts > and then building the indexes takes a hefty amount of time too. (about 9 > hours). Now, the application is likely part at fault, and we're working > to make it more efficient, but it has nothing to do with the index > building time. I'm wondering what we can do to make this better if > anything; would it be better to leave the indexes on? It doesn't seem to > be. Would it be better to use INSERTs instead of copies? Doesn't seem to no. probably any optimization strategies would focus on reducing the amount of data you had to load. merlin