Thread: PG8.2.1 choosing slow seqscan over idx scan

PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
Running PostgreSQL 8.2.1 on Win32.   The query planner is choosing a seq
scan over index scan even though index scan is faster (as shown by
disabling seqscan).  Table is recently analyzed and row count estimates
seem to be in the ballpark.

Another tidbit - I haven't done a "vacuum full" ever, although I "vacuum
analyze" regularly (and autovacuum).  I recently noticed that the PG
data drive is 40% fragmented (NTFS).  Could that be making the seqscan
slower than it should be?  Regardless of the fragmentations affect on
performance, is the query planner making a good decision here?


SOME CONFIGURATION PARAMS
effective_cache_size=1000MB
random_page_cost=3
default_statistics_target=50
shared_buffers=400MB
temp_buffers=10MB
work_mem=10MB
checkpoint_segments=12


QUERY
select merchant_dim_id,
       dcms_dim_id,
       sum(success) as num_success,
       sum(failed) as num_failed,
       count(*) as total_transactions,
       (sum(success) * 1.0 / count(*)) as success_rate
from transaction_facts
where transaction_date >= '2007-1-16'
and transaction_date < '2007-1-16 15:20'
group by merchant_dim_id, dcms_dim_id;


EXPLAIN ANALYZE (enable_seqscan=true)
HashAggregate  (cost=339573.01..340089.89 rows=15904 width=16) (actual
time=140606.593..140650.573 rows=10549 loops=1)
  ->  Seq Scan on transaction_facts  (cost=0.00..333928.25 rows=322558
  width=16) (actual time=19917.957..140036.910 rows=347434 loops=1)
        Filter: ((transaction_date >= '2007-01-16 00:00:00'::timestamp
        without time zone) AND (transaction_date < '2007-01-16
        15:20:00'::timestamp without time zone))
Total runtime: 140654.813 ms


EXPLAIN ANALYZE (enable_seqscan=false)
HashAggregate  (cost=379141.53..379658.41 rows=15904 width=16) (actual
time=3720.838..3803.748 rows=10549 loops=1)
  ->  Bitmap Heap Scan on transaction_facts  (cost=84481.80..373496.76
  rows=322558 width=16) (actual time=244.568..3133.741 rows=347434
  loops=1)
        Recheck Cond: ((transaction_date >= '2007-01-16
        00:00:00'::timestamp without time zone) AND (transaction_date <
        '2007-01-16 15:20:00'::timestamp without time zone))
        ->  Bitmap Index Scan on transaction_facts_transaction_date_idx
        (cost=0.00..84401.16 rows=322558 width=0) (actual
        time=241.994..241.994 rows=347434 loops=1)
              Index Cond: ((transaction_date >= '2007-01-16
              00:00:00'::timestamp without time zone) AND
              (transaction_date < '2007-01-16 15:20:00'::timestamp
              without time zone))
Total runtime: 3810.795 ms

Re: PG8.2.1 choosing slow seqscan over idx scan

From
Tom Lane
Date:
"Jeremy Haile" <jhaile@fastmail.fm> writes:
> Running PostgreSQL 8.2.1 on Win32.   The query planner is choosing a seq
> scan over index scan even though index scan is faster (as shown by
> disabling seqscan).  Table is recently analyzed and row count estimates
> seem to be in the ballpark.

Try reducing random_page_cost a bit.  Keep in mind that you are probably
measuring a fully-cached situation here, if you repeated the test case.
If your database fits into memory reasonably well then that's fine and
you want to optimize for that case ... but otherwise you may find
yourself pessimizing the actual behavior.

            regards, tom lane

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
Thanks Tom!  Reducing random_page_cost to 2 did the trick for this
query.  It now favors the index scan.

Even if this is a cached situation, I wouldn't expect a difference of 3
min vs 3 seconds.

Even if unrelated, do you think disk fragmentation would have negative
effects?  Is it worth trying to defragment the drive on a regular basis
in Windows?

Jeremy Haile


On Tue, 16 Jan 2007 16:39:07 -0500, "Tom Lane" <tgl@sss.pgh.pa.us> said:
> "Jeremy Haile" <jhaile@fastmail.fm> writes:
> > Running PostgreSQL 8.2.1 on Win32.   The query planner is choosing a seq
> > scan over index scan even though index scan is faster (as shown by
> > disabling seqscan).  Table is recently analyzed and row count estimates
> > seem to be in the ballpark.
>
> Try reducing random_page_cost a bit.  Keep in mind that you are probably
> measuring a fully-cached situation here, if you repeated the test case.
> If your database fits into memory reasonably well then that's fine and
> you want to optimize for that case ... but otherwise you may find
> yourself pessimizing the actual behavior.
>
>             regards, tom lane

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Chad Wagner"
Date:
On 1/16/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
Even if unrelated, do you think disk fragmentation would have negative
effects?  Is it worth trying to defragment the drive on a regular basis
in Windows?

Out of curiosity, is this table heavily updated or deleted from?  Perhaps there is an unfavorable "correlation" between the btree and data?  Can you dump the results of

select attname, null_frac, avg_width, n_distinct, correlation from pg_stats where tablename = 'transaction_facts'




--
Chad
http://www.postgresqlforums.com/

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
Hey Chad,

The table is heavily inserted and deleted from.  Recently I had done a
very large delete.

Here is the results of the query you sent me: (sorry it's hard to read)

"dcms_dim_id";0;4;755;-0.00676181
"transaction_fact_id";0;4;-1;-0.194694
"failed";0;4;2;0.964946
"van16";0;23;145866;0.00978649
"vendor_response";0.9942;43;9;0.166527
"transaction_id";0;4;-1;-0.199583
"transaction_date";0;8;172593;-0.194848
"serial_number";0.0434667;16;53311;0.0713039
"merchant_dim_id";0;4;105;0.299335
"comment";0.0052;29;7885;0.0219167
"archived";0;1;2;0.84623
"response_code";0.9942;4;3;0.905409
"transaction_source";0;4;2;0.983851
"location_dim_id";0;4;86;0.985384
"success";0;4;2;0.981072

Just curious - what does that tell us?

Jeremy Haile

On Tue, 16 Jan 2007 17:44:53 -0500, "Chad Wagner"
<chad.wagner@gmail.com> said:
> On 1/16/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
> >
> > Even if unrelated, do you think disk fragmentation would have negative
> > effects?  Is it worth trying to defragment the drive on a regular basis
> > in Windows?
> >
>
> Out of curiosity, is this table heavily updated or deleted from?  Perhaps
> there is an unfavorable "correlation" between the btree and data?  Can
> you
> dump the results of
>
> select attname, null_frac, avg_width, n_distinct, correlation from
> pg_stats
> where tablename = 'transaction_facts'
>
>
>
>
> --
> Chad
> http://www.postgresqlforums.com/

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Chad Wagner"
Date:
On 1/16/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
The table is heavily inserted and deleted from.  Recently I had done a
very large delete.

That's what I suspected. 

Here is the results of the query you sent me: (sorry it's hard to read)

"transaction_date";0;8;172593;-0.194848
Just curious - what does that tell us?

Based on the explain plan you posted earlier we learned the optimizer believed the query would return 322558 rows (and it was reasonably accurate, too) for a relatively small time frame (15 hours and 20 minutes). 

->  Seq Scan on transaction_facts  (cost=0.00..333928.25 rows=322558
 width=16) (actual time=19917.957..140036.910 rows=347434 loops=1)

Based on the information you just posted, the average row length is 156 bytes.

347434 rows * 156 bytes = 52MB (reasonable it could be held in your shared buffers, which makes Tom's suggestion very plausible, the index scan may not be cheaper -- because it is all cached)


The estimation of cost when you are requesting a "range" of data will involve the "correlation" factor, the correlation is defined as:

"Statistical correlation between physical row ordering and logical ordering of the column values. This ranges from -1 to +1. When the value is near -1 or +1, an index scan on the column will be estimated to be cheaper than when it is near zero, due to reduction of random access to the disk. (This column is NULL if the column data type does not have a < operator.)"

Which means that as correlation approaches zero (which it is -0.19, I would call that zero) it represents that the physical ordering of the data (in the data files) is such that a range scan of the btree index would result in many scatter reads (which are slow).  So the optimizer considers whether a "scattered" read will be cheaper than a sequential scan based on a few factors: a) correlation [for ranges] b) size of table c) estimated cardinality [what does it think you are going to get back].  Based on those factors it may choose either access path (sequential or index).

One of the reasons why the sequential scan is slower is because the optimizer doesn't know the data you are requesting is sitting in the cache (and it is VERY unlikely you have the entire table in cache, unless it is a heavily used table or very small table, which it's probably not).  So once it decides a sequential scan is the best plan, it starts chugging away at the disk and pulling in most of the pages in the table and throws away the pages that do not meet your criteria.

The index scan is quicker (may be bogus, as Tom suggested) because the it starts chugging away at the index and finds that many of the pages you are interested in are cached (but it didn't know, you just got lucky!).

In practice, once you start pulling in 15% or more of the table it is often cheaper just to read the entire table in, rather than scatter reads + double I/O.  Remember that an index access means I have to read the index PLUS the table from disk, and a sequential scan just throws away the index and reads the table from disk.


I would suggest running explain analyze after restarting the database (that may not be even good enough, because a large portion of the data file may be in the OS's buffer cache, hrm -- reboot? unmount?) and see how cheap that index access path is.

One thing to be careful of here is that you really need to consider what is the primary use of the table, and what are the major queries you will be launching against it.  But you could improve the correlation by rebuilding the table ordered by the transaction_date column, but it may screw up other range scans.  Another option is partitioning.  I wouldn't do any of this stuff, until you find out the last tweak you made still holds true, give it a few days, perhaps test it after a clean reboot of the server.

Let me know if any of this is inaccurate folks, as I certainly don't profess to be an expert on the internals of PostgreSQL, but I believe it is accurate based on my prior experiences with other database products.  :)


--
Chad
http://www.postgresqlforums.com/

Re: PG8.2.1 choosing slow seqscan over idx scan

From
Scott Marlowe
Date:
On Tue, 2007-01-16 at 21:58 -0500, Jeremy Haile wrote:
> Hey Chad,
>
> The table is heavily inserted and deleted from.  Recently I had done a
> very large delete.

I still keep wondering if this table is bloated with dead tuples.  Even
if you vacuum often if there's a connection with an idle transaction,
the tuples can't be reclaimed and the table would continue to grow.

Anyway, what does

vacuum analyze tablename

say (where tablename is, of course, the name of the table we're looking
at)?  Pay particular attention to DETAIL statements.

Assuming the table's NOT bloated, you may do well to increase the
effective_cache_size, which doesn't allocate anything, but just tells
the query planner about how big your operating systems file cache is as
regards postgresql.  It's a bit of a course setting, i.e. you can make
rather large changes to it without blowing things up.  If you've got a
couple gigs on your machine, try setting it to something like 512MB or
so.

If your table is bloating, and you don't have idle transactions hanging
of the database, it could be that your fsm settings are too low.

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
> I still keep wondering if this table is bloated with dead tuples.  Even
> if you vacuum often if there's a connection with an idle transaction,
> the tuples can't be reclaimed and the table would continue to grow.

I used to vacuum once an hour, although I've switched it to autovacuum
now.  It definitely could be bloated with dead tuples.  I'll paste the
"vacuum analyze verbose" output at the bottom of this e-mail.  Would a
vacuum full be a good idea?


> Assuming the table's NOT bloated, you may do well to increase the
> effective_cache_size, which doesn't allocate anything,
<snip>
> try setting it to something like 512MB or so.

It's currently set to 1000MB.


> If your table is bloating, and you don't have idle transactions hanging
> of the database, it could be that your fsm settings are too low.

fsm is currently set to 2000000.  Is there any harm in setting it too
high? =)

Here's the vacuum analyze verbose output:

INFO:  vacuuming "public.transaction_facts"
INFO:  scanned index "transaction_facts_pkey" to remove 759969 row
versions
DETAIL:  CPU 7.20s/2.31u sec elapsed 315.31 sec.
INFO:  scanned index "transaction_facts_dcms_dim_id_idx" to remove
759969 row versions
DETAIL:  CPU 1.29s/2.15u sec elapsed 146.98 sec.
INFO:  scanned index "transaction_facts_merchant_dim_id_idx" to remove
759969 row versions
DETAIL:  CPU 1.10s/2.10u sec elapsed 126.09 sec.
INFO:  scanned index "transaction_facts_transaction_date_idx" to remove
759969 row versions
DETAIL:  CPU 1.65s/2.40u sec elapsed 259.25 sec.
INFO:  scanned index "transaction_facts_transaction_id_idx" to remove
759969 row versions
DETAIL:  CPU 7.48s/2.85u sec elapsed 371.98 sec.
INFO:  scanned index "transaction_facts_product_date_idx" to remove
759969 row versions
DETAIL:  CPU 2.32s/2.10u sec elapsed 303.83 sec.
INFO:  scanned index "transaction_facts_merchant_product_date_idx" to
remove 759969 row versions
DETAIL:  CPU 2.48s/2.31u sec elapsed 295.19 sec.
INFO:  scanned index "transaction_facts_merchant_date_idx" to remove
759969 row versions
DETAIL:  CPU 8.10s/3.35u sec elapsed 398.73 sec.
INFO:  scanned index "transaction_facts_success_idx" to remove 759969
row versions
DETAIL:  CPU 5.01s/2.84u sec elapsed 192.73 sec.
INFO:  scanned index "transaction_facts_failed_idx" to remove 759969 row
versions
DETAIL:  CPU 1.03s/1.90u sec elapsed 123.00 sec.
INFO:  scanned index "transaction_facts_archived_idx" to remove 759969
row versions
DETAIL:  CPU 1.03s/1.39u sec elapsed 104.42 sec.
INFO:  scanned index "transaction_facts_response_code_idx" to remove
759969 row versions
DETAIL:  CPU 0.75s/2.17u sec elapsed 36.71 sec.
INFO:  scanned index "transaction_facts_transaction_source_idx" to
remove 759969 row versions
DETAIL:  CPU 0.60s/1.75u sec elapsed 42.29 sec.
INFO:  scanned index "transaction_facts_transaction_id_source_idx" to
remove 759969 row versions
DETAIL:  CPU 1.14s/1.84u sec elapsed 44.75 sec.
INFO:  "transaction_facts": removed 759969 row versions in 14360 pages
DETAIL:  CPU 0.57s/0.23u sec elapsed 45.28 sec.
INFO:  index "transaction_facts_pkey" now contains 2274280 row versions
in 152872 pages
DETAIL:  759969 index row versions were removed.
134813 index pages have been deleted, 134813 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  index "transaction_facts_dcms_dim_id_idx" now contains 2274280
row versions in 85725 pages
DETAIL:  759323 index row versions were removed.
75705 index pages have been deleted, 73721 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_dim_id_idx" now contains
2274280 row versions in 80023 pages
DETAIL:  759969 index row versions were removed.
71588 index pages have been deleted, 69210 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_date_idx" now contains
2274280 row versions in 144196 pages
DETAIL:  759969 index row versions were removed.
126451 index pages have been deleted, 126451 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_id_idx" now contains 2274280
row versions in 150529 pages
DETAIL:  759969 index row versions were removed.
130649 index pages have been deleted, 130649 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_product_date_idx" now contains 2274280
row versions in 202248 pages
DETAIL:  759969 index row versions were removed.
174652 index pages have been deleted, 174652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_product_date_idx" now contains
2274280 row versions in 202997 pages
DETAIL:  759969 index row versions were removed.
175398 index pages have been deleted, 175398 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_date_idx" now contains 2274280
row versions in 203561 pages
DETAIL:  759969 index row versions were removed.
175960 index pages have been deleted, 175960 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_success_idx" now contains 2274280 row
versions in 78237 pages
DETAIL:  759969 index row versions were removed.
70239 index pages have been deleted, 67652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_failed_idx" now contains 2274280 row
versions in 78230 pages
DETAIL:  759969 index row versions were removed.
70231 index pages have been deleted, 67665 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_archived_idx" now contains 2274280 row
versions in 72943 pages
DETAIL:  759969 index row versions were removed.
64962 index pages have been deleted, 62363 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_response_code_idx" now contains 2274280
row versions in 16918 pages
DETAIL:  759969 index row versions were removed.
8898 index pages have been deleted, 6314 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_source_idx" now contains
2274280 row versions in 14235 pages
DETAIL:  759969 index row versions were removed.
6234 index pages have been deleted, 3663 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_id_source_idx" now contains
2274280 row versions in 18053 pages
DETAIL:  759969 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "transaction_facts": found 759969 removable, 2274280 nonremovable
row versions in 308142 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 15710471 unused item pointers.
266986 pages contain useful free space.
0 pages are entirely empty.
CPU 58.00s/35.59u sec elapsed 3240.94 sec.
INFO:  analyzing "public.transaction_facts"
INFO:  "transaction_facts": scanned 15000 of 308142 pages, containing
113476 live rows and 0 dead rows; 15000 rows in sample, 2331115
estimated total rows

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Shoaib Mir"
Date:
A good idea here will be to first do a VACUUM FULL and then keep the Autovacuum settings you want.

-----------------
Shoaib Mir
EnterpriseDB (www.enterprisedb.com)

On 1/17/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
> I still keep wondering if this table is bloated with dead tuples.  Even
> if you vacuum often if there's a connection with an idle transaction,
> the tuples can't be reclaimed and the table would continue to grow.

I used to vacuum once an hour, although I've switched it to autovacuum
now.  It definitely could be bloated with dead tuples.  I'll paste the
"vacuum analyze verbose" output at the bottom of this e-mail.  Would a
vacuum full be a good idea?


> Assuming the table's NOT bloated, you may do well to increase the
> effective_cache_size, which doesn't allocate anything,
<snip>
> try setting it to something like 512MB or so.

It's currently set to 1000MB.


> If your table is bloating, and you don't have idle transactions hanging
> of the database, it could be that your fsm settings are too low.

fsm is currently set to 2000000.  Is there any harm in setting it too
high? =)

Here's the vacuum analyze verbose output:

INFO:  vacuuming "public.transaction_facts"
INFO:  scanned index "transaction_facts_pkey" to remove 759969 row
versions
DETAIL:  CPU 7.20s/2.31u sec elapsed 315.31 sec.
INFO:  scanned index "transaction_facts_dcms_dim_id_idx" to remove
759969 row versions
DETAIL:  CPU 1.29s/2.15u sec elapsed 146.98 sec.
INFO:  scanned index "transaction_facts_merchant_dim_id_idx" to remove
759969 row versions
DETAIL:  CPU 1.10s/2.10u sec elapsed 126.09 sec.
INFO:  scanned index "transaction_facts_transaction_date_idx" to remove
759969 row versions
DETAIL:  CPU 1.65s/2.40u sec elapsed 259.25 sec.
INFO:  scanned index "transaction_facts_transaction_id_idx" to remove
759969 row versions
DETAIL:  CPU 7.48s/2.85u sec elapsed 371.98 sec.
INFO:  scanned index "transaction_facts_product_date_idx" to remove
759969 row versions
DETAIL:  CPU 2.32s/2.10u sec elapsed 303.83 sec.
INFO:  scanned index "transaction_facts_merchant_product_date_idx" to
remove 759969 row versions
DETAIL:  CPU 2.48s/2.31u sec elapsed 295.19 sec.
INFO:  scanned index "transaction_facts_merchant_date_idx" to remove
759969 row versions
DETAIL:  CPU 8.10s/3.35u sec elapsed 398.73 sec.
INFO:  scanned index "transaction_facts_success_idx" to remove 759969
row versions
DETAIL:  CPU 5.01s/2.84u sec elapsed 192.73 sec.
INFO:  scanned index "transaction_facts_failed_idx" to remove 759969 row
versions
DETAIL:  CPU 1.03s/1.90u sec elapsed 123.00 sec.
INFO:  scanned index "transaction_facts_archived_idx" to remove 759969
row versions
DETAIL:  CPU 1.03s/1.39u sec elapsed 104.42 sec.
INFO:  scanned index "transaction_facts_response_code_idx" to remove
759969 row versions
DETAIL:  CPU 0.75s/2.17u sec elapsed 36.71 sec.
INFO:  scanned index "transaction_facts_transaction_source_idx" to
remove 759969 row versions
DETAIL:  CPU 0.60s/1.75u sec elapsed 42.29 sec.
INFO:  scanned index "transaction_facts_transaction_id_source_idx" to
remove 759969 row versions
DETAIL:  CPU 1.14s/1.84u sec elapsed 44.75 sec.
INFO:  "transaction_facts": removed 759969 row versions in 14360 pages
DETAIL:  CPU 0.57s/0.23u sec elapsed 45.28 sec.
INFO:  index "transaction_facts_pkey" now contains 2274280 row versions
in 152872 pages
DETAIL:  759969 index row versions were removed.
134813 index pages have been deleted, 134813 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO:  index "transaction_facts_dcms_dim_id_idx" now contains 2274280
row versions in 85725 pages
DETAIL:  759323 index row versions were removed.
75705 index pages have been deleted, 73721 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_dim_id_idx" now contains
2274280 row versions in 80023 pages
DETAIL:  759969 index row versions were removed.
71588 index pages have been deleted, 69210 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_date_idx" now contains
2274280 row versions in 144196 pages
DETAIL:  759969 index row versions were removed.
126451 index pages have been deleted, 126451 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_id_idx" now contains 2274280
row versions in 150529 pages
DETAIL:  759969 index row versions were removed.
130649 index pages have been deleted, 130649 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_product_date_idx" now contains 2274280
row versions in 202248 pages
DETAIL:  759969 index row versions were removed.
174652 index pages have been deleted, 174652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_product_date_idx" now contains
2274280 row versions in 202997 pages
DETAIL:  759969 index row versions were removed.
175398 index pages have been deleted, 175398 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_merchant_date_idx" now contains 2274280
row versions in 203561 pages
DETAIL:  759969 index row versions were removed.
175960 index pages have been deleted, 175960 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_success_idx" now contains 2274280 row
versions in 78237 pages
DETAIL:  759969 index row versions were removed.
70239 index pages have been deleted, 67652 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_failed_idx" now contains 2274280 row
versions in 78230 pages
DETAIL:  759969 index row versions were removed.
70231 index pages have been deleted, 67665 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_archived_idx" now contains 2274280 row
versions in 72943 pages
DETAIL:  759969 index row versions were removed.
64962 index pages have been deleted, 62363 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_response_code_idx" now contains 2274280
row versions in 16918 pages
DETAIL:  759969 index row versions were removed.
8898 index pages have been deleted, 6314 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_source_idx" now contains
2274280 row versions in 14235 pages
DETAIL:  759969 index row versions were removed.
6234 index pages have been deleted, 3663 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "transaction_facts_transaction_id_source_idx" now contains
2274280 row versions in 18053 pages
DETAIL:  759969 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "transaction_facts": found 759969 removable, 2274280 nonremovable
row versions in 308142 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 15710471 unused item pointers.
266986 pages contain useful free space.
0 pages are entirely empty.
CPU 58.00s/35.59u sec elapsed 3240.94 sec.
INFO:  analyzing "public.transaction_facts"
INFO:  "transaction_facts": scanned 15000 of 308142 pages, containing
113476 live rows and 0 dead rows; 15000 rows in sample, 2331115
estimated total rows

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
Thanks for the great info Chad.  I'm learning a lot from this thread!

> 347434 rows * 156 bytes = 52MB (reasonable it could be held in your
> shared buffers, which makes Tom's suggestion very plausible, the
> index scan may not be cheaper -- because it is all cached)

Maybe - I tried running the same query for an older time range that is
less likely to be cached.  The index scan took longer than my previous
example, but still only took 16 seconds, compared to the 87 seconds
required to seqscan the table.  When I can, I'll restart the machine and
run a comparison again to get a "pure" test.


> One of the reasons why the sequential scan is slower is because the
> optimizer doesn't know the data you are requesting is sitting in the
> cache (and it is VERY unlikely you have the entire table in cache,
> unless it is a heavily used table or very small table, which it's probably
> not).

This is a large table (3 million rows).  Rows are typically inserted in
date order, although large numbers of rows are deleted every night.
Basically, this table contains a list of transactions in a rolling time
window.  So inserts happen throughout the day, and then a day's worth of
old rows are deleted every night.  The most accessed rows are going to
be today's rows, which is a small subset of the overall table. (maybe
14%)

> One thing to be careful of here is that you really need to consider what
> is  the primary use of the table, and what are the major queries you will be
> launching against it.  But you could improve the correlation by
> rebuilding the table ordered by the transaction_date column, but it may screw up
> other range scans.

Date is almost always a criteria in scans of this table.  As mentioned
earlier, the table is naturally built in date order - so would
rebuilding the table help?  Is it possible that even though I'm
inserting in date order, since I delete rows so often the physical
correlation would get disrupted as disk pages are reused?  Perhaps
clustering on the transaction_date index and periodically running
"cluster" would help?  Does vacuum full help with this at all?

> Another option is partitioning.  I wouldn't do any of this
> stuff, until you find out the last tweak you made still holds true, give
> it a few days, perhaps test it after a clean reboot of the server.

Yeah - partitioning makes a lot of sense and I've thought about doing
this in the past.  Although I do run queries that cross multiple days,
most of my queries only look at today's data, so the physical disk
organization would likely be much better with a partitioned table setup.
 Also, since I usually delete old data one day at a time, I could simply
drop the old day's partition.  This would make vacuuming much less of an
issue.

But I won't be making any changes immediately, so I'll continue to run
tests given your advice.

Thanks again,
Jeremy Haile



Re: PG8.2.1 choosing slow seqscan over idx scan

From
Dave Cramer
Date:
On 17-Jan-07, at 9:37 AM, Jeremy Haile wrote:

>> I still keep wondering if this table is bloated with dead tuples.
>> Even
>> if you vacuum often if there's a connection with an idle transaction,
>> the tuples can't be reclaimed and the table would continue to grow.
>
> I used to vacuum once an hour, although I've switched it to autovacuum
> now.  It definitely could be bloated with dead tuples.  I'll paste the
> "vacuum analyze verbose" output at the bottom of this e-mail.  Would a
> vacuum full be a good idea?
>
>
>> Assuming the table's NOT bloated, you may do well to increase the
>> effective_cache_size, which doesn't allocate anything,
> <snip>
>> try setting it to something like 512MB or so.
>
> It's currently set to 1000MB.

How much memory does the box have

>
>
>> If your table is bloating, and you don't have idle transactions
>> hanging
>> of the database, it could be that your fsm settings are too low.
>
> fsm is currently set to 2000000.  Is there any harm in setting it too
> high? =)

Yes, it takes up space
>
> Here's the vacuum analyze verbose output:
>
> INFO:  vacuuming "public.transaction_facts"
> INFO:  scanned index "transaction_facts_pkey" to remove 759969 row
> versions
> DETAIL:  CPU 7.20s/2.31u sec elapsed 315.31 sec.
> INFO:  scanned index "transaction_facts_dcms_dim_id_idx" to remove
> 759969 row versions
> DETAIL:  CPU 1.29s/2.15u sec elapsed 146.98 sec.
> INFO:  scanned index "transaction_facts_merchant_dim_id_idx" to remove
> 759969 row versions
> DETAIL:  CPU 1.10s/2.10u sec elapsed 126.09 sec.
> INFO:  scanned index "transaction_facts_transaction_date_idx" to
> remove
> 759969 row versions
> DETAIL:  CPU 1.65s/2.40u sec elapsed 259.25 sec.
> INFO:  scanned index "transaction_facts_transaction_id_idx" to remove
> 759969 row versions
> DETAIL:  CPU 7.48s/2.85u sec elapsed 371.98 sec.
> INFO:  scanned index "transaction_facts_product_date_idx" to remove
> 759969 row versions
> DETAIL:  CPU 2.32s/2.10u sec elapsed 303.83 sec.
> INFO:  scanned index "transaction_facts_merchant_product_date_idx" to
> remove 759969 row versions
> DETAIL:  CPU 2.48s/2.31u sec elapsed 295.19 sec.
> INFO:  scanned index "transaction_facts_merchant_date_idx" to remove
> 759969 row versions
> DETAIL:  CPU 8.10s/3.35u sec elapsed 398.73 sec.
> INFO:  scanned index "transaction_facts_success_idx" to remove 759969
> row versions
> DETAIL:  CPU 5.01s/2.84u sec elapsed 192.73 sec.
> INFO:  scanned index "transaction_facts_failed_idx" to remove
> 759969 row
> versions
> DETAIL:  CPU 1.03s/1.90u sec elapsed 123.00 sec.
> INFO:  scanned index "transaction_facts_archived_idx" to remove 759969
> row versions
> DETAIL:  CPU 1.03s/1.39u sec elapsed 104.42 sec.
> INFO:  scanned index "transaction_facts_response_code_idx" to remove
> 759969 row versions
> DETAIL:  CPU 0.75s/2.17u sec elapsed 36.71 sec.
> INFO:  scanned index "transaction_facts_transaction_source_idx" to
> remove 759969 row versions
> DETAIL:  CPU 0.60s/1.75u sec elapsed 42.29 sec.
> INFO:  scanned index "transaction_facts_transaction_id_source_idx" to
> remove 759969 row versions
> DETAIL:  CPU 1.14s/1.84u sec elapsed 44.75 sec.
> INFO:  "transaction_facts": removed 759969 row versions in 14360 pages
> DETAIL:  CPU 0.57s/0.23u sec elapsed 45.28 sec.
> INFO:  index "transaction_facts_pkey" now contains 2274280 row
> versions
> in 152872 pages
> DETAIL:  759969 index row versions were removed.
> 134813 index pages have been deleted, 134813 are currently reusable.
> CPU 0.00s/0.01u sec elapsed 0.01 sec.
> INFO:  index "transaction_facts_dcms_dim_id_idx" now contains 2274280
> row versions in 85725 pages
> DETAIL:  759323 index row versions were removed.
> 75705 index pages have been deleted, 73721 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_merchant_dim_id_idx" now contains
> 2274280 row versions in 80023 pages
> DETAIL:  759969 index row versions were removed.
> 71588 index pages have been deleted, 69210 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_transaction_date_idx" now contains
> 2274280 row versions in 144196 pages
> DETAIL:  759969 index row versions were removed.
> 126451 index pages have been deleted, 126451 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_transaction_id_idx" now contains
> 2274280
> row versions in 150529 pages
> DETAIL:  759969 index row versions were removed.
> 130649 index pages have been deleted, 130649 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_product_date_idx" now contains 2274280
> row versions in 202248 pages
> DETAIL:  759969 index row versions were removed.
> 174652 index pages have been deleted, 174652 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_merchant_product_date_idx" now
> contains
> 2274280 row versions in 202997 pages
> DETAIL:  759969 index row versions were removed.
> 175398 index pages have been deleted, 175398 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_merchant_date_idx" now contains
> 2274280
> row versions in 203561 pages
> DETAIL:  759969 index row versions were removed.
> 175960 index pages have been deleted, 175960 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_success_idx" now contains 2274280 row
> versions in 78237 pages
> DETAIL:  759969 index row versions were removed.
> 70239 index pages have been deleted, 67652 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_failed_idx" now contains 2274280 row
> versions in 78230 pages
> DETAIL:  759969 index row versions were removed.
> 70231 index pages have been deleted, 67665 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_archived_idx" now contains 2274280 row
> versions in 72943 pages
> DETAIL:  759969 index row versions were removed.
> 64962 index pages have been deleted, 62363 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_response_code_idx" now contains
> 2274280
> row versions in 16918 pages
> DETAIL:  759969 index row versions were removed.
> 8898 index pages have been deleted, 6314 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_transaction_source_idx" now contains
> 2274280 row versions in 14235 pages
> DETAIL:  759969 index row versions were removed.
> 6234 index pages have been deleted, 3663 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  index "transaction_facts_transaction_id_source_idx" now
> contains
> 2274280 row versions in 18053 pages
> DETAIL:  759969 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "transaction_facts": found 759969 removable, 2274280
> nonremovable
> row versions in 308142 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 15710471 unused item pointers.
> 266986 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 58.00s/35.59u sec elapsed 3240.94 sec.
> INFO:  analyzing "public.transaction_facts"
> INFO:  "transaction_facts": scanned 15000 of 308142 pages, containing
> 113476 live rows and 0 dead rows; 15000 rows in sample, 2331115
> estimated total rows
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
> How much memory does the box have
2GB

> Yes, it takes up space
Well, I upped max_fsm_pages to 2000000 because it vacuums were failing
with it set to 1500000.  However, I'm now autovacuuming, which might be
keeping my fsm lower.  I didn't realize that setting it too high had
negative effects, so I'll try to get a better handle on how large this
needs to be.

Re: PG8.2.1 choosing slow seqscan over idx scan

From
Scott Marlowe
Date:
On Wed, 2007-01-17 at 08:37, Jeremy Haile wrote:
> > I still keep wondering if this table is bloated with dead tuples.  Even
> > if you vacuum often if there's a connection with an idle transaction,
> > the tuples can't be reclaimed and the table would continue to grow.
>
> I used to vacuum once an hour, although I've switched it to autovacuum
> now.  It definitely could be bloated with dead tuples.  I'll paste the
> "vacuum analyze verbose" output at the bottom of this e-mail.  Would a
> vacuum full be a good idea?
>
>
> > Assuming the table's NOT bloated, you may do well to increase the
> > effective_cache_size, which doesn't allocate anything,
> <snip>
> > try setting it to something like 512MB or so.
>
> It's currently set to 1000MB.

Sounds about right for a machine with 2G memory (as you mentioned
elsewhere)

> > If your table is bloating, and you don't have idle transactions hanging
> > of the database, it could be that your fsm settings are too low.
>
> fsm is currently set to 2000000.  Is there any harm in setting it too
> high? =)

As long as you don't run out of memory, it can be pretty high.  note
that it uses shared memory, so setting it too high can cause the db to
fail to start.

> INFO:  scanned index "transaction_facts_pkey" to remove 759969 row
> versions
> DETAIL:  CPU 7.20s/2.31u sec elapsed 315.31 sec.
> INFO:  scanned index "transaction_facts_dcms_dim_id_idx" to remove
> 759969 row versions

SNIP

> DETAIL:  759969 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "transaction_facts": found 759969 removable, 2274280 nonremovable
> row versions in 308142 pages
> DETAIL:  0 dead row versions cannot be removed yet.
> There were 15710471 unused item pointers.
> 266986 pages contain useful free space.
> 0 pages are entirely empty.
> CPU 58.00s/35.59u sec elapsed 3240.94 sec.

That's about 32% dead rows.  Might be worth scheduling a vacuum full,
but it's not like I was afraid it might be.  It looks to me like you
could probably use a faster I/O subsystem in that machine though.

If the random page cost being lower fixes your issues, then I'd just run
with it lower for now.  note that while lowering it may fix one query,
it may break another.  Tuning pgsql, like any database, is as much art
as science...

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
> That's about 32% dead rows.  Might be worth scheduling a vacuum full,
> but it's not like I was afraid it might be.  It looks to me like you
> could probably use a faster I/O subsystem in that machine though.

I'll try to schedule a full vacuum tonight.  As far as I/O - it's using
SAN over fiber.  Not as fast as internal SCSI though...

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Chad Wagner"
Date:
On 1/17/07, Jeremy Haile <jhaile@fastmail.fm> wrote:
Maybe - I tried running the same query for an older time range that is
less likely to be cached.  The index scan took longer than my previous
example, but still only took 16 seconds, compared to the 87 seconds
required to seqscan the table.  When I can, I'll restart the machine and
run a comparison again to get a "pure" test.

Heh, querying a different range of data was a better idea compared to rebooting.. doh..  So I think you reasonably established that an index scan for unbuffered data would still be faster than a sequential scan.

To me this is one of those cases where the optimizer doesn't understand the clustering of the data, and it is being misled by the statistics and fixed parameters it has.  If you have fast disks (I think a fiber SAN probably counts here) then adjusting random_page_cost lower is reasonable, the lowest I have heard recommended is 2.0.  It would be nice if the database could learn to estimate these values, as newer versions of Oracle does.

Date is almost always a criteria in scans of this table.  As mentioned
earlier, the table is naturally built in date order - so would
rebuilding the table help?  Is it possible that even though I'm
inserting in date order, since I delete rows so often the physical
correlation would get disrupted as disk pages are reused?  Perhaps
clustering on the transaction_date index and periodically running
"cluster" would help?  Does vacuum full help with this at all?

Yes, cluster would rebuild the table for you.  I wouldn't do anything too intrusive, run with the random_page_cost lowered, perhaps vacuum full, reindex, and see what happens.  If it degrades over time, then I would start looking at partitioning or some other solution.


Yeah - partitioning makes a lot of sense and I've thought about doing
this in the past.  Although I do run queries that cross multiple days,
most of my queries only look at today's data, so the physical disk
organization would likely be much better with a partitioned table setup.
Also, since I usually delete old data one day at a time, I could simply
drop the old day's partition.  This would make vacuuming much less of an
issue.

Yep, my thoughts exactly.  Partitioning support is PostgreSQL is there, but it needs a bit more of a tighter integration into the core (I shouldn't have to create a view, n tables, n rules, etc).  Additionally, I have read that at some point when you have "y" partitions the performance degrades, haven't really looked into it myself.


--
Chad
http://www.postgresqlforums.com/

Re: PG8.2.1 choosing slow seqscan over idx scan

From
Scott Marlowe
Date:
On Wed, 2007-01-17 at 10:28, Jeremy Haile wrote:
> > That's about 32% dead rows.  Might be worth scheduling a vacuum full,
> > but it's not like I was afraid it might be.  It looks to me like you
> > could probably use a faster I/O subsystem in that machine though.
>
> I'll try to schedule a full vacuum tonight.  As far as I/O - it's using
> SAN over fiber.  Not as fast as internal SCSI though...

Also, look at the thread going by about index bloat by 4x.  You'll
likely want to reindex after a vacuum full since vacuum full doesn't
reclaim space in indexes and in fact often bloats indexes.

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
> It would be nice if the database could
> learn to estimate these values, as newer versions of Oracle does.

That would be really nice since it would take some of the guess work out
of it.

> Yes, cluster would rebuild the table for you.  I wouldn't do anything too
> intrusive, run with the random_page_cost lowered, perhaps vacuum full,
> reindex, and see what happens.

I'll try doing the vacuum full and reindex tonight since they require
exclusive locks.

> Yep, my thoughts exactly.  Partitioning support is PostgreSQL is there,
> but  it needs a bit more of a tighter integration into the core (I shouldn't
> have to create a view, n tables, n rules, etc).  Additionally, I have read
> that at some point when you have "y" partitions the performance degrades,
> haven't really looked into it myself.

Yeah - I haven't setup partitioning in PostgreSQL before, although I've
read quite a bit about it.  I've talked about getting improved syntax
for partitioning in PostgreSQL.  MySQL's syntax is much simpler and more
intuitive compared to setting them up with Postgres - it would be nice
if PostgreSQL adopted a similar syntax where partitions were first-class
citizens.

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
> Also, look at the thread going by about index bloat by 4x.  You'll
> likely want to reindex after a vacuum full since vacuum full doesn't
> reclaim space in indexes and in fact often bloats indexes.

Thanks for the pointer.  That thread might indeed apply to my situation.
 I'm going to reindex the the table tonight.

Jeremy Haile

Re: PG8.2.1 choosing slow seqscan over idx scan

From
Tomas Vondra
Date:
>> Assuming the table's NOT bloated, you may do well to increase the
>> effective_cache_size, which doesn't allocate anything,
> <snip>
>> try setting it to something like 512MB or so.
>
> It's currently set to 1000MB.
>
>
>> If your table is bloating, and you don't have idle transactions hanging
>> of the database, it could be that your fsm settings are too low.
>
> fsm is currently set to 2000000.  Is there any harm in setting it too
> high? =)

I generally recomend to use this - it's a nice list of the most
important settings in postgresql.conf (with respect to performance),
along with a short explanation, and suggested values:

   http://www.powerpostgresql.com/PerfList

I'm using it as a general guide when setting and tuning our servers.

Anyway, as someone already pointed out, it's an art to choose the proper
values - there's nothing like 'the only best values'.

tomas

Re: PG8.2.1 choosing slow seqscan over idx scan

From
Tomas Vondra
Date:
> That's about 32% dead rows.  Might be worth scheduling a vacuum full,
> but it's not like I was afraid it might be.  It looks to me like you
> could probably use a faster I/O subsystem in that machine though.
>
> If the random page cost being lower fixes your issues, then I'd just run
> with it lower for now.  note that while lowering it may fix one query,
> it may break another.  Tuning pgsql, like any database, is as much art
> as science...

A nice feature of postgresql is the ability to log the 'slow queries'
(exceeding some time limit) - you can use it to compare the performance
of various settings. We're using it to 'detect' stupid SQL etc.

Just set it reasonably (the value depends on you), for example we used
about 500ms originally and after about two months of improvements we
lowered it to about 100ms.

You can analyze the log by hand, but about a year ago I've written a
tool to parse it and build a set of HTML reports with an overview and
details about each query) along with graphs and examples of queries.

You can get it here: http://opensource.pearshealthcyber.cz/

Just beware, it's written in PHP and it definitely is not perfect:

   (1) memory requirements (about 4x the size of the log)
   (2) not to fast (about 20mins of P4@3GHz for a 200MB log)
   (3) it requires a certain log format (see the page)

I did some improvements to the script recently, but forgot to upload it.
I'll do that tomorrow.

Tomas

Re: PG8.2.1 choosing slow seqscan over idx scan

From
"Jeremy Haile"
Date:
Interesting - I haven't seen that tool before.  I'll have to check it
out when I get a chance.  Thanks!


On Wed, 17 Jan 2007 20:32:37 +0100, "Tomas Vondra" <tv@fuzzy.cz> said:
> > That's about 32% dead rows.  Might be worth scheduling a vacuum full,
> > but it's not like I was afraid it might be.  It looks to me like you
> > could probably use a faster I/O subsystem in that machine though.
> >
> > If the random page cost being lower fixes your issues, then I'd just run
> > with it lower for now.  note that while lowering it may fix one query,
> > it may break another.  Tuning pgsql, like any database, is as much art
> > as science...
>
> A nice feature of postgresql is the ability to log the 'slow queries'
> (exceeding some time limit) - you can use it to compare the performance
> of various settings. We're using it to 'detect' stupid SQL etc.
>
> Just set it reasonably (the value depends on you), for example we used
> about 500ms originally and after about two months of improvements we
> lowered it to about 100ms.
>
> You can analyze the log by hand, but about a year ago I've written a
> tool to parse it and build a set of HTML reports with an overview and
> details about each query) along with graphs and examples of queries.
>
> You can get it here: http://opensource.pearshealthcyber.cz/
>
> Just beware, it's written in PHP and it definitely is not perfect:
>
>    (1) memory requirements (about 4x the size of the log)
>    (2) not to fast (about 20mins of P4@3GHz for a 200MB log)
>    (3) it requires a certain log format (see the page)
>
> I did some improvements to the script recently, but forgot to upload it.
> I'll do that tomorrow.
>
> Tomas
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

Configuration Advice

From
Steve
Date:
Hey there;

I've been lurking on this list awhile, and I've been working with postgres
for a number of years so I'm not exactly new to this.  But I'm still
having trouble getting a good balance of settings and I'd like to see what
other people think.  We may also be willing to hire a contractor to help
tackle this problem if anyone is interested.

I've got an application here that runs large (in terms of length -- the
queries have a lot of conditions in them) queries that can potentially
return millions of rows but on average probably return tens of thousands
of rows.  It's read only for most of the day, and pretty much all the
queries except one are really fast.

However, each night we load data from a legacy cobol system into the SQL
system and then we summarize that data to make the reports faster.  This
load process is intensely insert/update driven but also has a hefty
amount of selects as well.  This load process is taking ever longer to
complete.


SO ... our goal here is to make this load process take less time.  It
seems the big part is building the big summary table; this big summary
table is currently 9 million rows big.  Every night, we drop the table,
re-create it, build the 9 million rows of data (we use COPY to put hte
data in when it's prepared, not INSERT), and then build the indexes on it
-- of which there are many.  Unfortunately this table gets queried
in a lot of different ways and needs these indexes; also unfortunately, we
have operator class indexes to support both ASC and DESC sorting on
columns so these are for all intents and purposes duplicate but required
under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still
a requirement?)

Building these indexes takes forever!  It's a long grind through inserts
and then building the indexes takes a hefty amount of time too.  (about 9
hours).  Now, the application is likely part at fault, and we're working
to make it more efficient, but it has nothing to do with the index
building time.  I'm wondering what we can do to make this better if
anything; would it be better to leave the indexes on?  It doesn't seem to
be.  Would it be better to use INSERTs instead of copies?  Doesn't seem to
be.


Anyway -- ANYTHING we can do to make this go faster is appreciated :)
Here's some vital statistics:

- Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI
discs.  The disc configuration seems to be a good one, it's the best of
all the ones we've tested so far.

- The load process itself takes about 6 gigs of memory, the rest is free
for postgres because this is basically all the machine does.

- If this was your machine and situation, how would you lay out the emmory
settings?  What would you set the FSM to?  Would you leave teh bgwriter on
or off?  We've already got FSYNC off because "data integrity" doesn't
matter -- this stuff is religeously backed up and we've got no problem
reinstalling it.  Besides, in order for this machine to go down, data
integrity of the DB is the least of the worries :)

Do wal_buffers/full_page_writes matter of FSYNC is off?  If so, what
settings?  What about checkpoints?

Any finally, any ideas on planner constants?  Here's what I'm using:

seq_page_cost = 0.5                     # measured on an arbitrary scale
random_page_cost = 1.0                  # same scale as above
cpu_tuple_cost = 0.001                  # same scale as above
cpu_index_tuple_cost = 0.0001           # same scale as above
cpu_operator_cost = 0.00025             # same scale as above
effective_cache_size = 679006

I really don't remember how I came up with that effective_cache_size
number....


Anyway... any advice would be appreciated :)


Steve

Re: Configuration Advice

From
Tomas Vondra
Date:
> Any finally, any ideas on planner constants?  Here's what I'm using:
>
> seq_page_cost = 0.5                     # measured on an arbitrary scale
> random_page_cost = 1.0                  # same scale as above
> cpu_tuple_cost = 0.001                  # same scale as above
> cpu_index_tuple_cost = 0.0001           # same scale as above
> cpu_operator_cost = 0.00025             # same scale as above
> effective_cache_size = 679006
>
> I really don't remember how I came up with that effective_cache_size
> number....

I don't have much experience with the way your application works, but:

1) What is the size of the whole database? Does that fit in your memory?
   That's the first thing I'd like to know and I can't find it in your
   post.

   I'm missing several other important values too - namely

     shared_buffers
     max_fsm_pages
     work_mem
     maintenance_work_mem

   BTW, is the autovacuum daemon running? If yes, try to stop it during
   the import (and run ANALYZE after the import of data).

2) What is the size of a disc page? Without that we can only guess what
   doest the effective_cache_size number means - in the usual case it's
   8kB thus giving about 5.2 GiB of memory.

   As suggested in http://www.powerpostgresql.com/PerfList I'd increase
   that to about 1.400.000 which about 10.5 GiB (about 2/3 of RAM).

   Anyway - don't be afraid this breaks something. This is just an
   information for PostgreSQL how much memory the OS is probably using
   as a filesystem cache. PostgreSQL uses this to evaluate the
   probability that the page is in a cache.

3) What is the value of maintenance_work_mem? This is a very important
   value for CREATE INDEX (and some other). The lower this value is,
   the slower the CREATE INDEX is. So try to increase the value as much
   as you can - this could / should improve the import performance
   considerably.

   But be careful - this does influence the amount of memmory allocated
   by PostgreSQL. Being in your position I wouldn't do this in the
   postgresql.conf - I'd do that in the connection used by the import
   using SET command, ie. something like

   SET maintenance_work_mem = 524288;
   CREATE INDEX ...
   CREATE INDEX ...
   CREATE INDEX ...
   CREATE INDEX ...

   for a 512 MiB of maintenance_work_mem. Maybe even a higher value
   could be used (1 GiB?). Just try to fiddle with this a little.

4) Try to set up some performance monitoring - for example a 'dstat' is
   a nice way to do that. This way you can find yout where's the
   bottleneck (memory, I/O etc.)

That's basically all I can think of right now.

Tomas

Re: Configuration Advice

From
"Chad Wagner"
Date:
On 1/17/07, Steve <cheetah@tanabi.org> wrote:
However, each night we load data from a legacy cobol system into the SQL
system and then we summarize that data to make the reports faster.  This
load process is intensely insert/update driven but also has a hefty
amount of selects as well.  This load process is taking ever longer to
complete.

How many rows do you typically load each night?  If it is say less than 10% of the total rows, then perhaps the suggestion in the next paragraph is reasonable.

SO ... our goal here is to make this load process take less time.  It
seems the big part is building the big summary table; this big summary
table is currently 9 million rows big.  Every night, we drop the table,
re-create it, build the 9 million rows of data (we use COPY to put hte
data in when it's prepared, not INSERT), and then build the indexes on it

Perhaps, placing a trigger on the source table and building a "change log" would be useful.  For example, you could scan the change log (looking for insert, update, and deletes) and integrate those changes into your summary table.  Obviously if you are using complex aggregates it may not be possible to adjust the summary table, but if you are performing simple SUM's, COUNT's, etc.  then this is a workable solution.


--
Chad
http://www.postgresqlforums.com/

Re: Configuration Advice

From
Heikki Linnakangas
Date:
Steve wrote:
> SO ... our goal here is to make this load process take less time.  It
> seems the big part is building the big summary table; this big summary
> table is currently 9 million rows big.  Every night, we drop the table,
> re-create it, build the 9 million rows of data (we use COPY to put hte
> data in when it's prepared, not INSERT), and then build the indexes on
> it -- of which there are many.

Would it be possible to just update the summary table, instead of
recreating it from scratch every night?

> Unfortunately this table gets queried in
> a lot of different ways and needs these indexes; also unfortunately, we
> have operator class indexes to support both ASC and DESC sorting on
> columns so these are for all intents and purposes duplicate but required
> under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this
> still a requirement?)

I don't think this has changed in 8.2.

> Building these indexes takes forever!  It's a long grind through inserts
> and then building the indexes takes a hefty amount of time too.  (about
> 9 hours).  Now, the application is likely part at fault, and we're
> working to make it more efficient, but it has nothing to do with the
> index building time.  I'm wondering what we can do to make this better
> if anything; would it be better to leave the indexes on?  It doesn't
> seem to be.  Would it be better to use INSERTs instead of copies?
> Doesn't seem to be.

Would it help if you created multiple indexes simultaneously? You have
enough CPU to do it. Is the index creation CPU or I/O bound? 9 million
rows should fit in 16 GB of memory, right?

> - The load process itself takes about 6 gigs of memory, the rest is free
> for postgres because this is basically all the machine does.

Can you describe the load process in more detail? What's it doing with
the 6 gigs?

> - If this was your machine and situation, how would you lay out the
> emmory settings?  What would you set the FSM to?

FSM seems irrelevant here..

> Do wal_buffers/full_page_writes matter of FSYNC is off?

Better turn off full_page_writes, since you can kiss goodbye to data
integrity anyway with fsync=off.

> Anyway... any advice would be appreciated :)

What's your maintenance_work_mem setting? It can make a big difference
in sorting the data for indexes.

If you could post the schema including the indexes, people might have
more ideas...

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Configuration Advice

From
Steve
Date:
>
> 1) What is the size of the whole database? Does that fit in your memory?
>   That's the first thing I'd like to know and I can't find it in your
>   post.

Current on-disk size is about 51 gig.  I'm not sure if there's a different
size I should be looking at instead, but that's what du tells me the
directory for the database in the "base" directory is sized at.  So, no,
it doesn't fit into memory all the way.

>   I'm missing several other important values too - namely
>
>     shared_buffers
>     max_fsm_pages
>     work_mem
>     maintenance_work_mem
>

I didn't share these because they've been in flux :)  I've been
experimenting with different values, but currently we're using:

8GB     shared_buffers
100000  max_fsm_pages
256MB   work_mem
6GB     maintenance_work_mem

>   BTW, is the autovacuum daemon running? If yes, try to stop it during
>   the import (and run ANALYZE after the import of data).

No.  all vacuums are done explicitly since the database doesn't change
during the day.  The 'order of operations' is:

- Load COBOL data into database (inserts/updates)
- VACUUM COBOL data
- Summarize COBOL data (inserts/updates with the big table using COPY)
- VACUUM summary tables

So everything gets vacuumed as soon as it's updated.

> 2) What is the size of a disc page? Without that we can only guess what
>   doest the effective_cache_size number means - in the usual case it's
>   8kB thus giving about 5.2 GiB of memory.
>

     I believe it's 8kB.  I definitely haven't changed it :)

>   As suggested in http://www.powerpostgresql.com/PerfList I'd increase
>   that to about 1.400.000 which about 10.5 GiB (about 2/3 of RAM).
>
>   Anyway - don't be afraid this breaks something. This is just an
>   information for PostgreSQL how much memory the OS is probably using
>   as a filesystem cache. PostgreSQL uses this to evaluate the
>   probability that the page is in a cache.


     Okay, I'll try the value you recommend. :)


> 3) What is the value of maintenance_work_mem? This is a very important
>   value for CREATE INDEX (and some other). The lower this value is,
>   the slower the CREATE INDEX is. So try to increase the value as much
>   as you can - this could / should improve the import performance
>   considerably.
>
>   But be careful - this does influence the amount of memmory allocated
>   by PostgreSQL. Being in your position I wouldn't do this in the
>   postgresql.conf - I'd do that in the connection used by the import
>   using SET command, ie. something like
>
>   SET maintenance_work_mem = 524288;
>   CREATE INDEX ...
>   CREATE INDEX ...
>   CREATE INDEX ...
>   CREATE INDEX ...
>
>   for a 512 MiB of maintenance_work_mem. Maybe even a higher value
>   could be used (1 GiB?). Just try to fiddle with this a little.

     It's currently at 6GB in postgres.conf, though you have a good
point in that maybe that should be before the indexes are made to save
room.  Things are certainly kinda tight in the config as is.

> 4) Try to set up some performance monitoring - for example a 'dstat' is
>   a nice way to do that. This way you can find yout where's the
>   bottleneck (memory, I/O etc.)
>
> That's basically all I can think of right now.
>

Thanks for the tips :)


Steve

Re: Configuration Advice

From
Steve
Date:

On Wed, 17 Jan 2007, Benjamin Minshall wrote:

>
>> Building these indexes takes forever!
>
>> Anyway -- ANYTHING we can do to make this go faster is appreciated :)
>> Here's some vital statistics:
>
>> - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI
>> discs.  The disc configuration seems to be a good one, it's the best of all
>> the ones we've tested so far.
>
> What are your shared_buffers, work_mem, and maintenance_work_mem settings?
>
> maintenance_work_mem is used for CREATE INDEX, and with 16GB of memory in the
> machine, maintenance_work_mem should be set to at least 1GB in my opinion.
>

shared_buffers = 8GB
work_mem = 256MB
maintenance_work_mem = 6GB

So that should be covered, unless I'm using too much memory and swapping.
It does look like it's swapping a little, but not too badly as far as I
can tell.  I'm thinking of dialing back everything a bit, but I'm not
really sure what the heck to do :)  It's all guessing for me right now.


Steve

Re: Configuration Advice

From
Benjamin Minshall
Date:
> Building these indexes takes forever!

> Anyway -- ANYTHING we can do to make this go faster is appreciated :)
> Here's some vital statistics:

> - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI
> discs.  The disc configuration seems to be a good one, it's the best of
> all the ones we've tested so far.

What are your shared_buffers, work_mem, and maintenance_work_mem settings?

maintenance_work_mem is used for CREATE INDEX, and with 16GB of memory
in the machine, maintenance_work_mem should be set to at least 1GB in my
opinion.

--
Benjamin Minshall <minshall@intellicon.biz>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

Attachment

Re: Configuration Advice

From
Scott Marlowe
Date:
On Wed, 2007-01-17 at 15:58, Steve wrote:
> On Wed, 17 Jan 2007, Benjamin Minshall wrote:
>
> >
> >> Building these indexes takes forever!
> >
> >> Anyway -- ANYTHING we can do to make this go faster is appreciated :)
> >> Here's some vital statistics:
> >
> >> - Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI
> >> discs.  The disc configuration seems to be a good one, it's the best of all
> >> the ones we've tested so far.
> >
> > What are your shared_buffers, work_mem, and maintenance_work_mem settings?
> >
> > maintenance_work_mem is used for CREATE INDEX, and with 16GB of memory in the
> > machine, maintenance_work_mem should be set to at least 1GB in my opinion.
> >
>
> shared_buffers = 8GB
> work_mem = 256MB
> maintenance_work_mem = 6GB
>
> So that should be covered, unless I'm using too much memory and swapping.
> It does look like it's swapping a little, but not too badly as far as I
> can tell.  I'm thinking of dialing back everything a bit, but I'm not
> really sure what the heck to do :)  It's all guessing for me right now.

Generally speaking, once you've gotten to the point of swapping, even a
little, you've gone too far.  A better approach is to pick some
conservative number, like 10-25% of your ram for shared_buffers, and 1
gig or so for maintenance work_mem, and then increase them while
exercising the system, and measure the difference increasing them makes.

If going from 1G shared buffers to 2G shared buffers gets you a 10%
increase, then good.  If going from 2G to 4G gets you a 1.2% increase,
it's questionable.  You should reach a point where throwing more
shared_buffers stops helping before you start swapping.  But you might
not.

Same goes for maintenance work mem.  Incremental changes, accompanied by
reproduceable benchmarks / behaviour measurements are the way to
determine the settings.

Note that you can also vary those during different times of the day.
you can have maint_mem set to 1Gig during the day and crank it up to 8
gig or something while loading data.  Shared_buffers can't be changed
without restarting the db though.

Re: Configuration Advice

From
Steve
Date:
> How many rows do you typically load each night?  If it is say less than 10%
> of the total rows, then perhaps the suggestion in the next paragraph is
> reasonable.

     Hrm.  It's very, very variable.  I'd say it's more than 10% on
average, and it can actually be pretty close to 50-100% on certain days.
Our data is based upon customer submissions, and normally it's a daily
basis kind of deal, but sometimes they'll resubmit their entire year on
certain deadlines to make sure it's all in.  Now, we don't have to
optimize for those deadlines, just the 'average daily load'.  It's okay if
on those deadlines it takes forever, because that's understandable.

     However, I will look into this and see if I can figure out this
average value.  This may be a valid idea, and I'll look some more at it.


Thanks!

Steve

> SO ... our goal here is to make this load process take less time.  It
>> seems the big part is building the big summary table; this big summary
>> table is currently 9 million rows big.  Every night, we drop the table,
>> re-create it, build the 9 million rows of data (we use COPY to put hte
>> data in when it's prepared, not INSERT), and then build the indexes on it
>
>
> Perhaps, placing a trigger on the source table and building a "change log"
> would be useful.  For example, you could scan the change log (looking for
> insert, update, and deletes) and integrate those changes into your summary
> table.  Obviously if you are using complex aggregates it may not be possible
> to adjust the summary table, but if you are performing simple SUM's,
> COUNT's, etc.  then this is a workable solution.
>
>
> --
> Chad
> http://www.postgresqlforums.com/
>

Re: Configuration Advice

From
Dave Cramer
Date:
On 17-Jan-07, at 3:41 PM, Steve wrote:

> Hey there;
>
> I've been lurking on this list awhile, and I've been working with
> postgres for a number of years so I'm not exactly new to this.  But
> I'm still having trouble getting a good balance of settings and I'd
> like to see what other people think.  We may also be willing to
> hire a contractor to help tackle this problem if anyone is interested.
>
> I've got an application here that runs large (in terms of length --
> the queries have a lot of conditions in them) queries that can
> potentially return millions of rows but on average probably return
> tens of thousands of rows.  It's read only for most of the day, and
> pretty much all the queries except one are really fast.
>
> However, each night we load data from a legacy cobol system into
> the SQL system and then we summarize that data to make the reports
> faster.  This load process is intensely insert/update driven but
> also has a hefty amount of selects as well.  This load process is
> taking ever longer to complete.
>
>
> SO ... our goal here is to make this load process take less time.
> It seems the big part is building the big summary table; this big
> summary table is currently 9 million rows big.  Every night, we
> drop the table, re-create it, build the 9 million rows of data (we
> use COPY to put hte data in when it's prepared, not INSERT), and
> then build the indexes on it -- of which there are many.
> Unfortunately this table gets queried in a lot of different ways
> and needs these indexes; also unfortunately, we have operator class
> indexes to support both ASC and DESC sorting on columns so these
> are for all intents and purposes duplicate but required under
> Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this
> still a requirement?)
>
> Building these indexes takes forever!  It's a long grind through
> inserts and then building the indexes takes a hefty amount of time
> too.  (about 9 hours).  Now, the application is likely part at
> fault, and we're working to make it more efficient, but it has
> nothing to do with the index building time.  I'm wondering what we
> can do to make this better if anything; would it be better to leave
> the indexes on?  It doesn't seem to be.  Would it be better to use
> INSERTs instead of copies?  Doesn't seem to be.
>
>
> Anyway -- ANYTHING we can do to make this go faster is
> appreciated :) Here's some vital statistics:
>
> - Machine is a 16 GB, 4 actual CPU dual-core opteron system using
> SCSI discs.  The disc configuration seems to be a good one, it's
> the best of all the ones we've tested so far.
>
The basic problem here is simply writing all the data to disk. you
are building 9M rows of data plus numerous index's. How much data are
you actually writing to the disk. Try looking at iostat while this is
going on.

My guess is you are maxing out the disk write speed.
> - The load process itself takes about 6 gigs of memory, the rest is
> free for postgres because this is basically all the machine does.
>
> - If this was your machine and situation, how would you lay out the
> emmory settings?  What would you set the FSM to?  Would you leave
> teh bgwriter on or off?  We've already got FSYNC off because "data
> integrity" doesn't matter -- this stuff is religeously backed up
> and we've got no problem reinstalling it.  Besides, in order for
> this machine to go down, data integrity of the DB is the least of
> the worries :)
>
> Do wal_buffers/full_page_writes matter of FSYNC is off?  If so,
> what settings?  What about checkpoints?
>
Not reallly, I'd have WAL buffers write to a ram disk
> Any finally, any ideas on planner constants?  Here's what I'm using:
>
> seq_page_cost = 0.5                     # measured on an arbitrary
> scale
> random_page_cost = 1.0                  # same scale as above
> cpu_tuple_cost = 0.001                  # same scale as above
> cpu_index_tuple_cost = 0.0001           # same scale as above
> cpu_operator_cost = 0.00025             # same scale as above
> effective_cache_size = 679006
>

as a general rule make shared buffers about 25% of free mem,
effective cache 75% but with a write intensive load like you have I
think the first thing to look at is write speed.
> I really don't remember how I came up with that
> effective_cache_size number....
>
>
> Anyway... any advice would be appreciated :)
>
>
> Steve
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


Re: Configuration Advice

From
Steve
Date:
> Would it be possible to just update the summary table, instead of recreating
> it from scratch every night?

     Hrm, I believe it's probably less work for the computer to do if
it's rebuilt.  Any number of rows may be changed during an update, not
including additions, so I'd have to pull out what's changed and sync it
with what's in the summary table already.  It'll be a lot more selects and
program-side computation to save the big copy; it might work out, but I'd
say this would be my last ditch thing. :)

>> Building these indexes takes forever!  It's a long grind through inserts
>> and then building the indexes takes a hefty amount of time too.  (about 9
>> hours).  Now, the application is likely part at fault, and we're working to
>> make it more efficient, but it has nothing to do with the index building
>> time.  I'm wondering what we can do to make this better if anything; would
>> it be better to leave the indexes on?  It doesn't seem to be.  Would it be
>> better to use INSERTs instead of copies?  Doesn't seem to be.
>
> Would it help if you created multiple indexes simultaneously? You have enough
> CPU to do it. Is the index creation CPU or I/O bound? 9 million rows should
> fit in 16 GB of memory, right?

     This is a very very interesting idea.  It looks like we're
probably not fully utilizing the machine for the index build, and this
could be the ticket for us.  I'm going to go ahead and set up a test for
this and we'll see how it goes.

> Can you describe the load process in more detail? What's it doing with the 6
> gigs?

     There's two halves to the load process; the loader and the
summarizer.  The loader is the part that takes 6 gigs; the summarizer only
takes a few hundred MEG.

     Basically we have these COBOL files that vary in size but
are usually in the hundred's of MEG realm.  These files contain new data
OR updates to existing data.  We load this data from the COBOL files in
chunks, so that's not a place where we're burning a lot of memory.

     The first thing we do is cache the list of COBOL ID codes that are
already in the DB; the COBOL ID codes are really long numeric strings, so
we use a sequenced integer primary key.  The cache translates COBOL IDs to
primary keys, and this takes most of our memory nowadays.  Our cache is
fast, but it's kind of a memory hog.  We're working on trimming that down,
but it's definitely faster than making a query for each COBOL ID.

     The load is relatively fast and is considered "acceptable", and
has been relatively constant in speed.  It's the summarizer that's brutal.

     The summarizer produces 3 main summary tables and a few
secondaries that don't take much time to make.  Two of them are smallish
and not that big a deal, and the last one is the biggie that's 9 mil rows
and growing.  To produce the 9 mil row table, we query out the data in
groups, do our processing, and save that data to a series of text files
that are in blocks of 10,000 rows as I recall.  We then copy each file
into the DB (there were some issues with copying in an entire 9 mil row
file in the past, which is why we don't use just one file -- those issues
have been fixed, but we didn't undo the change).

> What's your maintenance_work_mem setting? It can make a big difference in
> sorting the data for indexes.

     6 gigs currently. :)

> If you could post the schema including the indexes, people might have more
> ideas...

     I'll have to ask first, but I'll see if I can :)

Talk to you later, and thanks for the info!


Steve

Re: Configuration Advice

From
"Chad Wagner"
Date:
On 1/17/07, Steve <cheetah@tanabi.org> wrote:
        However, I will look into this and see if I can figure out this
average value.  This may be a valid idea, and I'll look some more at it.

It must be, Oracle sells it pretty heavily as a data warehousing feature  ;).  Oracle calls it a materialized view, and the basic premise is you have a "change" log (called a materialized log by Oracle) and you have a job that runs through the change log and applies the changes to the materialized view.

If you are using aggregates, be careful and make sure you use simple forms of those aggregates.  For example, if you are using an "average" function then you should have two columns sum and count instead.  Some aggregates are too complex and cannot be represented by this solution and you will find that you can't update the summary tables, so definitely try to stay away from complex aggregates if you do not need them.

Here is a link to a PL/pgSQL effort that tries to simulate materialized views:

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

I don't know how complete it is, and it looks like there was a project started but has been abandoned for the last 3 years.

--
Chad
http://www.postgresqlforums.com/

Re: Configuration Advice

From
Chris Browne
Date:
cheetah@tanabi.org (Steve) writes:
> I'm wondering what we can do to make
> this better if anything; would it be better to leave the indexes on?
> It doesn't seem to be.

Definitely NOT.  Generating an index via a bulk sort is a LOT faster
than loading data into an index one tuple at a time.

We saw a BIG increase in performance of Slony-I when, in version
1.1.5, we added a modification that shuts off indexes during COPY and
then does a reindex.  Conceivably, you might look at how Slony-I does
that, and try doing the same thing; it might well be faster than doing
a bunch of reindexes serially.  (Or not...)

> Would it be better to use INSERTs instead of copies?  Doesn't seem
> to be.

I'd be mighty surprised.

> - The load process itself takes about 6 gigs of memory, the rest is
> free for postgres because this is basically all the machine does.

The settings you have do not seem conspicuously wrong in any way.

The one thought which comes to mind is that if you could turn this
into a *mostly* incremental change, that might help.

The thought:

 - Load the big chunk of data into a new table

 - Generate some minimal set of indices on the new table

 - Generate four queries that compare old to new:
  q1 - See which tuples are unchanged from yesterday to today
  q2  - See which tuples have been deleted from yesterday to today
  q3  - See which tuples have been added
  q4  - See which tuples have been modified

 If the "unchanged" set is extremely large, then you might see benefit
 to doing updates based on deleting the rows indicated by q2,
 inserting rows based on q3, and updating based on q4.

In principle, computing and applying those 4 queries might be quicker
than rebuilding from scratch.

In principle, applying q2, then q4, then vacuuming, then q3, ought to
be "optimal."
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/linux.html
"A 'Cape Cod Salsa' just isn't right." -- Unknown

Re: Configuration Advice

From
Steve
Date:
> Generally speaking, once you've gotten to the point of swapping, even a
> little, you've gone too far.  A better approach is to pick some
> conservative number, like 10-25% of your ram for shared_buffers, and 1
> gig or so for maintenance work_mem, and then increase them while
> exercising the system, and measure the difference increasing them makes.
>
> If going from 1G shared buffers to 2G shared buffers gets you a 10%
> increase, then good.  If going from 2G to 4G gets you a 1.2% increase,
> it's questionable.  You should reach a point where throwing more
> shared_buffers stops helping before you start swapping.  But you might
> not.
>
> Same goes for maintenance work mem.  Incremental changes, accompanied by
> reproduceable benchmarks / behaviour measurements are the way to
> determine the settings.
>
> Note that you can also vary those during different times of the day.
> you can have maint_mem set to 1Gig during the day and crank it up to 8
> gig or something while loading data.  Shared_buffers can't be changed
> without restarting the db though.
>

I'm currently benchmarking various configuration adjustments.  Problem is
these tests take a really long time because I have to run the load
process... which is like a 9 hour deal.  That's why I'm asking for advice
here, because there's a lot of variables here and it's really time costly
to test :)

I'm still working on the benchmarkings and by Friday I should have some
interesting statistics to work with and maybe help figure out what's going
on.


Thanks!

Steve

Re: Configuration Advice

From
Steve
Date:
> The thought:
>
> - Load the big chunk of data into a new table
>
> - Generate some minimal set of indices on the new table
>
> - Generate four queries that compare old to new:
>  q1 - See which tuples are unchanged from yesterday to today
>  q2  - See which tuples have been deleted from yesterday to today
>  q3  - See which tuples have been added
>  q4  - See which tuples have been modified
>
> If the "unchanged" set is extremely large, then you might see benefit
> to doing updates based on deleting the rows indicated by q2,
> inserting rows based on q3, and updating based on q4.
>
> In principle, computing and applying those 4 queries might be quicker
> than rebuilding from scratch.
>
> In principle, applying q2, then q4, then vacuuming, then q3, ought to
> be "optimal."


     This looks like an interesting idea, and I'm going to take a look
at how feasible it'll be to impletement.  I may be able to combine this
with Mr. Wagner's idea to make a much more efficient system overall.  It's
going to be a  pretty big programming task, but I've a feeling this
summarizer thing may just need to be re-written with a smarter system
like this to get something faster.


Thanks!

Steve

Re: Configuration Advice

From
Scott Marlowe
Date:
On Wed, 2007-01-17 at 18:27, Steve wrote:
> > Generally speaking, once you've gotten to the point of swapping, even a
> > little, you've gone too far.  A better approach is to pick some
> > conservative number, like 10-25% of your ram for shared_buffers, and 1
> > gig or so for maintenance work_mem, and then increase them while
> > exercising the system, and measure the difference increasing them makes.
> >
> > If going from 1G shared buffers to 2G shared buffers gets you a 10%
> > increase, then good.  If going from 2G to 4G gets you a 1.2% increase,
> > it's questionable.  You should reach a point where throwing more
> > shared_buffers stops helping before you start swapping.  But you might
> > not.
> >
> > Same goes for maintenance work mem.  Incremental changes, accompanied by
> > reproduceable benchmarks / behaviour measurements are the way to
> > determine the settings.
> >
> > Note that you can also vary those during different times of the day.
> > you can have maint_mem set to 1Gig during the day and crank it up to 8
> > gig or something while loading data.  Shared_buffers can't be changed
> > without restarting the db though.
> >
>
> I'm currently benchmarking various configuration adjustments.  Problem is
> these tests take a really long time because I have to run the load
> process... which is like a 9 hour deal.  That's why I'm asking for advice
> here, because there's a lot of variables here and it's really time costly
> to test :)
>
> I'm still working on the benchmarkings and by Friday I should have some
> interesting statistics to work with and maybe help figure out what's going
> on.

You can probably take a portion of what you're loading and make a
benchmark of the load process that is repeatable (same data, size,
etc...) each time, but only takes 30 minutes to an hour to run each
time.  shortens your test iteration AND makes it reliably repeatable.

Re: Configuration Advice

From
"Merlin Moncure"
Date:
On 1/17/07, Steve <cheetah@tanabi.org> wrote:
> Hey there;
> I've been lurking on this list awhile, and I've been working with postgres
> for a number of years so I'm not exactly new to this.  But I'm still
> having trouble getting a good balance of settings and I'd like to see what
> other people think.  We may also be willing to hire a contractor to help
> tackle this problem if anyone is interested.

I happen to be something of a cobol->sql expert, if you are interested
in some advice you can contact me off-list.  I converted an enterprise
cobol (in acucobol) app to Postgresql by plugging pg into the cobol
system via custom c++ isam driver.

> I've got an application here that runs large (in terms of length -- the
> queries have a lot of conditions in them) queries that can potentially
> return millions of rows but on average probably return tens of thousands
> of rows.  It's read only for most of the day, and pretty much all the
> queries except one are really fast.

If it's just one query I think I'd focus on optimizing that query, not
.conf settings.  In my opinion .conf tuning (a few gotchas aside)
doesn't really get you all that much.

> However, each night we load data from a legacy cobol system into the SQL
> system and then we summarize that data to make the reports faster.  This
> load process is intensely insert/update driven but also has a hefty
> amount of selects as well.  This load process is taking ever longer to
> complete.
>
>
> SO ... our goal here is to make this load process take less time.  It
> seems the big part is building the big summary table; this big summary
> table is currently 9 million rows big.  Every night, we drop the table,
> re-create it, build the 9 million rows of data (we use COPY to put hte
> data in when it's prepared, not INSERT), and then build the indexes on it
> -- of which there are many.  Unfortunately this table gets queried
> in a lot of different ways and needs these indexes; also unfortunately, we
> have operator class indexes to support both ASC and DESC sorting on

I have some very specific advice here.  Check out row-wise comparison
feature introduced in 8.2.

> columns so these are for all intents and purposes duplicate but required
> under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still
> a requirement?)

> Building these indexes takes forever!  It's a long grind through inserts
> and then building the indexes takes a hefty amount of time too.  (about 9
> hours).  Now, the application is likely part at fault, and we're working
> to make it more efficient, but it has nothing to do with the index
> building time.  I'm wondering what we can do to make this better if
> anything; would it be better to leave the indexes on?  It doesn't seem to
> be.  Would it be better to use INSERTs instead of copies?  Doesn't seem to

no.

probably any optimization strategies would focus on reducing the
amount of data you had to load.

merlin