Re: testing/predicting optimization using indexes - Mailing list pgsql-sql

From TJ O'Donnell
Subject Re: testing/predicting optimization using indexes
Date
Msg-id 41F819C3.4080507@acm.org
Whole thread Raw
In response to Re: testing/predicting optimization using indexes  (PFC <lists@boutiquenumerique.com>)
Responses Re: testing/predicting optimization using indexes  (PFC <lists@boutiquenumerique.com>)
List pgsql-sql
I was puzzled as to why my search slowed down when I added columns.
The VACUUM did not restore the former speed,
which I had obtained before adding the columns.
So, I rebuilt the table with only the smiles column and my original
speed was again obtained (not surprising).
After I added the extra columns, it slowed down again.
Finally, I built the table with all the additional columns created
during the initial creation of the table.  The original speed was obtained!
I conclude that the addition of columns after building all the rows of
a table somehow makes the table access less efficient.  Is this generally
true?  Is there a more efficient way to add columns to a table after its
initial construction?

The secondary issue was one of using an index on the additional columns.
This greatly speeds up the overall search, by limiting the number of
rows needing to use oe_matches.  I am currently working on optimizing the
number and nature of these extra columns.  However, my initial question
still remains.  Once I find a good set of columns to use as an index,
will I then get even greater speed by defining a new data type and an
index method equivalent to my multi-column index?

Here are the data you requested.  I think this is less important now that
I know I should create all my columns from the beginning.
Thanks for the tip on how to compute average time spent in my
oe_matches functions.  This will be very useful for future optimization.

SELECT count(*) FROM structure
237597

SELECT avg(length(smiles)) FROM structure
37.6528912402092619

VACUUM FULL ANALYZE structure
(no output)

EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM  structure
Seq Scan on structure  (cost=0.00..7573.96 rows=237597 width=41) (actual time=17.443..15025.974 rows=237597 loops=1)
Total runtime: 16786.542 ms

EXPLAIN ANALYZE SELECT smiles FROM structure
Seq Scan on structure  (cost=0.00..6979.97 rows=237597 width=41) (actual time=0.067..735.884 rows=237597 loops=1)
Total runtime: 1200.661 ms


TJ


PFC wrote:
> 
>> I'm quite happy with the speedup in 3, but puzzled over the slowdown 
>> in  2.
> 
>     Could you provide :
> 
>     - SELECT count(*) FROM structure;
>     => NRows
>     - SELECT avg(length(smiles)) FROM structure;
> 
>     Then VACUUM FULL ANALYZE structure
>     Redo your timings and this time post EXPLAIN ANALYZE
> 
>     Also your query returns 1313 rows, so wan you post :
> 
> EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM  
> structure;
> => time T1
> EXPLAIN ANALYZE SELECT smiles FROM structure;
> => time T2
> 
> (T1-T2)/(NRows) will give you an estimate of the time spent in each  
> oe_matches call.
> 
>     Also note that for postgres (a,b) > (c,d) means ((a>c) and (b>d)), 
> which  can be misleading, but I think that's what you wanted.


pgsql-sql by date:

Previous
From: Andrew Hammond
Date:
Subject: Re: same question little different test MSSQL vrs Postgres
Next
From: PFC
Date:
Subject: Re: testing/predicting optimization using indexes