Re: testing/predicting optimization using indexes - Mailing list pgsql-sql
From | TJ O'Donnell |
---|---|
Subject | Re: testing/predicting optimization using indexes |
Date | |
Msg-id | 41F819C3.4080507@acm.org Whole thread Raw |
In response to | Re: testing/predicting optimization using indexes (PFC <lists@boutiquenumerique.com>) |
Responses |
Re: testing/predicting optimization using indexes
|
List | pgsql-sql |
I was puzzled as to why my search slowed down when I added columns. The VACUUM did not restore the former speed, which I had obtained before adding the columns. So, I rebuilt the table with only the smiles column and my original speed was again obtained (not surprising). After I added the extra columns, it slowed down again. Finally, I built the table with all the additional columns created during the initial creation of the table. The original speed was obtained! I conclude that the addition of columns after building all the rows of a table somehow makes the table access less efficient. Is this generally true? Is there a more efficient way to add columns to a table after its initial construction? The secondary issue was one of using an index on the additional columns. This greatly speeds up the overall search, by limiting the number of rows needing to use oe_matches. I am currently working on optimizing the number and nature of these extra columns. However, my initial question still remains. Once I find a good set of columns to use as an index, will I then get even greater speed by defining a new data type and an index method equivalent to my multi-column index? Here are the data you requested. I think this is less important now that I know I should create all my columns from the beginning. Thanks for the tip on how to compute average time spent in my oe_matches functions. This will be very useful for future optimization. SELECT count(*) FROM structure 237597 SELECT avg(length(smiles)) FROM structure 37.6528912402092619 VACUUM FULL ANALYZE structure (no output) EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM structure Seq Scan on structure (cost=0.00..7573.96 rows=237597 width=41) (actual time=17.443..15025.974 rows=237597 loops=1) Total runtime: 16786.542 ms EXPLAIN ANALYZE SELECT smiles FROM structure Seq Scan on structure (cost=0.00..6979.97 rows=237597 width=41) (actual time=0.067..735.884 rows=237597 loops=1) Total runtime: 1200.661 ms TJ PFC wrote: > >> I'm quite happy with the speedup in 3, but puzzled over the slowdown >> in 2. > > Could you provide : > > - SELECT count(*) FROM structure; > => NRows > - SELECT avg(length(smiles)) FROM structure; > > Then VACUUM FULL ANALYZE structure > Redo your timings and this time post EXPLAIN ANALYZE > > Also your query returns 1313 rows, so wan you post : > > EXPLAIN ANALYZE SELECT oe_matches(smiles,'c1ccccc1CC(=O)NC') FROM > structure; > => time T1 > EXPLAIN ANALYZE SELECT smiles FROM structure; > => time T2 > > (T1-T2)/(NRows) will give you an estimate of the time spent in each > oe_matches call. > > Also note that for postgres (a,b) > (c,d) means ((a>c) and (b>d)), > which can be misleading, but I think that's what you wanted.