Thread: Table not using tsvector gin index and performance much worse thanwhen it uses it.

Hi,
I have a table with 22k rows - not large at all. I have a couple of indices
on it as well as a gin index on  a tsvector column. If I reindex the table
and run a query it takes ~20ms to execute using the tsvector-gin index. By
the end of the day, the planner decides not to use the gin index and uses
the other indices on the table and the query takes ~80ms. If I reindex, the
pattern repeats-it uses the gin index for a while for superior performance
and then drops back to using the alternate ones.
The ibloat on the index shows as 0.4 and wastedibytes is 0.  Less than 2K
rows have been updated of the 22K since the last reindex but the performance
has dropped since it is no longer using the gin index by mid-day.
Any thoughts on why it chooses to use alternate indices with hardly any
updates? And is there a way to force it to use the gin index without having
to reindex it twice a day.
Thanks!



--
View this message in context:
http://www.postgresql-archive.org/Table-not-using-tsvector-gin-index-and-performance-much-worse-than-when-it-uses-it-tp5954485.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


rverghese <riyav@hotmail.com> writes:
> I have a table with 22k rows - not large at all. I have a couple of indices
> on it as well as a gin index on  a tsvector column. If I reindex the table
> and run a query it takes ~20ms to execute using the tsvector-gin index. By
> the end of the day, the planner decides not to use the gin index and uses
> the other indices on the table and the query takes ~80ms. If I reindex, the
> pattern repeats-it uses the gin index for a while for superior performance
> and then drops back to using the alternate ones.
> The ibloat on the index shows as 0.4 and wastedibytes is 0.  Less than 2K
> rows have been updated of the 22K since the last reindex but the performance
> has dropped since it is no longer using the gin index by mid-day.
> Any thoughts on why it chooses to use alternate indices with hardly any
> updates? And is there a way to force it to use the gin index without having
> to reindex it twice a day.

You haven't mentioned what PG version this is, nor specified how many
updates is "hardly any", so you shouldn't expect any very precise answers.
But I'm suspicious that the problem is bloat of the index's pending list;
the planner's cost estimate is (correctly) pretty sensitive to the length
of that list.  If so, you need to arrange for the pending list to get
flushed into the main index structure more often.  Depending on your PG
version, that can be done by
* vacuum
* auto-analyze (but I bet your version doesn't, or you would not be
  complaining)
* gin_clean_pending_list() (but you probably ain't got that either)

Or you could reduce gin_pending_list_limit to cause insert-time flushes to
happen more often, or in the extremum even disable fastupdate for that
index.  Those options would slow down updates to make search performance
more stable, so they're not panaceas.

See
https://www.postgresql.org/docs/current/static/gin-implementation.html#GIN-FAST-UPDATE
for your version, also the "GIN Tips" on the next page.

Personally I'd try tweaking gin_pending_list_limit first, if you have
a version that has that ... but YMMV.

            regards, tom lane


Thanks for the response!

* We are on version 9.5.6

* Less than 10% of the table was updated today (between the time of the last
reindex to when performance deteriorated)

* autovacuum is on. I don't see an autoanalyze property in config but these
are the settings for analyze
/autovacuum_analyze_threshold = 3000     # min number of row updates before
analyze
#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before
vacuum
#autovacuum_analyze_scale_factor = 0.1  # fraction of table size before
analyze
#autovacuum_freeze_max_age = 200000000  # maximum XID age before forced
vacuum
                                        # (change requires restart)/

* And this #gin_pending_list_limit = 4MB

* gin_clean_pending_list() is not available.

Will play with gin_pending_list_limit and see what that does.

Thanks!
RV



--
View this message in context:
http://www.postgresql-archive.org/Table-not-using-tsvector-gin-index-and-performance-much-worse-than-when-it-uses-it-tp5954485p5954503.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


From my experience, you want to really tighten the autovacuum_analyze parameters.

 

I recommend our users to use:

autovacuum_analyze_threshold = 1

autovacuum_analyze_scale_factor = 0.0

 

Analyze is quite cheap, and the speed difference between an optimal and a suboptimal plans are usually pretty big.

 

My 2c,

  Igor

 

-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of rverghese
Sent: Wednesday, April 05, 2017 4:54 PM
To: pgsql-performance@postgresql.org
Subject: -EXT-[PERFORM] Re: Table not using tsvector gin index and performance much worse than when it uses it.

 

Thanks for the response!

 

* We are on version 9.5.6

 

* Less than 10% of the table was updated today (between the time of the last reindex to when performance deteriorated)

 

* autovacuum is on. I don't see an autoanalyze property in config but these are the settings for analyze

/autovacuum_analyze_threshold = 3000     # min number of row updates before

analyze

#autovacuum_vacuum_scale_factor = 0.2   # fraction of table size before

vacuum

#autovacuum_analyze_scale_factor = 0.1  # fraction of table size before analyze #autovacuum_freeze_max_age = 200000000  # maximum XID age before forced vacuum

                                        # (change requires restart)/

 

* And this #gin_pending_list_limit = 4MB

 

* gin_clean_pending_list() is not available.

 

Will play with gin_pending_list_limit and see what that does.

 

Thanks!

RV

 

 

 

--

View this message in context: http://www.postgresql-archive.org/Table-not-using-tsvector-gin-index-and-performance-much-worse-than-when-it-uses-it-tp5954485p5954503.html

Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

 

 

--

Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)

To make changes to your subscription:

http://www.postgresql.org/mailpref/pgsql-performance

 

Ok, appreciate the feedback.
Will play around with those settings as well. Maybe start with default which
is 50 I believe.
Thanks!
RV



--
View this message in context:
http://www.postgresql-archive.org/Table-not-using-tsvector-gin-index-and-performance-much-worse-than-when-it-uses-it-tp5954485p5954509.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


rverghese <riyav@hotmail.com> writes:
> Will play around with those settings as well. Maybe start with default which
> is 50 I believe.

If you're on 9.5, auto-analyze does not result in a pending list flush,
so it's irrelevant to fixing your problem.  (Assuming I've identified
the problem correctly.)  But you do have gin_pending_list_limit, so see
what that does for you.  Note you can set it either globally or per-index.

            regards, tom lane


Yup, I just found the per index option. Pretty cool. Will see what value is
optimal...

Thanks
RV



--
View this message in context:
http://www.postgresql-archive.org/Table-not-using-tsvector-gin-index-and-performance-much-worse-than-when-it-uses-it-tp5954485p5954521.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.