Home > mailing lists

Re: [GENERAL] FTS query, statistics and planner estimations… - Mailing list pgsql-general

From	Pavel Stehule
Subject	Re: [GENERAL] FTS query, statistics and planner estimations…
Date	November 9, 2016 10:42:34
Msg-id	CAFj8pRB+dws_D4ocfrTU==GWCxd8TGr_SEhym+4uK3GoZhuNAw@mail.gmail.com Whole thread Raw
In response to	Re: FTS query, statistics and planner estimations… (Pierre Ducroquet <pierre.ducroquet@people-doc.com>)
List	pgsql-general

Tree view

2016-11-09 11:19 GMT+01:00 Pierre Ducroquet <pierre.ducroquet@people-doc.com>:

On Wednesday, November 9, 2016 10:40:10 AM CET Francisco Olarte wrote:
> Pierre:
>
> On Wed, Nov 9, 2016 at 10:22 AM, Pierre Ducroquet
>
> <pierre.ducroquet@people-doc.com> wrote:
> > The query does a few joins «after» running a FTS query on a main table.
> > The FTS query returns a few thousand rows, but the estimations are wrong,
> > leading the optimizer to terrible plans compared to what should happen,
> > and
> > thus creates a far higher execution time.
>
> ....
>
> > but the issue remain the same. The table contains about 295,000 documents,
> > and
> ....
>
> > Request | Estimated rows | Real rows
> >
> > ----------------------------------+----------------+-----------
> > 'word1' | 38050 | 37500
> > 'word1 word2' | 4680 | 32000
> > 'word1 word2 word3' | 270 | 12300
> > 'word1 word2 word3 word4' | 10 | 9930
> > 'word1 word2 word3 word4 word5' | 1 | 9930
> >
> > You can see that with more words in query, the estimation falls far behind
> > reality.
>
> I'm not really familiar with FTS but, doing a few division of
> estimations and rows it seems it estimates as uncorrelated words, and
> you real rows clearly indicate some of them are clearly correlated (
> like w1/w2 and w4/s5, and partially w3/w45 ) and very common.
>
> > Is that a known limitation of the FTS indexing ? Am I missing something
> > obvious, or a poor configuration ?
>
> Someone more familiar with it needed for that, but what I've found
> several times is FTS does not mix too well with relational queries at
> the optimizer level ( as FTS terms can have very diverse degrees of
> correlation, which is very difficult to store in the statistics a
> relational optimizer normally uses ).

Indeed the words in the query are correlated, but I do hope that the FTS
indexing is able to cope with that. Otherwise it makes it far less usable than
what one would expect since real world queries will often contain sentences or
related words. Also, PostgreSQL 9.6 introduced phrase search in FTS, and I
don't see how that would work without a working multi-words query.

The PostgreSQL statistics are not multidimensional - so bad estimation is expected :(

Regards

Pavel

pgsql-general by date:

From: Pierre Ducroquet
Date: 09 November 2016, 10:19:54
Subject: Re: FTS query, statistics and planner estimations…

From: Francisco Olarte
Date: 09 November 2016, 10:45:39
Subject: Re: [GENERAL] FTS query, statistics and planner estimations…

Re: [GENERAL] FTS query, statistics and planner estimations… - Mailing list pgsql-general

Previous

Next