Re: tsvector pg_stats seems quite a bit off. - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: tsvector pg_stats seems quite a bit off.
Date
Msg-id 1274812862-sup-7954@alvh.no-ip.org
Whole thread Raw
In response to tsvector pg_stats seems quite a bit off.  (Jesper Krogh <jesper@krogh.cc>)
List pgsql-hackers
Excerpts from Jesper Krogh's message of mié may 19 15:01:18 -0400 2010:

> But the distribution is very "flat" at the end, the last 128 values are 
> excactly
> 1.00189e-05
> which means that any term sitting outside the array would get an estimate of
> 1.00189e-05 * 350174 / 2 = 1.75 ~ 2 rows

I don't know if this is related, but tsvector stats are computed and
stored per term, not per datum.  This is different from all other
datatypes.  Maybe there's code somewhere that's assuming per-datum and
coming up with the wrong estimates?  Or maybe the tsvector-specific code
contains a bug somewhere; maybe a rounding error?

-- 
Álvaro Herrera <alvherre@alvh.no-ip.org>


pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: Exposing the Xact commit order to the user
Next
From: "Kevin Grittner"
Date:
Subject: Re: Exposing the Xact commit order to the user