Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> For a distribution of data like below why does the planner
> choses to do an index scan by default for source = 'REGIS' when > 50%
> of the rows are having source='REGIS'.
Are there a huge number of dead rows in the table? ("vacuum verbose"
would give some info)
The given result seems suspect; an indexscan couldn't possibly read >50%
of the rows in less than a quarter of the time for a seqscan. Unless
(a) the table contains vast amounts of empty space that the seqscan has to
slog through, or (b) your second measurement is bogus due to caching
performed by the first measurement.
Also, might the table be in order by the "source" column? A
sufficiently high correlation might have persuaded the planner to try an
indexscan even if point (a) isn't true.
regards, tom lane