Re: Planner: rows=1 after "similar to" where condition. - Mailing list pgsql-general

From Tom Lane
Subject Re: Planner: rows=1 after "similar to" where condition.
Date
Msg-id 28326.1204673321@sss.pgh.pa.us
Whole thread Raw
In response to Re: Planner: rows=1 after "similar to" where condition.  ("Joris Dobbelsteen" <Joris@familiedobbelsteen.nl>)
List pgsql-general
"Joris Dobbelsteen" <Joris@familiedobbelsteen.nl> writes:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>> If you increase the stats target for the column to 100 or
>> more then it
>> will try actually applying the regexp to all the histogram entries.
>> That might or might not give you a better estimate.

> A new test case (I did a fresh VACUUM ANALYZE with your statistics for
> text set to 100):
> Arround 5288 rows out of 4.3 Million match.

Ah, you had not given us that number before.  That's one in 800 rows,
more or less, which means that there's no chance of getting a
well-founded statistical estimate with less than 800 items in the stats
collection.  Does it do any better with stats target set to 1000?

I think though that the real problem may be that the index condition

> "  Recheck Cond: (((program)::text = 'amavis'::text) AND
> ((facility)::text = 'mail'::text))"

selects rows that match the regex with much higher probability than the
general row population does.  Since we don't yet have any cross-column
statistics the planner has no chance of realizing that.

> My construction with the regex as "substring()" construction and a
> "WHERE substring() IS NOT NULL" seems to give a better estimate in these
> cases. The result seems equivalent.

Actually, it's got exactly 0 knowledge about substring() and is giving
you a completely generic guess for this clause :-(

            regards, tom lane

pgsql-general by date:

Previous
From: aklaver@comcast.net (Adrian Klaver)
Date:
Subject: Re: [SQL] Documenting a DB schema
Next
From: Alan Hodgson
Date:
Subject: Re: GRANT ALL ON recursive for all tables in my scheme?