Re: Planner: rows=1 after "similar to" where condition. - Mailing list pgsql-general

From Joris Dobbelsteen
Subject Re: Planner: rows=1 after "similar to" where condition.
Date
Msg-id E4953B65D9E5054AA6C227B410C56AA9C3B0@exchange1.joris2k.local
Whole thread Raw
In response to Planner: rows=1 after "similar to" where condition.  ("Joris Dobbelsteen" <Joris@familiedobbelsteen.nl>)
Responses Re: Planner: rows=1 after "similar to" where condition.
List pgsql-general
>-----Original Message-----
>From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>Sent: Monday, 25 February 2008 7:14
>To: Joris Dobbelsteen
>Cc: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] Planner: rows=1 after "similar to"
>where condition.
>
>On Sun, Feb 24, 2008 at 4:35 PM, Joris Dobbelsteen
><Joris@familiedobbelsteen.nl> wrote:
>> I seem to have some planner oddity, where it seems to completely
>> mispredict the output after a regex compare. I've seem it on other
>> occasions, where it completely screws up the join. You can note the
>> "rows=1" after the filter.
>>  A similar sitution has occurred when doing a regex filter in a
>> subquery,  which was subsequently predited as 1 row and triggered
>> (oddly enough) a  sequencial scan. Doing the same using
>"equality" on
>> the result to  substring(<text> from <regex>) seemed to work and
>> produced a useful  plan, since it did a hash-join (as it
>should have).
>>  Is this a known problem? Otherwise I think I should build a smaller
>> test  case...
>>
>>  Using Postgresql 8.2.6 from Debian Etch-backports.

Should be:
PostGreSQL 8.2.5 on x86_64-pc-linux-gnu (GCC 4.1.2.20061115) (Debian
4.1.1-21).
Should have paid closer attention.

>>
>>  "Bitmap Heap Scan on log_syslog syslog  (cost=13124.26..51855.25
>> rows=1  width=270)"
>>  "  Recheck Cond: (((program)::text = 'amavis'::text) AND
>> ((facility)::text = 'mail'::text))"
>>  "  Filter: ***SOME VERY LONG SIMILAR TO REGEX****"
>>  "  ->  BitmapAnd  (cost=13124.26..13124.26 rows=18957 width=0)"
>>  "        ->  Bitmap Index Scan on "IX_log_syslog_program"
>>  (cost=0.00..2223.95 rows=92323 width=0)"
>>  "              Index Cond: ((program)::text = 'amavis'::text)"
>>  "        ->  Bitmap Index Scan on "IX_log_syslog_facility"
>>  (cost=0.00..10899.81 rows=463621 width=0)"
>>  "              Index Cond: ((facility)::text = 'mail'::text)"
>
>It's not saying it will only get one row back for sure, it's
>saying it thinks it will return one row.  and depending on
>your query, it might.
> What's the query, and what's the explain analyze of that query?
>

See the attached file for the query and the explain (hopefully this
gives a consistent view and maintains the layout for easier reading).

The point is that it will NOT, not even close. The planner guesses 1
row, but the output was arround 13000 rows (of the 2.2M rows in the
table). Oddly enough the 18k rows on the bitmap and seems a very good
estimation. In fact, if I omit the "SIMILAR TO", it estimates ~12000
rows, which is spot on. So it seems the SIMILAR TO really gets the
planner confused.

The real value was 12981 rows that were returned by the query in the
first case. However, since I removed this data from the original table
(its now somewhere else), I cannot present the original EXPLAIN ANALYZE
any more. The new dataset only contains ~137 (but I still have the old
statistics, I think, or at least they provide the same predictions).
I also included a run after EXPLAIN ANALYZE on the current dataset.

Hopefully this helps.

Thanks,

- Joris

Attachment

pgsql-general by date:

Previous
From: danmcb
Date:
Subject: Re: request help forming query
Next
From: Gregory Stark
Date:
Subject: Re: Planner: rows=1 after "similar to" where condition.