Thread: varchar does not work too well with IS NOT NULL partial indexes.

varchar does not work too well with IS NOT NULL partial indexes.

From
"Dawid Kuroczko"
Date:
Hello, I guess I've stuck upon one problem with (I guess) implicit
casting varchar to text...

Basically it looks like planner makes better use of
WHERE ... IS NOT NULL indexes if either you explicitly
put "text" as a column type or that you cast the column
to ::text when making index.

Here's a self-contained example (this is 8.2.4 server)

CREATE TABLE foo (i int);

INSERT INTO foo
  SELECT CASE WHEN i%10=0 THEN NULL ELSE i END
     FROM generate_series(1,100000) AS n(i);

CREATE INDEX foo_i_index ON foo (i) WHERE i IS NOT NULL;
ANALYZE foo;

EXPLAIN SELECT * FROM foo WHERE i=17;
                              QUERY PLAN
-----------------------------------------------------------------------
 Index Scan using foo_i_index on foo  (cost=0.00..8.28 rows=1 width=4)
   Index Cond: (i = 17)

ALTER TABLE foo ALTER COLUMN i TYPE text;
EXPLAIN SELECT * FROM foo WHERE i=17;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=12.14..554.42 rows=500 width=32)
   Recheck Cond: (i = '17'::text)
   ->  Bitmap Index Scan on foo_i_index  (cost=0.00..12.01 rows=498 width=0)
         Index Cond: (i = '17'::text)

 EXPLAIN SELECT * FROM foo WHERE i=17;
                       QUERY PLAN
---------------------------------------------------------
 Seq Scan on foo  (cost=0.00..1772.00 rows=500 width=34)
   Filter: ((i)::text = '17'::text)

CREATE INDEX foo_i2_index ON foo ((i::text));
EXPLAIN SELECT * FROM foo WHERE i='17'::text;
                                  QUERY PLAN
------------------------------------------------------------------------------
 Bitmap Heap Scan on foo  (cost=12.14..554.82 rows=500 width=34)
   Recheck Cond: ((i)::text = '17'::text)
   ->  Bitmap Index Scan on foo_i2_index  (cost=0.00..12.01 rows=500 width=0)
         Index Cond: ((i)::text = '17'::text)

Regards,
    Dawid

Re: varchar does not work too well with IS NOT NULL partial indexes.

From
Gregory Stark
Date:
"Dawid Kuroczko" <qnex42@gmail.com> writes:

> ALTER TABLE foo ALTER COLUMN i TYPE text;
> EXPLAIN SELECT * FROM foo WHERE i=17;
>                                 QUERY PLAN
> -----------------------------------------------------------------------------
> Bitmap Heap Scan on foo  (cost=12.14..554.42 rows=500 width=32)
>   Recheck Cond: (i = '17'::text)
>   ->  Bitmap Index Scan on foo_i_index  (cost=0.00..12.01 rows=498 width=0)
>         Index Cond: (i = '17'::text)

I think you've lost some single-quotes around 17 in this query. With the
single-quotes it works like this which seems like the correct result. You
don't need the casts in the index definition if you write the query with
single-quotes.

> EXPLAIN SELECT * FROM foo WHERE i=17;
>                       QUERY PLAN
> ---------------------------------------------------------
> Seq Scan on foo  (cost=0.00..1772.00 rows=500 width=34)
>   Filter: ((i)::text = '17'::text)

This is now an error:

LINE 1: EXPLAIN SELECT * FROM foo WHERE i=17;
                                         ^
HINT:  No operator matches the given name and argument type(s). You might need to add explicit type casts.


In fact it's not clear what you would want to happen here. Should it cast the
text to an integer and use integer comparison or cast the integer to text and
use text comparison? They don't necessarily generate the same results. (In
fact I suspect they would for equals but consider the same situation for < or
>)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: varchar does not work too well with IS NOT NULL partial indexes.

From
"Dawid Kuroczko"
Date:
On 7/24/07, Gregory Stark <stark@enterprisedb.com> wrote:
> "Dawid Kuroczko" <qnex42@gmail.com> writes:
>
> > ALTER TABLE foo ALTER COLUMN i TYPE text;
> > EXPLAIN SELECT * FROM foo WHERE i=17;
> >                                 QUERY PLAN
> > -----------------------------------------------------------------------------
> > Bitmap Heap Scan on foo  (cost=12.14..554.42 rows=500 width=32)
> >   Recheck Cond: (i = '17'::text)
> >   ->  Bitmap Index Scan on foo_i_index  (cost=0.00..12.01 rows=498 width=0)
> >         Index Cond: (i = '17'::text)
>
> I think you've lost some single-quotes around 17 in this query. With the
> single-quotes it works like this which seems like the correct result. You
> don't need the casts in the index definition if you write the query with
> single-quotes.

Well, maybe I used wrong example...

CREATE TABLE foo (t varchar(100));
INSERT INTO foo
  SELECT CASE WHEN i % 10 = 0 THEN NULL ELSE 'X' || i END
     FROM generate_series(1,1000000) AS n(i);

What we have here is a table with every 10th row NULL.

CREATE INDEX foo_t_index ON foo (t) WHERE t IS NOT NULL;

...and an index which will contain only NOT NULL values.

Now, if we:

# EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..18025.78 rows=1 width=8) (actual
time=0.079..565.661 rows=1 loops=1)
   Filter: ((t)::text = 'X17'::text)
 Total runtime: 565.689 ms


# EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
                      QUERY PLAN
-------------------------------------------------------
 Seq Scan on foo  (cost=0.00..178.00 rows=50 width=68)
   Filter: ((t)::text = 'X17'::text)
(2 rows)

But if we:

# ALTER TABLE foo ALTER COLUMN t TYPE text;

# EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Scan using foo_t_index on foo  (cost=0.00..8.39 rows=1
width=10) (actual time=0.051..0.052 rows=1 loops=1)
   Index Cond: (t = 'X17'::text)
 Total runtime: 0.077 ms

...so it does nothing to do with single quotes.  Actually it works
fine, so long as you use text instead of varchar2:

# EXPLAIN ANALYZE SELECT t FROM foo WHERE t=17;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Scan using foo_t_index on foo  (cost=0.00..8.39 rows=1
width=10) (actual time=0.014..0.014 rows=0 loops=1)
   Index Cond: (t = '17'::text)
 Total runtime: 0.034 ms


I hope I have stated the problem clearly now. :-)

   Regards,
      Dawid

Re: varchar does not work too well with IS NOT NULL partial indexes.

From
Gregory Stark
Date:
"Dawid Kuroczko" <qnex42@gmail.com> writes:

> Now, if we:
>
> # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
>                                            QUERY PLAN
> ---------------------------------------------------------------------------------------------------
> Seq Scan on foo  (cost=0.00..18025.78 rows=1 width=8) (actual
> time=0.079..565.661 rows=1 loops=1)
>   Filter: ((t)::text = 'X17'::text)
> Total runtime: 565.689 ms
>
>
> # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
>                      QUERY PLAN
> -------------------------------------------------------
> Seq Scan on foo  (cost=0.00..178.00 rows=50 width=68)
>   Filter: ((t)::text = 'X17'::text)
> (2 rows)

I still think you're playing games with the output. a) This is not an EXPLAIN
ANALYZE at all, there are no "actual" values. And b) there's no explanation
for why the estimates should be different for this query than the previous,
identical, query.

Send along the actual psql session, not an edited version.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: varchar does not work too well with IS NOT NULL partial indexes.

From
"Dawid Kuroczko"
Date:
On 7/24/07, Gregory Stark <stark@enterprisedb.com> wrote:
> "Dawid Kuroczko" <qnex42@gmail.com> writes:
>
> > Now, if we:
> >
> > # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
> >                                            QUERY PLAN
> > ---------------------------------------------------------------------------------------------------
> > Seq Scan on foo  (cost=0.00..18025.78 rows=1 width=8) (actual
> > time=0.079..565.661 rows=1 loops=1)
> >   Filter: ((t)::text = 'X17'::text)
> > Total runtime: 565.689 ms
> >
> >
> > # EXPLAIN ANALYZE SELECT t FROM foo WHERE t='X17';
> >                      QUERY PLAN
> > -------------------------------------------------------
> > Seq Scan on foo  (cost=0.00..178.00 rows=50 width=68)
> >   Filter: ((t)::text = 'X17'::text)
> > (2 rows)
>
> I still think you're playing games with the output. a) This is not an EXPLAIN
> ANALYZE at all, there are no "actual" values. And b) there's no explanation
> for why the estimates should be different for this query than the previous,
> identical, query.

My mistake, copy&paste error.

> Send along the actual psql session, not an edited version.

Actual session is attached.

If I may suggest it -- try to run the queries yourself.  You will find
the problem
lies not in the statistics.

   Regards,
      Dawid

Attachment

Re: varchar does not work too well with IS NOT NULL partial indexes.

From
Gregory Stark
Date:
"Dawid Kuroczko" <qnex42@gmail.com> writes:

> If I may suggest it -- try to run the queries yourself. You will find the
> problem lies not in the statistics.

I was more concerned that there might be other discrepancies between the
commands in the email and the actual commands you're running.

Running it myself I do see the same behaviour in 8.3. I'm not sure whether
this is something we expect to work or not though. Binary-compatible types are
a bit of weirdness I still haven't quite absorbed.

postgres=# alter table foo alter column i type text;
ALTER TABLE

postgres=# analyze foo;
ANALYZE

postgres=# explain analyze select * from foo where i='17';
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Scan using foo_i_index on foo  (cost=0.00..8.28 rows=1 width=5) (actual time=0.132..0.138 rows=1 loops=1)
   Index Cond: (i = '17'::text)
 Total runtime: 0.235 ms
(3 rows)

postgres=# alter table foo alter column i type varchar(100);
ALTER TABLE

postgres=# analyze foo;
ANALYZE

postgres=# explain analyze select * from foo where i='17';
                                            QUERY PLAN
--------------------------------------------------------------------------------------------------
 Seq Scan on foo  (cost=0.00..1681.00 rows=1 width=5) (actual time=0.147..281.349 rows=1 loops=1)
   Filter: ((i)::text = '17'::text)
 Total runtime: 281.448 ms
(3 rows)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: varchar does not work too well with IS NOT NULL partial indexes.

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Running it myself I do see the same behaviour in 8.3. I'm not sure whether
> this is something we expect to work or not though.

It looks like it might be relatively easy to fix, but I haven't dug down
to find exactly where things go wrong.  Presumably there's something in
predtest.c that's not quite smart enough.

            regards, tom lane

Re: varchar does not work too well with IS NOT NULL partial indexes.

From
Tom Lane
Date:
"Dawid Kuroczko" <qnex42@gmail.com> writes:
> Basically it looks like planner makes better use of
> WHERE ... IS NOT NULL indexes if either you explicitly
> put "text" as a column type or that you cast the column
> to ::text when making index.

I've applied a patch for this.

            regards, tom lane