Re: BUG #15475: Views over CITEXT columns return no data - Mailing list pgsql-bugs

From Paul Schaap
Subject Re: BUG #15475: Views over CITEXT columns return no data
Date
Msg-id FF0E68A0-3266-48CF-BE8E-B2E03AB3EE9F@ipggroup.com
Whole thread Raw
In response to Re: BUG #15475: Views over CITEXT columns return no data  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: BUG #15475: Views over CITEXT columns return no data
List pgsql-bugs
Hi Andrew,

I agree it seems implausible to me too however it is 100% reproducible, see below track of it happening, I have changed the names of things to protect the innocent ;-) :

my_server.rds.amazonaws.com = Postgres 10.5 RDS
Local Postgres 11.0 on OSX, also tried on Ubuntu with the same data and it exhibited the same issue

$ psql -c "DROP TABLE my_lookup CASCADE;" my_databasename
NOTICE:  drop cascades to view my_events
DROP TABLE

$ pg_dump -h my_server.rds.amazonaws.com -U my_username -d my_databasename -t my_lookup -Fc | pg_restore -d my_databasename
Password: 
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 225; 1259 27069 TABLE my_lookup my_username
pg_restore: [archiver (db)] could not execute query: ERROR:  role "my_username" does not exist
    Command was: ALTER TABLE public.my_lookup OWNER TO my_username;


WARNING: errors ignored on restore: 1


$ psql -c "CREATE VIEW my_events AS SELECT * FROM my_events_raw LEFT OUTER JOIN my_lookup ON my_citext::INT = my_int;" my_databasename
CREATE VIEW


HERE IS THE STRANGE ANSWER:

$ psql -c "SELECT COUNT(*)                                                                                                          
FROM my_events
WHERE my_date >= '2018-08-01'
AND   my_date <  '2018-09-01'
AND   type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
 count 
-------
     0
(1 row)

BUT HANG ON WE DO GET DATA IF I LIMIT:

$ psql -c "SELECT *                                                                                                                 
FROM my_events
WHERE my_date >= '2018-08-01'
AND   my_date <  '2018-09-01'
AND   type_id NOT IN ('A','LIST','OF','NOTS') LIMIT 10;" my_databasename

<snip>

(10 rows)

OK LETS RELOAD USING A SQL METHOD:

$ psql -c "DROP TABLE my_lookup CASCADE;" my_databasename
NOTICE:  drop cascades to view my_events
DROP TABLE

$ pg_dump -h my_server.rds.amazonaws.com -U my_username -d my_databasename -t my_lookup | psql -d my_databasename   
Password: 
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)

SET
SET
SET
SET
SET
CREATE TABLE
ERROR:  role "my_username" does not exist
COPY 355109
ALTER TABLE
CREATE INDEX

JUST SO YOU KNOW HOW MANY my_events THERE ARE AS THIS RELIES ON SOME SCALE:

$ psql -c "SELECT COUNT(*) FROM my_events_raw;" my_databasename
   count   
-----------
 162430247
(1 row)

$ psql -c "CREATE VIEW my_events AS SELECT * FROM my_events_raw LEFT OUTER JOIN my_lookup ON my_citext::INT = my_int;" my_databasename
CREATE VIEW

BINGO, THE RIGHT ANSWER:

$ psql -c "SELECT COUNT(*)                           
FROM my_events
WHERE my_date >= '2018-08-01'
AND   my_date <  '2018-09-01'
AND   type_id NOT IN ('A','LIST','OF','NOTS');" my_databasename
 count  
--------
 168878
(1 row)

The only other clues I have for you are :
  • my_events_raw is partitioned by month, e.g. FOR VALUES FROM ('2018-02-01 00:00:00+00' ) TO ('2018-03-01 00:00:00+00');
  • that if I switch the my_citext from CITEXT to TEXT column magically works again, and switching it back again to CITEXT breaks it again
  • That if I reload the table using any non -Fc method it all works, also no matter if I drop or truncate and load the table
  • There are no NULL my_int's, there are NULL my_citext's
  • That if I reverse my_citext::INT = my_int to my_citext = my_int::CITEXT it also works correctly
Is there perhaps any other logging or log level I could set that may help?

I know this is baffling and is a weird edge case, and there are plenty of work arounds that I found, but maybe it hints at another issue so seems worth investigating especially for a .0 release.

CHEERS
Paul

On 2 Nov 2018, at 09:26, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:

"Paul" == Paul Schaap <ps@ipggroup.com> writes:

Paul> Hi Andrew,

Paul> Yes there are indexes on both tables, and the issue only shows up
Paul> with high volumes.

Does it show up with low volumes too if you do  set enable_seqscan=false;

Paul> Note I have subsequently isolated the issue down to a pg_dump 10
Paul> to pg_restore 11 using -Fc.

No, I don't think you have. In particular, it's not plausible that using
-Fc makes any substantive difference, since the commands run for the
restore will not be any different to restoring a plain-text dump.

--
Andrew.


PRIVACY & CONFIDENTIALITY NOTICE

The information contained in this email is intended for the named recipients  only.  It may contain privileged and confidential information and if you are  not the intended recipient, you should not copy it or disclose its contents  to any other person, or take any action in reliance on it.  If you have  received this email in error, please notify us immediately by return email.

pgsql-bugs by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: BUG #15475: Views over CITEXT columns return no data
Next
From: Andrew Gierth
Date:
Subject: Re: BUG #15475: Views over CITEXT columns return no data