Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Custom compression methods
Date
Msg-id CAFiTN-shBZoHjb=C5L6j9ddKcn533VUuLrFsX6Z2TCiAiSx+6w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Custom compression methods  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: [HACKERS] Custom compression methods  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Mon, Jan 11, 2021 at 11:00 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Sun, Jan 10, 2021 at 10:59 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Mon, Jan 04, 2021 at 04:57:16PM +0530, Dilip Kumar wrote:
> > > On Mon, Jan 4, 2021 at 6:52 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > And fails pg_upgrade check, apparently losing track of the compression (?)
> > > >
> > > >  CREATE TABLE public.cmdata2 (
> > > > -    f1 text COMPRESSION lz4
> > > > +    f1 text
> > > >  );
> > >
> > > I did not get this?  pg_upgrade check is passing for me.
> >
> > I realized that this was failing in your v16 patch sent Dec 25.
> > It's passing on current patches because they do "DROP TABLE cmdata2", but
> > that's only masking the error.

I tested specifically pg_upgrade by removing all the DROP table and MV
and it is passing.  I don't see the reason why should it fail.  I mean
after the upgrade why COMPRESSION lz4 is missing?


> > I think this patch needs to be specifically concerned with pg_upgrade, so I
> > suggest to not drop your tables and MVs, to allow the pg_upgrade test to check
> > them.  That exposes this issue:
>
> Thanks for the suggestion I will try this.
>
> > pg_dump: error: Error message from server: ERROR:  cache lookup failed for access method 36447
> > pg_dump: error: The command was: COPY public.cmdata (f1) TO stdout;
> > pg_dumpall: error: pg_dump failed on database "regression", exiting
> > waiting for server to shut down.... done
> > server stopped
> > pg_dumpall of post-upgrade database cluster failed
> >
> > I found that's the AM's OID in the old clsuter:
> > regression=# SELECT * FROM pg_am WHERE oid=36447;
> >   oid  | amname |  amhandler  | amtype
> > -------+--------+-------------+--------
> >  36447 | pglz2  | pglzhandler | c
> >
> > But in the new cluster, the OID has changed.  Since that's written into table
> > data, I think you have to ensure that the compression OIDs are preserved on
> > upgrade:
> >
> >  16755 | pglz2  | pglzhandler          | c
>
> Yeah, basically we are storing am oid in the compressed data so Oid
> must be preserved.  I will look into this and fix it.

On further analysis, if we are dumping and restoring then we will
compress the data back while inserting it so why would we need to old
OID.  I mean in the new cluster we are inserting data again so it will
be compressed again and now it will store the new OID.  Am I missing
something here?

> > In my brief attempt to inspect it, I got this crash:
> >
> > $ tmp_install/usr/local/pgsql/bin/postgres -D src/bin/pg_upgrade/tmp_check/data &
> > regression=# SELECT pg_column_compression(f1) FROM cmdata a;
> > server closed the connection unexpectedly

I tried to test this after the upgrade but I can get the proper value.

Laptop309pnin:bin dilipkumar$ ./pg_ctl -D
/Users/dilipkumar/Documents/PG/custom_compression/src/bin/pg_upgrade/tmp_check/data.old/
start
waiting for server to start....2021-01-11 11:53:28.153 IST [43412]
LOG:  starting PostgreSQL 14devel on x86_64-apple-darwin19.6.0,
compiled by Apple clang version 11.0.3 (clang-1103.0.32.62), 64-bit
2021-01-11 11:53:28.170 IST [43412] LOG:  database system is ready to
accept connections
 done
server started

Laptop309pnin:bin dilipkumar$ ./psql -d regression
regression[43421]=# SELECT pg_column_compression(f1) FROM cmdata a;
 pg_column_compression
-----------------------
 lz4
 lz4
 pglz2
(3 rows)

Manual test: (dump and load on the new cluster)
---------------
postgres[43903]=# CREATE ACCESS METHOD pglz2 TYPE COMPRESSION HANDLER
pglzhandler;
CREATE ACCESS METHOD

postgres[43903]=# select oid from pg_am where amname='pglz2';
  oid
-------
 16384
(1 row)

postgres[43903]=# CREATE TABLE cmdata_test(f1 text COMPRESSION pglz2);
CREATE TABLE
postgres[43903]=# INSERT INTO cmdata_test
VALUES(repeat('1234567890',1000));
INSERT 0 1
postgres[43903]=# SELECT pg_column_compression(f1) FROM cmdata_test;
pg_column_compression
-----------------------
 pglz2
(1 row)

Laptop309pnin:bin dilipkumar$ ./pg_dump -d postgres > 1.sql

—restore on new cluster—
postgres[44030]=# select oid from pg_am where amname='pglz2';
  oid
-------
 16385
(1 row)

postgres[44030]=# SELECT pg_column_compression(f1) FROM cmdata_test;
 pg_column_compression
-----------------------
 pglz2
(1 row)

You can see on the new cluster the OID of the pglz2 is changed but
there is no issue.   Is it possible for you to give me a
self-contained test case to reproduce the issue or a theory that why
it should fail?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding
Next
From: Justin Pryzby
Date:
Subject: Re: [HACKERS] Custom compression methods