Re: WIP - xmlvalidate implementation from TODO list - Mailing list pgsql-hackers

From Marcos Magueta
Subject Re: WIP - xmlvalidate implementation from TODO list
Date
Msg-id CAN3aFCcUFLbdVBoL6c2bMh4r5P9EnXM9eBsX8+ZyER7YBSDUtA@mail.gmail.com
Whole thread Raw
In response to Re: WIP - xmlvalidate implementation from TODO list  (Jim Jones <jim.jones@uni-muenster.de>)
Responses Re: WIP - xmlvalidate implementation from TODO list
List pgsql-hackers
Hey Jim,

> Any particular reason for that? If not, take a look at other options, e.g. a_expr
No particular reason apart from it being simpler since I didn't need to invoke an execution at the cmd. Changed it now.

> Why did you choose text over xml for schemadata?
My original thought was that XML schemas require additional validation in contrast to normal XML, but it being additive, we would have redundant checks. But in reconsideration, perhaps keeping the field with an XML type is more intuitive for anyone introspecting over the catalog. Also applied the change on the latest version of the patch.

Just a question about something I am very curious about the previous patch I sent:
I noticed DefineXmlSchema() calls IsThereXmlSchemaInNamespace() right after XmlSchemaCreate() returns a valid OID. Since XmlSchemaCreate() already inserted the tuple into the catalog (via CatalogTupleInsert at pg_xmlschema.c:166), wouldn't SearchSysCacheExists2() find it and always throw "already exists"? We all tested the original code and it worked fine, so I'm missing something about syscache visibility or timing; that was an early function I did to check for duplicates that ended up in the wrong place. I removed the call (and function) as I judged it to be redundant (the duplicate check already happens inside XmlSchemaCreate()), but is there something subtle about intra-command visibility I'm not understanding? If anyone knows, please let me know.

I tried to split the patch into multiple, as recommended, but there might still exist some overlaps when it comes to the division of the implementation of XMLVALIDATE and XMLSCHEMA. I put the docs and tests on the last patch, as I had issues amending.

Also, I added tab completion on psql and fixed pg_dump.

Regards, Marcos Magueta


Em qua., 14 de jan. de 2026 às 07:10, Jim Jones <jim.jones@uni-muenster.de> escreveu:


On 14/01/2026 02:23, Marcos Magueta wrote:
> Please follow the updated version attached.

A few comments:

== grammar ==

In gram.y you restricted CREATE XMLSCHEMA to Sconst:

DO $$
DECLARE xsd text := '<foo></foo>';
BEGIN
  CREATE XMLSCHEMA person_schema AS xsd;
END $$;
ERROR:  syntax error at or near "xsd"
LINE 4:   CREATE XMLSCHEMA person_schema AS xsd;

Any particular reason for that? If not, take a look at other options,
e.g. a_expr

== pg_xmlschema ==

Why did you choose text over xml for schemadata?

postgres=# \d pg_xmlschema
               Table "pg_catalog.pg_xmlschema"
     Column      |   Type    | Collation | Nullable | Default
-----------------+-----------+-----------+----------+---------
 oid             | oid       |           | not null |
 schemaname      | name      |           | not null |
 schemanamespace | oid       |           | not null |
 schemaowner     | oid       |           | not null |
 schemadata      | text      | C         | not null |
 schemaacl       | aclitem[] |           |          |
Indexes:
    "pg_xmlschema_oid_index" PRIMARY KEY, btree (oid)
    "pg_xmlschema_name_nsp_index" UNIQUE CONSTRAINT, btree (schemaname,
schemanamespace)


== psql command to display and list xml schemas ==

Not a requirement for this patch (specially not at the current stage),
but you should add it to your TODO list.

\dz
\dz foo
\dz+ foo

* z here is just an example

== tab completion ==

CREATE <TAB> should suggest XMLSCHEMA and CREATE XML<TAB> should
autocomplete CREATE XMLSCHEMA. The same applies for DROP XMLSCHEMA [IF
EXISTS], where it should additionally list the available schemas after
DROP XMLSCHEMA <TAB>.

== white-space warnings ==

The patch does not apply cleanly:

/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:594:
indent with spaces.
                Oid schemanamespace,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:595:
indent with spaces.
                Oid schemaowner,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:596:
indent with spaces.
                const char *schemadata,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:597:
indent with spaces.
                bool if_not_exists,
/home/jim/patches/xmlvalidate/0002-xmlschema-catalog-and-xmlvalidate.patch:598:
indent with spaces.
                bool quiet)
warning: squelched 139 whitespace errors
warning: 144 lines add whitespace errors.

== file naming ==

Your patch suggests that it is part of a patch set, from which 0001 is
missing. In case you meant a version 2 of the previous patch, a better
format would be

v2-0001-xmlschema-catalog-and-xmlvalidate.patch

which can be generated with

$ git format-patch -1 -v2

== xml_1.out not updated ==

After every change in xml.sql you must create an equivalent file for a
postgres compiled without --with-libxml, and put the changes in
xml_1.out.[1]

== corrupt pg_dump ==

I understand we agreed to work on XMLVALIDATE only after CREATE
XMLSCHEMA is settled, but since the code is partially already there, you
might wanna take a look at pg_dump. It is not serialising the CREATE
XMLSCHEMA statements:

$ /usr/local/postgres-dev/bin/psql postgres
psql (19devel)
Type "help" for help.

postgres=# CREATE XMLSCHEMA person_schema AS '<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="person">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="name" type="xs:string"/>
        <xs:element name="age" type="xs:integer"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>';
CREATE XMLSCHEMA
postgres=# CREATE VIEW v AS
 SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml
 ACCORDING TO XMLSCHEMA person_schema);
CREATE VIEW
postgres=# \q

$ /usr/local/postgres-dev/bin/pg_dump postgres
--
-- PostgreSQL database dump
--

\restrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd

-- Dumped from database version 19devel
-- Dumped by pg_dump version 19devel

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET transaction_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

--
-- Name: test_xmlschema_ns; Type: SCHEMA; Schema: -; Owner: jim
--

CREATE SCHEMA test_xmlschema_ns;


ALTER SCHEMA test_xmlschema_ns OWNER TO jim;

--
-- Name: v; Type: VIEW; Schema: public; Owner: jim
--

CREATE VIEW public.v AS
 SELECT XMLVALIDATE(DOCUMENT '<bar></bar>'::xml ACCORDING TO XMLSCHEMA
person_schema) AS "xmlvalidate";


ALTER VIEW public.v OWNER TO jim;

--
-- PostgreSQL database dump complete
--

\unrestrict WLaIQWmNJVW2yc4Jv5W81qh2TZGHnupJlpl4Urm4Pp6Ku3VPyH5dO3ReFc4LMmd

Take a look at pg_dump.c. You might need a new function, e.g.
dumpXmlSchemas(Archive *fout, const SchemaInfo *schemaInfo)

== patch structure ==

To make the review a bit easier, I suggest to split this patch into a
patch set with **at least 4** smaller patches - the more seasoned
hackers here might correct me if I am wrong. For instance:

0001 - CREATE XMLSCHEMA (code + tests + documentation)
0002 - pg_dump changes to output CREATE XMLSCHEMA
0003 - psql tab completion + new command to display and list xml schemas
0004 - XMLVALIDATE (code + tests + documentation)


Thanks for working on this!

Best, Jim

[1] https://cirrus-ci.com/task/4872456290172928?logs=check_world#L126
Attachment

pgsql-hackers by date:

Previous
From: Ilia Evdokimov
Date:
Subject: Re: [PATCH] ANALYZE: hash-accelerate MCV tracking for equality-only types
Next
From: Neil Conway
Date:
Subject: Re: Speed up COPY FROM text/CSV parsing using SIMD