Re: WIP - xmlvalidate implementation from TODO list - Mailing list pgsql-hackers

From Marcos Magueta
Subject Re: WIP - xmlvalidate implementation from TODO list
Date
Msg-id CAN3aFCe_cBshj0rb7J8yoT+fRHOBOZmk-m8V7DMLDe0ZjSgjcA@mail.gmail.com
Whole thread Raw
In response to Re: WIP - xmlvalidate implementation from TODO list  (Marcos Magueta <maguetamarcos@gmail.com>)
List pgsql-hackers
Hello there, Postgres hackers!

Since we last talked I tried to actually implement a catalog for XML schemas. It was a bit challenging personally.

I particularly had an issue with SYSCACHE so I can search the xmlschema entries. I mostly tried to follow examples around the codebase (specially when it comes to security, which I mostly mirrored from sequences), so if I am misusing something, that's likely the problem, so please point that out. Some highlights of the changes are:

What's implemented so far:
  - New pg_xmlschema catalog (OIDs 6434 to 6438) for storing XML schema definitions
  - The validation function itself: XMLVALIDATE(DOCUMENT xml_expr ACCORDING TO XMLSCHEMA xml_schema_name)
  - DDL support: CREATE XMLSCHEMA, DROP XMLSCHEMA, ALTER XMLSCHEMA (with IF NOT EXISTS, RENAME, SET SCHEMA, OWNER TO)
  - Two syscaches (XMLSCHEMAOID and XMLSCHEMANAMENSP) for lookups during parsing and execution
  - Dependency tracking so schemas can't be dropped while views/rules reference them (I think I even added a test for it, but it's worth checking more closely)
  - Integration with PostgreSQL's object system (event triggers, security labels, permissions) -- I was mostly following what was in place already and adding what seemed necessary
  - Grants and role permissions

The flow is the following: schemas are registered via CREATE XMLSCHEMA, then at parse time the schema name gets resolved to an OID (with dependencies recorded), and finally at execution time the schema is fetched by OID and used for validation. Please note that I only implemented DOCUMENT mode for now, as it seemed like the most common use case and kept things simpler. CONTENT mode by the standard seems to require another keyword to make things useful (ELEMENT), which is a bit problematic since it's a fairly used word to conflict on any database (it broke on the current tests when I ran already, so...). With the ELEMENT we could do things like: XMLVALIDATE(CONTENT '<author>C. J. Date</author>' ACCORDING TO XMLSCHEMA book ELEMENT author), which is fairly powerful if you want to validate just parts of the xml document, but again, it's scrapped from this version because I wanted to pull my hair out dealing with it.

Following the SQL/XML standard, XMLVALIDATE returns the validated XML, not a boolean anymore, and raises ERROR on validation failure. So now we can do things like INSERT INTO r VALUES (XMLVALIDATE(...)) and other cool compositions.

The documentation is still tracking the old xmlvalidate behavior and nothing about the catalog changes.I can fix all that before submission if the overall approach looks reasonable.

Permissions are a bit trickier, I thought of just relying on USAGE, since the question for an XML validation is "Can I use this XML schema in a call?". I think FDW works like that currently. I tried adding a bunch of examples on xml.sql.

One change I specifically wanted feedback on are the changes on parse_expr.c. Since I need to resolve the name of the schema, I thought doing so at parsing time was the correct approach. In there I extract schema name before the ResTarget loop runs, resolve it to OID using get_xmlschema_oid(), save the string form in newx->name for deparsing (so views can be displayed correctly) and finally clear x->named_args = NIL so the ResTarget loop below has nothing to process. This approach reuses the existing named_args field as temporary storage during parsing, then converts it properly during transformation. It felt strange, but it avoids adding another field to the XmlExpr structure just for XMLVALIDATE.

Also, I made the variables that get modified inside of the PG_TRY block volatile as it seems to be a pattern likely to force the compiler to do memory access instead of relying on registers during jumps.

I'm attaching the patch for another round of review (please excuse the lack of the linter, I tried running but had some issues). Any feedback would be really appreciated, this was quite an adventure.

PS: I noticed now that I haven't copied the changes over to xml1.out nor xml2.out. But This is just for the feature validation round, so I can fix that later.

Thanks for taking a look!
Attachment

pgsql-hackers by date:

Previous
From: "zengman"
Date:
Subject: [PATCH] Fix minor issues in astreamer_zstd.c
Next
From: Paul A Jungwirth
Date:
Subject: Re: SQL:2011 Application Time Update & Delete