Appendix D. Special Data Types in pgpro_axe

pgpro_axe introduces a number of special Postgres Pro data types. You must not create these data types explicitly, but they can be displayed in error messages from Postgres Pro.

duckdb.row #

The duckdb.row data type is returned by certain functions, such as read_parquet, read_csv, and scan_iceberg. Depending on the arguments, these functions can return rows with different columns and data types. Currently, Postgres Pro does not fully support such functions, so a custom type is returned. To get columns out of these rows, use the square bracket indexing syntax, similarly to how you get the field:

Example D.1. 

SELECT r['id'], r['name'] FROM read_parquet('file.parquet') r WHERE r['age'] > 21;

Using SELECT * results in the columns of this row being expanded, so your query result never has a column that has duckdb.row as its data type:

Example D.2. 

SELECT * FROM read_parquet('file.parquet');

There are certain limitations associated with using a function that returns duckdb.row in a CTE or subquery. The main problem is that pgpro_axe cannot automatically assign aliases to the selected columns from the row.

While this query without a CTE or subquery returns the r[company] column as company:

Example D.3. 

SELECT r['company']
FROM duckdb.query($$ SELECT 'DuckDB Labs' company $$) r;
--    company
-- ─────────────
--  DuckDB Labs

The same query in a CTE or subquery returns the column as r:

Example D.4. 

WITH mycte AS (
SELECT r['company']
FROM duckdb.query($$ SELECT 'DuckDB Labs' company $$) r
)
SELECT * FROM mycte;
--       r
-- ─────────────
--  DuckDB Labs

This is easy to work around by adding an explicit alias to the column in the CTE or subquery:

Example D.5. 

WITH mycte AS (
SELECT r['company'] AS company
FROM duckdb.query($$ SELECT 'DuckDB Labs' company $$) r
)
SELECT * FROM mycte;
--    company
-- ─────────────
--  DuckDB Labs

Another limitation is that if when using SELECT * inside the CTE or subquery, you want to reference a specific column outside the CTE or subquery, you still need to use the r['column_name'] syntax instead of column_name.

While this query works as expected:

Example D.6. 

WITH mycte AS (
SELECT *
FROM duckdb.query($$ SELECT 'DuckDB Labs' company $$) r
)
SELECT * FROM mycte;
--    company
-- ─────────────
--  DuckDB Labs

The following query outputs an error:

Example D.7. 

WITH mycte AS (
SELECT *
FROM duckdb.query($$ SELECT 'DuckDB Labs' company $$) r
)
SELECT * FROM mycte WHERE company = 'DuckDB Labs';
-- ERROR:  42703: column "company" does not exist
-- LINE 5: SELECT * FROM mycte WHERE company = 'DuckDB Labs';
--                                   ^
-- HINT:  If you use DuckDB functions like read_parquet, you need to use the r['colname'] syntax to use columns. If you are already doing that, maybe you forgot to to give the function the r alias.

This is easy to work around by using the following r['column_name'] syntax:

Example D.8. 

> WITH mycte AS (
SELECT *
FROM duckdb.query($$ SELECT 'DuckDB Labs' company $$) r
)
SELECT * FROM mycte WHERE r['company'] = 'DuckDB Labs';
--    company
-- ─────────────
--  DuckDB Labs

duckdb.unresolved_type #

The duckdb.unresolved_type data type is used to make Postgres Pro process an expression for which the data type is not known at query parse time. This is the data type of any of the columns extracted from a duckdb.row using the r['column_name'] syntax. Many operators and aggregates return duckdb.unresolved_type when one of the sides of the operator has the duckdb.unresolved_type data type, for example, r['age'] + 10.

The actual data type is specified once DuckDB executes a query. The query result never contains a column that has duckdb.unresolved_type as its data type.

You can receive errors indicating that functions or operators do not exist for the duckdb.unresolved_type data type:

Example D.9. 

ERROR:  function some_func(duckdb.unresolved_type) does not exist
LINE 6:  some_func(r['somecol']) as somecol

In such cases, a simple workaround is often to add an explicit cast to the data type that the function accepts, such as some_func(r['somecol']::text) as somecol.

duckdb.json #

The duckdb.json data type is used for arguments to DuckDB JSON functions. This type exists so that these functions can take json, jsonb, and duckdb.unresolved_type values.