Re: Does people favor to have matrix data type? - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Does people favor to have matrix data type?
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8011F6486@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Does people favor to have matrix data type?  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: Does people favor to have matrix data type?  (Joe Conway <mail@joeconway.com>)
List pgsql-hackers
> On 5/25/16 7:46 AM, Kouhei Kaigai wrote:
> > My only concern is that domain type is not allowed to define type cast.
> > If we could add type cast on domain, we can define type transformation from
> > other array type to matrix.
> 
> I've actually wished for that in the past, as well as casting to
> compound types. Having those would make it easier to mock up a real data
> type for experimentation.
> 
> I strongly encourage you to talk to the MADlib community about
> first-class matrix support. They currently emulate matrices via arrays.
>
A MADLib folks contacted me in the past; when I initially made PG-Strom
several years before, however, no communication with them after that.

https://madlib.incubator.apache.org/docs/v1.8/group__grp__arraysmatrix.html
According to the documentation, their approach is different from what
I'd like to implement. They use a table as if a matrix. Because of its
data format, we have to adjust position of the data element to fit
requirement by high performance matrix libraries (like cuBLAS).

> I don't know offhand if they support NULLs in their regular matrices.
> They also support a sparsematrix "type" that is actually implemented as
> a table that contains coordinates and a value for each value in the
> matrix. Having that as a type might also be interesting (if you're
> sparse enough, that will be cheaper than the current NULL bitmap
> implementation).
>
Sparse matrix! It is a disadvantaged area for the current array format.

I have two ideas. HPC folks often split a large matrix into multiple
grid. A grid is typically up to 1024x1024 matrix, for example.
If a grid is consists of all zero elements, it is obvious we don't need
to have individual elements on the grid.
One other idea is compression. If most of matrix is zero, it is an ideal
data for compression, and it is easy to reconstruct only when calculation.

> Related to this, Tom has mentioned in the past that perhaps we should
> support abstract use of the [] construct. Currently point finds a way to
> make use of [], but I think that's actually coded into the grammar.
>
Yep, if we consider 2D-array is matrix, no special enhancement is needed
to use []. However, I'm inclined to have own data structure for matrix
to present the sparse matrix.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Statement timeout
Next
From: Tom Lane
Date:
Subject: Re: Parallel pg_dump's error reporting doesn't work worth squat