Re: Does people favor to have matrix data type? - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Does people favor to have matrix data type?
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8011F7E14@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Does people favor to have matrix data type?  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Responses Re: Does people favor to have matrix data type?  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
> -----Original Message-----
> From: Gavin Flower [mailto:GavinFlower@archidevsys.co.nz]
> Sent: Tuesday, May 31, 2016 9:47 AM
> To: Kaigai Kouhei(海外 浩平); Joe Conway; Jim Nasby; Ants Aasma; Simon Riggs
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Does people favor to have matrix data type?
> 
> On 31/05/16 12:01, Kouhei Kaigai wrote:
> >> On 05/29/2016 04:55 PM, Kouhei Kaigai wrote:
> >>> For the closer integration, it may be valuable if PL/R and PL/CUDA can exchange
> >>> the data structure with no serialization/de-serialization when PL/R code tries
> >>> to call SQL functions.
> >> I had been thinking about something similar. Maybe PL/R can create an
> >> extension within the R environment that wraps PL/CUDA directly or at the
> >> least provides a way to use a fast-path call. We should probably try to
> >> start out with one common use case to see how it might work and how much
> >> benefit there might be.
> >>
> > My thought is the second option above. If SPI interface supports fast-path
> > like 'F' protocol, it may become a natural way for other PLs also to
> > integrate SQL functions in other languages.
> >
> >>> IIUC, pg.spi.exec("SELECT my_function(...)") is the only way to call SQL functions
> >> inside PL/R scripts.
> >>
> >> Correct (currently).
> >>
> >> BTW, this is starting to drift off topic I think -- perhaps we should
> >> continue off list?
> >>
> > Some elements are common for PostgreSQL (matrix data type and fastpath SPI
> > interface). I like to keep the discussion on the list.
> > Regarding to the PoC on a particular use case, it might be an off-list
> > discussion.
> >
> > Thanks,
> > --
> > NEC Business Creation Division / PG-Strom Project
> > KaiGai Kohei <kaigai@ak.jp.nec.com>
> >
> Possibly there should be two matrix types in Postgres: the first would
> be the default and optimized for small dense matrices, the second would
> store large sparse matrices efficiently in memory at the expensive of
> speed (possibly with one or more parameters relating to how sparse it is
> likely to be?) - for appropriate definitions 'small' & 'large', though
> memory savings for the latter type might not kick in unless the matrices
> are big enough (so a small sparse matrix might consume more memory than
> a nominally larger dense matrix type & a sparse matrix might have to be
> sufficiently sparse to make real memory savings at any size).
>
One idea in my mind is that a sparse matrix is represented as a grid
of a smaller matrixes, and omit all-zero area. It looks like indirect
pointer reference. The header of matrix type has offset values to
each grid. If offset == 0, it means this grid contains all-zero.

Due to performance reason, location of each element must be deterministic
without walking on the data structure. This approach guarantees we can
reach individual element with 2 steps.

A flat matrix can be represented as a special case of the sparse matrix.
If entire matrix is consists of 1x1 grid, it is a flat matrix.
We may not need to define two individual data types.

> Probably good to think of 2 types at the start, even if the only the
> first is implemented initially.
>
I agree.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: IPv6 link-local addresses and init data type
Next
From: Amit Langote
Date:
Subject: Re: foreign table batch inserts