Re: Custom Scan APIs (Re: Custom Plan node) - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: Custom Scan APIs (Re: Custom Plan node)
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8F7FBEC@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: Custom Scan APIs (Re: Custom Plan node)  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Custom Scan APIs (Re: Custom Plan node)  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
> > > If you're looking to just use GPU acceleration for improving
> > > individual queries, I would think that Robert's work around backend
> > > workers would be a more appropriate way to go, with the ability to
> > > move a working set of data from shared buffers and on-disk
> > > representation of a relation over to the GPU's memory, perform the
> operation, and then copy the results back.
> > >
> > The approach is similar to the Robert's work except for GPU adoption,
> > instead of multicore CPUs. So, I tried to review his work to apply the
> > facilities on my extension also.
>
> Good, I'd be very curious to hear how that might solve the issue for you,
> instead of using hte CustomScan approach..
>
I (plan to) use custom-scan of course. Once a relation is referenced
and optimizer decided GPU acceleration is cheaper, associated custom-
scan node read the data from underlying relation (or in-memory cache
if exists) then move to the shared memory buffer to deliver GPU
management background worker that launches asynchronous DMA one by one.
After that, custom-scan node receives filtered records via shared-
memory buffer, so it can construct tuples to be returned to the upper
node.

> > > "regular" PG tables, just to point out one issue, can be locked on a
> > > row-by-row basis, and we know exactly where in shared buffers to go
> > > hunt down the rows.  How is that going to work here, if this is both
> a "regular"
> > > table and stored off in a GPU's memory across subsequent queries or
> > > even transactions?
> > >
> > It shall be handled "case-by-case" basis, I think. If row-level lock
> > is required over the table scan, custom-scan node shall return a tuple
> > being located on the shared buffer, instead of the cached tuples. Of
> > course, it is an option for custom-scan node to calculate qualifiers
> > by GPU with cached data and returns tuples identified by ctid of the cached
> tuples.
> > Anyway, it is not a significant problem.
>
> I think you're being a bit too hand-wavey here, but if we're talking about
> pre-scanning the data using PG before sending it to the GPU and then only
> performing a single statement on the GPU, we should be able to deal with
> it.
It's what I want to implement.

> I'm worried about your ideas to try and cache things on the GPU though,
> if you're not prepared to deal with locks happening in shared memory on
> the rows you've got cached out on the GPU, or hint bits, or the visibility
> map being updated, etc...
>
It does not remain any state/information on the GPU side. Things related
to PG internal stuff is job of CPU.

> > OK, I'll move the portion that will be needed commonly for other FDWs
> > into the backend code.
>
> Alright- but realize that there may be objections there on the basis that
> the code/structures which you're exposing aren't, and will not be, stable.
> I'll have to go back and look at them myself, certainly, and their history.
>
I see, but it is a process during code getting merged.

> > Yes. According to the previous discussion around postgres_fdw getting
> > merged, all we can trust on the remote side are built-in data types,
> > functions, operators or other stuffs only.
>
> Well, we're going to need to expand that a bit for aggregates, I'm afraid,
> but we should be able to define the API for those aggregates very tightly
> based on what PG does today and require that any FDW purporting to provides
> those aggregates do it the way PG does.  Note that this doesn't solve all
> the problems- we've got other issues with regard to pushing aggregates down
> into FDWs that need to be solved.
>
I see. It probably needs more detailed investigation.

> > The custom-scan node is intended to perform on regular relations, not
> > only foreign tables. It means a special feature (like GPU
> > acceleration) can perform transparently for most of existing
> > applications. Usually, it defines regular tables for their work on
> > installation, not foreign tables. It is the biggest concern for me.
>
> The line between a foreign table and a local one is becoming blurred already,
> but still, if this is the goal then I really think the background worker
> is where you should be focused, not on this Custom Scan API.  Consider that,
> once we've got proper background workers, we're going to need new nodes
> which operate in parallel (or some other rejiggering of the nodes- I don't
> pretend to know exactly what Robert is thinking here, and I've apparently
> forgotten it if he's posted it
> somewhere) and those interfaces may drive changes which would impact the
> Custom Scan API- or worse, make us deprecate or regret having added it
> because now we'll need to break backwards compatibility to add in the
> parallel node capability to satisfy the more general non-GPU case.
>
The custom-scan API is thin abstraction towards the plan node interface,
not tightly convinced with a particular use case, like GPU, remote-join
and so on. So, I'm quite optimistic for the future maintainability.
Also, please remind the discussion at the last developer meeting.
The purpose of custom-scan (we didn't name it at that time) is to avoid
unnecessary project branch for people who want to implement their own
special feature but no facilities to enhance optimizer/executor are
supported.
Even though we have in-core parallel execution feature by CPU, it also
makes sense to provide some unique implementation that may be suitable
for a specific region.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Unfortunate choice of short switch name in pgbench
Next
From: Shigeru Hanada
Date:
Subject: Re: Custom Scan APIs (Re: Custom Plan node)