Thread: meson vs. llvm bitcode files

meson vs. llvm bitcode files

From
Peter Eisentraut
Date:
The meson build currently does not produce llvm bitcode (.bc) files. 
AFAIK, this is the last major regression for using meson for production 
builds.

Is anyone working on that?  I vaguely recall that some in-progress code 
was shared a couple of years ago, but I haven't seen anything since.  It 
would be great if we could collect any existing code and notes to maybe 
get this moving again.



Re: meson vs. llvm bitcode files

From
Nazir Bilal Yavuz
Date:
Hi,

On Thu, 5 Sept 2024 at 11:56, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> The meson build currently does not produce llvm bitcode (.bc) files.
> AFAIK, this is the last major regression for using meson for production
> builds.
>
> Is anyone working on that?  I vaguely recall that some in-progress code
> was shared a couple of years ago, but I haven't seen anything since.  It
> would be great if we could collect any existing code and notes to maybe
> get this moving again.

I found that Andres shared a patch
(v17-0021-meson-Add-LLVM-bitcode-emission.patch) a while ago [1].

[1] https://www.postgresql.org/message-id/20220927011951.j3h4o7n6bhf7dwau%40awork3.anarazel.de

-- 
Regards,
Nazir Bilal Yavuz
Microsoft



Re: meson vs. llvm bitcode files

From
Nazir Bilal Yavuz
Date:
Hi,

On Thu, 5 Sept 2024 at 12:24, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> I found that Andres shared a patch
> (v17-0021-meson-Add-LLVM-bitcode-emission.patch) a while ago [1].

Andres and I continued to work on that. I think the patches are in
sharable state now and I wanted to hear opinions before proceeding
further. After applying the patches, bitcode files should be installed
into $pkglibdir/bitcode/ directory if the llvm is found.

There are 6 patches attached:

v1-0001-meson-Add-generated-header-stamps:

This patch is trivial. Instead of having targets depending directly on
the generated headers, have them depend on a stamp file. The benefit
of using a stamp file is that it makes ninja.build smaller and meson
setup faster.
----------

v1-0002-meson-Add-postgresql-extension.pc-for-building-extension-libraries:

This patch is for generating postgresql-extension.pc file which can be
used for building extensions libraries.

Normally, there is no need to use this .pc file for generating bitcode
files. However, since there is no clear way to get all include paths
for building bitcode files, this .pc file is later used for this
purpose (by running pkg-config --cflags-only-I
postgresql-extension-uninstalled.pc) [1].
----------

v1-0003-meson-Test-building-extensions-by-using-postgresql-extension.pc:
[Not needed for generating bitcode files]

This is a patch for testing if extensions can be built by using
postgresql-extension.pc. I added that commit as an example of using
postgresql-extension.pc to build extensions.
----------

v1-0004-meson-WIP-Add-docs-for-postgresql-extension.pc: [Not needed
for generating bitcode files]

I added this patch in case we recommend people to use
postgresql-extension.pc to build extension libraries. I am not sure if
we want to do that because there are still TODOs about
postgresql-extension.pc like running test suites. I just wanted to
show my plan, dividing 'Extension Building Infrastructure' into two,
'PGXS' and 'postgresql-extension.pc'.
----------

v1-0005-meson-Add-LLVM-bitcode-emission:

This patch adds required infrastructure to generate bitcode files and
uses postgresql-extension-uninstalled.pc to get include paths for
generating bitcode files [1].
----------

v1-0006-meson-Generate-bitcode-files-of-contrib-extension.patch:

This patch adds manually selected contrib libraries to generate their
bitcode files. These libraries are selected manually, depending on
- If they have SQL callable functions
- If the library functions are short enough (the performance gain from
bitcode files is too minimal compared to the function's run time, so
this type of libraries are omitted).

Any kind of feedback would be appreciated.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

Re: meson vs. llvm bitcode files

From
Diego Fronza
Date:
Hello,

I did a full review on the provided patches plus some tests, I was able to validate that the loading of bitcode modules is working also JIT works for both backend and contrib modules.

To test JIT on contrib modules I just lowered the costs for all jit settings and used the intarray extension, using the data/test__int.data:
CREATE EXTENSION intarray;
CREATE TABLE test__int( a int[] );1
\copy test__int from 'data/test__int.data'

For queries any from line 98+ on contrib/intarray/sql/_int.sql will work.

Then I added extra debug messages to llvmjit_inline.cpp on add_module_to_inline_search_path() function, also on llvm_build_inline_plan(), I was able to see many functions in this module being successfully inlined.

I'm attaching a new patch based on your original work which add further support for generating bitcode from:
 - Generated backend sources: processed by flex, bison, etc.
 - Generated contrib module sources, 

On this patch I just included fmgrtab.c and src/backend/parser for the backend generated code.
For contrib generated sources I added contrib/cube as an example.

All relevant details about the changes are included in the patch itself.

As you may know already I also created a PR focused on llvm bitcode emission on meson, it generates bitcode for all backend and contribution modules, currently under review by some colleagues at Percona: https://github.com/percona/postgres/pull/103
I'm curious if we should get all or some of the generated backend sources compiled to bitcode, similar to contrib modules.
Please let me know your thoughts and how we can proceed to get this feature included, thank you.

Regards,
Diego Fronza
Percona

On Fri, Mar 7, 2025 at 7:52 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
Hi,

On Thu, 5 Sept 2024 at 12:24, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
>
> I found that Andres shared a patch
> (v17-0021-meson-Add-LLVM-bitcode-emission.patch) a while ago [1].

Andres and I continued to work on that. I think the patches are in
sharable state now and I wanted to hear opinions before proceeding
further. After applying the patches, bitcode files should be installed
into $pkglibdir/bitcode/ directory if the llvm is found.

There are 6 patches attached:

v1-0001-meson-Add-generated-header-stamps:

This patch is trivial. Instead of having targets depending directly on
the generated headers, have them depend on a stamp file. The benefit
of using a stamp file is that it makes ninja.build smaller and meson
setup faster.
----------

v1-0002-meson-Add-postgresql-extension.pc-for-building-extension-libraries:

This patch is for generating postgresql-extension.pc file which can be
used for building extensions libraries.

Normally, there is no need to use this .pc file for generating bitcode
files. However, since there is no clear way to get all include paths
for building bitcode files, this .pc file is later used for this
purpose (by running pkg-config --cflags-only-I
postgresql-extension-uninstalled.pc) [1].
----------

v1-0003-meson-Test-building-extensions-by-using-postgresql-extension.pc:
[Not needed for generating bitcode files]

This is a patch for testing if extensions can be built by using
postgresql-extension.pc. I added that commit as an example of using
postgresql-extension.pc to build extensions.
----------

v1-0004-meson-WIP-Add-docs-for-postgresql-extension.pc: [Not needed
for generating bitcode files]

I added this patch in case we recommend people to use
postgresql-extension.pc to build extension libraries. I am not sure if
we want to do that because there are still TODOs about
postgresql-extension.pc like running test suites. I just wanted to
show my plan, dividing 'Extension Building Infrastructure' into two,
'PGXS' and 'postgresql-extension.pc'.
----------

v1-0005-meson-Add-LLVM-bitcode-emission:

This patch adds required infrastructure to generate bitcode files and
uses postgresql-extension-uninstalled.pc to get include paths for
generating bitcode files [1].
----------

v1-0006-meson-Generate-bitcode-files-of-contrib-extension.patch:

This patch adds manually selected contrib libraries to generate their
bitcode files. These libraries are selected manually, depending on
- If they have SQL callable functions
- If the library functions are short enough (the performance gain from
bitcode files is too minimal compared to the function's run time, so
this type of libraries are omitted).

Any kind of feedback would be appreciated.

--
Regards,
Nazir Bilal Yavuz
Microsoft
Attachment

Re: meson vs. llvm bitcode files

From
Nazir Bilal Yavuz
Date:
Hi,

On Tue, 11 Mar 2025 at 01:04, Diego Fronza <diego.fronza@percona.com> wrote:
> I did a full review on the provided patches plus some tests, I was able to validate that the loading of bitcode
modulesis working also JIT works for both backend and contrib modules.
 

Thank you!

> To test JIT on contrib modules I just lowered the costs for all jit settings and used the intarray extension, using
thedata/test__int.data:
 
> CREATE EXTENSION intarray;
> CREATE TABLE test__int( a int[] );1
> \copy test__int from 'data/test__int.data'
>
> For queries any from line 98+ on contrib/intarray/sql/_int.sql will work.
>
> Then I added extra debug messages to llvmjit_inline.cpp on add_module_to_inline_search_path() function, also on
llvm_build_inline_plan(),I was able to see many functions in this module being successfully inlined.
 
>
> I'm attaching a new patch based on your original work which add further support for generating bitcode from:

Thanks for doing that!

>  - Generated backend sources: processed by flex, bison, etc.
>  - Generated contrib module sources,

I think we do not need to separate these two.

   foreach srcfile : bitcode_module['srcfiles']
-    if meson.version().version_compare('>=0.59')
+    srcfilename = '@0@'.format(srcfile)
+    if srcfilename.startswith('<CustomTarget')
+      srcfilename = srcfile.full_path().split(meson.build_root() + '/')[1]
+    elif meson.version().version_compare('>=0.59')

Also, checking if the string starts with '<CustomTarget' is a bit
hacky, and 'srcfilename = '@0@'.format(srcfile)' causes a deprecation
warning. So, instead of this we can process all generated sources like
how generated backend sources are processed. I updated the patch with
that.

> On this patch I just included fmgrtab.c and src/backend/parser for the backend generated code.
> For contrib generated sources I added contrib/cube as an example.

I applied your contrib/cube example and did the same thing for the contrib/seg.

> All relevant details about the changes are included in the patch itself.
>
> As you may know already I also created a PR focused on llvm bitcode emission on meson, it generates bitcode for all
backendand contribution modules, currently under review by some colleagues at Percona:
https://github.com/percona/postgres/pull/103
> I'm curious if we should get all or some of the generated backend sources compiled to bitcode, similar to contrib
modules.

I think we can do this. I added other backend sources like you did in
the PR but attached it as another patch (0007) because I wanted to
hear other people's opinions on that first.

v3 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

Re: meson vs. llvm bitcode files

From
Diego Fronza
Date:
Hi,

The v7 patch looks good to me, handling the bitcode modules in a uniform way and also avoiding the hacky code and warnings, much better now.

A small note about the bitcode emission for generated sources in contrib, using cube as example, currently it creates two dict entries in a list:
bc_seg_gen_sources = [{'srcfiles': [seg_scan]}]
bc_seg_gen_sources += {'srcfiles': [seg_parse[0]]}

Then pass it to the bitcode_modules:
bitcode_modules += {
  ...
  'gen_srcfiles': bc_seg_gen_sources,
}

It could be passed as a list with a single dict, since both generated sources share the same compilation flags:
bitcode_modules += {
  ...
  'gen_srcfiles': [
    {  'srcfiles': [cube_scan, cube_parse[0]] }.
  ]
}

Both approaches work, the first one has the advantage of being able to pass separate additional_flags per generated source.

Thanks for your reply Nazir, also waiting for more opinions on this.

Regards,
Diego

On Wed, Mar 12, 2025 at 7:27 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
Hi,

On Tue, 11 Mar 2025 at 01:04, Diego Fronza <diego.fronza@percona.com> wrote:
> I did a full review on the provided patches plus some tests, I was able to validate that the loading of bitcode modules is working also JIT works for both backend and contrib modules.

Thank you!

> To test JIT on contrib modules I just lowered the costs for all jit settings and used the intarray extension, using the data/test__int.data:
> CREATE EXTENSION intarray;
> CREATE TABLE test__int( a int[] );1
> \copy test__int from 'data/test__int.data'
>
> For queries any from line 98+ on contrib/intarray/sql/_int.sql will work.
>
> Then I added extra debug messages to llvmjit_inline.cpp on add_module_to_inline_search_path() function, also on llvm_build_inline_plan(), I was able to see many functions in this module being successfully inlined.
>
> I'm attaching a new patch based on your original work which add further support for generating bitcode from:

Thanks for doing that!

>  - Generated backend sources: processed by flex, bison, etc.
>  - Generated contrib module sources,

I think we do not need to separate these two.

   foreach srcfile : bitcode_module['srcfiles']
-    if meson.version().version_compare('>=0.59')
+    srcfilename = '@0@'.format(srcfile)
+    if srcfilename.startswith('<CustomTarget')
+      srcfilename = srcfile.full_path().split(meson.build_root() + '/')[1]
+    elif meson.version().version_compare('>=0.59')

Also, checking if the string starts with '<CustomTarget' is a bit
hacky, and 'srcfilename = '@0@'.format(srcfile)' causes a deprecation
warning. So, instead of this we can process all generated sources like
how generated backend sources are processed. I updated the patch with
that.

> On this patch I just included fmgrtab.c and src/backend/parser for the backend generated code.
> For contrib generated sources I added contrib/cube as an example.

I applied your contrib/cube example and did the same thing for the contrib/seg.

> All relevant details about the changes are included in the patch itself.
>
> As you may know already I also created a PR focused on llvm bitcode emission on meson, it generates bitcode for all backend and contribution modules, currently under review by some colleagues at Percona: https://github.com/percona/postgres/pull/103
> I'm curious if we should get all or some of the generated backend sources compiled to bitcode, similar to contrib modules.

I think we can do this. I added other backend sources like you did in
the PR but attached it as another patch (0007) because I wanted to
hear other people's opinions on that first.

v3 is attached.

--
Regards,
Nazir Bilal Yavuz
Microsoft

Re: meson vs. llvm bitcode files

From
Nazir Bilal Yavuz
Date:
Hi,

On Wed, 12 Mar 2025 at 16:39, Diego Fronza <diego.fronza@percona.com> wrote:
>
> Hi,
>
> The v7 patch looks good to me, handling the bitcode modules in a uniform way and also avoiding the hacky code and
warnings,much better now.
 
>
> A small note about the bitcode emission for generated sources in contrib, using cube as example, currently it creates
twodict entries in a list:
 
> bc_seg_gen_sources = [{'srcfiles': [seg_scan]}]
> bc_seg_gen_sources += {'srcfiles': [seg_parse[0]]}
>
> Then pass it to the bitcode_modules:
> bitcode_modules += {
>   ...
>   'gen_srcfiles': bc_seg_gen_sources,
> }
>
> It could be passed as a list with a single dict, since both generated sources share the same compilation flags:
> bitcode_modules += {
>   ...
>   'gen_srcfiles': [
>     {  'srcfiles': [cube_scan, cube_parse[0]] }.
>   ]
> }
>
> Both approaches work, the first one has the advantage of being able to pass separate additional_flags per generated
source.

I liked the current approach as it makes bitcode_modules easier to
understand but both approaches work for me as well.

One thing I noticed is that gen_srcfiles['srcfiles'] seems wrong.
gen_sources is a better name compared to gen_srcfiles. So, I changed
it to gen_sources in v4.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

Re: meson vs. llvm bitcode files

From
Nazir Bilal Yavuz
Date:
Hi,

On Thu, 13 Mar 2025 at 13:11, Nazir Bilal Yavuz <byavuz81@gmail.com> wrote:
> One thing I noticed is that gen_srcfiles['srcfiles'] seems wrong.
> gen_sources is a better name compared to gen_srcfiles. So, I changed
> it to gen_sources in v4.

Rebase is needed due to b1720fe63f, v5 is attached.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft

Attachment