librelist archives

« back to archive

automatic differentiation & gpu codegen for pure numpy

automatic differentiation & gpu codegen for pure numpy

From:
James Bergstra
Date:
2012-06-05 @ 15:46
Hello list,

I recently sent the following as an RFC PR to numba, and Travis
suggested that this was the right place to discuss it

https://github.com/ContinuumIO/numba/pull/6

After a few more days of thinking about things, I more convinced that
this technique is a great way to do what Theano was trying to do, and
so I'm wondering who else is pushing this project forward and where
is it going?

- James

The RFC text was:

This RFC pull request contains a working small example of how to apply
Theano to numpy code (see tests/test_ad.py) and it has no interaction
with the LLVM code (Theano builds a higher-level AST for now). The
goals of using a Theano backend and an LLVM backend are related in
wanting to speed up numpy code, and in the technique of emulating the
VM to do JIT specialization.

I was never happy with Theano's interface. I really wanted to
non-invasively spy on what numpy was doing (what it was allocating,
what expressions produced what results, etc.) but without actually
hijacking the interpreter itself I couldn't get that information.
Consequently Theano has the feel of an additional "framework" that a
programmer has to think about and buy into. That's a shame because
Numpy + Python is a perfectly expressive combination -- we just wanted
to use Theano's optimizations and code generation to speed it up (and
use GPU). Numba's bytecode parsing trick blew my mind when I looked at
it, how easy it was to implement the CPython VM in Python. With that
technique I think it's finally possible to give Theano the right API
so that it's services (autodiff, GPU, codegen, etc.) can be expressed
in terms of raw numpy code.

Do you think there's room in one project for all of these features?
The trouble with VM-hacking is that it would be really nice if there
were just one level of VM emulation going on at a time, and the
various libraries that were using the technique could at least
register themselves as listeners for a single bytecode parser.

Another thing to coordinate on is how numba was planning on dealing
with array expressions. Why does numba focus on scalar code? It seems
like if numba were to encounter vectorized code, then it would start
doing pretty much what Theano currently does (at least in a high level
sense), and possibly have a huge overlap with what @markflorisson88 is
working on this summer for Cython (and possibly with what a previous
GSoC student implemented for SymPy -- the fortran-code generation
stuff). I think it would be great if Theano acted on LLVM-IR instead
of its own internal AST data structure, and now that libndarray has a
C interface, I've been hoping that somehow it would be possible to
make LLVM-IR look not-too-different from a theano AST. If that were
the case then numba, Theano, and Cython would all be tackling nearly
the same problem with nearly the same tools, for very similar
applications. These issues have come up in conversation with
@markflorisson88, @npinto, @dwf, @dagss, so I'm bringing them in here
too.

Re: [numba] automatic differentiation & gpu codegen for pure numpy

From:
federico vaggi
Date:
2012-06-07 @ 08:01
Hello,

sorry to jump into the conversation, but when I saw the discussion of using
numba to do automatic differentiation, I immediately thought of the Python
uncertainties package:

http://packages.python.org/uncertainties/

Which handles error propagation using automatic differentiation.

I am forwarding this conversation to the author of the module, who has been
exceedingly helpful in the past, and expressed some interest in extending
the uncertainties package later this year when he had some time.

This is potentially a really cool application - and if there is some
overlap between the two projects, it seems redundant to duplicate work.

Federico

On Tue, Jun 5, 2012 at 5:46 PM, James Bergstra <james.bergstra@gmail.com>wrote:

> Hello list,
>
> I recently sent the following as an RFC PR to numba, and Travis
> suggested that this was the right place to discuss it
>
> https://github.com/ContinuumIO/numba/pull/6
>
> After a few more days of thinking about things, I more convinced that
> this technique is a great way to do what Theano was trying to do, and
> so I'm wondering who else is pushing this project forward and where
> is it going?
>
> - James
>
> The RFC text was:
>
> This RFC pull request contains a working small example of how to apply
> Theano to numpy code (see tests/test_ad.py) and it has no interaction
> with the LLVM code (Theano builds a higher-level AST for now). The
> goals of using a Theano backend and an LLVM backend are related in
> wanting to speed up numpy code, and in the technique of emulating the
> VM to do JIT specialization.
>
> I was never happy with Theano's interface. I really wanted to
> non-invasively spy on what numpy was doing (what it was allocating,
> what expressions produced what results, etc.) but without actually
> hijacking the interpreter itself I couldn't get that information.
> Consequently Theano has the feel of an additional "framework" that a
> programmer has to think about and buy into. That's a shame because
> Numpy + Python is a perfectly expressive combination -- we just wanted
> to use Theano's optimizations and code generation to speed it up (and
> use GPU). Numba's bytecode parsing trick blew my mind when I looked at
> it, how easy it was to implement the CPython VM in Python. With that
> technique I think it's finally possible to give Theano the right API
> so that it's services (autodiff, GPU, codegen, etc.) can be expressed
> in terms of raw numpy code.
>
> Do you think there's room in one project for all of these features?
> The trouble with VM-hacking is that it would be really nice if there
> were just one level of VM emulation going on at a time, and the
> various libraries that were using the technique could at least
> register themselves as listeners for a single bytecode parser.
>
> Another thing to coordinate on is how numba was planning on dealing
> with array expressions. Why does numba focus on scalar code? It seems
> like if numba were to encounter vectorized code, then it would start
> doing pretty much what Theano currently does (at least in a high level
> sense), and possibly have a huge overlap with what @markflorisson88 is
> working on this summer for Cython (and possibly with what a previous
> GSoC student implemented for SymPy -- the fortran-code generation
> stuff). I think it would be great if Theano acted on LLVM-IR instead
> of its own internal AST data structure, and now that libndarray has a
> C interface, I've been hoping that somehow it would be possible to
> make LLVM-IR look not-too-different from a theano AST. If that were
> the case then numba, Theano, and Cython would all be tackling nearly
> the same problem with nearly the same tools, for very similar
> applications. These issues have come up in conversation with
> @markflorisson88, @npinto, @dwf, @dagss, so I'm bringing them in here
> too.
>

Re: [numba] automatic differentiation & gpu codegen for pure numpy

From:
Jon Riehl
Date:
2012-06-05 @ 23:09
Hi James,

On Tue, Jun 5, 2012 at 10:46 AM, James Bergstra
<james.bergstra@gmail.com> wrote:
> After a few more days of thinking about things, I more convinced that
> this technique is a great way to do what Theano was trying to do, and
> so I'm wondering who else is pushing this project forward and where
> is it going?

Travis brought me on board for my previous experience with compiler
engineering, and more specifically building Python
compilers/translators.  Many many moons ago I wrote a Python to C
translator called PyFront that was never released to the public.  I'm
interested in reviving those efforts, and in many respects Numba adds
type information that I really wanted to have back then.

We're currently targeting a subset of Python that is type aware at the
level of scalars and ndarrays, and specializes code accordingly.  In
the future, I'd like to generalize the ndarray code generation stuff
to handle extension types given some declaration object/language
(maybe similar or identical to PyRex/Cython's cdef classes).  We're
also interested in emitting loops that use OpenMP or similar to
achieve parallelization.  If we get a fast path to GPU code generation
via Theano, that'd rock as well.

> This RFC pull request contains a working small example of how to apply
> Theano to numpy code (see tests/test_ad.py) and it has no interaction
> with the LLVM code (Theano builds a higher-level AST for now). The
> goals of using a Theano backend and an LLVM backend are related in
> wanting to speed up numpy code, and in the technique of emulating the
> VM to do JIT specialization.

I wish I knew more about AD.  I sat in on a Scheme AD talk (Siskind
was discussing his Stalin compiler, IIRC) a while back, but it'd take
me a bit to page that back in.

> I was never happy with Theano's interface. I really wanted to
> non-invasively spy on what numpy was doing (what it was allocating,
> what expressions produced what results, etc.) but without actually
> hijacking the interpreter itself I couldn't get that information.
> Consequently Theano has the feel of an additional "framework" that a
> programmer has to think about and buy into. That's a shame because
> Numpy + Python is a perfectly expressive combination -- we just wanted
> to use Theano's optimizations and code generation to speed it up (and
> use GPU). Numba's bytecode parsing trick blew my mind when I looked at
> it, how easy it was to implement the CPython VM in Python. With that
> technique I think it's finally possible to give Theano the right API
> so that it's services (autodiff, GPU, codegen, etc.) can be expressed
> in terms of raw numpy code.

I'd like to quickly mention that the first time I encountered doing
symbolic execution of Python VM bytecode was in the PyPy project.
PyFront did symbolic execution as well, but had already eliminated
control flow bytecodes at that point.

> Do you think there's room in one project for all of these features?
> The trouble with VM-hacking is that it would be really nice if there
> were just one level of VM emulation going on at a time, and the
> various libraries that were using the technique could at least
> register themselves as listeners for a single bytecode parser.

Again, this sounds like part of the PyPy project's goals, since it has
modular listeners for doing symbolic execution of bytecode.  We can
certainly build a competing API for doing this as well if there is a
demand for that capability.  I know the sheer size and scope of PyPy
makes it seem less accessible.

> Another thing to coordinate on is how numba was planning on dealing
> with array expressions. Why does numba focus on scalar code? It seems
> like if numba were to encounter vectorized code, then it would start
> doing pretty much what Theano currently does (at least in a high level
> sense), and possibly have a huge overlap with what @markflorisson88 is
> working on this summer for Cython (and possibly with what a previous
> GSoC student implemented for SymPy -- the fortran-code generation
> stuff). I think it would be great if Theano acted on LLVM-IR instead
> of its own internal AST data structure, and now that libndarray has a
> C interface, I've been hoping that somehow it would be possible to
> make LLVM-IR look not-too-different from a theano AST. If that were
> the case then numba, Theano, and Cython would all be tackling nearly
> the same problem with nearly the same tools, for very similar
> applications. These issues have come up in conversation with
> @markflorisson88, @npinto, @dwf, @dagss, so I'm bringing them in here
> too.

We've focused on the scalar code up to now since it is lower hanging
fruit (though the test in .../tests/test_vectorize.py shows how we can
use this code to make ufuncs that easily vectorize).  As I mentioned
above, we're adding support for Numpy arrays as the next step.  The
LLVM IR is growing on me as a reasonable representation, though I'm
not sure how one could observe Numpy-specific API calls in the
presence of inlining (much less parse "{ i32, i32*, i8*, i32, i32*,
i32*, i8*, i8*, i32, i8*, i8*, i8*, i32* }*" as "PyArrayObject *").
Certainly something for further discussion.

Thanks,
-Jon

Re: [numba] automatic differentiation & gpu codegen for pure numpy

From:
James Bergstra
Date:
2012-06-07 @ 14:16
On Tue, Jun 5, 2012 at 7:09 PM, Jon Riehl <jon.riehl@gmail.com> wrote:
> Hi James,
>
> On Tue, Jun 5, 2012 at 10:46 AM, James Bergstra
> <james.bergstra@gmail.com> wrote:
>> After a few more days of thinking about things, I more convinced that
>> this technique is a great way to do what Theano was trying to do, and
>> so I'm wondering who else is pushing this project forward and where
>> is it going?
>
> Travis brought me on board for my previous experience with compiler
> engineering, and more specifically building Python
> compilers/translators.  Many many moons ago I wrote a Python to C
> translator called PyFront that was never released to the public.  I'm
> interested in reviving those efforts, and in many respects Numba adds
> type information that I really wanted to have back then.
>
> We're currently targeting a subset of Python that is type aware at the
> level of scalars and ndarrays, and specializes code accordingly.  In
> the future, I'd like to generalize the ndarray code generation stuff
> to handle extension types given some declaration object/language
> (maybe similar or identical to PyRex/Cython's cdef classes).  We're
> also interested in emitting loops that use OpenMP or similar to
> achieve parallelization.  If we get a fast path to GPU code generation
> via Theano, that'd rock as well.
>
>> This RFC pull request contains a working small example of how to apply
>> Theano to numpy code (see tests/test_ad.py) and it has no interaction
>> with the LLVM code (Theano builds a higher-level AST for now). The
>> goals of using a Theano backend and an LLVM backend are related in
>> wanting to speed up numpy code, and in the technique of emulating the
>> VM to do JIT specialization.
>
> I wish I knew more about AD.  I sat in on a Scheme AD talk (Siskind
> was discussing his Stalin compiler, IIRC) a while back, but it'd take
> me a bit to page that back in.
>
>> I was never happy with Theano's interface. I really wanted to
>> non-invasively spy on what numpy was doing (what it was allocating,
>> what expressions produced what results, etc.) but without actually
>> hijacking the interpreter itself I couldn't get that information.
>> Consequently Theano has the feel of an additional "framework" that a
>> programmer has to think about and buy into. That's a shame because
>> Numpy + Python is a perfectly expressive combination -- we just wanted
>> to use Theano's optimizations and code generation to speed it up (and
>> use GPU). Numba's bytecode parsing trick blew my mind when I looked at
>> it, how easy it was to implement the CPython VM in Python. With that
>> technique I think it's finally possible to give Theano the right API
>> so that it's services (autodiff, GPU, codegen, etc.) can be expressed
>> in terms of raw numpy code.
>
> I'd like to quickly mention that the first time I encountered doing
> symbolic execution of Python VM bytecode was in the PyPy project.
> PyFront did symbolic execution as well, but had already eliminated
> control flow bytecodes at that point.
>
>> Do you think there's room in one project for all of these features?
>> The trouble with VM-hacking is that it would be really nice if there
>> were just one level of VM emulation going on at a time, and the
>> various libraries that were using the technique could at least
>> register themselves as listeners for a single bytecode parser.
>
> Again, this sounds like part of the PyPy project's goals, since it has
> modular listeners for doing symbolic execution of bytecode.  We can
> certainly build a competing API for doing this as well if there is a
> demand for that capability.  I know the sheer size and scope of PyPy
> makes it seem less accessible.
>
>> Another thing to coordinate on is how numba was planning on dealing
>> with array expressions. Why does numba focus on scalar code? It seems
>> like if numba were to encounter vectorized code, then it would start
>> doing pretty much what Theano currently does (at least in a high level
>> sense), and possibly have a huge overlap with what @markflorisson88 is
>> working on this summer for Cython (and possibly with what a previous
>> GSoC student implemented for SymPy -- the fortran-code generation
>> stuff). I think it would be great if Theano acted on LLVM-IR instead
>> of its own internal AST data structure, and now that libndarray has a
>> C interface, I've been hoping that somehow it would be possible to
>> make LLVM-IR look not-too-different from a theano AST. If that were
>> the case then numba, Theano, and Cython would all be tackling nearly
>> the same problem with nearly the same tools, for very similar
>> applications. These issues have come up in conversation with
>> @markflorisson88, @npinto, @dwf, @dagss, so I'm bringing them in here
>> too.
>
> We've focused on the scalar code up to now since it is lower hanging
> fruit (though the test in .../tests/test_vectorize.py shows how we can
> use this code to make ufuncs that easily vectorize).  As I mentioned
> above, we're adding support for Numpy arrays as the next step.  The
> LLVM IR is growing on me as a reasonable representation, though I'm
> not sure how one could observe Numpy-specific API calls in the
> presence of inlining (much less parse "{ i32, i32*, i8*, i32, i32*,
> i32*, i8*, i8*, i32, i8*, i8*, i8*, i32* }*" as "PyArrayObject *").
> Certainly something for further discussion.
>
> Thanks,
> -Jon

Hi Jon, nice to meet you!

The connection with PyPy is interesting. I'm afraid I know next to
nothing about PyPy, but I'm wondering about how much overlap there is
between what I was trying to do with the byte-code VM emulator and
what PyPy is accomplishing in a much deeper and more developed way.
You're right that the PyPy project seems opaque and daunting at first
glance... at least to me. If you think we should use / re-implement
things from there that sounds great.

As you move toward your next step of supporting numpy arrays, would
you be open to some pull requests over the next little while that
build up an interface wish list? I don't know what your application
targets for numba are, but in the world of machine learning I think
numba could make a big difference by supporting algorithms that are
otherwise awkward to use. Speaking from my own experience, the list
includes:

* automatic differentiation
* code (code-path) specialization for GPU or remote evaluation by e.g. iPython
* numeric stability and uncertainty tracking (nice application Federico!)
* numeric / combinatorial optimization
* inference in probabilistic programming

If numba exposes a symbolic (I admit it: theano-like :/)
representation of functional relationships between inputs,
intermediate values, and results, then it becomes syntactically easy
to apply algorithms for these tasks to functions described in terms of
normal numpy computations, which are easy to write and debug, pretty
fast, widely known by programmers, and which cover many application
domains. This representation is, I think, the same one that one would
need for lazy evaluation, so hopefully this is in line with what Mark
had in mind.

- James

Re: [numba] automatic differentiation & gpu codegen for pure numpy

From:
mark florisson
Date:
2012-06-06 @ 09:00
On 6 June 2012 00:09, Jon Riehl <jon.riehl@gmail.com> wrote:
> Hi James,
>
> On Tue, Jun 5, 2012 at 10:46 AM, James Bergstra
> <james.bergstra@gmail.com> wrote:
>> After a few more days of thinking about things, I more convinced that
>> this technique is a great way to do what Theano was trying to do, and
>> so I'm wondering who else is pushing this project forward and where
>> is it going?
>
> Travis brought me on board for my previous experience with compiler
> engineering, and more specifically building Python
> compilers/translators.  Many many moons ago I wrote a Python to C
> translator called PyFront that was never released to the public.  I'm
> interested in reviving those efforts, and in many respects Numba adds
> type information that I really wanted to have back then.
>
> We're currently targeting a subset of Python that is type aware at the
> level of scalars and ndarrays, and specializes code accordingly.  In
> the future, I'd like to generalize the ndarray code generation stuff
> to handle extension types given some declaration object/language
> (maybe similar or identical to PyRex/Cython's cdef classes).  We're
> also interested in emitting loops that use OpenMP or similar to
> achieve parallelization.  If we get a fast path to GPU code generation
> via Theano, that'd rock as well.
>
>> This RFC pull request contains a working small example of how to apply
>> Theano to numpy code (see tests/test_ad.py) and it has no interaction
>> with the LLVM code (Theano builds a higher-level AST for now). The
>> goals of using a Theano backend and an LLVM backend are related in
>> wanting to speed up numpy code, and in the technique of emulating the
>> VM to do JIT specialization.
>
> I wish I knew more about AD.  I sat in on a Scheme AD talk (Siskind
> was discussing his Stalin compiler, IIRC) a while back, but it'd take
> me a bit to page that back in.
>
>> I was never happy with Theano's interface. I really wanted to
>> non-invasively spy on what numpy was doing (what it was allocating,
>> what expressions produced what results, etc.) but without actually
>> hijacking the interpreter itself I couldn't get that information.
>> Consequently Theano has the feel of an additional "framework" that a
>> programmer has to think about and buy into. That's a shame because
>> Numpy + Python is a perfectly expressive combination -- we just wanted
>> to use Theano's optimizations and code generation to speed it up (and
>> use GPU). Numba's bytecode parsing trick blew my mind when I looked at
>> it, how easy it was to implement the CPython VM in Python. With that
>> technique I think it's finally possible to give Theano the right API
>> so that it's services (autodiff, GPU, codegen, etc.) can be expressed
>> in terms of raw numpy code.
>
> I'd like to quickly mention that the first time I encountered doing
> symbolic execution of Python VM bytecode was in the PyPy project.
> PyFront did symbolic execution as well, but had already eliminated
> control flow bytecodes at that point.
>
>> Do you think there's room in one project for all of these features?
>> The trouble with VM-hacking is that it would be really nice if there
>> were just one level of VM emulation going on at a time, and the
>> various libraries that were using the technique could at least
>> register themselves as listeners for a single bytecode parser.
>
> Again, this sounds like part of the PyPy project's goals, since it has
> modular listeners for doing symbolic execution of bytecode.  We can
> certainly build a competing API for doing this as well if there is a
> demand for that capability.  I know the sheer size and scope of PyPy
> makes it seem less accessible.
>
>> Another thing to coordinate on is how numba was planning on dealing
>> with array expressions. Why does numba focus on scalar code? It seems
>> like if numba were to encounter vectorized code, then it would start
>> doing pretty much what Theano currently does (at least in a high level
>> sense), and possibly have a huge overlap with what @markflorisson88 is
>> working on this summer for Cython (and possibly with what a previous
>> GSoC student implemented for SymPy -- the fortran-code generation
>> stuff). I think it would be great if Theano acted on LLVM-IR instead
>> of its own internal AST data structure, and now that libndarray has a
>> C interface, I've been hoping that somehow it would be possible to
>> make LLVM-IR look not-too-different from a theano AST. If that were
>> the case then numba, Theano, and Cython would all be tackling nearly
>> the same problem with nearly the same tools, for very similar
>> applications. These issues have come up in conversation with
>> @markflorisson88, @npinto, @dwf, @dagss, so I'm bringing them in here
>> too.
>
> We've focused on the scalar code up to now since it is lower hanging
> fruit (though the test in .../tests/test_vectorize.py shows how we can
> use this code to make ufuncs that easily vectorize).  As I mentioned
> above, we're adding support for Numpy arrays as the next step.  The
> LLVM IR is growing on me as a reasonable representation, though I'm
> not sure how one could observe Numpy-specific API calls in the
> presence of inlining (much less parse "{ i32, i32*, i8*, i32, i32*,
> i32*, i8*, i8*, i32, i8*, i8*, i8*, i32* }*" as "PyArrayObject *").
> Certainly something for further discussion.

That would be rather tough, but maybe easier handled if the JIT would
specialize code blocks instead of entire functions (which also handles
branching). That would be somewhat more like psyco though... I really
believe this is all handled much better through lazy evaluation, where
the numpy runtime records the operations lazily, and you don't have to
figure out dtypes and numpy calls by looking at bytecode. This would
work great with numba functions, since a numba lazy hook for numpy
would recognize such ufuncs in vectorized form, and inline them (as
well as providing tiling code if needed, and possible additional SIMD
instructions after inlining).

> Thanks,
> -Jon