librelist archives

« back to archive

Common numba/Theano AST?

Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-18 @ 21:03
Hi,

What about the idea of having a common AST between numba and Theano?
This would allow numba to reuse many of Theano optimization.

Any interest? Problems? Theano currently hardcode the dtype, number of
dimensions and for each dimensions if they are broadcastable or not.

thanks

Fred

p.s. I posted to the new and the old mailing list to be sure to reach
everybody. Can you only reply in the new mailing list? To register:
https://groups.google.com/a/continuum.io/d/forum/numba-users

Re: [numba] Common numba/Theano AST?

From:
Travis Oliphant
Date:
2012-07-18 @ 21:15
On Jul 18, 2012, at 4:03 PM, Frédéric Bastien wrote:

> Hi,
> 
> What about the idea of having a common AST between numba and Theano?
> This would allow numba to reuse many of Theano optimization.
> 
> Any interest? Problems? Theano currently hardcode the dtype, number of
> dimensions and for each dimensions if they are broadcastable or not.
> 

Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag). 

-Travis



> thanks
> 
> Fred
> 
> p.s. I posted to the new and the old mailing list to be sure to reach
> everybody. Can you only reply in the new mailing list? To register:
> https://groups.google.com/a/continuum.io/d/forum/numba-users

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-18 @ 22:27
Hi,

The user define symbolic variable. Then he do some computation on
them. Here is a small simple:

a = theano.tensor.matrix()
b = theano.tensor.matrix()
c=a+b

in this case, a and b are matrix and we expect that they will have the
same shape, without any broadcasting.

a = theano.tensor.row()
b = theano.tensor.matrix()
c=a+b

Here, a is a 2d ndarray, with shape (1, N).

So currently in Theano, we force the user to hard code when there will
be broadcasting and when it won't happen.

I'm not sure what would be the consequence if we relax this constrain
to allow matrix to be broadcasted dynamically.: to allow "matrix() +
matrix()" to to behave as numpy(broadcast based on the shape at
execution time). In the simple case of element-wise function, I
suppose there won't be big consequence. But I'm not sure how far we
can push this change.

Fred

On Wed, Jul 18, 2012 at 5:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>
> On Jul 18, 2012, at 4:03 PM, Frédéric Bastien wrote:
>
>> Hi,
>>
>> What about the idea of having a common AST between numba and Theano?
>> This would allow numba to reuse many of Theano optimization.
>>
>> Any interest? Problems? Theano currently hardcode the dtype, number of
>> dimensions and for each dimensions if they are broadcastable or not.
>>
>
> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>
> -Travis
>
>
>
>> thanks
>>
>> Fred
>>
>> p.s. I posted to the new and the old mailing list to be sure to reach
>> everybody. Can you only reply in the new mailing list? To register:
>> https://groups.google.com/a/continuum.io/d/forum/numba-users
>

Re: [numba] Common numba/Theano AST?

From:
James Bergstra
Date:
2012-07-19 @ 01:29
Will anyone be around at the sprints to talk about this stuff? I'll be
coming only for Friday and Saturday, but I'd love to get a sense of where
numba is, and where it's going.
On Jul 18, 2012 3:28 PM, "Frédéric Bastien" <nouiz@nouiz.org> wrote:

> Hi,
>
> The user define symbolic variable. Then he do some computation on
> them. Here is a small simple:
>
> a = theano.tensor.matrix()
> b = theano.tensor.matrix()
> c=a+b
>
> in this case, a and b are matrix and we expect that they will have the
> same shape, without any broadcasting.
>
> a = theano.tensor.row()
> b = theano.tensor.matrix()
> c=a+b
>
> Here, a is a 2d ndarray, with shape (1, N).
>
> So currently in Theano, we force the user to hard code when there will
> be broadcasting and when it won't happen.
>
> I'm not sure what would be the consequence if we relax this constrain
> to allow matrix to be broadcasted dynamically.: to allow "matrix() +
> matrix()" to to behave as numpy(broadcast based on the shape at
> execution time). In the simple case of element-wise function, I
> suppose there won't be big consequence. But I'm not sure how far we
> can push this change.
>
> Fred
>
> On Wed, Jul 18, 2012 at 5:15 PM, Travis Oliphant <travis@continuum.io>
> wrote:
> >
> > On Jul 18, 2012, at 4:03 PM, Frédéric Bastien wrote:
> >
> >> Hi,
> >>
> >> What about the idea of having a common AST between numba and Theano?
> >> This would allow numba to reuse many of Theano optimization.
> >>
> >> Any interest? Problems? Theano currently hardcode the dtype, number of
> >> dimensions and for each dimensions if they are broadcastable or not.
> >>
> >
> > Curious, what makes a dimension not broadcastable (or is that a
> user-defined tag).
> >
> > -Travis
> >
> >
> >
> >> thanks
> >>
> >> Fred
> >>
> >> p.s. I posted to the new and the old mailing list to be sure to reach
> >> everybody. Can you only reply in the new mailing list? To register:
> >> https://groups.google.com/a/continuum.io/d/forum/numba-users
> >
>

Re: [numba] Common numba/Theano AST?

From:
Travis Oliphant
Date:
2012-07-19 @ 03:15
On Jul 18, 2012, at 8:29 PM, James Bergstra wrote:

> Will anyone be around at the sprints to talk about this stuff? I'll be 
coming only for Friday and Saturday, but I'd love to get a sense of where 
numba is, and where it's going.
> 
> 

I will be there on Friday for some of the day, but Mark F. would be a 
better person to talk with you about this.   I'm happy to discuss things 
though. 

-Travis




> On Jul 18, 2012 3:28 PM, "Frédéric Bastien" <nouiz@nouiz.org> wrote:
> Hi,
> 
> The user define symbolic variable. Then he do some computation on
> them. Here is a small simple:
> 
> a = theano.tensor.matrix()
> b = theano.tensor.matrix()
> c=a+b
> 
> in this case, a and b are matrix and we expect that they will have the
> same shape, without any broadcasting.
> 
> a = theano.tensor.row()
> b = theano.tensor.matrix()
> c=a+b
> 
> Here, a is a 2d ndarray, with shape (1, N).
> 
> So currently in Theano, we force the user to hard code when there will
> be broadcasting and when it won't happen.
> 
> I'm not sure what would be the consequence if we relax this constrain
> to allow matrix to be broadcasted dynamically.: to allow "matrix() +
> matrix()" to to behave as numpy(broadcast based on the shape at
> execution time). In the simple case of element-wise function, I
> suppose there won't be big consequence. But I'm not sure how far we
> can push this change.
> 
> Fred
> 
> On Wed, Jul 18, 2012 at 5:15 PM, Travis Oliphant <travis@continuum.io> wrote:
> >
> > On Jul 18, 2012, at 4:03 PM, Frédéric Bastien wrote:
> >
> >> Hi,
> >>
> >> What about the idea of having a common AST between numba and Theano?
> >> This would allow numba to reuse many of Theano optimization.
> >>
> >> Any interest? Problems? Theano currently hardcode the dtype, number of
> >> dimensions and for each dimensions if they are broadcastable or not.
> >>
> >
> > Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
> >
> > -Travis
> >
> >
> >
> >> thanks
> >>
> >> Fred
> >>
> >> p.s. I posted to the new and the old mailing list to be sure to reach
> >> everybody. Can you only reply in the new mailing list? To register:
> >> https://groups.google.com/a/continuum.io/d/forum/numba-users
> >

Re: [numba] Common numba/Theano AST?

From:
David Warde-Farley
Date:
2012-07-19 @ 03:58
On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:

> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).

Basically, a broadcastable dimension is one that is guaranteed to have
length 1. Everything else is "non-broadcastable". This means that,
e.g. an (M, N) input added to another input that *just happens* to be
(1, N) or (M, 1) at runtime (but without the approriate broadcastable
flags specified at compile time) will result in a shape error.

This is, by default, the only shape information that Theano knows
about. Note that subtensor operations and reshapes and whatnot can set
the broadcastable flag on a given dimension implicitly.

I'm not sure why Theano was designed this way exactly, but I assume it
is so that the code path necessary for implementation the broadcast is
only generated when the broadcasting is known to be needed a priori.
One could just always generate this code path, or (in a JIT context)
generate it if it's hit at least once, or something like that.

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-19 @ 04:09
On Wed, Jul 18, 2012 at 11:58 PM, David Warde-Farley
<d.warde.farley@gmail.com> wrote:
> On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>
>> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>
> Basically, a broadcastable dimension is one that is guaranteed to have
> length 1. Everything else is "non-broadcastable". This means that,
> e.g. an (M, N) input added to another input that *just happens* to be
> (1, N) or (M, 1) at runtime (but without the approriate broadcastable
> flags specified at compile time) will result in a shape error.
>
> This is, by default, the only shape information that Theano knows
> about. Note that subtensor operations and reshapes and whatnot can set
> the broadcastable flag on a given dimension implicitly.
>
> I'm not sure why Theano was designed this way exactly, but I assume it
> is so that the code path necessary for implementation the broadcast is
> only generated when the broadcasting is known to be needed a priori.
> One could just always generate this code path, or (in a JIT context)
> generate it if it's hit at least once, or something like that.

For the elemwise code, if you create a new ptr variable at each loop,
you can just set the stride to 0 for broadcatable dimensions. This is
what the gpu code do to limit the number of code writed. I don't
remember how our c code do, but it is not an hard change to do.

But, the next change in our elemwise will probably be to use minivect
from Marc F. This is done by cython, it work in parallel and do more
optimization related to strides then we do. He made it in such a way
that it can be reused outside of cython. In this case, the only change
to do is in our we check for error in the shape of inputs. In facts,
doing this will lower the number of c code we compile and probably
won't be slower or not by much in the worst case I think.

Fred

Re: [numba] Common numba/Theano AST?

From:
mark florisson
Date:
2012-07-19 @ 18:25
On 19 July 2012 05:09, Frédéric Bastien <nouiz@nouiz.org> wrote:
> On Wed, Jul 18, 2012 at 11:58 PM, David Warde-Farley
> <d.warde.farley@gmail.com> wrote:
>> On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>>
>>> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>>
>> Basically, a broadcastable dimension is one that is guaranteed to have
>> length 1. Everything else is "non-broadcastable". This means that,
>> e.g. an (M, N) input added to another input that *just happens* to be
>> (1, N) or (M, 1) at runtime (but without the approriate broadcastable
>> flags specified at compile time) will result in a shape error.
>>
>> This is, by default, the only shape information that Theano knows
>> about. Note that subtensor operations and reshapes and whatnot can set
>> the broadcastable flag on a given dimension implicitly.
>>
>> I'm not sure why Theano was designed this way exactly, but I assume it
>> is so that the code path necessary for implementation the broadcast is
>> only generated when the broadcasting is known to be needed a priori.
>> One could just always generate this code path, or (in a JIT context)
>> generate it if it's hit at least once, or something like that.
>
> For the elemwise code, if you create a new ptr variable at each loop,
> you can just set the stride to 0 for broadcatable dimensions. This is
> what the gpu code do to limit the number of code writed. I don't
> remember how our c code do, but it is not an hard change to do.
>
> But, the next change in our elemwise will probably be to use minivect
> from Marc F. This is done by cython, it work in parallel and do more
> optimization related to strides then we do. He made it in such a way
> that it can be reused outside of cython. In this case, the only change
> to do is in our we check for error in the shape of inputs. In facts,
> doing this will lower the number of c code we compile and probably
> won't be slower or not by much in the worst case I think.

I plan to support an LLVM backend as well, which can hopefully reduce
this runtime compilation overhead.

> Fred

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-20 @ 04:21
On Thu, Jul 19, 2012 at 2:25 PM, mark florisson
<markflorisson88@gmail.com> wrote:
> On 19 July 2012 05:09, Frédéric Bastien <nouiz@nouiz.org> wrote:
>> On Wed, Jul 18, 2012 at 11:58 PM, David Warde-Farley
>> <d.warde.farley@gmail.com> wrote:
>>> On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>>>
>>>> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>>>
>>> Basically, a broadcastable dimension is one that is guaranteed to have
>>> length 1. Everything else is "non-broadcastable". This means that,
>>> e.g. an (M, N) input added to another input that *just happens* to be
>>> (1, N) or (M, 1) at runtime (but without the approriate broadcastable
>>> flags specified at compile time) will result in a shape error.
>>>
>>> This is, by default, the only shape information that Theano knows
>>> about. Note that subtensor operations and reshapes and whatnot can set
>>> the broadcastable flag on a given dimension implicitly.
>>>
>>> I'm not sure why Theano was designed this way exactly, but I assume it
>>> is so that the code path necessary for implementation the broadcast is
>>> only generated when the broadcasting is known to be needed a priori.
>>> One could just always generate this code path, or (in a JIT context)
>>> generate it if it's hit at least once, or something like that.
>>
>> For the elemwise code, if you create a new ptr variable at each loop,
>> you can just set the stride to 0 for broadcatable dimensions. This is
>> what the gpu code do to limit the number of code writed. I don't
>> remember how our c code do, but it is not an hard change to do.
>>
>> But, the next change in our elemwise will probably be to use minivect
>> from Marc F. This is done by cython, it work in parallel and do more
>> optimization related to strides then we do. He made it in such a way
>> that it can be reused outside of cython. In this case, the only change
>> to do is in our we check for error in the shape of inputs. In facts,
>> doing this will lower the number of c code we compile and probably
>> won't be slower or not by much in the worst case I think.
>
> I plan to support an LLVM backend as well, which can hopefully reduce
> this runtime compilation overhead.

I was thinking about the number of compilation unit. This is a current
problem with theano buildbot that have ~8500 different module. This
take time to index what is in the cache(this could be optimized), but
this also make the virtual memory used higher. I also think this cause
the high memory use of Theano in our tests. This is a problem on
windows as most of the time this is run on a 32 bits version of python
that limit the memory usable per the process.

Do you know if using llvm compiled unit take less memory/virtual
memory space then normal c extension?

Fred

Re: [numba] Common numba/Theano AST?

From:
mark florisson
Date:
2012-07-19 @ 18:21
On 19 July 2012 04:58, David Warde-Farley <d.warde.farley@gmail.com> wrote:
> On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>
>> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>
> Basically, a broadcastable dimension is one that is guaranteed to have
> length 1. Everything else is "non-broadcastable". This means that,
> e.g. an (M, N) input added to another input that *just happens* to be
> (1, N) or (M, 1) at runtime (but without the approriate broadcastable
> flags specified at compile time) will result in a shape error.
>
> This is, by default, the only shape information that Theano knows
> about. Note that subtensor operations and reshapes and whatnot can set
> the broadcastable flag on a given dimension implicitly.
>
> I'm not sure why Theano was designed this way exactly, but I assume it
> is so that the code path necessary for implementation the broadcast is
> only generated when the broadcasting is known to be needed a priori.
> One could just always generate this code path, or (in a JIT context)
> generate it if it's hit at least once, or something like that.

Broadcasting knowledge is very valuable to avoid possible expensive
recomputation of the broadcasting operands. Knowing exactly which
dimensions will broadcast  allows you to generate optimal code, see
e.g. http://tinyurl.com/cph64bm page 8 (which optimizes spread()
operations in Fortran 90).

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-20 @ 04:39
On Thu, Jul 19, 2012 at 2:21 PM, mark florisson
<markflorisson88@gmail.com> wrote:
> On 19 July 2012 04:58, David Warde-Farley <d.warde.farley@gmail.com> wrote:
>> On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>>
>>> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>>
>> Basically, a broadcastable dimension is one that is guaranteed to have
>> length 1. Everything else is "non-broadcastable". This means that,
>> e.g. an (M, N) input added to another input that *just happens* to be
>> (1, N) or (M, 1) at runtime (but without the approriate broadcastable
>> flags specified at compile time) will result in a shape error.
>>
>> This is, by default, the only shape information that Theano knows
>> about. Note that subtensor operations and reshapes and whatnot can set
>> the broadcastable flag on a given dimension implicitly.
>>
>> I'm not sure why Theano was designed this way exactly, but I assume it
>> is so that the code path necessary for implementation the broadcast is
>> only generated when the broadcasting is known to be needed a priori.
>> One could just always generate this code path, or (in a JIT context)
>> generate it if it's hit at least once, or something like that.
>
> Broadcasting knowledge is very valuable to avoid possible expensive
> recomputation of the broadcasting operands. Knowing exactly which
> dimensions will broadcast  allows you to generate optimal code, see
> e.g. http://tinyurl.com/cph64bm page 8 (which optimizes spread()
> operations in Fortran 90).

I didn't read it yet, but I'm sure you are right. I'm not sure what is
the best, for the AST to contain the broadcast information or not. If
we don't do that, we can probably still generation all optimized case
with the broadcasted dimensions when the number of input and the
number of dimensions is low. Otherwise it will ask to case to generate
and compile that won't be used.

There is inconvinience to forcing the broadcast information into the
graph. What about making it optional? I'm also talking with Mattew
Rolling from SymPy and this is something they have. We have something
for hint in a sandbox in Theano and will mayby use it more in the
futur. So what about moving the broadcast information there?

Also, what about the exact shape information? Do you use that into
minivect(I don't remember you telling this). We can see the shape
information as a special type of hint. This is handled completly
differently in Theano for now. Do you have ideas about this?

Fred

Re: [numba] Common numba/Theano AST?

From:
mark florisson
Date:
2012-07-20 @ 08:54
On 20 July 2012 05:39, Frédéric Bastien <nouiz@nouiz.org> wrote:
> On Thu, Jul 19, 2012 at 2:21 PM, mark florisson
> <markflorisson88@gmail.com> wrote:
>> On 19 July 2012 04:58, David Warde-Farley <d.warde.farley@gmail.com> wrote:
>>> On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>>>
>>>> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>>>
>>> Basically, a broadcastable dimension is one that is guaranteed to have
>>> length 1. Everything else is "non-broadcastable". This means that,
>>> e.g. an (M, N) input added to another input that *just happens* to be
>>> (1, N) or (M, 1) at runtime (but without the approriate broadcastable
>>> flags specified at compile time) will result in a shape error.
>>>
>>> This is, by default, the only shape information that Theano knows
>>> about. Note that subtensor operations and reshapes and whatnot can set
>>> the broadcastable flag on a given dimension implicitly.
>>>
>>> I'm not sure why Theano was designed this way exactly, but I assume it
>>> is so that the code path necessary for implementation the broadcast is
>>> only generated when the broadcasting is known to be needed a priori.
>>> One could just always generate this code path, or (in a JIT context)
>>> generate it if it's hit at least once, or something like that.
>>
>> Broadcasting knowledge is very valuable to avoid possible expensive
>> recomputation of the broadcasting operands. Knowing exactly which
>> dimensions will broadcast  allows you to generate optimal code, see
>> e.g. http://tinyurl.com/cph64bm page 8 (which optimizes spread()
>> operations in Fortran 90).
>
> I didn't read it yet, but I'm sure you are right. I'm not sure what is
> the best, for the AST to contain the broadcast information or not. If
> we don't do that, we can probably still generation all optimized case
> with the broadcasted dimensions when the number of input and the
> number of dimensions is low. Otherwise it will ask to case to generate
> and compile that won't be used.
>
> There is inconvinience to forcing the broadcast information into the
> graph. What about making it optional? I'm also talking with Mattew
> Rolling from SymPy and this is something they have. We have something
> for hint in a sandbox in Theano and will mayby use it more in the
> futur. So what about moving the broadcast information there?

For most cases theano already has the right tensor types you need
though, right? So how often do you find yourself spelling broadcasting
rules out manually?

> Also, what about the exact shape information? Do you use that into
> minivect(I don't remember you telling this). We can see the shape
> information as a special type of hint. This is handled completly
> differently in Theano for now. Do you have ideas about this?

Yes, James mentioned it on the list. I don't think this is
significantly useful information, it can only help with perfect
unrolling. I do plan to incorporate broadcasting information in the
minivect AST, currently it doesn't since Cython doesn't know, unless
some of the operands have fewer dimensions (e.g. stretching a vector
along the rows of a matrix, which means you may need a temporary (and
hence a different evaluation function).

I do plan on moving temporary allocation and deciding whether a
temporary is justified into minivect itself, but it's more work than
generating a blob of C code directly :) Especially if minivect is used
at runtime, it will have all the broadcasting information
automatically.

> Fred

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-08-07 @ 14:25
Hi,

thanks for the pointors. I saw a follow up(1998) that also merge
reduction with elemwise, reshape and transpose:

http://dl.acm.org/citation.cfm?id=302872.302883

This 2001 paper is another follow up that optimize this on distributed
memory system. I don't think it will be useful for ndarray except if
we get a distributed ndarray at some point. But still interresting to
know:

http://dl.acm.org/citation.cfm?id=375323.375325

Fred

On Thu, Jul 19, 2012 at 2:21 PM, mark florisson
<markflorisson88@gmail.com> wrote:
> On 19 July 2012 04:58, David Warde-Farley <d.warde.farley@gmail.com> wrote:
>> On Wed, Jul 18, 2012 at 4:15 PM, Travis Oliphant <travis@continuum.io> wrote:
>>
>>> Curious, what makes a dimension not broadcastable (or is that a 
user-defined tag).
>>
>> Basically, a broadcastable dimension is one that is guaranteed to have
>> length 1. Everything else is "non-broadcastable". This means that,
>> e.g. an (M, N) input added to another input that *just happens* to be
>> (1, N) or (M, 1) at runtime (but without the approriate broadcastable
>> flags specified at compile time) will result in a shape error.
>>
>> This is, by default, the only shape information that Theano knows
>> about. Note that subtensor operations and reshapes and whatnot can set
>> the broadcastable flag on a given dimension implicitly.
>>
>> I'm not sure why Theano was designed this way exactly, but I assume it
>> is so that the code path necessary for implementation the broadcast is
>> only generated when the broadcasting is known to be needed a priori.
>> One could just always generate this code path, or (in a JIT context)
>> generate it if it's hit at least once, or something like that.
>
> Broadcasting knowledge is very valuable to avoid possible expensive
> recomputation of the broadcasting operands. Knowing exactly which
> dimensions will broadcast  allows you to generate optimal code, see
> e.g. http://tinyurl.com/cph64bm page 8 (which optimizes spread()
> operations in Fortran 90).

Re: [numba] Common numba/Theano AST?

From:
mark florisson
Date:
2012-07-19 @ 18:25
On 18 July 2012 22:03, Frédéric Bastien <nouiz@nouiz.org> wrote:
> Hi,
>
> What about the idea of having a common AST between numba and Theano?
> This would allow numba to reuse many of Theano optimization.
>
> Any interest? Problems? Theano currently hardcode the dtype, number of
> dimensions and for each dimensions if they are broadcastable or not.

I'm not sure, would you map loops to scan? I think there's many
constructs that Numba will aim to support that Theano doesn't. That
said, I was thinking Numba would use minivect for array expressions,
which will hopefully in the future have Theano integration (map
from/to a Theano AST). It will be valuable to everyone to share all
optimizations that both projects perform, since reimplementing them
seems mad.

> thanks
>
> Fred
>
> p.s. I posted to the new and the old mailing list to be sure to reach
> everybody. Can you only reply in the new mailing list? To register:
> https://groups.google.com/a/continuum.io/d/forum/numba-users

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-20 @ 04:28
On Thu, Jul 19, 2012 at 2:25 PM, mark florisson
<markflorisson88@gmail.com> wrote:
> On 18 July 2012 22:03, Frédéric Bastien <nouiz@nouiz.org> wrote:
>> Hi,
>>
>> What about the idea of having a common AST between numba and Theano?
>> This would allow numba to reuse many of Theano optimization.
>>
>> Any interest? Problems? Theano currently hardcode the dtype, number of
>> dimensions and for each dimensions if they are broadcastable or not.
>
> I'm not sure, would you map loops to scan? I think there's many

Do you speak about theano scan op? Currently in Theano you have two
way, our scan op, or to unroll the loop(work only for fixed number of
iteration). If you speak about theano scan, if the inner loop is very
fast to execute, theano scan implementation have a too high overhead.

> constructs that Numba will aim to support that Theano doesn't. That

I think you are right, but I'm not able to come with one example in
mind. Do you have one? Having one always help.

> said, I was thinking Numba would use minivect for array expressions,
> which will hopefully in the future have Theano integration (map
> from/to a Theano AST). It will be valuable to everyone to share all
> optimizations that both projects perform, since reimplementing them
> seems mad.


I had the impression that Theano would call minivect to generate the
code. You seam to tell that minivect will call Theano? Or that numba
will call Theano?

I agree that sharing graph optimization and code generation is an
important goal that we should have.

Fred

Re: [numba] Common numba/Theano AST?

From:
mark florisson
Date:
2012-07-20 @ 09:00
On 20 July 2012 05:28, Frédéric Bastien <nouiz@nouiz.org> wrote:
> On Thu, Jul 19, 2012 at 2:25 PM, mark florisson
> <markflorisson88@gmail.com> wrote:
>> On 18 July 2012 22:03, Frédéric Bastien <nouiz@nouiz.org> wrote:
>>> Hi,
>>>
>>> What about the idea of having a common AST between numba and Theano?
>>> This would allow numba to reuse many of Theano optimization.
>>>
>>> Any interest? Problems? Theano currently hardcode the dtype, number of
>>> dimensions and for each dimensions if they are broadcastable or not.
>>
>> I'm not sure, would you map loops to scan? I think there's many
>
> Do you speak about theano scan op? Currently in Theano you have two
> way, our scan op, or to unroll the loop(work only for fixed number of
> iteration). If you speak about theano scan, if the inner loop is very
> fast to execute, theano scan implementation have a too high overhead.

Yeah I mean the scan op.

>> constructs that Numba will aim to support that Theano doesn't. That
>
> I think you are right, but I'm not able to come with one example in
> mind. Do you have one? Having one always help.

Calling a non-theano function, printing some stuff to stdout, etc

>> said, I was thinking Numba would use minivect for array expressions,
>> which will hopefully in the future have Theano integration (map
>> from/to a Theano AST). It will be valuable to everyone to share all
>> optimizations that both projects perform, since reimplementing them
>> seems mad.
>
>
> I had the impression that Theano would call minivect to generate the
> code. You seam to tell that minivect will call Theano? Or that numba
> will call Theano?
>
> I agree that sharing graph optimization and code generation is an
> important goal that we should have.

Either way, really. Theano can use minivect once mature (hopefully),
and minivect can use Theano for optimizations (theoretically). Or
maybe these optimizations can be ported directly, which avoids some
runtime overhead. Basically Theano would run its optimizations first,
and then decide whether it wants minivect to generate code, do the AST
mapping and have minivect return the code (a C function or an LLVM
function pointer). Minivect, when called from other contexts, could
optionally map to theano, run the optimizations, and map back. The
types and any other needed internal state also needs to be mapped, so
it's a bit involved.

It's all a bit theoretical unless it's actually tried, it may or may
not be actually practical. Minivect is not in mature-enough state yet
where I'm looking at doing this kind of thing.

> Fred

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-26 @ 15:39
On Fri, Jul 20, 2012 at 5:00 AM, mark florisson
<markflorisson88@gmail.com> wrote:
>> I agree that sharing graph optimization and code generation is an
>> important goal that we should have.
>
> Either way, really. Theano can use minivect once mature (hopefully),
> and minivect can use Theano for optimizations (theoretically). Or
> maybe these optimizations can be ported directly, which avoids some
> runtime overhead. Basically Theano would run its optimizations first,
> and then decide whether it wants minivect to generate code, do the AST
> mapping and have minivect return the code (a C function or an LLVM
> function pointer). Minivect, when called from other contexts, could
> optionally map to theano, run the optimizations, and map back. The
> types and any other needed internal state also needs to be mapped, so
> it's a bit involved.
>
> It's all a bit theoretical unless it's actually tried, it may or may
> not be actually practical. Minivect is not in mature-enough state yet
> where I'm looking at doing this kind of thing.


Here you talk much about minivect AST, what about Numba AST? Do you
think they will share the same one?

Fred

Re: [numba] Common numba/Theano AST?

From:
mark florisson
Date:
2012-07-26 @ 16:14
On 26 July 2012 16:39, Frédéric Bastien <nouiz@nouiz.org> wrote:
> On Fri, Jul 20, 2012 at 5:00 AM, mark florisson
> <markflorisson88@gmail.com> wrote:
>>> I agree that sharing graph optimization and code generation is an
>>> important goal that we should have.
>>
>> Either way, really. Theano can use minivect once mature (hopefully),
>> and minivect can use Theano for optimizations (theoretically). Or
>> maybe these optimizations can be ported directly, which avoids some
>> runtime overhead. Basically Theano would run its optimizations first,
>> and then decide whether it wants minivect to generate code, do the AST
>> mapping and have minivect return the code (a C function or an LLVM
>> function pointer). Minivect, when called from other contexts, could
>> optionally map to theano, run the optimizations, and map back. The
>> types and any other needed internal state also needs to be mapped, so
>> it's a bit involved.
>>
>> It's all a bit theoretical unless it's actually tried, it may or may
>> not be actually practical. Minivect is not in mature-enough state yet
>> where I'm looking at doing this kind of thing.
>
>
> Here you talk much about minivect AST, what about Numba AST? Do you
> think they will share the same one?
>
> Fred

That's because what minivect does is closer to what Theano does than
Numba. Numba will use minivect, but it will not share the entire AST,
although I'm thinking at least the common functionality can be in
minivect. At least they share a type system, but Numba's AST and type
system are really a superset of minivect's.

Re: [numba] Common numba/Theano AST?

From:
Frédéric Bastien
Date:
2012-07-29 @ 23:56
Hi,

On Thu, Jul 26, 2012 at 12:14 PM, mark florisson
<markflorisson88@gmail.com> wrote:
> That's because what minivect does is closer to what Theano does than
> Numba. Numba will use minivect, but it will not share the entire AST,
> although I'm thinking at least the common functionality can be in
> minivect. At least they share a type system, but Numba's AST and type
> system are really a superset of minivect's.

Thanks for the clarification.

What about using the same graph in Theano then in minivect? If we can
make then exactly equal, there won't be need to convert them. Is there
a place that describe the minivect AST? I would like to compare it
with Theano AST and I'll need it to use minivect to generate c code
for our elemwise operator.

Also, I just saw sympy AST during the SciPy conference, and there is a
significant difference with Theano AST. Theano AST is a bi-partite
graph. I think that is a must as this allow one operation to return
many results and as our goal is fast code execution, this is needed.

If we can't make them exactly the same, could we make the optimization
rules in an declarative form that allow to reuse them on both graph?
That would remove the converstion step. Theano have such a mechanism,
but it not used frequently right now. So the majority of optimization
would need to be rewrited.

What do you think?

Fred

Re: [numba] Common numba/Theano AST?

From:
mark florisson
Date:
2012-08-01 @ 22:35
On 30 July 2012 00:56, Frédéric Bastien <nouiz@nouiz.org> wrote:
> Hi,
>
> On Thu, Jul 26, 2012 at 12:14 PM, mark florisson
> <markflorisson88@gmail.com> wrote:
>> That's because what minivect does is closer to what Theano does than
>> Numba. Numba will use minivect, but it will not share the entire AST,
>> although I'm thinking at least the common functionality can be in
>> minivect. At least they share a type system, but Numba's AST and type
>> system are really a superset of minivect's.
>
> Thanks for the clarification.
>
> What about using the same graph in Theano then in minivect? If we can
> make then exactly equal, there won't be need to convert them. Is there
> a place that describe the minivect AST? I would like to compare it
> with Theano AST and I'll need it to use minivect to generate c code
> for our elemwise operator.

I think minivect's AST functionality would pretty much always be a
subset of any other project's AST functionality. So you could use it
at the core, but mostly it doesn't really matter much. Minivect should
be used pretty late in the compilation process, i.e. after all types
are known at each point in the AST, and it's a bit of a low-level
component. You could reuse it, but the nodes themselves don't really
do anything, it's the successive transformations that rewrite and
expand them, and in essence this is the code generation part. The
actual code generation is then just a mapping from fundamental AST
nodes to the final code.

I documented the entire project a while back, but it doesn't have a
gentle introduction yet. The documentation is in the doc/
sub-directory, and you need sphinx to build it. I will write more of a
tutorial introduction, but right now I have other priorities. In the
meantime though, converting an AST is pretty straightforward if you're
attempting to go for a full mapping (partial mappings are more
complicated, and not the best idea). It's especially simple for a
runtime component, since you have all the information, and don't need
to bother generating runtime code to select the right specialization.

> Also, I just saw sympy AST during the SciPy conference, and there is a
> significant difference with Theano AST. Theano AST is a bi-partite
> graph. I think that is a must as this allow one operation to return
> many results and as our goal is fast code execution, this is needed.

The AST is not a bi-partite graph, it's just an tree. Since minivect
really attempts to do the bare low-level essentials of going from an
expression representation to efficient code, I'm not sure it really
matters, since you won't be doing your analysis on the minivect AST.
Could you elaborate on the advantages of the bi-partite property? I
think you'd only want to use minivect when you know that

a) minivect can handle the expression
b) the user wants a C or LLVM backend (to be implemented, I'm planning on this)

> If we can't make them exactly the same, could we make the optimization
> rules in an declarative form that allow to reuse them on both graph?
> That would remove the converstion step. Theano have such a mechanism,
> but it not used frequently right now. So the majority of optimization
> would need to be rewrited.

That would be awesome. Do you have a link as to what this declarative
form looks like?

> What do you think?
>
> Fred