librelist archives

« back to archive

Type inference

Type inference

From:
mark florisson
Date:
2012-06-26 @ 13:16
Hey,

I was thinking, if we are going to add type inference, it might be
useful to base that on the control flow. However, the CFG is currently
build while creating LLVM basic blocks, which need an LLVM function,
which needs a return type (which you first want to infer). However,
running the control flow twice is also a waste. It would also seem
useful, if Numba grows, to move to an actual AST. Going straight from
Python bytecode to LLVM has several disadvantages, namely:

    - it is less convenient to associate data with operations, you
have to map bytecode indices to data in a bunch of dicts or lists
    - it is harder to infer intent from the code, e.g. you have to
parse the patterns in the bytecode and infer loops, etc
    - but most importantly, you cannot conveniently write transforms
that iteratively transform the original AST to a final, optimized AST.
You don't want to hack in all your optimizations (e.g. tiling, axes
collapsing) at code generation time

I think we should think about this carefully, since it is still
relatively easy to switch at this point.

Regarding semantics, how would we handle the following case?

    if some_condition:
        x = 2
    else:
        x = "string"

We could either disallow that, or we could create new SSA variables
for each, and only disallow this when you then later read the value
outside the branch, which would mean a merge of inconsistent types:

    if/else code
    print x

whereas the following would be legal:

    if/else code
    x = 2.0
    print x

since 'x' doesn't need to be merged into one variable after the
branch. I personally think that although it would be nice to support,
it would be confusing for users when reading the value suddenly
breaks.

Thoughts?

Re: [numba] Type inference

From:
Travis Oliphant
Date:
2012-06-26 @ 13:20
I think we can switch to using the AST if that is deemed helpful.     I 
think it's important to be able to work with the function object direction
instead of a "string" (although there are ways to use import hooks to 
potentially make that distinction un-necessary).  

Basically, if both you and Jon agree that the AST would be better to use, 
the let's do it.   

-Travis



On Jun 26, 2012, at 8:16 AM, mark florisson wrote:

> Hey,
> 
> I was thinking, if we are going to add type inference, it might be
> useful to base that on the control flow. However, the CFG is currently
> build while creating LLVM basic blocks, which need an LLVM function,
> which needs a return type (which you first want to infer). However,
> running the control flow twice is also a waste. It would also seem
> useful, if Numba grows, to move to an actual AST. Going straight from
> Python bytecode to LLVM has several disadvantages, namely:
> 
>    - it is less convenient to associate data with operations, you
> have to map bytecode indices to data in a bunch of dicts or lists
>    - it is harder to infer intent from the code, e.g. you have to
> parse the patterns in the bytecode and infer loops, etc
>    - but most importantly, you cannot conveniently write transforms
> that iteratively transform the original AST to a final, optimized AST.
> You don't want to hack in all your optimizations (e.g. tiling, axes
> collapsing) at code generation time
> 
> I think we should think about this carefully, since it is still
> relatively easy to switch at this point.
> 
> Regarding semantics, how would we handle the following case?
> 
>    if some_condition:
>        x = 2
>    else:
>        x = "string"
> 
> We could either disallow that, or we could create new SSA variables
> for each, and only disallow this when you then later read the value
> outside the branch, which would mean a merge of inconsistent types:
> 
>    if/else code
>    print x
> 
> whereas the following would be legal:
> 
>    if/else code
>    x = 2.0
>    print x
> 
> since 'x' doesn't need to be merged into one variable after the
> branch. I personally think that although it would be nice to support,
> it would be confusing for users when reading the value suddenly
> breaks.
> 
> Thoughts?