librelist archives

« back to archive

Numpy dtype string for arrays?

Numpy dtype string for arrays?

From:
Jon Riehl
Date:
2012-05-30 @ 20:35
Hi,

I'm trying to figure out how to specify string inputs to
"numpy.dtype()" that specify an array of fixed dimension, but
arbitrary shape within those dimensions.  For example "(2,3)d" is
fixed to having a shape of (2, 3), whereas I want to say "(*,*)d"
where the result would describe a two dimensional array of arbitrary
length along the two axes.

Thanks,
-Jon

Re: Numpy dtype string for arrays?

From:
Jon Riehl
Date:
2012-05-30 @ 21:07
So it looks like dtype was not intended to describe the arrays, but
just elements of an array.  This leads me to stop beating around the
bush about the following.  I want to compile with array input types,
and I'm wondering if anyone has comments about the following proposal,
demonstrated in code:

test_data = numpy.array([1., 2., 3.])
compiled_fn = numba_compile(arg_types = [['d']])(get_ndarray_ndim)
self.assertEqual(compiled_fn(test_data), 1)

Thus a one dimensional array of doubles would be ['d'], a two
dimensional array [['d']], and so on...

How about arrays with arbitrary dimensions?  It is nice to know if an
array has a fixed number of dimensions for typing purposes, assuming
we are not just going to go through the API for everything.

Thanks,
-Jon

Re: [numba] Numpy dtype string for arrays?

From:
Travis Oliphant
Date:
2012-05-31 @ 11:58
On May 30, 2012, at 4:07 PM, Jon Riehl wrote:

> So it looks like dtype was not intended to describe the arrays, but
> just elements of an array.  This leads me to stop beating around the
> bush about the following.  I want to compile with array input types,
> and I'm wondering if anyone has comments about the following proposal,
> demonstrated in code:
> 
> test_data = numpy.array([1., 2., 3.])
> compiled_fn = numba_compile(arg_types = [['d']])(get_ndarray_ndim)
> self.assertEqual(compiled_fn(test_data), 1)


I'm not sure what the purpose of get_ndarray_ndim is in this case.     I 
think it would be rare that a function should be compiled special-cased to
a particular size (a particular number of dimensions, yes, but not a 
particular size).  

As for spelling, I don't have any particular problem with using ['d'] for 
1-d and [['d']] for 2-d arrays of doubles.   On the other hand, it might 
be easier to just define 

All I know is that I don't like how long it takes to spell a 2-d array of 
doubles with cython:  nd.ndarray[ndim=2, dtype='d'] --- Yuck!

[['d']] is certainly better than that. 

-Travis


> 
> Thus a one dimensional array of doubles would be ['d'], a two
> dimensional array [['d']], and so on...
> 
> How about arrays with arbitrary dimensions?  It is nice to know if an
> array has a fixed number of dimensions for typing purposes, assuming
> we are not just going to go through the API for everything.
> 
> Thanks,
> -Jon

Re: [numba] Numpy dtype string for arrays?

From:
Jon Riehl
Date:
2012-05-31 @ 16:50
On Thu, May 31, 2012 at 6:58 AM, Travis Oliphant <travis@continuum.io> wrote:
> On May 30, 2012, at 4:07 PM, Jon Riehl wrote:
>> So it looks like dtype was not intended to describe the arrays, but
>> just elements of an array.  This leads me to stop beating around the
>> bush about the following.  I want to compile with array input types,
>> and I'm wondering if anyone has comments about the following proposal,
>> demonstrated in code:
>>
>> test_data = numpy.array([1., 2., 3.])
>> compiled_fn = numba_compile(arg_types = [['d']])(get_ndarray_ndim)
>> self.assertEqual(compiled_fn(test_data), 1)
>
> I'm not sure what the purpose of get_ndarray_ndim is in this case.     I
think it would be rare that a function should be compiled special-cased to
a particular size (a particular number of dimensions, yes, but not a 
particular size).

Sorry, this is the problem with grabbing examples from other code out
of context.  I was experimenting with LOAD_ATTR support to demonstrate
our ability to model the array structure as an LLVM data type.  The
compiled function simply returned "ndarr.ndim", though a savvy
compiler might look at our type markup and replace that with a
constant.

> As for spelling, I don't have any particular problem with using ['d'] 
for 1-d and [['d']] for 2-d arrays of doubles.   On the other hand, it 
might be easier to just define
>
> All I know is that I don't like how long it takes to spell a 2-d array 
of doubles with cython:  nd.ndarray[ndim=2, dtype='d'] --- Yuck!
>
> [['d']] is certainly better than that.

Yeah, but I can already see how it doesn't scale very well past two
dimensional arrays (a rank 5 tensor being [[[[['d']]]]], for example).
 I see the bits of another proposal being "arr[dt]" where "dt" is the
array element type.  Maybe in the long run we go with that, and add an
optional dimension number field, say "arr<2>[d]" or "arr2[d]".  At the
moment I don't want to write another parser on top of the one already
in use for dtype's unless there is some communal buy in.

Thanks,
-Jon