Introduction

Numpy Structured arrays are a more complex type of array made up of smaller “structures”. This is very similar to an array of structs in the C language.

Previously, we had introduced Numba’s @jitclass feature for grouping together values of different types. However, for many uses (outside of building your own data structures) structured arrays will suffice.

Creation

You can create a numpy structured array by passing in a dtype containing a list of the names and types of the individual fields like this:

import numpy
a = numpy.empty((10,), dtype=[('x', float), ('y', float), ('z', float), ('count', int)])

If you need to specify the exact precision, you can do this with the numpy types:

a = numpy.empty((10,), dtype=[('x', numpy.float32), ('y', numpy.float32), 
    ('z', numpy.float64), ('count', numpy.uint8)])

You can also provide input data using numpy.array:

a = numpy.array([(121.9, 97.1, 9.1, 5), (124.1, 98.0, 8.7, 4)], 
    dtype=[('x', float), ('y', float), ('z', float), ('count', int)])

Access

You can access individual structures with the normal numpy indices, combined with the name if individual fields. To set values, use this syntax:

# set all the `x` values to 10 for all elements of the array
a['x'] = 10
# set all the fields of the first structure to 9
a[0] = 9
# set the 'x' field of the second structure to 100
a[1]['x'] = 100

The same rules apply for reading data out of the array:

# all the 'x' fields in the array
a['x']
# the first structure in the array
a[0]
# the 'x' field of the second structure in the array
a[1]['x']

Strings

Prior to numpy 2.0 you needed to define the length of strings in a structured array, and whether they were unicode or not:

a = numpy.empty((10,), dtype=[('asciistring', 'S8'), ('unicodestring', 'U5')])

Note that asciistring is limited to 8 characters and is accessed as a Python bytes object. unicodestring is limited to 5 characters and is accessed as a normal string.

Since numpy 2.0, you can use the new StringDType which is more flexible - you don’t have to define length and can handle UTF-8 encoded strings:

a = numpy.empty((10,), dtype=[('x', float), ('label', numpy.dtypes.StringDType)])
a['label'] = 'forest'

Numba

Numba can also access structured arrays:

from numba import njit

@njit
def iteratearray(a_struct):
    rows, = a_struct.shape
    totalcount = 0
    for i in range(rows):
        totalcount += a_struct[i]['count']
        
    return totalcount

a = numpy.array([(121.9, 97.1, 9.1, 5), (124.1, 98.0, 8.7, 4)], 
    dtype=[('x', float), ('y', float), ('z', float), ('count', int)])
    
print(iteratearray(a))

If you prefer, you can use the more C style “dot” notation from within numba:

for i in range(rows):
   totalcount += a_struct[i].count

Conclusion

Structured arrays are another handy numpy feature that can be combined with Numba to create fast application specific code. Structured arrays are often using with LiDAR data - for example Riegl Tools returns structured numpy arrays of TLS data.