A broadcast operator process two tensors in different shapes. Normally, one of the operands has a particular dimension to be 1, which will be broadcast along the corresponding dimension of the other operator to perform the given calculation. Common scalar calculations can all be broadcast, such as elementary arithmetic and logical operations. Fig. 3.1.1 illustrates one broadcast add case between two 2-dimensional tensors. Broadcast operators are commonly seen in deep learning workloads, e.g. batch normalization. In this section we will demonstrate how to perform a broadcast add between two 2-dimensional tensors. The following code defines the computation.

import numpy as np
import tvm
from tvm import te

# Save to the d2ltvm package.

shape1, shape2 : the shapes of the input tensors
"""
assert len(shape1) == 2 and len(shape2) == 2, \
"broadcast tensors should both be 2-dimension"
for i in range(len(shape1)):
assert shape1[i] == shape2[i] or shape1[i] == 1 or shape2[i] == 1, \
"tensor shapes do not fit for broadcasting"
A = te.placeholder(shape1, name='A')
B = te.placeholder(shape2, name='B')
m = shape1 if shape2 == 1 else shape2
n = shape1 if shape2 == 1 else shape2
f = lambda x, y: A[0 if shape1==1 else x, 0 if shape1==1 else y] + \
B[0 if shape2==1 else x, 0 if shape2==1 else y]
C = te.compute((m, n), f, name='C')
return A, B, C


Then we use it to perform the broadcast add illustrated in Fig. 3.1.1.

m = 3
n = 4
shape1 = (m, 1)
shape2 = (m, n)
s = te.create_schedule(C.op)
print(tvm.lower(s, [A, B], simple_mode=True))
mod = tvm.build(s, [A, B, C])

// attr [C] storage_scope = "global"
allocate C[float32 * 12]
produce C {
for (x, 0, 3) {
for (y, 0, 4) {
C[((x*4) + y)] = (A[x] + B[((x*4) + y)])
}
}
}


The printed pseudocode clearly depicts the process of a broadcast add. We verify the results as follows.

# Save to the d2ltvm package.
def get_bcast_data(shape1, shape2, constructor=None):
"""Return random tensors a, b
and empty tensor c to store broadcast results between a and b

shape1, shape2: shapes of input tensors
constructor : user-defined tensor constructor
"""
np.random.seed(0)
a = np.random.normal(size=shape1).astype("float32")
b = np.random.normal(size=shape2).astype("float32")
out_shape = (shape1 if shape2 == 1 else shape2,
shape1 if shape2 == 1 else shape2)
c = np.empty(out_shape, dtype='float32')
if constructor:
a, b, c = [constructor(x) for x in (a, b, c)]
return a, b, c
a, b, c = get_bcast_data(shape1, shape2, tvm.nd.array)
mod(a, b, c)


Note that broadcast is allowed to perform along multiple dimensions.

shape1 = (m, 1)
shape2 = (1, n)
s = te.create_schedule(C.op)
mod = tvm.build(s, [A, B, C])
a, b, c = get_bcast_data(shape1, shape2, tvm.nd.array)
mod(a, b, c)
print(a.shape, b.shape, c.shape)

(3, 1) (1, 4) (3, 4)


Lastly, it is easy to note that when the shapes of two input tensors are indentical, the broadcast add reduces to an element-wise add.

## 3.1.1. Summary¶

• We can define a broadcast operator in TVM.

• Broadcast be can performed along multiple dimensions.

EE Exercise - Generalize broadcast_add defined above to more dimensions and more operators.