2.4. Reduction Operations
Open the notebook in Colab

Reduction is an operation to reduce certain dimension(s) of an input tensor, usually to scalar(s), e.g. np.sum in NumPy. Reduction is often straightforward to implement with for-loops. But it’s a little bit more complicated in TVM since we cannot use a Python for-loop directly. In this section, we will describe how to implement reduction in TVM.

import d2ltvm
import numpy as np
import tvm
from tvm import te

2.4.1. Sum

Let’s start with summing the rows of a 2-D matrix to reduce it to be a 1-D vector. In NumPy, we can do it with the sum method.

a = np.random.normal(size=(3,4)).astype('float32')
a.sum(axis=1)
array([ 1.8948135, -2.4319794,  1.9638997], dtype=float32)

As we did before, let’s implement this operation from scratch to help understand the TVM expression.

def sum_rows(a, b):
    """a is an n-by-m 2-D matrix, b is an n-length 1-D vector
    """
    n = len(b)
    for i in range(n):
        b[i] = np.sum(a[i,:])

b = np.empty((3,), dtype='float32')
sum_rows(a, b)
b
array([ 1.8948135, -2.4319794,  1.9638997], dtype=float32)

It’s fairly straightforward, we first iterate on the first dimension, axis=0, and then sum all elements on the second dimension to write the results. In NumPy, we can use : to slice all elements along that dimension.

Now let’s implement the same thing in TVM. Comparing to the vector addition in Section 1.2, we used two new operators here. One is tvm.reduce_axis, which create an axis for reduction with range from 0 to m. It’s functionally similar to the : used in sum_rows, but we need to explicitly specify the range in TVM. The other one is tvm.sum, which sums all elements along the reducing axis k and returns a scalar.

n, m = te.var('n'), te.var('m')
A = te.placeholder((n, m), name='a')
j = te.reduce_axis((0, m), name='j')
B = te.compute((n,), lambda i: te.sum(A[i, j], axis=j), name='b')
s = te.create_schedule(B.op)
tvm.lower(s, [A, B], simple_mode=True)
produce b {
  for (i, 0, n) {
    b[(i*stride)] = 0f
    for (j, 0, m) {
      b[(i*stride)] = (b[(i*stride)] + a[((i*stride) + (j*stride))])
    }
  }
}

We can see that the generated pseudo codes expand tvm.sum into another for loop along axis k. As mentioned before, the pseudo codes are C-like, so the index of a[i,j] is expanded to (i*m)+j by treating a as a 1-D array. Also note that b is initialized to be all-zero before summation.

Now test the results are as expected.

mod = tvm.build(s, [A, B])
c = tvm.nd.array(np.empty((3,), dtype='float32'))
mod(tvm.nd.array(a), c)
np.testing.assert_equal(b, c.asnumpy())

We know that a.sum() will sum all elements in a and returns a scalar. Let’s also implement this in TVM. To do it, we need another reduction axis along the first dimension, whose size is n. The result is a scalar, namely a 0-rank tensor, can be created with an empty tuple ().

i = te.reduce_axis((0, n), name='i')
B = te.compute((), lambda: te.sum(A[i, j], axis=(i, j)), name='b')
s = te.create_schedule(B.op)
tvm.lower(s, [A, B], simple_mode=True)
produce b {
  b[0] = 0f
  for (i, 0, n) {
    for (j, 0, m) {
      b[0] = (b[0] + a[((i*stride) + (j*stride))])
    }
  }
}

Let’s also verify the results.

mod = tvm.build(s, [A, B])
c = tvm.nd.array(np.empty((), dtype='float32'))
mod(tvm.nd.array(a), c)
np.testing.assert_allclose(a.sum(), c.asnumpy(), atol=1e-5)

In this case we use np.testing.assert_allclose instead of np.testing.assert_equal to verify the results as the calculation on float32 numbers may differ due to the numerical error.

Beyond tvm.sum, there are other reduction operators in TVM such as tvm.min and tvm.max. We can also use them to implement the corresponding reduction operations as well.

2.4.2. Commutative Reduction

In mathematics, an operator \(\circ\) is commutative if \(a\circ b = b\circ a\). TVM allows to define a customized commutative reduction operator through tvm.comm_reducer. It accepts two function arguments, one defines how to compute \(a\circ b\), the other one specifies the initial value.

Let’s use the production by rows, e.g a.prod(axis=1), as an example. Again, we first show how to implement it from scratch.

def prod_rows(a, b):
    """a is an n-by-m 2-D matrix, b is an n-length 1-D vector
    """
    n, m = a.shape
    for i in range(n):
        b[i] = 1
        for j in range(m):
            b[i] = b[i] * a[i, j]

As can be seen, we need to first initialize the return values to be 1, and then compute the reduction using scalar product *. Now let’s define these two functions in TVM to serve as the arguments of te.comm_reducer. As discussed, the first one defines \(a\circ b\) with two scalar inputs. The second one accepts a data type argument to return the initial value of an element. Then we can create the reduction operator.

comp = lambda a, b: a * b
init = lambda dtype: tvm.tir.const(1, dtype=dtype)
product = te.comm_reducer(comp, init)

The usage of product is similar to te.sum. Actually, te.sum is a pre-defined reduction operator using the same way.

n = te.var('n')
m = te.var('m')
A = te.placeholder((n, m), name='a')
k = te.reduce_axis((0, m), name='k')
B = te.compute((n,), lambda i: product(A[i, k], axis=k), name='b')
s = te.create_schedule(B.op)
tvm.lower(s, [A, B], simple_mode=True)
produce b {
  for (i, 0, n) {
    b[(i*stride)] = 1f
    for (k, 0, m) {
      b[(i*stride)] = (b[(i*stride)]*a[((i*stride) + (k*stride))])
    }
  }
}

The generated pseudo codes are similar to the one for summing by rows, except for the initialized value and the reduction arithmetic.

Again, let’s verify the results.

mod = tvm.build(s, [A, B])
b = tvm.nd.array(np.empty((3,), dtype='float32'))
mod(tvm.nd.array(a), b)
np.testing.assert_allclose(a.prod(axis=1), b.asnumpy(), atol=1e-5)

2.4.3. Summary

  • We can apply a reduction operator, e.g. te.sum over a reduction axis te.reduce_axis.

  • We can implement customized commutative reduction operators by te.comm_reducer.