# 2.4. Reduction Operations¶

Reduction is an operation to reduce certain dimension(s) of an input
tensor, usually to scalar(s), e.g. `np.sum`

in NumPy. Reduction is
often straightforward to implement with for-loops. But it’s a little bit
more complicated in TVM since we cannot use a Python for-loop directly.
In this section, we will describe how to implement reduction in TVM.

```
import d2ltvm
import numpy as np
import tvm
from tvm import te
```

## 2.4.1. Sum¶

Let’s start with summing the rows of a 2-D matrix to reduce it to be a
1-D vector. In NumPy, we can do it with the `sum`

method.

```
a = np.random.normal(size=(3,4)).astype('float32')
a.sum(axis=1)
```

```
array([ 1.8948135, -2.4319794, 1.9638997], dtype=float32)
```

As we did before, let’s implement this operation from scratch to help understand the TVM expression.

```
def sum_rows(a, b):
"""a is an n-by-m 2-D matrix, b is an n-length 1-D vector
"""
n = len(b)
for i in range(n):
b[i] = np.sum(a[i,:])
b = np.empty((3,), dtype='float32')
sum_rows(a, b)
b
```

```
array([ 1.8948135, -2.4319794, 1.9638997], dtype=float32)
```

It’s fairly straightforward, we first iterate on the first dimension,
`axis=0`

, and then sum all elements on the second dimension to write
the results. In NumPy, we can use `:`

to slice all elements along that
dimension.

Now let’s implement the same thing in TVM. Comparing to the vector
addition in Section 1.2, we used two new operators here.
One is `tvm.reduce_axis`

, which create an axis for reduction with
range from 0 to `m`

. It’s functionally similar to the `:`

used in
`sum_rows`

, but we need to explicitly specify the range in TVM. The
other one is `tvm.sum`

, which sums all elements along the reducing
axis `k`

and returns a scalar.

```
n, m = te.var('n'), te.var('m')
A = te.placeholder((n, m), name='a')
j = te.reduce_axis((0, m), name='j')
B = te.compute((n,), lambda i: te.sum(A[i, j], axis=j), name='b')
s = te.create_schedule(B.op)
tvm.lower(s, [A, B], simple_mode=True)
```

```
produce b {
for (i, 0, n) {
b[(i*stride)] = 0f
for (j, 0, m) {
b[(i*stride)] = (b[(i*stride)] + a[((i*stride) + (j*stride))])
}
}
}
```

We can see that the generated pseudo codes expand `tvm.sum`

into
another for loop along axis `k`

. As mentioned before, the pseudo codes
are C-like, so the index of `a[i,j]`

is expanded to `(i*m)+j`

by
treating `a`

as a 1-D array. Also note that `b`

is initialized to be
all-zero before summation.

Now test the results are as expected.

```
mod = tvm.build(s, [A, B])
c = tvm.nd.array(np.empty((3,), dtype='float32'))
mod(tvm.nd.array(a), c)
np.testing.assert_equal(b, c.asnumpy())
```

We know that `a.sum()`

will sum all elements in `a`

and returns a
scalar. Let’s also implement this in TVM. To do it, we need another
reduction axis along the first dimension, whose size is `n`

. The
result is a scalar, namely a 0-rank tensor, can be created with an empty
tuple `()`

.

```
i = te.reduce_axis((0, n), name='i')
B = te.compute((), lambda: te.sum(A[i, j], axis=(i, j)), name='b')
s = te.create_schedule(B.op)
tvm.lower(s, [A, B], simple_mode=True)
```

```
produce b {
b[0] = 0f
for (i, 0, n) {
for (j, 0, m) {
b[0] = (b[0] + a[((i*stride) + (j*stride))])
}
}
}
```

Let’s also verify the results.

```
mod = tvm.build(s, [A, B])
c = tvm.nd.array(np.empty((), dtype='float32'))
mod(tvm.nd.array(a), c)
np.testing.assert_allclose(a.sum(), c.asnumpy(), atol=1e-5)
```

In this case we use `np.testing.assert_allclose`

instead of
`np.testing.assert_equal`

to verify the results as the calculation on
`float32`

numbers may differ due to the numerical error.

Beyond `tvm.sum`

, there are other reduction operators in TVM such as
`tvm.min`

and `tvm.max`

. We can also use them to implement the
corresponding reduction operations as well.

## 2.4.2. Commutative Reduction¶

In mathematics, an operator \(\circ\) is commutative if
\(a\circ b = b\circ a\). TVM allows to define a customized
commutative reduction operator through `tvm.comm_reducer`

. It accepts
two function arguments, one defines how to compute \(a\circ b\), the
other one specifies the initial value.

Let’s use the production by rows, e.g `a.prod(axis=1)`

, as an example.
Again, we first show how to implement it from scratch.

```
def prod_rows(a, b):
"""a is an n-by-m 2-D matrix, b is an n-length 1-D vector
"""
n, m = a.shape
for i in range(n):
b[i] = 1
for j in range(m):
b[i] = b[i] * a[i, j]
```

As can be seen, we need to first initialize the return values to be 1,
and then compute the reduction using scalar product `*`

. Now let’s
define these two functions in TVM to serve as the arguments of
`te.comm_reducer`

. As discussed, the first one defines
\(a\circ b\) with two scalar inputs. The second one accepts a data
type argument to return the initial value of an element. Then we can
create the reduction operator.

```
comp = lambda a, b: a * b
init = lambda dtype: tvm.tir.const(1, dtype=dtype)
product = te.comm_reducer(comp, init)
```

The usage of `product`

is similar to `te.sum`

. Actually, `te.sum`

is a pre-defined reduction operator using the same way.

```
n = te.var('n')
m = te.var('m')
A = te.placeholder((n, m), name='a')
k = te.reduce_axis((0, m), name='k')
B = te.compute((n,), lambda i: product(A[i, k], axis=k), name='b')
s = te.create_schedule(B.op)
tvm.lower(s, [A, B], simple_mode=True)
```

```
produce b {
for (i, 0, n) {
b[(i*stride)] = 1f
for (k, 0, m) {
b[(i*stride)] = (b[(i*stride)]*a[((i*stride) + (k*stride))])
}
}
}
```

The generated pseudo codes are similar to the one for summing by rows, except for the initialized value and the reduction arithmetic.

Again, let’s verify the results.

```
mod = tvm.build(s, [A, B])
b = tvm.nd.array(np.empty((3,), dtype='float32'))
mod(tvm.nd.array(a), b)
np.testing.assert_allclose(a.prod(axis=1), b.asnumpy(), atol=1e-5)
```

## 2.4.3. Summary¶

We can apply a reduction operator, e.g.

`te.sum`

over a reduction axis`te.reduce_axis`

.We can implement customized commutative reduction operators by

`te.comm_reducer`

.