# 1.2. Vector Add¶

Now you have installed all libraries, let’s write our first program:
summing two `n`

-dimensional vectors `a`

and `b`

. It’s
straightforward in NumPy, where we can do it by `c = a + b`

.

## 1.2.1. Implementing with NumPy¶

```
import numpy as np
np.random.seed(0)
n = 100
a = np.random.normal(size=n).astype(np.float32)
b = np.random.normal(size=n).astype(np.float32)
c = a + b
```

Here we create two random vectors with length 100, and sum them element-wisely. Note that NumPy in default uses 64-bit floating-points or 64-bit integers, which is different from 32-bit floating point typically used in deep learning, so we explicitly cast the data type.

Although we can use the build-in `+`

operator in NumPy to realize
element-wise add, let’s try to implement it by only using scalar
operators. It will help us understand the implementation with TVM. The
following function uses a for-loop to iterate over every element of the
vectors, and then add two elements together with the scalar `+`

operator each time.

```
def vector_add(a, b, c):
for i in range(n):
c[i] = a[i] + b[i]
d = np.empty(shape=n, dtype=np.float32)
vector_add(a, b, d)
np.testing.assert_array_equal(c, d)
```

Given we will frequently create two random ndarrays and another empty one to store the results in the following chapters, we save this routine to reuse it in the future.

```
# Save to the d2ltvm package.
def get_abc(shape, constructor=None):
"""Return random a, b and empty c with the same shape.
"""
np.random.seed(0)
a = np.random.normal(size=shape).astype(np.float32)
b = np.random.normal(size=shape).astype(np.float32)
c = np.empty_like(a)
if constructor:
a, b, c = [constructor(x) for x in (a, b, c)]
return a, b, c
```

Note that we fixed the random seed so that we will always get the same
results to facilitate the comparison between NumPy, TVM and others. In
addition, it accepts an optional `constructor`

to convert the data
into a different format.

## 1.2.2. Defining the TVM Computation¶

Now let’s implement `vector_add`

in TVM. The TVM implementation
differs from above in two ways:

We don’t need to write the complete function, but only to specify how each element of the output, i.e.

`c[i]`

, is computedTVM is symbolic, we create symbolic variables by specifying their shapes, and define how the program will be computed

In the following program, we first declare the placeholders `A`

and
`B`

for both inputs by specifying their shapes, `(n,)`

, through
`tvm.te.placeholder`

. Both `A`

and `B`

are `Tensor`

objects,
which we can feed data later. We assign names to them so we can print an
easy-to-read program later.

Next we define how the output `C`

is computed by `tvm.compute`

. It
accepts two arguments, the output shape, and a function to compute each
element by giving its index. Since the output is a vector, its elements
are indexed by integers. The lambda function defined in `tvm.compute`

accepts a single argument `i`

, and returns `c[i]`

, which is
identical to `c[i] = a[i] + b[i]`

defined in `vector_add`

. One
difference is that we don’t write the for-loop, which will be filled by
TVM later.

```
import tvm
from tvm import te # te stands for tensor expression
# Save to the d2ltvm package.
def vector_add(n):
"""TVM expression for vector add"""
A = te.placeholder((n,), name='a')
B = te.placeholder((n,), name='b')
C = te.compute(A.shape, lambda i: A[i] + B[i], name='c')
return A, B, C
A, B, C = vector_add(n)
type(A), type(C)
```

```
(tvm.te.tensor.Tensor, tvm.te.tensor.Tensor)
```

We can see that `A`

, `B`

, and `C`

are all `Tensor`

objects,
which can be viewed as a symbolic version of NumPy’s ndarray. We can
access the variables’ attributes such as data type and shape. But those
values don’t have concrete values right now.

```
(A.dtype, A.shape), (C.dtype, C.shape)
```

```
(('float32', [100]), ('float32', [100]))
```

The operation that generates the tensor object can be accessed by
`A.op`

.

```
type(A.op), type(C.op)
```

```
(tvm.te.tensor.PlaceholderOp, tvm.te.tensor.ComputeOp)
```

We can see that the types of the operations for `A`

and `C`

are
different, but they share the same base class `Operation`

, which
represents an operation that generates a tensor object.

```
A.op.__class__.__bases__[0]
```

```
tvm.te.tensor.Operation
```

## 1.2.3. Creating a Schedule¶

To run the computation, we need to specify how to execute the program,
for example, the order to access data and how to do multi-threading
parallelization. Such an execution plan is called a *schedule*. Since
`C`

is the output tensor, let’s create a default schedule on its
operator and print the pseudo codes.

```
s = te.create_schedule(C.op)
```

A schedule consists of several stages. Each stage corresponds to an
operation to describe how it is scheduled. We can access a particular
stage by either `s[C]`

or `s[C.op]`

.

```
type(s), type(s[C])
```

```
(tvm.te.schedule.Schedule, tvm.te.schedule.Stage)
```

Later on we will see how to change the execution plan to better utilize the hardware resources to improve its efficiency. Here let’s see the default execution plan by printing the C-like pseudo codes.

```
tvm.lower(s, [A, B, C], simple_mode=True)
```

```
produce c {
for (i, 0, 100) {
c[i] = (a[i] + b[i])
}
}
```

The `lower`

method accepts the schedule and input and output tensors.
The `simple_mode=True`

will print the program in a simple and compact
way. Note that the program has added proper for-loops according to the
output shape. Overall, it’s quite similar to the preview function
`vector_add`

.

Now you see that TVM separates the computation and the schedule. The computation defines how the results are computed, which will not change no matter on what hardware platform you run the program. On the other hand, an efficient schedule are often hardware dependent, but changing a schedule will not impact the correctness. The idea of separating computation from schedule is inherited by TVM from Halide [Ragan-Kelley et al., 2013].

## 1.2.4. Compilation and Execution¶

Once both computation and schedule are defined, we can compile them into
an executable module with `tvm.build`

. It accepts the same argument as
`tvm.lower`

. In fact, it first calls `tvm.lower`

to generate the
program and then compiles to machine codes.

```
mod = tvm.build(s, [A, B, C])
type(mod)
```

```
tvm.runtime.module.Module
```

It returns an executable module object. Now we can feed data for `A`

,
`B`

and `C`

to run it. The tensor data must be
`tvm.ndarray.NDArray`

object. The easiest way is to create NumPy
ndarray objects first and then convert them into TVM ndarray by
`tvm.nd.array`

. We can convert them back to NumPy by the `asnumpy`

method.

```
x = np.ones(2)
y = tvm.nd.array(x)
type(y), y.asnumpy()
```

```
(tvm.runtime.ndarray.NDArray, array([1., 1.]))
```

Now let’s construct data and return them as TVM ndarrays.

```
a, b, c = get_abc(100, tvm.nd.array)
```

Do the computation, and verify the results.

```
mod(a, b, c)
np.testing.assert_array_equal(a.asnumpy() + b.asnumpy(), c.asnumpy())
```

## 1.2.5. Argument Constraints¶

Remember that we specified both inputs to be 100-length vectors when
declaring `A`

and `B`

.

```
A.shape, B.shape, C.shape
```

```
([100], [100], [100])
```

TVM will check if the input shapes satisfy this specification.

```
try:
a, b, c = get_abc(200, tvm.nd.array)
mod(a, b, c)
except tvm.TVMError as e:
print(e)
```

```
Traceback (most recent call last):
[bt] (1) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(TVMFuncCall+0x61) [0x7f8eec7a60f1]
[bt] (0) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(+0xcab5a1) [0x7f8eec78d5a1]
File "/home/ubuntu/tvm/src/runtime/library_module.cc", line 89
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (100 == int32(arg0.shape[0])), Argument arg0.shape[0] has an unsatisfied constraint
```

The default data type in TVM is `float32`

.

```
A.dtype, B.dtype, C.dtype
```

```
('float32', 'float32', 'float32')
```

An error will appear if input with a different data type.

```
try:
a, b, c = get_abc(100, tvm.nd.array)
a = tvm.nd.array(a.asnumpy().astype('float64'))
mod(a, b, c)
except tvm.TVMError as e:
print(e)
```

```
Traceback (most recent call last):
[bt] (1) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(TVMFuncCall+0x61) [0x7f8eec7a60f1]
[bt] (0) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(+0xcab5a1) [0x7f8eec78d5a1]
File "/home/ubuntu/tvm/src/runtime/library_module.cc", line 89
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (((tvm_struct_get(arg0, 0, 5) == (uint8)2) && (tvm_struct_get(arg0, 0, 6) == (uint8)32)) && (tvm_struct_get(arg0, 0, 7) == (uint16)1)), arg0.dtype is expected to be float32
```

## 1.2.6. Saving and Loading a Module¶

A compiled a module can be saved into disk,

```
mod_fname = 'vector-add.tar'
mod.export_library(mod_fname)
```

and then loaded back later.

```
loaded_mod = tvm.runtime.load_module(mod_fname)
```

Verify the results.

```
a, b, c = get_abc(100, tvm.nd.array)
loaded_mod(a, b, c)
np.testing.assert_array_equal(a.asnumpy() + b.asnumpy(), c.asnumpy())
```

## 1.2.7. Summary¶

Implementing an operator using TVM has three steps:

Declare the computation by specifying input and output shapes and how each output element is computed.

Create a schedule to (hopefully) fully utilize the machine resources.

Compile to the hardware target.

In addition, we can save the compiled module into disk so we can load it back later.