1.2. Vector Add
Open the notebook in Colab

Now you have installed all libraries, let’s write our first program: summing two n-dimensional vectors a and b. It’s straightforward in NumPy, where we can do it by c = a + b.

1.2.1. Implementing with NumPy

import numpy as np

n = 100
a = np.random.normal(size=n).astype(np.float32)
b = np.random.normal(size=n).astype(np.float32)
c = a + b

Here we create two random vectors with length 100, and sum them element-wisely. Note that NumPy in default uses 64-bit floating-points or 64-bit integers, which is different from 32-bit floating point typically used in deep learning, so we explicitly cast the data type.

Although we can use the build-in + operator in NumPy to realize element-wise add, let’s try to implement it by only using scalar operators. It will help us understand the implementation with TVM. The following function uses a for-loop to iterate over every element of the vectors, and then add two elements together with the scalar + operator each time.

def vector_add(a, b, c):
    for i in range(n):
        c[i] = a[i] + b[i]

d = np.empty(shape=n, dtype=np.float32)
vector_add(a, b, d)
np.testing.assert_array_equal(c, d)

Given we will frequently create two random ndarrays and another empty one to store the results in the following chapters, we save this routine to reuse it in the future.

# Save to the d2ltvm package.
def get_abc(shape, constructor=None):
    """Return random a, b and empty c with the same shape.
    a = np.random.normal(size=shape).astype(np.float32)
    b = np.random.normal(size=shape).astype(np.float32)
    c = np.empty_like(a)
    if constructor:
        a, b, c = [constructor(x) for x in (a, b, c)]
    return a, b, c

Note that we fixed the random seed so that we will always get the same results to facilitate the comparison between NumPy, TVM and others. In addition, it accepts an optional constructor to convert the data into a different format.

1.2.2. Defining the TVM Computation

Now let’s implement vector_add in TVM. The TVM implementation differs from above in two ways:

  1. We don’t need to write the complete function, but only to specify how each element of the output, i.e. c[i], is computed

  2. TVM is symbolic, we create symbolic variables by specifying their shapes, and define how the program will be computed

In the following program, we first declare the placeholders A and B for both inputs by specifying their shapes, (n,), through tvm.te.placeholder. Both A and B are Tensor objects, which we can feed data later. We assign names to them so we can print an easy-to-read program later.

Next we define how the output C is computed by tvm.compute. It accepts two arguments, the output shape, and a function to compute each element by giving its index. Since the output is a vector, its elements are indexed by integers. The lambda function defined in tvm.compute accepts a single argument i, and returns c[i], which is identical to c[i] = a[i] + b[i] defined in vector_add. One difference is that we don’t write the for-loop, which will be filled by TVM later.

import tvm
from tvm import te # te stands for tensor expression

# Save to the d2ltvm package.
def vector_add(n):
    """TVM expression for vector add"""
    A = te.placeholder((n,), name='a')
    B = te.placeholder((n,), name='b')
    C = te.compute(A.shape, lambda i: A[i] + B[i], name='c')
    return A, B, C

A, B, C = vector_add(n)
type(A), type(C)
(tvm.te.tensor.Tensor, tvm.te.tensor.Tensor)

We can see that A, B, and C are all Tensor objects, which can be viewed as a symbolic version of NumPy’s ndarray. We can access the variables’ attributes such as data type and shape. But those values don’t have concrete values right now.

(A.dtype, A.shape), (C.dtype, C.shape)
(('float32', [100]), ('float32', [100]))

The operation that generates the tensor object can be accessed by A.op.

type(A.op), type(C.op)
(tvm.te.tensor.PlaceholderOp, tvm.te.tensor.ComputeOp)

We can see that the types of the operations for A and C are different, but they share the same base class Operation, which represents an operation that generates a tensor object.


1.2.3. Creating a Schedule

To run the computation, we need to specify how to execute the program, for example, the order to access data and how to do multi-threading parallelization. Such an execution plan is called a schedule. Since C is the output tensor, let’s create a default schedule on its operator and print the pseudo codes.

s = te.create_schedule(C.op)

A schedule consists of several stages. Each stage corresponds to an operation to describe how it is scheduled. We can access a particular stage by either s[C] or s[C.op].

type(s), type(s[C])
(tvm.te.schedule.Schedule, tvm.te.schedule.Stage)

Later on we will see how to change the execution plan to better utilize the hardware resources to improve its efficiency. Here let’s see the default execution plan by printing the C-like pseudo codes.

tvm.lower(s, [A, B, C], simple_mode=True)
produce c {
  for (i, 0, 100) {
    c[i] = (a[i] + b[i])

The lower method accepts the schedule and input and output tensors. The simple_mode=True will print the program in a simple and compact way. Note that the program has added proper for-loops according to the output shape. Overall, it’s quite similar to the preview function vector_add.

Now you see that TVM separates the computation and the schedule. The computation defines how the results are computed, which will not change no matter on what hardware platform you run the program. On the other hand, an efficient schedule are often hardware dependent, but changing a schedule will not impact the correctness. The idea of separating computation from schedule is inherited by TVM from Halide [Ragan-Kelley et al., 2013].

1.2.4. Compilation and Execution

Once both computation and schedule are defined, we can compile them into an executable module with tvm.build. It accepts the same argument as tvm.lower. In fact, it first calls tvm.lower to generate the program and then compiles to machine codes.

mod = tvm.build(s, [A, B, C])

It returns an executable module object. Now we can feed data for A, B and C to run it. The tensor data must be tvm.ndarray.NDArray object. The easiest way is to create NumPy ndarray objects first and then convert them into TVM ndarray by tvm.nd.array. We can convert them back to NumPy by the asnumpy method.

x = np.ones(2)
y = tvm.nd.array(x)
type(y), y.asnumpy()
(tvm.runtime.ndarray.NDArray, array([1., 1.]))

Now let’s construct data and return them as TVM ndarrays.

a, b, c = get_abc(100, tvm.nd.array)

Do the computation, and verify the results.

mod(a, b, c)
np.testing.assert_array_equal(a.asnumpy() + b.asnumpy(), c.asnumpy())

1.2.5. Argument Constraints

Remember that we specified both inputs to be 100-length vectors when declaring A and B.

A.shape, B.shape, C.shape
([100], [100], [100])

TVM will check if the input shapes satisfy this specification.

    a, b, c = get_abc(200, tvm.nd.array)
    mod(a, b, c)
except tvm.TVMError as e:
Traceback (most recent call last):
  [bt] (1) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(TVMFuncCall+0x61) [0x7f8eec7a60f1]
  [bt] (0) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(+0xcab5a1) [0x7f8eec78d5a1]
  File "/home/ubuntu/tvm/src/runtime/library_module.cc", line 89
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (100 == int32(arg0.shape[0])), Argument arg0.shape[0] has an unsatisfied constraint

The default data type in TVM is float32.

A.dtype, B.dtype, C.dtype
('float32', 'float32', 'float32')

An error will appear if input with a different data type.

    a, b, c = get_abc(100, tvm.nd.array)
    a = tvm.nd.array(a.asnumpy().astype('float64'))
    mod(a, b, c)
except tvm.TVMError as e:
Traceback (most recent call last):
  [bt] (1) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(TVMFuncCall+0x61) [0x7f8eec7a60f1]
  [bt] (0) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(+0xcab5a1) [0x7f8eec78d5a1]
  File "/home/ubuntu/tvm/src/runtime/library_module.cc", line 89
TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (((tvm_struct_get(arg0, 0, 5) == (uint8)2) && (tvm_struct_get(arg0, 0, 6) == (uint8)32)) && (tvm_struct_get(arg0, 0, 7) == (uint16)1)), arg0.dtype is expected to be float32

1.2.6. Saving and Loading a Module

A compiled a module can be saved into disk,

mod_fname = 'vector-add.tar'

and then loaded back later.

loaded_mod = tvm.runtime.load_module(mod_fname)

Verify the results.

a, b, c = get_abc(100, tvm.nd.array)
loaded_mod(a, b, c)
np.testing.assert_array_equal(a.asnumpy() + b.asnumpy(), c.asnumpy())

1.2.7. Summary

Implementing an operator using TVM has three steps:

  1. Declare the computation by specifying input and output shapes and how each output element is computed.

  2. Create a schedule to (hopefully) fully utilize the machine resources.

  3. Compile to the hardware target.

In addition, we can save the compiled module into disk so we can load it back later.

1.2.8. Discussions