.. _ch_vector_add: Vector Add ========== Now you have installed all libraries, let's write our first program: summing two ``n``-dimensional vectors ``a`` and ``b``. It's straightforward in NumPy, where we can do it by ``c = a + b``. Implementing with NumPy ----------------------- .. code:: python import numpy as np np.random.seed(0) n = 100 a = np.random.normal(size=n).astype(np.float32) b = np.random.normal(size=n).astype(np.float32) c = a + b Here we create two random vectors with length 100, and sum them element-wisely. Note that NumPy in default uses 64-bit floating-points or 64-bit integers, which is different from 32-bit floating point typically used in deep learning, so we explicitly cast the data type. Although we can use the build-in ``+`` operator in NumPy to realize element-wise add, let's try to implement it by only using scalar operators. It will help us understand the implementation with TVM. The following function uses a for-loop to iterate over every element of the vectors, and then add two elements together with the scalar ``+`` operator each time. .. code:: python def vector_add(a, b, c): for i in range(n): c[i] = a[i] + b[i] d = np.empty(shape=n, dtype=np.float32) vector_add(a, b, d) np.testing.assert_array_equal(c, d) Given we will frequently create two random ndarrays and another empty one to store the results in the following chapters, we save this routine to reuse it in the future. .. code:: python # Save to the d2ltvm package. def get_abc(shape, constructor=None): """Return random a, b and empty c with the same shape. """ np.random.seed(0) a = np.random.normal(size=shape).astype(np.float32) b = np.random.normal(size=shape).astype(np.float32) c = np.empty_like(a) if constructor: a, b, c = [constructor(x) for x in (a, b, c)] return a, b, c Note that we fixed the random seed so that we will always get the same results to facilitate the comparison between NumPy, TVM and others. In addition, it accepts an optional ``constructor`` to convert the data into a different format. Defining the TVM Computation ---------------------------- Now let's implement ``vector_add`` in TVM. The TVM implementation differs from above in two ways: 1. We don't need to write the complete function, but only to specify how each element of the output, i.e. ``c[i]``, is computed 2. TVM is symbolic, we create symbolic variables by specifying their shapes, and define how the program will be computed In the following program, we first declare the placeholders ``A`` and ``B`` for both inputs by specifying their shapes, ``(n,)``, through ``tvm.te.placeholder``. Both ``A`` and ``B`` are ``Tensor`` objects, which we can feed data later. We assign names to them so we can print an easy-to-read program later. Next we define how the output ``C`` is computed by ``tvm.compute``. It accepts two arguments, the output shape, and a function to compute each element by giving its index. Since the output is a vector, its elements are indexed by integers. The lambda function defined in ``tvm.compute`` accepts a single argument ``i``, and returns ``c[i]``, which is identical to ``c[i] = a[i] + b[i]`` defined in ``vector_add``. One difference is that we don't write the for-loop, which will be filled by TVM later. .. code:: python import tvm from tvm import te # te stands for tensor expression # Save to the d2ltvm package. def vector_add(n): """TVM expression for vector add""" A = te.placeholder((n,), name='a') B = te.placeholder((n,), name='b') C = te.compute(A.shape, lambda i: A[i] + B[i], name='c') return A, B, C A, B, C = vector_add(n) type(A), type(C) .. parsed-literal:: :class: output (tvm.te.tensor.Tensor, tvm.te.tensor.Tensor) We can see that ``A``, ``B``, and ``C`` are all ``Tensor`` objects, which can be viewed as a symbolic version of NumPy's ndarray. We can access the variables' attributes such as data type and shape. But those values don't have concrete values right now. .. code:: python (A.dtype, A.shape), (C.dtype, C.shape) .. parsed-literal:: :class: output (('float32', [100]), ('float32', [100])) The operation that generates the tensor object can be accessed by ``A.op``. .. code:: python type(A.op), type(C.op) .. parsed-literal:: :class: output (tvm.te.tensor.PlaceholderOp, tvm.te.tensor.ComputeOp) We can see that the types of the operations for ``A`` and ``C`` are different, but they share the same base class ``Operation``, which represents an operation that generates a tensor object. .. code:: python A.op.__class__.__bases__[0] .. parsed-literal:: :class: output tvm.te.tensor.Operation Creating a Schedule ------------------- To run the computation, we need to specify how to execute the program, for example, the order to access data and how to do multi-threading parallelization. Such an execution plan is called a *schedule*. Since ``C`` is the output tensor, let's create a default schedule on its operator and print the pseudo codes. .. code:: python s = te.create_schedule(C.op) A schedule consists of several stages. Each stage corresponds to an operation to describe how it is scheduled. We can access a particular stage by either ``s[C]`` or ``s[C.op]``. .. code:: python type(s), type(s[C]) .. parsed-literal:: :class: output (tvm.te.schedule.Schedule, tvm.te.schedule.Stage) Later on we will see how to change the execution plan to better utilize the hardware resources to improve its efficiency. Here let's see the default execution plan by printing the C-like pseudo codes. .. code:: python tvm.lower(s, [A, B, C], simple_mode=True) .. parsed-literal:: :class: output produce c { for (i, 0, 100) { c[i] = (a[i] + b[i]) } } The ``lower`` method accepts the schedule and input and output tensors. The ``simple_mode=True`` will print the program in a simple and compact way. Note that the program has added proper for-loops according to the output shape. Overall, it's quite similar to the preview function ``vector_add``. Now you see that TVM separates the computation and the schedule. The computation defines how the results are computed, which will not change no matter on what hardware platform you run the program. On the other hand, an efficient schedule are often hardware dependent, but changing a schedule will not impact the correctness. The idea of separating computation from schedule is inherited by TVM from Halide :cite:`Ragan-Kelley.Barnes.Adams.ea.2013`. Compilation and Execution ------------------------- Once both computation and schedule are defined, we can compile them into an executable module with ``tvm.build``. It accepts the same argument as ``tvm.lower``. In fact, it first calls ``tvm.lower`` to generate the program and then compiles to machine codes. .. code:: python mod = tvm.build(s, [A, B, C]) type(mod) .. parsed-literal:: :class: output tvm.runtime.module.Module It returns an executable module object. Now we can feed data for ``A``, ``B`` and ``C`` to run it. The tensor data must be ``tvm.ndarray.NDArray`` object. The easiest way is to create NumPy ndarray objects first and then convert them into TVM ndarray by ``tvm.nd.array``. We can convert them back to NumPy by the ``asnumpy`` method. .. code:: python x = np.ones(2) y = tvm.nd.array(x) type(y), y.asnumpy() .. parsed-literal:: :class: output (tvm.runtime.ndarray.NDArray, array([1., 1.])) Now let's construct data and return them as TVM ndarrays. .. code:: python a, b, c = get_abc(100, tvm.nd.array) Do the computation, and verify the results. .. code:: python mod(a, b, c) np.testing.assert_array_equal(a.asnumpy() + b.asnumpy(), c.asnumpy()) Argument Constraints -------------------- Remember that we specified both inputs to be 100-length vectors when declaring ``A`` and ``B``. .. code:: python A.shape, B.shape, C.shape .. parsed-literal:: :class: output ([100], [100], [100]) TVM will check if the input shapes satisfy this specification. .. code:: python try: a, b, c = get_abc(200, tvm.nd.array) mod(a, b, c) except tvm.TVMError as e: print(e) .. parsed-literal:: :class: output Traceback (most recent call last): [bt] (1) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(TVMFuncCall+0x61) [0x7f8eec7a60f1] [bt] (0) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(+0xcab5a1) [0x7f8eec78d5a1] File "/home/ubuntu/tvm/src/runtime/library_module.cc", line 89 TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (100 == int32(arg0.shape[0])), Argument arg0.shape[0] has an unsatisfied constraint The default data type in TVM is ``float32``. .. code:: python A.dtype, B.dtype, C.dtype .. parsed-literal:: :class: output ('float32', 'float32', 'float32') An error will appear if input with a different data type. .. code:: python try: a, b, c = get_abc(100, tvm.nd.array) a = tvm.nd.array(a.asnumpy().astype('float64')) mod(a, b, c) except tvm.TVMError as e: print(e) .. parsed-literal:: :class: output Traceback (most recent call last): [bt] (1) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(TVMFuncCall+0x61) [0x7f8eec7a60f1] [bt] (0) /var/lib/jenkins/miniconda3/envs/d2l-tvm-0/lib/python3.7/site-packages/tvm/libtvm.so(+0xcab5a1) [0x7f8eec78d5a1] File "/home/ubuntu/tvm/src/runtime/library_module.cc", line 89 TVMError: Check failed: ret == 0 (-1 vs. 0) : Assert fail: (((tvm_struct_get(arg0, 0, 5) == (uint8)2) && (tvm_struct_get(arg0, 0, 6) == (uint8)32)) && (tvm_struct_get(arg0, 0, 7) == (uint16)1)), arg0.dtype is expected to be float32 Saving and Loading a Module --------------------------- A compiled a module can be saved into disk, .. code:: python mod_fname = 'vector-add.tar' mod.export_library(mod_fname) and then loaded back later. .. code:: python loaded_mod = tvm.runtime.load_module(mod_fname) Verify the results. .. code:: python a, b, c = get_abc(100, tvm.nd.array) loaded_mod(a, b, c) np.testing.assert_array_equal(a.asnumpy() + b.asnumpy(), c.asnumpy()) Summary ------- Implementing an operator using TVM has three steps: 1. Declare the computation by specifying input and output shapes and how each output element is computed. 2. Create a schedule to (hopefully) fully utilize the machine resources. 3. Compile to the hardware target. In addition, we can save the compiled module into disk so we can load it back later. `Discussions `__ -------------------------------------------------------------------------------