.. _ch_remote: Running on a Remote Machine =========================== In this book, we will run and optimize programs on various hardware platforms. One way is to log into the machine with the desired hardware, install required packages and then run the workloads there. It, however, makes maintaining the source codes and data difficult, especially when the targeting hardware is with minimal power. In this section, we will describe another solution: running a daemon on the remote machine and then sending the compiled module and input data to it only for execution. .. code:: python import d2ltvm import numpy as np import mxnet as mx import tvm from tvm import te, rpc, relay from PIL import Image Note that we imported the ``rpc`` module from TVM. `RPC `__, namely remote procedure call, enables executing a program on a remote place. Setup the Remote Machine ------------------------ We first need to install TVM ``runtime`` module on the remote machine. The installation setup is almost identical to TVM (refer to :numref:`ch_install`), except that we only need to build the runtime, i.e. ``make runtime``, instead of the whole TVM library. The runtime size is often less than 1MB, which makes it suitable for device with memory constraints. You also need to enable the proper backend, e.g. ``CUDA`` or ``OpenCL``, if necessary. Once the runtime is installed, we can start the daemon by running the following command on the remote machine. ``python -m tvm.exec.rpc_server --host 0.0.0.0 --port=9090`` It will start an RPC server which binds the local 9090 port to listen. You should see the following output indicating the server has already started. ``INFO:RPCServer:bind to 0.0.0.0:9090`` In addition, you need to check two things on the remote machine. One is the remote machine's IP. On Linux or macOS, you can get it by ``ifconfig | grep inet``. Also remember to open the 9090 port if there is a firewall. The other one is the target architecture. It's straightforward for GPUs, we will cover it later. For CPUs, the easiest way is installing LLVM on the remote machine and then checking ``llvm-config --host-target``. The return of the remote machine we are using is ``x86_64-pc-linux-gnu``. This target triplet has the general format ``---``, where - arch: x86, x86\_64, arm, thumb, mips, etc. - sub: for ARM, there are v5, v6m, v7a, v7m, v8, etc. - vendor: pc, apple, nvidia, ibm, etc. - sys: linux, win32, darwin, cuda, none, unknown, etc. - abi: eabi, gnu, android, macho, elf, etc. For example, it's ``x86_64-apple-darwin17.7.0`` for the MacbookPro I'm using, and ``armv6k-unknown-linux-gnueabihf`` for the Raspberry Pi 4B. Compile the Program for the Remote Machine ------------------------------------------ Let's run the vector addition defined :numref:`ch_vector_add` on the remote machine. Note that we specified the remote machine target through the ``-target`` argument for LLVM. .. code:: python n = 100 target = 'llvm -target=x86_64-pc-linux-gnu' args = d2ltvm.vector_add(n) s = te.create_schedule(args[-1].op) mod = tvm.build(s, args, target) Then we save the compiled module to disk, which will be uploaded to the remote machine later. .. code:: python mod_fname = 'vector-add.tar' mod.export_library(mod_fname) Evaluate on the Remote Machine ------------------------------ We first connect to the remote machine with the IP we checked before. .. code:: python remote = rpc.connect('172.31.0.149', 9090) Next, we send the compiled library to the machine and load it into the memory of the remote machine. .. code:: python remote.upload(mod_fname) remote_mod = remote.load_module(mod_fname) When creating the data, we specify the device context as CPU on the remote machine. The data will be created on the local machine as before, but will be sent to the remote machine later. Note that we used NumPy to create the data, but there is no need to have the remote machine also installed NumPy. .. code:: python ctx = remote.cpu() a, b, c = d2ltvm.get_abc(n, lambda x: tvm.nd.array(x, ctx=ctx)) Since both data and library are ready on the remote machine, let's execute the program on it as well. .. code:: python remote_mod(a, b, c) Finally, the ``.asnumpy()`` method will send the data back to the local machine and convert to a NumPy array. So we can verify the results as before. .. code:: python np.testing.assert_equal(a.asnumpy()+b.asnumpy(), c.asnumpy()) Running Neural Network Inference -------------------------------- Let's run the ResNet-18 used in :numref:`ch_from_mxnet` on the remote machine. As before, we load a sample image and Imagenet 1K labels. .. code:: python image = Image.open('../data/cat.jpg').resize((224, 224)) x = d2ltvm.image_preprocessing(image) with open('../data/imagenet1k_labels.txt') as f: labels = eval(f.read()) Then we convert, compile and save the module. Note that we just need to save the shared library which contains the machine code of the compiled operators to the disk. .. code:: python mod_fname = 'resnet18.tar' model = mx.gluon.model_zoo.vision.resnet18_v2(pretrained=True) relay_mod, relay_params = relay.frontend.from_mxnet(model, {'data': x.shape}) with relay.build_config(opt_level=3): graph, mod, params = relay.build(relay_mod, target, params=relay_params) mod.export_library(mod_fname) .. parsed-literal:: :class: output Cannot find config for target=llvm -target=x86_64-pc-linux-gnu, workload=('dense_nopack.x86', ('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression. Next, we upload the saved library to the remote machine and load it into memory. Then we can create a runtime using the model definition, the remote library and the remote context. .. code:: python remote.upload(mod_fname) remote_mod = remote.load_module(mod_fname) remote_rt = tvm.contrib.graph_runtime.create(graph, remote_mod, ctx) The inference is identical to :numref:`ch_from_mxnet`, where both parameters and input are on the local machine. The runtime will upload them into the remote machine properly. .. code:: python remote_rt.set_input(**params) remote_rt.run(data=tvm.nd.array(x)) scores = remote_rt.get_output(0).asnumpy()[0] scores.shape a = np.argsort(scores)[-1:-5:-1] labels[a[0]], labels[a[1]] .. parsed-literal:: :class: output ('tiger cat', 'Egyptian cat') Summary ------- - We can install a TVM runtime on a remote machine to start an RPC server to accept workloads to run. - A program can be compiled locally with specifying the remote machine's architecture target (called cross-compilation), and then run on the remote machine via RPC. `Discussions `__ -------------------------------------------------------------------------------------------