References
Open the notebook in Colab

Chen et al., 2015

Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., … Zhang, Z. (2015). Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274.

Chen et al., 2018

Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Shen, H., … others. (2018). Tvm: an automated end-to-end optimizing compiler for deep learning. 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18) (pp. 578–594).

Howard et al., 2017

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., … Adam, H. (2017). Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

Lai & Seznec, 2013

Lai, J., & Seznec, A. (2013). Performance upper bound analysis and optimization of sgemm on fermi and kepler gpus. Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (pp. 1–10).

Liu et al., 2019

Liu, Y., Wang, Y., Yu, R., Li, M., Sharma, V., & Wang, Y. (2019). Optimizing cnn model inference on cpus. 2019 USENIX Annual Technical Conference (USENIX ATC 19) (pp. 1025–1040).

Nath et al., 2010

Nath, R., Tomov, S., & Dongarra, J. (2010). An improved magma gemm for fermi graphics processing units. The International Journal of High Performance Computing Applications, 24(4), 511–515.

Ragan-Kelley et al., 2013

Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., & Amarasinghe, S. (2013). Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 519–530). ACM.

Roesch et al., 2019

Roesch, J., Lyubomirsky, S., Kirisame, M., Pollock, J., Weber, L., Jiang, Z., … Tatlock, Z. (2019). Relay: a high-level ir for deep learning. arXiv preprint arXiv:1904.08368.