I am a research intern in TuSimple
’s HPC group,
experiencing a gap year after obtaining a bachelor’s degree in Electrical Engineering from Beihang University.
My research interests are in the intersection of HPC and deep learning systems.
I have been working on binary neural networks, int8 inference acceleration, and auto optimization of deep network workloads with tensor compilation and kernel fusion.
I am applying for 2018 fall Electrical and Computer Engineering PhD programs.
BitFlow: Exploiting Computing Power of Binary Neural Networks on CPU
Yuwei Hu, Dinghua Li, Yifan Gong, Jiangming Jin. In submission to SC17 poster session.
TVM: Tensor IR Stack for Deep Learning Systems dmlc/tvm
TVM is a novel framework that can: represent and optimize common deep learning computation workloads for CPUs, GPUs and other specialized hardware;
automatically transform the computation graph to optimize data layout and fuse computation patterns.
- Schedule optimization of depthwise convolution
- Mobilenet end to end compilation (GTX1080)
Here is the
TVM tutorial blog
, summarizing what I have learned from writing depthwise convolution.
MXNet float32 to TensorRT int8 Model Converter (in progress)
Make things as simple as possible, but no simpler.