尹首一,男,1977年1月生,漢族,中共黨員,博士,清華大學微電子與納電子學系副教授、副主任,博士生導師。
基本介紹
- 中文名:尹首一
- 職業:教師
- 學位/學歷:博士
- 專業方向:可重構計算、神經網路計算晶片
- 任職院校:清華大學微電子與納電子學系
人物經歷,研究方向,學術成果,榮譽獎項,
人物經歷
於2000年、2002年和2005年於清華大學獲得工學學士、碩士和博士學位,2005~2007年在英國帝國理工大學從事博士後工作。
研究方向
1) 可重構神經網路晶片:Thinker是一款基於65nm工藝、高能效的混合神經網路處理器。它共包含兩個16x16的可重構異構處理單元陣列。為了對混合類型的神經網路進行加速,處理單元可根據任務需求進行劃分和重構來支持對神經網路的並行處理。為了提升能效,每個處理單元均支持不同數據位寬的適應處理。實驗驗證顯示該處理器在200MHz的時鐘下可實現409.6GOPS的峰值計算性能以及5.09TOPS/W的峰值能效比,比當前最先進的設計提升了5.2倍的能效比。
Reconfigurable Neural Network Chip: "Thinker” is an energy-efficient hybrid neural network (NN) processor fabricated using 65nm technology. It has two 16x16 reconfigurable heterogeneous processing elements (PEs) arrays. To accelerate a hybrid-NN, PE array is designed to support on demand partitioning and reconfiguration for parallel processing different NNs. To improve the energy efficiency, each PE supports bit-width adaptive computing to meet variant bit-width of different neural layers. Measurement results show that this processor achieves a peak 409.6GOPS running at 200MHz and at most 5.09TOPS/W energy efficiency. It outperforms the state-of-the-art up to 5.2X in energy efficiency.
2) 可重構雲計算平台:雲計算能夠根據計算機和其他設備的需求,提供可共享的計算處理資源以及數據。大部分的雲平台均是基於CPU和GPU來實現的,然而其功耗往往較高。我們設計了可重構的雲平台系統,使用課題組自行設計實現的CHAMELEON CGRA作為處理器單元。每個CHAMELEON處理器中包含有4個8x8的可重構處理單元,並基於65nm工藝實現。我們將兩個CHAMELEON晶片與FPGA共同集成在PCI-E的板卡上,並在每台伺服器中插入4塊板卡。彈性的管理系統用來管理一個1主4從節點的小型集群。系統的計算速度與節點數目基本呈現線性的增長關係,並且其計算能效比相較於Xeon CPU呈現出將近三個數量級的提升。
Reconfigurable Cloud Computing Platform: Cloud computing provides shared computer processing resources and data to computers and other devices on demand. Most cloud platform is based on CPUs and GPUs, whose power consumption can be very high. Here we design a reconfigurable cloud platform, which uses our CHAMELEON CGRA chip as accelerator. Each CHAMELEON chip has 4x8x8 reconfigurable PEs using 65nm technology. We integrate two CHAMELEON chips onto a FPGA-assisted PCI-E board, and insert four PCI-E boards in one server. An elastic management system is build over a five-node (1 master + 4 slaves) cluster. The computing speed shows a near-linear relationship with the number of computing nodes, and the computing efficiency is about three orders-of-magnitude better than Xeon CPU under 200MHz clock.
3) 高能效粗粒度可重構編譯:粗粒度可重構結構(CGRA)是一種高性能高能效的計算方案,並且能夠動態執行重構。然而,目前並沒有有效的自動化設計方法及高層次綜合理論來支持軟體程式向CGRA結構的映射。我們的研究主要集中在針對通用CGRA結構的自動化編譯映射方法,並主要針對四大挑戰進行研究:程式並行度挖掘、減少訪存衝突的記憶體管理方案、降低重構成本的配置信息壓縮技術以及高能效的能耗管理方案。
Energy-Efficient Compiling for CGRA:Coarse-grained reconfigurable architecture (CGRA) is a promising solution for high-performance and high energy-efficiency computing, which can be reconfigured dynamically at runtime. However, there were no effective design automation methods and high-level synthesis (HLS) theory when mapping software applications onto CGRA architecture. Our research focuses on design automation methods for general-purpose CGRA, which mainly focuses on four major challenges on these issues:Parallelism of applications exploitation for high performance;Memory management for access conflicts reduction;Configuration context compression for reconfiguration cost reduction;Energy management for high energy-efficiency solutions.
學術成果
論文
[C] Shouyi Yin, Peng Ouyang, Shibin Tang, Fengbin Tu, Leibo Liu, Shaojun Wei: A 1.06-to-5.09 TOPS/W Reconfigurable Hybrid-Neural-Network Processor for Deep Learning Applications. VLSI 2017
[C] Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu, Shaojun Wei: A Reconfigurable Multi-modal Neural Processor for Cognitive Intelligence Applications.ISSCC-SRP 2017
[J] Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu, Shaojun Wei: Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns. IEEE TVLSI 2017
[C] Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, Yao Wang, Shaojun Wei:
Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware.ISCA 2017
[J] Shuang Liang, Shouyi Yin, Leibo Liu, Yike Guo, Shaojun Wei:A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration.IEEE CAL 2016
[J] Shouyi Yin, Jiangyuan Gu, Dajiang Liu, Leibo Liu, Shaojun Wei: Joint Modulo Scheduling and Vdd Assignment for Loop Mapping on Dual-Vdd CGRAs. IEEE TCAD 2016
[J] Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei: A Fast and Power-Efficient Memory-Centric Architecture for Affine Computation. IEEE TCAS II 2016
[J] Shouyi Yin, Xinhan Lin, Leibo Liu, Shaojun Wei: Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures. IEEE TPDS 2016
[J] Shouyi Yin, Dajiang Liu, Yu Peng, Leibo Liu, Shaojun Wei: Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures.IEEE TVLSI 2016
[J] Shouyi Yin, Peng Ouyang, Tianbao Chen, Leibo Liu, Shaojun Wei: A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing. IEEE TVLSI 2016
[J] Shouyi Yin, Xianqing Yao, Dajiang Liu, Leibo Liu, Shaojun Wei: Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures. IEEE TVLSI 2016
[J] Shouyi Yin, Pengcheng Zhou, Leibo Liu, Shaojun Wei: Trigger-Centric Loop Mapping on CGRAs. IEEE TVLSI 2016
[C] Shouyi Yin, Xianqing Yao, Tianyi Lu, Leibo Liu, Shaojun Wei: Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory. ICCAD 2016
[J] Peng Ouyang, Shouyi Yin, Yuchi Zhang, Leibo Liu, Shaojun Wei: A Fast Integral Image Computing Hardware Architecture With High Power and Area Efficiency. IEEE TCAS II 2015
[J] Dajiang Liu, Shouyi Yin, Yu Peng, Leibo Liu, Shaojun Wei: Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures. IEEE TVLSI 2015
[J] Peng Ouyang, Shouyi Yin, Leibo Liu, Shaojun Wei: Energy Management on Battery-Powered Coarse-Grained Reconfigurable Platforms. IEEE TVLSI 2015
[C] Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei: A 83fps 1080P resolution 354 mW silicon implementation for computing the improved robust feature in affine space. CICC 2015
[C] Fengbin Tu, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei: RNA: a reconfigurable architecture for hardware neural acceleration.DATE 2015
[C] Shouyi Yin, Pengcheng Zhou, Leibo Liu, Shaojun Wei: Acceleration of Nested Conditionals on CGRAs via Trigger Scheme.ICCAD 2015
[C] Fengbin Tu, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei: Neural approximating architecture targeting multiple application domains.ISCAS 2015
[C] Dajiang Liu, Shouyi Yin, Leibo Liu, Shaojun Wei: Polyhedral model based mapping optimization of loop nests for CGRAs. DAC 2013
主持承擔了國家科技重大專項子課題、國家863計畫項目、國家自然科學基金等10餘項,發表期刊論文70餘篇、國際會議論文50餘篇,申請發明專利45項,授權6項,獲得軟體著作權5項,並領銜研究和設計了可重構多模態混合神經計算晶片。
榮譽獎項
曾獲教育部技術發明獎一等獎。