Model-Based Parallelization for Simulink Models on Multicore CPUs and GPUs
In this paper we propose a model-based approach to parallelize Simulink models of image processing algorithms on homogeneous multicore CPUs and NVIDIA GPUs at the block level and generate CUDA C codes for parallel execution on the target hardware. In the proposed approach, the Simulink models are converted to directed acyclic graphs (DAGs) based on their block diagrams, wherein the nodes represent tasks of grouped blocks or subsystems in the model and the edges represent the communication behaviors between blocks. Next, a path analysis is conducted on the DAGs to extract all execution paths and calculate their respective lengths, which comprises the execution times of tasks and the communication times of edges on the path. Then, an integer linear programming (ILP) formulation is used to minimize the length of the critical path of the DAG, which represents the execution time of the Simulink model. The ILP formulation also balances workloads on each CPU core for optimized hardware utilization. We parallelized image processing models on a platform of two homogeneous CPU cores and two GPUs with our approach and observed a speedup performance between 8.78x and 15.71x.
MathWorks, Inc. "Simulation and Model-Based Design." https://jp.mathworks.com/products/simulink.
Saini, Mandeep Singh, et al. "Comparative analysis of digital image watermarking techniques in the frequency domain using matlab simulink." International Journal of Engineering Research and Applications (IJERA) 2.4 (2012): 2248-9622.
Suzuki, Tomonori, et al. "GPGPU-based high-performance parallel computation method for valve body failure mode." SAE International Journal of Passenger Cars-Mechanical Systems 9.2016-01-1353 (2016): 301-309.
Nvidia, C. U. D. A. "Compute unified device architecture programming guide." 2007.
Alur, Rajeev, et al. "Symbolic analysis for improving simulation coverage of Simulink/Stateflow models." Proceedings of the 8th ACM international conference on Embedded software. ACM, 2008.
Peranandam, Prakash, et al. "An integrated test generation tool for enhanced coverage of Simulink/Stateflow models." Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, 2012.
Höttger, Robert, Lukas Krawczyk, and Burkhard Igel. "Model-based automotive partitioning and mapping for embedded multicore systems." International Conference on Parallel, Distributed Systems and Software Engineering. Vol. 2. No. 1. 2015.
Yi, Ying, et al. "An ILP formulation for task mapping and scheduling on multi-core architectures." Proceedings of the conference on design, automation and test in Europe. European Design and Automation Association, 2009.
Tuncali, Cumhur Erkan, Georgios Fainekos, and Yann-Hang Lee. "Automatic Parallelization of Simulink Models for Multi-core Architectures." High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on. IEEE, 2015.
MathWorks, Inc. "MATLAB Coder Generate C and C++ code from MATLAB code." https://jp.mathworks.com/products/matlab-coder.html, The MathWorks, Inc, 2012.
Zhong, Zhaoqian, and Masato Edahiro. "Model-based Parallelizer for Embedded Control Systems on Multicore Processors." IPSJ Journal, 2018, 59.2: 735-747 (in Japanese).
Zhong, Zhaoqian, and Masato Edahiro. "Model-Based Parallelizer for Embedded Control Systems on Single-ISA Heterogeneous Multicore." INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY 19: 7470-7484.
MathWorks, Inc. "Implement Data Parallelism in Simulink." https://ww2.mathworks.cn/help/simulink/ug/
Mittal, Sparsh, and Jeffrey S. Vetter. "A survey of CPU-GPU heterogeneous computing techniques." ACM Computing Surveys (CSUR) 47.4 (2015): 69.
Ryoo, Shane, et al. "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA." Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. ACM, 2008.
Yang, Zhiyi, Yating, Zhu, and Yong, Pu. "Parallel image processing based on CUDA." 2008 International Conference on Computer Science and Software Engineering. IEEE, 2008.
Gregg, Chris, and Kim Hazelwood. "Where is the data? Why you cannot debate CPU vs. GPU performance without the answer." (IEEE ISPASS) IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 2011.
Mokhtari, Reza, and Michael Stumm. "BigKernel--High Performance CPU-GPU Communication Pipelining for Big Data-Style Applications." 2014 IEEE 28th international parallel and distributed processing symposium. IEEE, 2014.
MathWorks, Inc. "Embedded Coder User's Guide." https://www.mathworks.com/help/pdf_doc/ecoder/, The MathWorks, Inc, 2019.
Betts, Adam, and Alastair Donaldson. "Estimating the WCET of GPU-accelerated applications using hybrid analysis." 2013 25th Euromicro Conference on Real-Time Systems. IEEE, 2013.
CPLEX, IBM ILOG. "12.7, User's Manual for CPLEX." CPLEX division, 2016.
Copyright (c) 2020 Zhaoqian Zhong, Masato Edahiro
This work is licensed under a Creative Commons Attribution 4.0 International License.
The author warrants that the article is original, written by stated author(s), has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author(s).