2017 Parallel programming proj

Advanced Computer Architecture Prof. M. Ferretti, 2016-­‐2017 Parallel programming project Exam final verification The final exam of the course “Advanced Computer Architecture” consists of three compulsory parts: a written test, a programming project (using OpenMP), and the discussion of the project. An oral exam is always possible, but not mandatory. The project should be carried out by a group of at most two students; it must comply with the specifications listed below and it must be described in a technical project report to be defended in an oral presentation. The project report must be delivered within the day of one of the scheduled dates for the written test ("appelli"), as published by the Faculty of Engineering web site. The discussion will be held about one week later (exact schedule will be published on the course web site). The technical project report must be sent as pdf or doc document to Prof. Ferretti ([email protected]) and to Dr. Musci ([email protected]). No printed version is accepted. A late delivery prevents access to the final exam. The presentation of the project by the group can last a maximum of 10 minutes. Students should present their work using a pc of their own. Students in the same group must defend their project together. Aim of the project The focus of the project is on the understanding of the basic concepts of shared memory parallel programming. No advanced programming skills are expected. The OpenMP model is the preferred paradigm. Candidates can freely adopt any other paradigm of their choice (e.g. PThread, CUDA, etc.), provided they communicate such a choice in advance. Structure of the report Students must choose their project among those listed in the following section (standard procedure). In alternative, students may propose a project of their own, provided the choice is approved by either the professor or the assistant. The report must be structured as follows: 1. Analysis of the serial algorithm. The serial version must be explained first. It is suggested to produce graphs, diagrams, measures, application instances, and the like. 2. A-­‐priori study of available parallelism. It is mandatory to discuss data structures, and/or functionalities, that are deemed capable of being used for parallelization. Specifically, all shared data structures must be commented, and proper synchronization discussed. 3. OpenMP parallel implementation. Coding with C or C++ is mandatory. The code must be properly commented. Multiple versions are welcome, and pros/cons will be highlighted. 4. Testing and debugging. Care must be taken in the designing of test cases and into debugging, since parallel programming can hide potential flaws in the code. A static code analysis is adviced, with emphasis on the access to shared resources and memory. 5. Performance analysis and speedup. A theoretical assessment (through Amdahl’s law) is required, as well as an analysis of the results, on multiple architectures if possible. A scalability analysis is also strongly suggested (even small systems currently allow for a minimal scalability analysis in terms of number of cores). Problem size analysis is mandatory: as an advice, test your code on large input data, as performance testing on small inputs can be deceitful. The evaluation of the project will take into account: -­‐
-­‐
-­‐
-­‐
-­‐
-­‐
Clarity of presentation; Correctness and originality of the theoretical analysis; Quality of the implementation; Presence of multiple parallel solutions; Depth of the performance evaluation and analysis. Significance of the results: as parallel programming is mostly concerned about performance, a good project must also provide significant speedup. Please keep in mind that we take plagiarism very seriously. Use of external sources is not prohibited, but encouraged; however, every occurrence must be referenced. Proposed algorithms In the following, some well-­‐known algorithms are listed. Some are from mostly theoretical areas, such as graph analysis or sorting, and some from applied ones, such as image processing, artificial intelligence and physics. A large degree of freedom is awarded to each group in the definition of the scope and the extent of their projects. In case of doubt, please contact the teacher and/or the assistant. For most project, a reference serial implementation is easily found by searching the web or by referring to any book on Algorithms and Data Structures (such as to the “bible” on informatics, The Art of Computer Programming by Donald Knuth). For image processing, there exists a well-­‐established library (openCV); it is advised that this library is used for all ancillary operations on images (such as I/O and memory management). Any of the algorithms listed below is approved for the project. Further proposals can be put forward, and will be evaluated by the professor and the assistant. Image processing: 1. Mathematical morphology (with applications) 2. Optical flow 3. Distance Transform (hard!) 4. Other transforms a. Haar (see ref. below) b. Wavelet c. Hough / Generalized Hough 5. (any other Computer Vision problem, to be discussed with the professors) Physics 1. Heat propagation (2D or 3D) 2. Wave equation (1D or 2D) 3. N-­‐body problem 4. (any finite elements problems, to be discussed with the professors) Other: 1. Game of Life /Cellular Automata 2. Fast Fourier Transform (2D ) 3. Pathfinding (e.g. “ant colony”) 4. Substring matching 5. Minimum spanning tree/Shortest path (e.g. travelling salesman problem) 6. K-­‐means clustering References and suggestions: On the Haar transform, see: http://cnx.org/content/m11087/latest/#figure1a On physics see: •
•
•
Heat propagation. For suggestion, see: http://coitweb.uncc.edu/~abw/ITCS4145S13/Assignments/assign3S13.pdf. Look at part2, tasks 1 and 3 (task 2 would be optional) 1D wave equation. For suggestion, see item 1 of Homework 2 in http://ocw.mit.edu/courses/earth-­‐atmospheric-­‐and-­‐planetary-­‐sciences/12-­‐950-­‐parallel-­‐
programming-­‐for-­‐multicore-­‐machines-­‐using-­‐openmp-­‐and-­‐mpi-­‐january-­‐iap-­‐
2010/assignments/MIT12_950IAP10_hw2.pdf. N-­‐body problem. For suggestion see: http://coitweb.uncc.edu/~abw/ITCS4145S13/Assignments/assign2S13.pdf.