Jornadas de Seguimiento de Proyectos, 2007 Programa Nacional de Tecnologías Informáticas Parallel Architectures and Applications (Cooperative Project) TIN2004-07440-C02 (01 and 02) Ramón Beivide* Universidad de Cantabria José Miguel-Alonso** Universidad del País Vasco Abstract This research project deals with interconnection technologies for parallel and distributed computing systems. Nowadays, interconnection networks are ubiquitous in the computing field. At the chip level, networks are present in clustered microprocessors, on-chip multiprocessors and systems on chip. Some workstations and servers begin to use switched networks to interconnect their different modules. Clusters, cc-NUMAs, MPPs and SANs rely on their underlying interconnection subsystem for obtaining high performance and robustness. Some designs for IP switches begin to consider internal structures based on interconnection networks. Finally, Grid Computing is hot topic pursuing efficient designs for wide area networks. This project pursues new research contributions in the fields mentioned above including architectural and technological proposals and software development. Keywords: Interconnection networks, performance evaluation, parallel and distributed computing. 1. Project Goals The main outcomes of a cooperative research project are summarized in this report. Two coordinated research teams belonging to the University of Cantabria (subproject 1) and to the University of the Basque Country (subproject 2) support this project. The group in Cantabria is officially composed of five regular professors, one system manager and three PhD students. The group in the Basque Country is composed of three professors (and two PhD students that did not form part of the grant-requesting team). The research project begun in September 2004 and ends in September 2007. The subproject 1 achieved a direct funding of 157800 € (40800 € in manpower) plus a grant for one PhD student. The subproject 2 achieved a direct funding of 34400€. The goals of this research project were organized around five main research lines: * Email: [email protected] Email: [email protected] ** TIN2004-07440-C02 1. Bibliographic study and state-of-the-art in parallel computing. As the research project deals with parallel architectures and applications, a permanent goal is to acquire in some cases, or to maintain in others, enough know-how in the research field as to propose new ideas and solutions. The emphasis of the project when it was proposed was directed towards interconnection subsystems. Then, a continuous bibliographic analysis had to be done in the broad spectrum of networks present today in different platforms: multi-core on-chip architectures, multiprocessors, clusters, supercomputers and grids. 2. Simulation Tools. The design, development and maintenance of accurate simulation tools constitute key aspects for a research group in interconnection networks. The objectives of this project included the development of simulation tools designed by our research groups and the adaptation of external ones to our simulation platforms. In the same way, the design and analysis of adequate benchmarks and load models were also targets to be considered. 3. Topological Issues. Many of the design problems of interconnection networks have a topological nature. Some of the targets programmed in this project pivoted around topological issues including the proposal of new network geometries and lay-outs. Routing, fault-tolerance, partitioning and support for collective communication were also targets of our research project. Special emphasis was declared on very large networks, as those used by the “top ten” supercomputers. In addition, contributions to the design of algebraic error-correcting codes were envisaged due to their intrinsic topological nature. 4. Architectural proposals. The proposals of complete network and router architectures constitute another goal of this project. First, the topic of fault-tolerance was considered as a critical one for forthcoming network technologies. In addition, routing and deadlock avoidance were included as research topics. The design of scalable high-performance routers was considered as another important issue. Special emphasis was also declared on small size networks for current multi-core on-chip architectures and big networks for high-end supercomputers. Finally, the architecture of network interfaces was also considered as another research goal of the project. 5. Parallel Applications. The targets around this research line were related to the parallelization of scientific codes which requires high computational power. Two application suites were considered. The first one is based on a set of bio-inspired algorithms denoted as Estimation of Distribution Algorithms. The second one is a physics application devoted to the search of the Higgs boson based on artificial neural networks. Clusters and Grid computing were declared as target platforms to run the resulting parallel codes. In addition, load monitoring and load modeling were announced as other research topics of the project. 2. Project Achievements 2.1. Simulation Tools Supporting the advances in the other project goals, a complete evaluation framework has been developed and maintained. Currently, we are working in two main simulation platforms, SICOSYS (Simulator of Communicating Systems) and INSEE (Interconnection Network Simulation and TIN2004-07440-C02 Evaluation Environment), and around these tools we have a complex infrastructure that allows us to evaluate different interconnection network usage scenarios. SICOSYS was initially planned, almost 10 years ago, trying to achieve the precision of hardware simulators with much lower computational effort. In order to be able to analyze current proposals, the simulator is under continuous development. The tool has more than 80.000 of C++ code. Its flexible software design has made possible to use a coherent tool for very different scenarios by different researchers without problems. The main new features added to the tool support the analysis of fault-tolerant techniques in the interconnection network; in addition, the simulator is able to report power consumption measures. Besides, different architectural router proposals have been added to the simulator. SICOSYS has been optimized to expand its scalability to current computing infrastructures. Nowadays, commercial multi-core processors have made possible to acquire affordable small scale SMP systems. Our computing infrastructure follows this trend. In order to evaluate different architectural proposals employing HTC (High Throughput Computing), large amounts of DRAM are required which can increase the infrastructure cost dramatically. To mitigate this drawback, we have parallelized SICOSYS. Now, it is possible to run efficiently one simulation with 2 to 4 threads per computing node. We have improved the answer time for multi-billion cycle simulations being able to simulate very large networks without reduce the original tool accuracy [1]. SICOSYS has been designed with a functional interface to external simulators. In the past, that interface was used with execution-driven simulators like Rsim and SimOS. Now, we have added GEMS (General Execution-driven Multiprocessor Simulator). GEMS is able to run unmodified Solaris®8/9/10 software on top of simulated state-of-the art superscalar processor with a complex memory hierarchy and using SICOSYS as the interconnection network simulator. The tool relies in SIMICS® to simulate the full system activity, including I/O. Consequently, now we have a complete framework to perform full-system simulation with precise processor/memory/network details. Using the complete tool, we are able to evaluate OS dependent workloads such as transactional or commercial applications whit a high level of confidence. Moreover, the development of new benchmarks has been greatly simplified. GEMS is an actively supported tool by the group of Prof. Mark D.Hill at the University of Wisconsin-Madison, with more than one hundred of registered users. Many multiprocessor researches use this tool in their recent papers. The knowledge and later integration of GEMS was possible through a direct collaboration with the original developers. One of the members of the U. Cantabria (Valentin Puente) spent one sabbatical year at UW-Madison collaborating with the group of Prof. Mark D.Hill. Although not as expensive, in terms of memory and CPU utilization, as a hardware simulator, SICOSYS is still a heavyweight tool. It is not possible, using current hardware, to run this tool to simulate a network with thousands of routers. For this reason, we are working on a complementary, lightweight simulation environment called INSEE. It is less precise, in terms of timing, than SICOSYS, but allows us to experiment with very large networks. The main two modules of INSEE are a network simulator (FSIN) and a traffic generation module (TrGen). Regarding improvements to FSIN, the most relevant one has been the extension of capabilities to deal with many different topologies, including k-ary n-cubes and trees. Important re- TIN2004-07440-C02 design has allowed us to describe all its behavior in a single configuration file, so a single binary can simulate many different architectural proposals. Output capabilities have also been enhanced, generating files that can be imported directly by data analysis tools. The traffic generation capabilities of TrGen have also been greatly improved. Originally, it was only able to generate traffic using a small selection of synthetic traffic patterns. In the current version, the selection is wider. Also, we have included the capability of injecting traffic to the FSIN simulator using traces obtained from actual applications. This has allowed us to perform more realistic evaluations of performance. Finally, TrGen allows FSIN to interact with a full-system simulator, Simics®, allowing simulations with workloads generated by actual, running applications. The details of these tools are described in [2, 3, 4, 5]. The availability of these tools has allowed us to perform experiments in the fields of topologies for interconnection networks [6] and congestion control [7, 8, 9]. Also, the experimentation with different ways of injecting traffic in the network have allowed us to detect some pitfalls in some “classic” approaches; that is the reason we propose new ways of generating traffic in [10, 11]. Note that the development of the INSEE tools have been made at the UPV/EHU, but inputs for their design, and utilization, has been done in cooperation with the Group at the U. of Cantabria, and with Dr. Cruz Izu of the University of Adelaide. Recently we have started a line of collaboration with the BSC/CNS and IBM Research Laboratories–Zurich. One of our immediate objectives is to share our knowledge on simulation tools and application characterization using traces. Later, we will work on topologies and scheduling strategies for large-scale high-performance supercomputers. We have held several meetings hosted by the BSC/CNS, and one of the members of the UPV/EHU team (Javier Navaridas, a PhD student under the supervision of Dr. Miguel-Alonso) has been visiting the Zurich laboratories from November 2006 to February 2007. Concerning parallel workloads, we are actively using NAS Parallel Benchmarks (OpenMP or MPI versions), older benchmarks such us SPLASH 1&2 and some transactional benchmarks such as static web applications (SURGE) or Online Transaction Processing (SPECjbb) Regarding traffic modeling, we have been working in different ways of providing realistic traffic to our network simulators. This has been done using three different approaches: i) Full-system simulation: A collection of machines are simulated using Simics®, and run unmodified MPI applications. The network simulator performs the interchange of messages; ii) Trace-based simulation: Instrumented versions of MPI applications are run in actual machines (our own clusters, or machines such as BSC/CNS Mare Nostrum). Detailed traces of the interchange messages are obtained. These traces are used to feed simulators. The challenge here is to respect causal relationships; iii) Synthetic traffic: The usual way of working is considering a collection of independent sources of traffic that use some probability distribution functions to compute message sizes, destinations, and inter-generation times. We have introduced a "burst" mode in which traffic sources are not totally independent, which reflect the coupled nature of most parallel applications. A recent paper submitted to EuroPar 2007 [12] compares the last two options. Paper [2] evaluates the first one. TIN2004-07440-C02 2.2. Topological Issues The objectives achieved within this workpackage can be grouped around two topics: distancereduced practicable topologies and algebraic codes. In respect to topologies, dozens of parallel computers of different size have been designed around Torus interconnection networks. Typically, a 2D Torus arranges its N nodes in a square Mesh with wraparound links. Above few thousands nodes, it has been shown that parallel computers should use 3D topologies, being a cubic 3D Torus the most desirable solution. Parallel machines, such as the HP GS1280 based on Alpha 21364 and the Cray X1E vector computer, have used twodimensional Tori. Others, such as the Cray T3D and T3E have used three dimensional Tori by piling up several 2D Tori planes. The IBM BlueGene is a notable example of a massively parallel computer that joins 26×25×25 nodes in a mixed-radix 3D Torus. In addition, each dimension of a Torus network may have a different number of nodes, leading to rectangular and prismatic topologies for two and three dimensions respectively. These topologies are denoted as mixed-radix networks. Mixed-radix Tori are often built for practical reasons of packaging and modularity. The HP GS1280 employs a 2D rectangular network and the IBM Blue-Gene a 3D prismatic one. However, mixed-radix Tori have two important drawbacks: first, they are no longer edgesymmetric and second, the distance-related network parameters (diameter and average distance) are quite far from the optimum values of square and cubic topologies. The edge asymmetry introduces load imbalance in these networks, and for many traffic patterns the load on the longer dimensions is larger than the load on the shorter ones. In addition, maximum and average delays are too long as they depend on the poor values of diameter and average distance exhibited by these networks. During this project we have analyzed and introduced a family of 2D toroidal networks which includes standard and twisted Tori configured as square or rectangular networks. We have proposed the use of the Gaussian integers as the appropriated tool for defining, analyzing and exploiting this set of networks. By using this algebraic tool we have solved some interesting applications over this type of networks such as unicast and broadcast optimal routing and the perfect placement of resources on them. Some of the results of this research line have been published in several papers and others have been submitted for publication. Most of the material generated on this topic has been collected in [38]. In [13], we have considered practicable lay-outs for this kind of networks. Also, in [14] we have explored a particular network case as a suitable topology for on-chip networks. In [47, 48] the routing problem over circulant graphs has been revisited. Moreover, we have proposed and analyze alternative mixed-radix 2D and 3D Torus topologies that avoid the two above mentioned problems by adequately twisting the wraparound links of one or two network dimensions. The performance exhibited by these twisted Tori is notably higher than the one obtained when using their standard mixed-radix counterparts. Some of the results of this research line have been published in a recently accepted paper [6]. In respect to the field of Algebraic Codes, the design of error-correcting codes for two-dimensional signal spaces has been recently considered in the technical literature. Hamming and Lee distances have been proved to be inappropriate metrics to deal with QAM signal sets and other related constellations. Up to our knowledge, the first author who modeled certain QAM constellations TIN2004-07440-C02 with quotient rings of Gaussian integers was Klaus Huber. In his papers, Huber introduced a new distance for using in QAM-like constellations denoted as Mannheim metric. The rational behind this metric is to consider the Manhattan (or taxicab) metric modulo a two-dimensional grid. Based on this seminal work, we have proposed perfect codes for different multidimensional signal spaces. To solve these problems, we have introduced an original relationship among the fields of Graph Theory, Number Theory and Coding Theory. One of our main findings is the proposal of a suitable metric over quadratic, hexagonal and four-dimensional constellations of signal points. This metric is the distance among vertices of a new class of Cayley graphs defined over integer rings which include Gaussian, Eisenstein-Jacobi and Lipschitz Graphs. Hence, such graphs represent mathematical models of the multidimensional constellations under study. A problem in Graph Theory known as the perfect dominating set calculation has been solved over these graphs. A sufficient condition for obtaining such a set has been given for each case. Obtaining these sets of domination directly yields to the construction of perfect codes for the alphabets under consideration. Papers concerning codes over Gaussian integers have been published in notable Information Theory conferences’ proceedings or are in reviewing process in journals. Specifically, in [39] and [40] perfect codes over quotient rings of Gaussian integers are considered. The metric applied to these codes is the distance induced by a circulant Gaussian graph. Also, the weight distribution of these circulants has been presented in [41]. In addition, perfect codes over any quotient ring of Gaussian integers are presented in [42]. In this paper we have shown, as well, that Lee perfect codes are a particular case of our Gaussian perfect codes. In [39], 1-perfect error correcting codes over hexagonal constellations modeled by quotient rings of Eisenstein-Jacobi integers have been introduced. Recently, in [43], a method for finding certain perfect t-dominating sets over degree six circulant graphs has been considered. A preliminary approach to degree eight graphs and codes over them is being considered. In this case, the graphs will be built over certain subsets of the integer quaternions. Perfect 1-dominating sets are obtained and perfect codes over these sets are compared, in some cases, with perfect Lee codes, which we have considered in [44]. Most of this work corresponds to the PhD dissertation of C. Martínez [50]. 2.3. Architectural Proposals Among the different tasks that were raised in this objective, most effort has been directed towards the improvement of packet routers, and to the increase of the fault-tolerant capabilities of the network. Thus, very significant results were obtained in [15] (not completely explored yet) giving rise to a new strategy to implement adaptive routing in irregular networks. This method relies on the use of a deadlock avoidance mechanism developed by our groups in previous projects. Other aspect in which our work has given results of impact has been the development of a new fault-tolerant mechanism for medium and large size k-ary n-cube networks [16, 17]. This mechanism makes possible to tolerate, at network level, every time-space combination of failures provided that the network topology remains connected. The application of this new mechanism to other network structures can give new interesting results. In [45], we present an exhaustive evaluation of the proposal under different realistic scenarios. TIN2004-07440-C02 Many other issues, corresponding to the hardware support for improving the communication operations inside the router have been carried out. Thus, new solutions for solving several congestion problems, included in [18, 19, 20], have been proposed, with special emphasis on the aspect discovered (at least published for the first time) for our groups, about the instability of channel occupation in networks whose diameter surpasses the twenty of nodes [21]. Additionally and with respect to the network interfaces, a deep analysis of the main characteristics of the network interfaces present at the moment in the market (and with options of future) has been made in collaboration with the BSC/CNS. It is worth to remark that an important part of the challenges of the interconnection networks has been transferred inside the chip. The capacity of integrating several processors inside the same chip has converted the interconnection network in a key element for the performance of new multi-core architectures. For this reason, our groups are making an effort in the acquisition of the necessary expertise and tools as to make contributions in interconnection mechanisms suitable for this new environment. As a consequence of this effort, a line of collaboration with the BSC/CNS has produced significant results [22, 23] about kilo-instruction processors as a way for removing bottlenecks in the access to shared resources. Specifically, a new router structure for this type of systems has been developed whose performance surpasses the one of the architectures used until the moment. These results have been submitted for publication [46]. Finally, we want to mention that, in cooperation with Dr. Izu of the U. of Adelaide, we have studied some congestion-control mechanism that can be incorporated in large-scale interconnection networks. In particular, we have studied the mechanism included in IBM’s BlueGene/L (that gives priority to in-transit traffic, even at the cost of delaying new injections) and proposed and studied LBR (Local Buffer Restriction), an extension of the Bubble routing mechanism to adaptive virtual channels. Results of these studies are summarized in [7, 8, 9]. 2.4. Parallel Applications One of the objectives of this project was to perform, in cooperation with the Intelligent Systems Group of the UPV/EHU, the parallelization of a particular class of bio-inspired algorithm called EDAs (Estimation of Distribution Algorithms). These algorithms have proven very successful when solving complex optimization problems, but are very CPU-consuming. If we were able to accelerate them, via parallelization, we could reach better solutions in shorter times, and/or explore more solutions. This is precisely what we have done. The details of the way EDAs work, and how parallelization has been done, are in [24]. This paper also includes a preliminary performance evaluation of the parallel solutions, extended in [25]. We have collaborated with the group of Prof. Ubide at the Faculty of Chemistry of the UPV/EHU, in order to apply our fast, parallel version of EDAs to a complex problem in quantitative chemistry: the creation of multivariate calibration models for chemical applications. The problem consists of obtaining data (light spectra) of the concentration of species taking part in controlled chemical reactions, selecting the most relevant data, and creating a prediction model. This model can be feed with measured data from new reactions with unknown concentrations of the same species; the model should predict those concentrations. The application of EDAs has TIN2004-07440-C02 proven to be a very successful approach, because of the quality of the obtained models. In [26, 27, 28] we discuss the approaches to the problem used traditionally by chemists, and compare those with new approaches based on the “intelligent” search in the space of solutions performed by an EDA. In [29] we extend this work evaluating different mechanisms of input-data reduction, and different EDAs. As a result of this work, Alexander Mendiburu presented his PhD dissertation in January 2006 [30] under the supervision of Dr. Jose Miguel-Alonso and Dr. Jose Antonio Lozano. In the field of grid computing, we have been working in economic models to schedule applications in a grid environment. The results of this work are in [31]. What we have done is an extension to the grid architecture, introducing a collection of modules that, when launching an application, select resources taking into account not only availability and adequacy, but also a cost model. Implementation has been done using a Globus 4 environment [www.globus.org/toolkit], and the GridWay [www.gridway.org] meta-scheduler. Another objective of this project was to perform, in cooperation with the IFCA (Instituto de Física de Cantabria - Institute of Physics of Cantabria), the parallelization of an Artificial Neural Nets algorithm. Currently, to search for the Higgs boson at CERN (European Nuclear Research Centre) a multi-layer perceptron (MLP) is used. MLPfit is a sequential application for the design, implementation and use of multilayer perceptrons. It has been developed by DAPNIA which depends on the French Commission of Atomic Energy and adopted by CERN for the implementation of artificial neural nets for High-Energy Physics. The details of how parallelization has been carried out are in [32]. This work also has produced several Master theses [33, 34, 35, 36]. One of these students has joined our group as a PhD student. In addition, a formal collaboration with IFCA has been established under a new contract [37] for participating in the Crossgrid initiative. Crossgrid is a European project, whose objective is the creation, management and exploitation of a Europe-wide Grid computation environment, permitting the interactive use of applications that are extremely intensive in terms of calculation and data. The applications making use of the Crossgrid project’s infrastructure are related with the fields of biomedicine, meteorology or High-Energy Physics. Also, in [49] we have presented an evaluation of OpenMoxis environments. 3. Indicators of results Collaboration with research groups - Dr. Cruz Izu – Dep. of Computer Science, U. of Adelaide, Australia. - Prof. Ernst Gabidulin. Moscow Institute of Physics and Technology. Russia. - Prof. Mark D. Hill, UW-Madison. USA. - Prof. Mateo Valero and Jesús Labarta. BSC/UPC. - Dr. Carlos Ubide – Faculty of Chemistry, UPV/EHU. - Intelligent Systems Group – Computer Science Sc., UPV/EHU (www.sc.ehu.es/isg) - BSC / CNS (www.bsc.es) - IBM Research Laboratories–Zurich (www.zurich.ibm.com) - I2Bask – Research Network of the Basque Country (www.i2bask.net) - IFCA (CSIC/U. Cantabria, (www.ifca.unican.es) TIN2004-07440-C02 Patents: “Mecanismo de encaminamiento tolerante a fallos altamente escalable”. Inventores: Valentín Puente Varona, José Ángel Gregorio Manasterio, Fernando Vallejo Alonso, Ramón Beivide Palacio. U. Cantabria. Request #: P200500530. Request date: 01-03-05. PhD dissertations: - Dr. Alex Mendiburu (supervised by Dr. Miguel-Alonso) UPV/EHU - Dr. Carmen Martínez (supervised by Dr. Ramón Beivide) U. Cantabria Current PhD students: 2 at the UPV/EHU, 3 at U. of Cantabria, 1 at U. of Burgos Master’s thesis: 1 at the UPV/EHU, 4 at U. of Cantabria Participation in other projects: HiPEAC Network of Excellence (European Project IST-004408) Articles in international journals: 7 accepted plus 2 submitted Papers in international conferences: 21 accepted and 3 submitted Papers in national conferences: 7 accepted Other publications (technical reports): 3 4. References 1. 2. 3. 4. 5. 6. 7. P.Prieto, V.Puente, P.Abad y J.Á. Gregorio, “Paralelización de un Entorno de Evaluación de Redes de Interconexión”, XVII Jornadas de Paralelismo, 2006. Javier Navaridas, Fco. Javier Ridruejo, J. Miguel-Alonso. "Evaluation of Interconnection Networks Using Full-System Simulators: Lessons Learned". Proc. 40th Annual Simulation Symposium, Norfolk, VA, March 26-28, 2007. F.J. Ridruejo, J. Miguel. "Simulación de redes de interconexión utilizando tráfico real". Actas de las XVI Jornadas de Paralelismo. Thomson, 2005 (ISBN 84-9732-430-7). Pages: 109 - 116. F.J. Ridruejo, J. Miguel-Alonso. "INSEE: an Interconnection Network Simulation and Evaluation Environment". Lecture Notes in Computer Science, Volume 3648 / 2005 (Proc. Euro-Par 2005), Pages 1014 - 1023. F.J. Ridruejo, A. Gonzalez, J. Miguel-Alonso. "TrGen: a Traffic Generation System for Interconnection Network Simulators". International Conference on Parallel Processing, 2005. 1st. Int. Workshop on Performance Evaluation of Networks for Parallel, Cluster and Grid Computing Systems (PEN-PCGCS'05). ICPP 2005 Workshops. 14-17 June 2005 Page(s): 547 – 553 Jose M. Camara, Miquel Moreto, Enrique Vallejo, Ramon Beivide, Jose Miguel-Alonso, Carmen Martinez, Javier Navaridas. "Mixed-radix Twisted Torus Interconnection Networks". Proc. 21st IEEE International Parallel & Distributed Processing Symposium IPDPS '07, Long Beach, CA, March 26-30, 2007. F.J. Ridruejo, J. Navaridas, J. Miguel-Alonso, C. Izu. "Evaluation of Congestion Control Mechanisms using Realistic Loads". Actas de las XVII Jornadas de Paralelismo. Albacete, Septiembre 2006. TIN2004-07440-C02 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. C. Izu, J. Miguel-Alonso, J.A. Gregorio. "Effects of injection pressure on network throughput". Proc. PDP'06 - 14th Euromicro Conference on Parallel, Distributed and Network-based Processing. Montbéliard-Sochaux, 15-17 Feb. 2006, France. Pp. 91-98 J. Miguel-Alonso, C. Izu, J.A. Gregorio. "Improving the Performance of Large Interconnection Networks using Congestion-Control Mechanisms". Technical report EHU-KAT-IK-06-05. Department of Computer Architecture and Technology, The University of the Basque Country. C. Izu, J. Miguel-Alonso, J.A. Gregorio. "Evaluation of Interconnection Network Performance under Heavy Non-uniform Loads". Lecture Notes in Computer Science, Volume 3719 / 2005 (Proc. ICA3PP 2005), Pages 396 - 405. J. Miguel-Alonso, F.J. Ridruejo, C. Izu, J.A. Gregorio and V. Puente. "Are independent traffic sources suitable for the evaluation of interconnection networks?". Technical report EHU-KAT-IK-07-05. Department of Computer Architecture and Technology, The University of the Basque Country. F.J. Ridruejo, J. Navaridas, J. Miguel-Alonso and Cruz Izu. “Realistic Evaluation of Interconnection Network Performance at High Loads". Submitted to EuroPar 2007. E. Vallejo, R. Beivide and C. Martinez, «Practicable Lay-outs for Optimal Circulant Graphs» The 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP'05) C. Martínez, E. Vallejo, R. Beivide, C. Izu and M. Moretó, “Dense Gaussian Networks: Suitable Topologies for On-Chip Multiprocessors”, International Journal of Parallel Programming, Vol: 34, pp. 193-21, Junio 2006. V. Puente, J.A. Gregorio, F. Vallejo, R. Beivide and C. Izu, “High-performance adaptive routing for networks with arbitrary topology”, Journal on System Architecture, Vol 52, Issue 6, June 2006, pp.345-358 V.Puente, J.A.Gregorio, F.Vallejo y R.Beivide, “Immunet: A Cheap and Robust FaultTolerant Packet Routing Mechanism”, International Symposium on Computer Architecture (ISCA-2004). V. Puente, J.A. Gregorio, "Immucube: Scalable Fault Tolerant Routing for k-ary n-cube Networks", Aceptado para publicación en IEEE Transactions on Parallel and Distributed Systems. C.Izu, R.Beivide y J.A. Gregorio, “The Case of Chaotic Routing Revisited”, WDDD Workshop. In conjunction with the International Symposium on Computer Architecture (ISCA-2004) C.Izu y R.Beivide, “Understanding Buffer Management for Cut-through 1D Rings”, EURO-PAR 2004. Miguel-Alonso, J.A. Gregorio, V. Puente, F. Vallejo, R. Beivide, P. Abad, « Desequilibrio de Carga en Redes k-ary n-cube », XV Jornadas de Paralelismo (2004) J.Miguel-Alonso, J.A.Gregorio, V.Puente, F.Vallejo y Ramón Beivide, “Load unbalance in k-ary n-cube Networks”, EURO-PAR 2004. E.Vallejo, M. Galluzzi, A. Cristal, F. Vallejo, R. Beivide, P. Stenström, J. E. Smith and M. Valero, “Implementing Kilo-Instruction Multiprocessors”, IEEE International Conference on Pervasive Services 2005 (ICPS'05); Santorini, Greece. E.Vallejo, F.Vallejo, R.Beivide, M.Galluzzi, A.Cristal, P.Stenström, J.E. Smith, M.Valero, “KIMP: Multicheckpointing Multiprocessors”, XVII Jornadas de Paralelismo, 2005. TIN2004-07440-C02 24. A. Mendiburu, J.A. Lozano, J. Miguel-Alonso. "Parallel implementation of EDAs based on probabilistic graphical models". IEEE Transactions on Evolutionary Computation, Aug. 2005 Vol. 9, N. 4, Pages 406 - 423. 25. A. Mendiburu, J. Miguel-Alonso and J.A. Lozano. "Implementation and performance evaluation of a parallelization of Estimation of Bayesian Network Algorithms". Parallel Processing Letters, Vol. 16, No. 1 (2006) 133-148. 26. A. Mendiburu, J. Miguel-Alonso, J.A. Lozano, "Evaluation of Parallel EDAs to Create Chemical Calibration Model", e-science, p. 118, Second IEEE International Conference on e-Science and Grid Computing (e-Science'06), 2006. 27. A. Mendiburu, J. Miguel-Alonso, J.A. Lozano, M. Ostra and C. Ubide, "Parallel EDAs to create multivariate calibration models for quantitative chemical applications", Journal of Parallel and Distributed Computing, Volume 66, Issue 8, Parallel Bioinspired Algorithms, August 2006, Pages 1002-1013. 28. A. Mendiburu, J. Miguel-Alonso, J. A. Lozano, M. Ostra and C. Ubide. "Calibración multivariante de reacciones químicas utilizando EDAs paralelos". Actas del IV Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB'2005) II. M.G. Arenas et al. (editors). Thomson, 2005 (ISBN 84-9732-467-6). Pages: 881 - 888. 29. A. Mendiburu, J. Miguel-Alonso, J. A. Lozano, M. Ostra and C. Ubide. "Parallel and Multi-objective EDAs to create multivariate calibration models for quantitative chemical applications". International Conference on Parallel Processing, 2005. 1st. Int. Workshop on Parallel Bioinspired Algorithms. ICPP 2005 Workshops. 14-17 June 2005 Page(s): 596603 30. Alexander Mendiburu Alberro. “Parallel implementation of Estimation of Distribution Algorithms based on probabilistic graphical models. Application to chemical calibration models”. PhD thesis, Department of Computer Architecture and Technology, UPV/EHU, January 2006. 31. Jose Antonio Pascual Saiz. Sistema de compraventa de recursos para Grid Computing. Proyecto Fin de Carrera dirigido por J. Miguel-Alonso. Facultad de Informática UPV/EHU, 2006. 32. Gutierrez, Alejandro, Cavero, Francisco, Menéndez de Llano, Rafael. Parallelization of a Neural Net Training Program in a Grid Environment. Proceedings of PDP 2004 by IEEE Computer Society. 2004. 33. Pablo Abad. Análisis Y Evaluación De Sistemas De Colas En Entornos De Computación De Alto Rendimiento. Proyecto Fin de Carrera dirigido por Rafael Menéndez de Llano. Universidad de Cantabria. 2004. 34. Alejandro Gutierrez. Implementación Y Evaluación De La Paralelización De Una Aplicación De Física De Altas Energías Basada En Redes Neuronales Artificiales Para La Detección Del Bosón De Higgs. Proyecto Fin de Carrera dirigido por Rafael Menéndez de Llano. Universidad de Cantabria. 2004. 35. David Hernández Sanz. Instalación, Análisis Y Evaluación De Un Cluster Openmosix Para Entornos De Alta Productividad Y De Alto Rendimiento. Proyecto Fin de Carrera dirigido por Rafael Menéndez de Llano. Universidad de Cantabria. 2005 36. Víctor Sancibrián Grijuela. Evaluación De Distintos Bancos De Pruebas Para Diferentes Arquitecturas De Supercomputadores. Proyecto Fin de Carrera dirigido por Rafael Menéndez de Llano. Universidad de Cantabria. 2005. TIN2004-07440-C02 37. “Development of Grid Environment for Interactive Applications (CROSSGRID)” Subcontrato de IST-2001-32243. CSIC / Universidad de Cantabria. Investigador principal: Rafael Menéndez de Llano Rozas. 2004. 38. C. Martínez, R. Beivide, E. Stafford, M. Moretó and E. Gabidulin. ``Modeling Toroidal Networks with the Gaussian Integers". Submitted to IEEE Transactions on Computers. 2007. 39. C. Martínez, R. Beivide, and E. Gabidulin. “Perfect Codes from Circulant Graphs”. Accepted for publication under changes. IEEE Transactions on Information Theory. 2007. 40. C.Martínez, R.Beivide, J. Gutierrez and E.Gabidulin, “On the Perfect t-Dominating Set Problem in Circulant Graphs and Codes over Gaussian Integers”, 2005 IEEE International Symposium on Information Theory (ISIT 2005); Adelaide, Australia. 41. E.Gabidulin, C.Martínez, R.Beivide and J.Gutierrez, “On the Weight Distribution of Gaussian Graphs with an Application to Coding Theory”, Eighth International Symposium on Communication Theory and Applications (ISCTA ’05); Ambleside, Lake District, UK. 42. C. Martínez, M. Moretó, R. Beivide and E. Gabidulin, “A Generalization of Perfect Lee Codes over Gaussian Integers”,. 2006 IEEE International Symposium on Information Theory (ISIT'06). 43. C. Martínez, E. Stafford, R. Beivide and E. Gabidulin, “Perfect Codes over EisensteinJacobi Graphs”, Lugar de publicación: Proc. of Tenth Int. Workshop on Algebraic and Combinatorial Coding Theory (ACCT-10), 2006. 44. C. Martínez, E. Stafford, R. Beivide and E. Gabidulin. “Perfect Codes over Lipschitz Integers”. Submitted to 2007 IEEE International Symposium on Information Theory. 45. V.Puente, J.A.Gregorio, F.Vallejo y R.Beivide, “Dependable Routing in Interconnection Networks of Flexible Topology”, Submitted to IEEE Transactions on Computers. 2007. 46. P. Abad, V. Puente, P. Prieto and J.A. Gregorio, “Rotary Router: An Efficient Architecture for CMP Interconnection Networks”, Submitted to ISCA 2007, San Diego, California, USA, 2007. 47. D. Gomez-Perez, J. Gutierrez, A. Ibeas, C. Martinez, « On routing on circulant graphs of degree four », International Symposium on Symbolic and Algebraic Computation 2005. 48. D.Gómez, J.Gutierrez, A.Ibeas, C.Martínez and R.Beivide, « On Finding a Shortest Path in Circulant Graphs with Two Jumps”, The Eleventh International Computing and Combinatorics Conference (COCOON 2005); Kunming, Yunnan, P.R.China. 49. D.Hernández, R.Menéndez de Llano, P.Abad, C.Izu. ”Evaluación de un cluster OpenMosix para entornos de alta productividad y de alto rendimiento”, XVI Jornadas de Paralelismo, 2005. 50. C. Martínez. “Codes and Graphs over Complex Integer Rings”. PhD Dissertation. To be defended the 12th of March, 2007.
© Copyright 2026 Paperzz