הקדמה למבחן בניתוח ועיצוב מערכות תוכנה אחרי שסיים את התואר הראשון שנה שעברה הלך דני ,לעבוד בחברה שנותנת שירותי חישוב ושמירת נתונים בענן (דוגמת Amazon .) Cloudכעת החברה מפתחת מנגנון ניטור לשרתים שלה שיאסוף מידע רב על פעילותו ומצבו של כל שרת ושרת .מידע זה ישמש לזיהוי תקלות ,איתור חריגות ,וכו .מנגנון זה נקרא ) .Distributed Computer Monitor System (DCMSדני ישתלב בצוות שמפתח את .DCMSכעת הוא קיבל את מסמכי המערכת מראש הצוות שלו כדי להתחיל להיכנס לעניינים .אומנם המסמכים לא מתארים את המערכת העדכנית ביותר ואף מכילים טעויות אבל הם מספיקים כדי להבין את המבנה של .DCMSדני רוצה להראות שהוא עובד חרוץ ומבין .עזור לו להבין את העיצוב של DCMSלמצוא את הטעויות במסמכים הקיימים ולשפר את המערכת במידת האפשר. להלן תיאור מפורט של .DCMSהמסמך כולל הן תיאור מילולי של המערכת והדרישות. כל השאלות מסומנות באופן ברור ע"י מיספור .1. תשובותיך חייבות להיות כתובות בכתב ברור במקום המיועד בלבד .מומלץ לכתוב את התשובות ולצייר את התרשימים תחילה במחברת טיוטה ולהעתיק את התשובות הסופיות לטופס כחצי שעה לפני סיום הבחינה. תשובה "לא יודע" תזכה ב 20% -מערך הסעיף. השאלון כתוב במין זכר אך מתייחס לשני המינים כאחד ,סה"כ ניתן לצבור בבחינה 110נקודות. 1 Distributed Computer Monitoring System (DCMS) 1 Summary This system used to collect information about the use of hardware (CPU, memory, disk, and network) and software resources (file handles and modules) of each server in a computational cluster in real time and save that information in a remote central database. The main function of DCMS is monitoring and collecting various information on processes, event's, log data, and more for every computer in an organization. The agent, which can be installed on each server in the computational cluster, allows the system administrator to determine the information collected about each server, collect the information, and save the information in central data storage. Later the collected information will be used to analyze the performance of each computer and of the computational cluster as a whole. The agent deployment should be dynamic in the sense that the administrator can add and remove agents during the system operation. Figure 1: Central storage, local storage, servers, and agents (left to right) 2 Stakeholders System Administrator is responsible for setting up the DCMS including central server and agents, configuring the agent, and analyzing the information collected by all agents. Computational Cluster Users are usually unaware of the DCMS agent that monitors computational servers they are working on. The agent should take minimal amount of resources to remain transparent for the cluster users. Computational cluster owners would like Computational Cluster Users to perform their tasks without interruption. They also need to maintain the computational cluster itself and take care of correct flawless operation of the cluster servers. They support DCMS as a measure for determining the "health" of cluster servers. 3 Requirements When the DCMS Client Monitor starts it collects data about: Devices installed on the computer, Local User Accounts, Environment (Computer name, operating system version, physical memory size etc.), Network Adapters, and Active Processes. 2 All active processes (those that were running before the agent and those that have been executed while the agent was running) are continuously for computer resources consumption. Data on active processes CPU Memory Network usage File system usage Registry usage Process CPU and memory consumption are retrieved using external library System.Diagnostic. Network, Files, Registry usages are tracked using the external library Event Tracing for Windows (ETW). Information on the overall system performance is collected every second as well. System performance measurements: Total CPU usage Total Memory usage Total Network Usage There are two methods of collecting information: 1) the agent listens to events generated by the operating system and 2) actively queries the operating system APIs. The agents should be configurable in the sense that it allows the administrator to determine the desired measurements collected by each instance of the agent. 3.1 Primary Use Case: UC1: Monitoring Main Success Scenario 1. Initialization: 1.1. Upon first activation of the agent it starts by loading the configuration settings 1.2. All controllers and data containers are created 1.3. According to the configuration settings the agent collects static data about the computer 1.4. The agent collects basic information on processes that were active when the agent was activated 2. Continuous collection of measurements: 2.1. The agent listens to system events, using the external library Event Tracing for Windows (ETW), and records the events in dedicated buffers according to the filter configuration settings. 2.2. The buffers are sampled every 1sec in order to generate process behavior statistics. 2.3. The operating system is queried for process performance measurements every 1sec. 2.4. The operating system is queried for system performance measurements every 1sec. 2.5. Generated measurement records are aggregated in memory 3. Send to server: 3.1. Every 10 minutes the agent packs the collected data in files 3.2. The data files are sent to the server for further analysis. Alternative Scenarios 1.3.a. Agent collects data about the Devices 1.3.b. Agent collects data about the local User accounts 1.3.c. Agent collects data about the local Network Adapters 2.1.a. New Process Event is received 1. The agent records basic information on the process a. Command line string b. Process ID 3 c. Parent process ID d. Base priority e. Start time f. User name 2. The agent starts monitoring events related to the new active process 2.1.b. ETW Events relate to network usage by active processes 1. The relevant process is identified 2. The new event is used to update measurements associated with the process 2.1.c. ETW Events relate to file access by active processes 1. The relevant process is identified 2. The new event is used to update measurements associated with the process 2.1.d. ETW Events relate to registry access by active processes 1. The relevant process is identified 2. The new event is used to update measurements associated with the process 3.2 Assumptions and Non-Functional Requirements All computers are located in local network. Cluster computers run 24/7. The system is consistent with the policies and the law. The agent CPU consumption should be up to 10% No measurements data should be lost due to failures or unavailability of communication network. 3.3 System boundary and external interfaces DCMS system includes monitoring agents and a central storage server. The agent interfaces with several Windows operating system interfaces: • Win32API is used to collect data on current system performance such as total CPU usage, memory usage, network usage etc. • The agent receives events from the Event Tracing for Windows (ETW) components . • Agent configuration settings are built on the .NET System.Configuration package. • System.Diagnostic.Process is used to collect process performance metrics such as total CPU and memory usage. 4 4 Conceptual architecture Figure 2: DCMS architecture DCMS Server Receives monitoring data from clients Asynchronously accepts tcp connection from clients via port 43036 Receives a compressed data file from the client and saves the file locally in a special folder. The program monitors the special folder and a dedicated thread is responsible to de-serialize the data in the files and save it to a local SQL-Server database DCMS Client • Data Providers • Third party components for interfacing the Windows API • Data Collectors • Collect data via data providers • Save the collected data in special folder • Data Sender • Sends the collected to server via tcp connection 5 5 Static structure 5.1 DCMS Agent Figure 3: DCMS agent class diagram TraceEvents is an external package for working with ETW events. System.Diagnostic can be treated as an external library as well whose responsibility is providing real time statistics about running programs (processes). Win32API is a class developed by the DCMS team. It encapsulates everything that is required to access the Windows API. <<implements>> also known as <<realizes>> 6 6 Behavior analysis Figure 4: Device manager data collection (UC1:1.3.a) Figure 5: Sample File Usage (UC1:2.2.a) 7
© Copyright 2026 Paperzz