REGEX HELPER USER MANUAL CONTENTS 1. 2. 3. 4. 5. 6. ABOUT REGEX HELPER SYSTEM REQUIREMENTS DEPLOYING REGEX HELPER MAIN USER INTERFACE USAGE AND FUNCTIONALITY SAMPLE USE CASE (With Screenshots) ABOUT REGEX HELPER Regex Helper is a web-based tool that can be used to retrieve synonyms for tokens in a regular expression from a dataset. The Regex Helper interface allows the user to give a regular expression and data location as input and retrieve all synonyms for a token in the regular expression along with the corresponding matches. Further, the user can provide feedback on the relevancy of synonyms to the system to improve the set of results in the future iterations. The user can continue this process until a sufficient number of synonyms are retrieved or until no more relevant synonyms are retrieved. SYSTEM REQUIREMENTS The Regex Helper tool has been developed in Java. It is available as a Web Archive file which is to be deployed on a Web Server. Web Server : Apache Tomcat 7.0 (http://tomcat.apache.org/download-70.cgi) Java: Version 1.7 Tested on Red Hat Linux Release 6.5 DEPLOYING REGEX HELPER The package is available in the following location: http://pages.cs.wisc.edu/~gayatrik/RegexHelper/RegexHelper.war The application is available as a Web Archive file - RegexHelper.war. It needs to be deployed on a Web server. In Apache Tomcat 7.0, the base directory of server installation is referred to by $CATALINA_BASE. To deploy the application, copy the web application archive file into directory $CATALINA_BASE/webapps/. When Tomcat is started, it will automatically expand the web application archive file into its unpacked form, and execute the application that way. Once deployed, the web application can be accessed from the browser in the following manner. http://localhost:8080/RegexHelper MAIN USER INTERFACE The main user interface of the tool is as shown below: The inputs to the Regex Helper are: 1. Regular Expression The input regular expression should indicate the word for which the synonyms are to be found. Atleast one seed word needs to be included in the input regular expression. Sample Regular Expression : If a rule is of the form : (athletic|batting|fitness|work[ -]?out) gloves? The regular expression can be given as “(athletic|\syn) gloves?”. The token '\syn' is used to indicate the word for which synonyms are to be found. Here, “(athletic|\syn)” indicates that the seed word for which the synonyms are to be found is 'athletic'. This would find all the synonyms that match the rule and are relevant to “athletic”. 2. Data Location The data location is the location of the dataset. It can be a file or a directory on the local drive. 3. Additional Options: i. Multiline This option indicates to the tool whether the surrounding context of a synonym spans across multiple lines. For example, if the dataset file consists of each line having a product title, the context is NOT multiline. This helps in matching the synonyms that are more relevant to the current context. The default value is false. In case of a text file, consisting of an e-mail (say), the context can be set to Multiline as the context spans over several lines in that case. ii. Number of context words: This is to specify the number of words nearby the synonym that could be considered as the context of the synonym. The default value is 5. iii. Max Number of words in Synonym: This is to specify the maximum number of words that the synonym can contain. The default value is 1. If this option is set to 2, the synonyms consisting of both 1 and 2 words would be retrieved. iv. Minimum Number of characters in Synonym: This option is to specify the minimum number of characters that a synonym can contain. The default value is 2. v. Number of words to match if (.*) is used in expression For a regular expression of the form, (tape|\syn).*dispensers? , the (.*) indicates that any number of characters might be matched. This option is used to set a bound on the number of words that can match the (.*). The default value is 3. If option is set to None, all the matches with the initial words as the synonyms would be retrieved. This is because the (.*) is greedy and tries to match as many words as possible. 4. Logging The process status messages are displayed in the status messages area. These logs could be saved to a file by checking the “Enable Logging” option (Enabling logging for every run might generate a large number of files). At the end of the process, the report of the current run can be generated by clicking on “Finish and Generate Report” button. The log and report files are saved under the bin/RegexTemp directory on the server. The filenames are of the form log_GUID.html and report_GUID.html respectively where GUID corresponds to a unique identifier for the run. This GUID would be displayed in the status message log when the process is submitted. USAGE AND FUNCTIONALITY: 1. Provide the regular expression and the data location as input to RegexHelper along with any necessary additional options. 2. Click on Submit for the process to start mining the dataset. The status of the process is updated in the status messages log. 3. If this log needs to be stored in a log file, enable logging before submitting the process. 4. After the results are returned, the user can provide relevant feedback by selecting those synonyms that seem relevant. The user is provided with the data that matches each synonym so as to verify the same. 5. After the feedback is submitted to the system, the system provides the user with the next ten most relevant results incorporating the feedback of the user. 6. This process can be continued either until no more relevant synonyms are retrieved / until the required number of synonyms are retrieved. 7. When the user wants to finish the process, click on Finish and Generate report, to complete the feedback process, generate a HTML report of the process and display it. The logs and the reports are stored in the server folder. 8. To start a new process with another regular expression, refresh the page or close the current RegexHelper instance and open another instance in the browser in order to avoid session related issues. SAMPLE USE CASE: (Screenshots) 1. Regular expression and data location (folder) provided as input with Multiline option checked. 2. The process runs and the status log is displayed. 3. Once the process is complete, the user can select the relevant synonyms by examining the matches. 4. The synonyms along with a few matches in the dataset are displayed to the user. The user can ` select the relevant synonyms by checking them. 5. On examining the results, the user can submit the feedback by clicking on “Submit Feedback”. Each iteration displays 10 synonyms. 6. After the user decides to stop the process and not continue with any further iterations, the user clicks on the “Finish and Generate Report” button. 7. This generates a HTML report and displays it to the user. The report is saved under the RegexTemp directory on the server.
© Copyright 2025 Paperzz