Index - Acrolinx

Acrolinx Reuse
User Guide
Version: 4.1
Acrolinx™
2
Copyright © 2014
Acrolinx GmbH All rights reserved
The software contains proprietary information of Acrolinx GmbH. It is provided under a license agreement
containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering
of the software is prohibited.
Due to continued product development, this information may change without notice. The information and
intellectual property contained in this document is confidential between Acrolinx GmbH and the customer,
and remains the exclusive property of Acrolinx GmbH. If you find any errors in the documentation, please
report them to us in writing. Acrolinx GmbH does not guarantee that this document is error-free.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
or by any means, without the prior written permission of Acrolinx GmbH.
Acrolinx® is registered in the U.S. Patent and Trademark Office. Acrolinx™ is a trademark of Acrolinx
GmbH. All trademarked and copyrighted names used within this and supplemental documents are the
sole and exclusive property of their registered or common law owners.
Acrolinx GmbH
Friedrichstraße 100
D-10117 Berlin
Germany
Phone: +49 30 288 84 83-30
Fax: +49 30 288 84 83-39
E-mail: [email protected]
Website: http://www.acrolinx.com
DocID: IR-EN-285463-20140919-v4.1-b5065
NOTE: Because the Acrolinx user guides are updated frequently, there may be newer information in the
online version of these help files. Click here to open an online version of this guide.
3
Contents
Introduction
4
Acrolinx Reuse.....................................................................................................................................4
Intelligent Reuse Process......................................................................................................................4
Before You Start.......................................................................................................................6
Creating a Reuse Repository
7
How Acrolinx Identifies Sentences..........................................................................................................7
Creating and Updating Reuse Repositories with Harvested Sentences.......................................................7
Harvesting Sentences................................................................................................................7
Adding Harvested Sentences to a Repository.............................................................................10
Creating a Reuse Repository from an Import File........................................................................13
Creating Empty Reuse Repositories.....................................................................................................14
Canceling a Repository Task................................................................................................................15
Managing Clusters
16
The Clusters Page..............................................................................................................................16
Representative Sentences...................................................................................................................18
Editing Clusters.................................................................................................................................18
Sorting and Filtering the Cluster List.........................................................................................19
Changing the Cluster Status.....................................................................................................19
Changing Representative Sentences..........................................................................................20
Removing Sentences from Clusters...........................................................................................20
Adding Sentences to Clusters...................................................................................................20
Creating New Clusters.............................................................................................................21
Managing Reuse Repositories
23
Enabling Reuse Repositories for Checking.............................................................................................23
Assigning a Reuse Repository to a Rule Set...............................................................................24
Activating or Deactivating Repositories......................................................................................24
Language Server Statuses and Warnings...................................................................................24
Exporting a Reuse Repository..............................................................................................................25
Checking for Reuse Issues in the Acrolinx Plug-ins.................................................................................26
Deleting a Reuse Repository................................................................................................................26
Backing up Reuse Repositories............................................................................................................27
4
Introduction
Chapter 1
Introduction
Acrolinx Reuse
Acrolinx Reuse is a powerful tool for maintaining consistent authoring
standards and eliminating redundancy in documentation projects. Acrolinx
Reuse uses linguistic analysis to match sentences based on meaning.
Text from content repositories or translation memories is automatically
analyzed to produce small groups of sentences with similar meaning. The
preferred wording can be easily validated, selected, and released for reuse.
For example, the following variations of a sentence might appear in your
documentation:
The following items come in your TopSpin shipment.
Your TopSpin shipment includes the following items.
Your TopSpin package is shipped with the following items.
You can choose one these sentences to be your standard sentence, and
Acrolinx identifies the variations and proposes your chosen standard sentence.
When authors run a check in an Acrolinx plug-in, Acrolinx recognizes
sentences with similar meanings. The author receives a suggestion with the
preferred wording. All suggestions have been automatically validated by the
system and released by the linguistic administrator. The author can accept
the suggestion with a single mouse click.
Intelligent Reuse Process
The Acrolinx Reuse module is a flexible tool which you can use for a variety
of purposes, such as cleaning a translation memory, or identifying new
terminology. The most common use for the Acrolinx Reuse is to analyze and
check consistency in a set of documentation. The following illustration
5
summarizes the steps and components that are involved in using
Reuse.
Acrolinx
Figure 1: Standard
Reuse Process
Acrolinx recommends the following major steps when starting up a
documentation consistency project with Acrolinx Reuse.
1 Harvest Sentences: Extract sentences from product information and store
them on the Acrolinx Server.
•
Procedure Summary: Create a sentence bank to store sentences and
run checks by using the Acrolinx plug-ins to harvest sentences from
your documents. For more information, see the topic Harvesting
Sentences (see "Harvesting Sentences" on page 7).
2 Create Repository: Create a repository of sentences which are grouped
together based on structure and meaning. Unlike sentence banks, the
sentences in a repository are grouped into clusters. When Acrolinx groups
sentences together, a representative sentence is automatically selected
for each cluster.
•
Procedure Summary: Add harvested sentences or import sentences
from a text file to a new repository. For more information, see the
chapter Creating a Reuse Repository (see "Creating a Reuse Repository"
on page 7).
3 Enable Repository for Statistics and Checking: Enable the repository to
gather statistics and provide suggestions for sentences that differ from
the representative sentence.
•
Procedure Summary: Assign your repository to a rule set, and instruct
authors to run checks with the reuse repository. For more information,
see the topics Enabling Reuse Repositories for Checking (see page 23)
and Checking for Reuse Issues in the Acrolinx Plug-ins (see page 26).
6
Introduction
4 (Optional) Edit Clusters: Use the statistics to review the most commonly
used clusters and confirm the preferred form of the sentence for each
cluster.
•
Procedure Summary: After you have collected enough statistics, sort
clusters by match frequency, and check that the representatives for
frequently used clusters are correct. For more information, see the
topic Editing Clusters (see "Editing Clusters" on page 18).
Before You Start
To use the intelligent reuse you must have a license and appropriately
configured privileges and linguistic resources.
Your server administrator must install a license that is configured with the
Reuse module and assign you with a role that has the privileges in the Reuse
section enabled.
To enable your changes for checking, you also need the privileges in the
Resources section and the privilege Restart servers.
Ensure that your linguistic resources are configured for Reuse. When linguistic
resources are configured for reuse, the ReuseHarvesting rule set is displayed
on the language server page in the Dashboard. If the ReuseHarvesting rule
set is missing, contact your Acrolinx project consultant.
7
Chapter 2
Creating a Reuse Repository
A reuse repository is a repository of sentences which are grouped together
based on structure and meaning.
•
•
A group of similar sentences is called a cluster.
A reuse repository is normally used to store clusters which have a similar
subject area or relate to a specific product.
You can create a repository in the following ways:
•
•
•
Create a reuse repository from harvested sentences. (see "Adding
Harvested Sentences to a Repository" on page 10)
Create a reuse repository from an import file (see "Creating a Reuse
Repository from an Import File" on page 13).
Create an empty repository from the Repositories page (see "Creating
Empty Reuse Repositories" on page 14).
On the Progress page, you can monitor the progress of a repository which
is being created or updated and view a list of repositories which have been
completed or canceled. You can also cancel the tasks for selected repositories.
How Acrolinx Identifies Sentences
Depending on how your linguistic resources are configured, some characters
can cause Acrolinx to interpret your sentence as two sentences.
Example: Suppose you are importing a TMX file which contains a segment
with the following sentence:
The Topspin package always contains at least three items: a fan tray, a
system controller, and a warranty card.
The sentence contains a colon, and colons are interpreted by standard
linguistic resources as the end of the sentence. In this example, the server
interprets the segment as two separate sentences.
Creating and Updating Reuse Repositories with
Harvested Sentences
Harvesting Sentences
Sentence harvesting is the process of detecting new sentences whenever a
document is checked. After you have harvested enough sentences, you can
add them to a new or existing repository.
The first step in harvesting sentences is to create sentence banks to store the
sentences (see "Creating a Sentence Bank" on page 8).
You can harvest sentences with:
8
Creating a Reuse Repository
•
•
the Acrolinx Plug-ins (see "Harvesting Sentences with an Acrolinx Plug-in"
on page 9)
the Acrolinx Batch Checker (see "Harvesting Sentences with the Acrolinx
Batch Checker" on page 9)
Working with Sentence Banks
A sentence bank stores the raw source data which you use to create a reuse
repository. Unlike a reuse repository, the sentences in a sentence bank are
not grouped into clusters.
You can keep building up a sentence bank and experiment with different
clustering settings by creating several reuse repositories from the same set
of harvested sentences.
You can also create several sentence banks to store sentences from different
types of documentation.
Creating a Sentence Bank
â
To create a sentence bank, follow these steps:
1 Open the page Reuse > Create and Update > Harvest Sentences and click New.
2 In the New Sentence Bank dialog box, enter a name and select a language
for the sentence bank and click OK.
The new sentence bank is created and is automatically selected as the
default sentence bank for the language you defined.
Selecting Default Sentence Banks
To collect sentences in the Acrolinx Plug-ins, a default sentence bank must
be defined for each checking language.
â
To set a default sentence bank for a checking language, follow these steps:
1 Open the page Reuse > Create and Update > Harvest Sentences.
2 In the Default column, select a sentence bank to store the harvested
sentences in your required checking language.
If you have sentences banks in several languages, you can select one
sentence bank from each language.
The changes take effect immediately.
Viewing the Contents of a Sentence Bank
â
To view the contents of a sentence bank:
•
Click the name of a sentence bank on the
Sentence Banks Page.
You can click the column headers to sort the sentences alphabetically or
by Acrolinx score.
Sentences with a high
Acrolinx score are lower quality sentences.
TIP: If you find malformed sentences, you can use the Delete Button to remove
them from the sentence bank.
9
Setting the Language for Legacy Sentence Banks
Legacy sentence banks are sentence banks which were created with an
Acrolinx Server version earlier than 1.5. In server versions 1.5 or later, all
sentences banks must have a language defined before you can start the
clustering wizard.
If your installation contains legacy sentence banks, the Set Language button
appears on the sentence banks page. This procedure is possible only if the
Set Language button is visible.
â
To set the language for legacy sentence banks, follow these steps:
1 Open the page Reuse > Harvest Sentences.
2 Use the checkboxes in the Name column to select legacy sentence banks.
Legacy sentence banks do not have a language defined in the Language
column.
3 Click Set Language.
4 In the Set Language dialog box, select a language and click OK.
5 The selected language appears in the Language column for the selected
sentence banks.
If all sentence banks have a language defined after completing this
procedure, the Set Language button is hidden.
Harvesting Sentences with an Acrolinx Plug-in
â
To harvest sentences with an Acrolinx plug-in, follow these steps:
1 (Follow this step if you have not yet selected a default sentence bank)
Select a default sentence bank for your required checking language.
2 In your editor application, open a document which contains the content
that you intend to cluster.
3 Open the Acrolinx Plug-in Options and select a rule set which has been
configured to harvest sentences.
Normally this rule set is called ReuseHarvesting.
If you are unsure about which rule set to use, ask your
administrator.
Acrolinx Server
4 Run a check with the checking options Spelling, Grammar, Style, and
Terminology selected.
TIP: When you run a check with the main checking options selected, you
ensure that each sentence receives an accurate Acrolinx score. The Acrolinx
score is used to select a cluster representative during the clustering
process.
5 In the Dashboard, open the Sentence Banks Page.
6 Click Refresh to update the sentence count in the Sentence Bank Table.
7 (Optional) Click the sentence bank name to view the contents of the
sentence bank.
Harvesting Sentences with the Acrolinx Batch Checker
â
To harvest sentences with the
Acrolinx Batch Checker, follow these steps:
10
Creating a Reuse Repository
1 Open the Acrolinx Batch Checker and locate the files which contain the
content that you intend to cluster.
2 Configure your file settings and server connection settings (for more
information, see the Acrolinx Batch Checker User Guide).
3 In the Acrolinx Batch Checker check options, select a rule set which has
been configured to populate sentence banks.
Normally this rule set is called ReuseHarvesting.
If you are unsure about which rule set to use, ask your
administrator.
Acrolinx Server
4 Select your sentence bank from the Reuse Sentence Bank dropdown.
NOTE: The Reuse Sentence Bank dropdown is only visible when at least one
sentence bank is detected. If you have created a sentence bank and cannot
see the Reuse Sentence Bank dropdown, refresh your server connection
5 Run a check with the checking options Spelling, Grammar, Style, and
Terminology selected.
TIP: When you run a check with the main checking options selected, you
ensure that each sentence receives an accurate Acrolinx score. The Acrolinx
score is used to select a cluster representative during the clustering
process
6 In the Dashboard, open the Sentence Banks Page.
7 Click Refresh to update the sentence count in the Sentence Bank Table.
8 (Optional) Click the sentence bank name to view the contents of the
sentence bank.
Adding Harvested Sentences to a Repository
After you have harvested enough sentences (see "Harvesting Sentences" on
page 7), you can add them to a new or existing reuse repository.
IMPORTANT: All sentence banks must have a language defined before you
add the sentences to a reuse repository. If your installation contains legacy
sentence banks which were created in a server version earlier than 1.5, set
the language for the legacy sentence banks first (see "Setting the Language
for Legacy Sentence Banks" on page 9).
â
To create and update reuse repositories with harvested sentences, follow
these steps:
1 Open the page Reuse > Create and Update > Harvest Sentences.
2 Select one or more sentence banks using the checkboxes next to the
sentence bank names.
3 Click Add to Repository.
4 On the Repository Options page, create or update a repository and click
Next.
When you update, you can merge or replace the contents of an existing
repository.
5 On the Cluster Settings page, configure the clustering settings (see "Cluster
Settings" on page 11). The cluster settings define how the sentences
should be grouped together into clusters (see "Managing Clusters" on
page 16).
11
•
•
•
•
The minimum word count defines the minimum number of words that
must be in a sentence before the sentence can be added to a cluster.
The minimum cluster size defines the minimum number of sentences
that must be in a cluster before the cluster can be added to the
repository.
The cluster strictness defines the quality of clusters to add to the reuse
repository.
The initial cluster status defines the initial status of all clusters in the
reuse repository.
6 Click Finish.
The Acrolinx Server begins grouping the sentences into clusters and adding
the clusters to the reuse repository.
After the repository is created or updated, open the
select the repository to view the clusters.
Repositories page and
If your repository is ready to be used, enable the repository for checking
(see page 23).
Cluster Settings
You can use the cluster settings to influence the average number of sentences
and the similarity of the sentences in each cluster.
Acrolinx recommends that you experiment with the cluster settings, create
several repositories from the same set of harvested sentences, and compare
the results.
Minimum Word Count
The minimum word count defines the minimum number of words that must
be in a sentence before the sentence can be added to a cluster.
For example, titles are treated as individual sentences, but in some
documents, titles often contain only one word. You can raise the minimum
word count to eliminate short titles from being added to a cluster.
Stop words such as "and", "to", and "the" are included in the minimum word
count.
To a certain extent, the lower the minimum word count, the more likely you
are to get irrelevant sentences in your clusters.
For example, consider the two titles "Configuring Browsers", and "Configuring
Servers". If the minimum word count is set to two, both variants might be
included in the same cluster. However, these sentences do not represent the
same idea.
Minimum Cluster Size
The minimum cluster size defines the minimum number of sentences that
must be in a cluster before the cluster can be added to the repository.
You can use this setting to prioritize sentences which have a large degree of
variation.
For example, the sentence "Open the configuration file" might be written the
same way in all of your documentation with only one other variant such as
"Launch the configuration file". You might have many clusters that contain
sentences with only one variant. These clusters can be time consuming to
review and edit.
12
Creating a Reuse Repository
However another sentence such as "End Date cannot be before the Start
Date" might also be written in the following ways:
End Date must be greater than Start Date.
End Date must be greater than or equal to Start Date.
End Date must be later than Start Date.
End Time must be later than the Start Time.
A larger number of variants in sentence structure leads to higher translation
costs, so a high minimum cluster size can help you focus on the most
problematic sentences.
Cluster Strictness
The cluster strictness defines the quality of clusters to add to the reuse
repository.
There are five levels of cluster strictness ranging from lowest to highest.
At the lowest level, sentences which share only a few keywords are grouped.
Clusters created with the lowest cluster strictness are usually large clusters
which can contain ten or more sentences.
For example, the following sentences are grouped with a setting of Lowest.
End Date cannot be before the Start Date.
End Date must be greater than Start Date.
End Date must be greater than or equal to Start Date.
End Date must be later than Start Date.
End Time must be later than the Start Time.
End date must be equal to or later than the start date.
End date should be greater than start date.
Please enter a start date that is before the end date.
Please enter an End Date that is later than or the same as the Start Date.
Please enter an end date that is later than the start date.
The Start Date cannot be after the End Date.
The actual end date must be on or after the actual start date.
The end date cannot be before the start date.
The end date must be later than or the same as the start date.
At the highest level, only sentences which are very similar are grouped.
Clusters created with the highest cluster strictness are usually smaller clusters
with two or more sentences.
For example, the following sentences are grouped with a setting of Highest.
End Date must be later than Start Date.
Start date must be before end date!
The start date must be on or before the end date.
The start date must be prior to the end date.
Your start date must be before your end date.
The choice of strictness depends on the type of data and on the intended
purpose of the reuse repository.
13
•
•
A lower strictness can in result in a repository that contains lot of variation,
which might be useful for testing.
To reduce the degree of variation and to eliminate clusters that are too
large, you can set the cluster strictness to a higher setting.
The more harvested sentences you have, the more likely you need to use
a higher strictness.
Initial Cluster Status
When you add harvested sentences to a repository you can select the initial
status for all clusters. You cannot change the initial status after you create
the repository. You can only change the status of clusters individually (see
"Changing the Cluster Status" on page 19). To change the status for all
clusters at the same time, you must create the repository again and select
a different initial cluster status.
You must set the clusters to Enabled if you want to make them available for
checking.
You must set the clusters to Proposed or Disabled if you want to edit the clusters
further and do not want them to be available for checking.
Creating a Reuse Repository from an Import File
You can create reuse repositories by importing sentences from a text or TMX
file. This feature is useful if you already have an externally validated file
which contains sentences that you want to add to a new reuse repository.
What You Should Know before Importing Sentences
•
•
•
When you add harvested sentences from a sentence bank, you can add
new sentences to existing repositories. When adding sentences from an
import file, you can add sentences only to a new repository which you
create during the import process.
Unlike adding harvested sentences, no linguistic intelligence is used to
group sentences. You will have a cluster for every sentence in the import
file.
Text files must contain one sentence per line and TMX files must contain
only one sentence per segment. If a line or segment contains more that
one sentence, the affected line or segment is ignored and logged. If a
sentence contains special characters it might be interpreted as two
sentences and ignored (see "How Acrolinx Identifies Sentences" on page
7).
Importing a Text or TMX File
â
To import a text or TMX file, follow these steps:
1
2
3
4
5
Open the page Reuse > Create and Update > Import Sentences.
In the File Options, select the desired File format
(Follow this step if you are importing a text file) Select the
Locate the import file using the Browse button.
Click Next.
The
Encoding.
Import Preview page displays the first few rows of your import file.
6 Confirm that the preview of the import file looks correct and click Next.
14
Creating a Reuse Repository
If you are importing a text file and some characters are not rendered
correctly in the import preview, click Back and adjust the Encoding field.
Repository Options, select the Repository language and enter the
Repository name.
8 (Follow this step if you are importing a TMX file) In the TMX language
7 In the
dropdown, select the language of the sentences to import.
9 Click Finish.
The
Import Summary page is displayed.
The import begins and a progress bar displays underneath the
Menu until the import operation completes.
•
•
Navigation
You can also use the Progress page to see more details on the import
progress and estimated completion time.
You can continue to use the Dashboard while the import is running.
10 Verify that the import was successful by viewing the import log messages
(see page 14).
11 (Optional) Click Start New Import to import another file.
Viewing Import Log Messages
The Dashboard features a generic inbox for viewing log messages for any
tasks that run in the background. This inbox is located at the top right of
your screen in the Dashboard Menu.
Large import tasks run in the background while you continue to use the
Dashboard . When the import is complete, an unread envelope icon displays
next to the Messages menu item in the Dashboard Menu which indicates that
you have a new import log message.
â
To open an import log message, follow these steps:
1 Click the Messages menu item in the Dashboard Menu.
2 In the Messages Window, click an unread message.
The Log Message Window opens and a summary of the import is displayed
in a text box.
3 Click Download Detailed Import Log at the bottom of the
to download a complete log of the import.
Log Message Window
NOTE: All import log messages are removed from the Messages Window
when the Acrolinx core server is restarted. However, the detailed log files
are still stored in the server output directory. For more details ask your
Acrolinx Server administrator.
Creating Empty Reuse Repositories
Creating an empty repository is helpful if:
•
•
you prefer to have your repositories ready before adding harvested
sentences.
you want to manually add clusters and sentences (see "Creating New
Clusters" on page 21) to the repository.
15
â
To create a new reuse repository, follow these steps:
1 Open the page Reuse > Repositories.
2 Click New.
3 Enter a Name, select the Language and click OK.
Canceling a Repository Task
You might want to cancel a repository which is being created or updated
under the following circumstances:
•
•
The task is taking too long, or requires a large amount of server resources.
You have used the wrong sentence bank or import file, and want to create
or update the repository again.
â
To cancel a repository:
•
Select the relevant repositories and click Cancel Selected.
The selected repositories display as canceled in the Remaining Time column.
NOTE: It can take a few seconds before the status information is updated.
16
Managing Clusters
Chapter 3
Managing Clusters
In a reuse repository, clusters are used to group sentences together based
on structure and meaning. A cluster normally contains several sentences.
However, if there are no other similar sentences in the repository, a cluster
can also contain a single sentence
At least one sentence in a cluster is selected to be a representative sentence.
The representative is the preferred wording of the sentence when several
variations of the sentence exist.
The Clusters Page
The
Clusters Page displays the clusters within a reuse repository.
Figure 2: The Clusters
Page
The clusters page has the following parts.
Part
Use to
Keyword Filter
Apply a filter based on keywords in
the cluster (see "Sorting and Filtering
the Cluster List" on page 19).
New Cluster Button
Create a new cluster (see "Creating
New Clusters" on page 21).
Status Filter
Apply a search filter based on cluster
status (see "Sorting and Filtering the
Cluster List" on page 19).
17
Part
Use to
Cluster Size Filter
Apply a filter based on cluster size
(see "Sorting and Filtering the Cluster
List" on page 19).
Cluster table columns
Sort the cluster list. (see "Sorting and
Filtering the Cluster List" on page 19)
Edit Cluster button
Add sentences to a cluster (see
"Adding Sentences to Clusters" on
page 20).
Create as New Cluster button
Create a new cluster from the
selected sentences (see "Creating
New Clusters" on page 21).
Delete button
Delete sentences from a cluster (see
"Removing Sentences from Clusters"
on page 20).
Set Representatives button
Change the representative sentences
for a cluster (see "Changing
Representative Sentences" on page
20).
The Clusters Table
The clusters page shows a list of clusters in a table with the following columns:
Column Name
Details
Active
Contains buttons to control the
cluster status.
•
•
•
If both buttons are gray, the
cluster has not yet been validated,
and will not be used in a check.
If the On button is green, the
cluster is active, and the server
will flag near matches.
If the Off button is red, the cluster
is inactive, and will not be used in
a check.
ID
A numeric unique identifier of each
cluster
Representative
Indicates the current cluster name.
The cluster name is taken from the
first cluster representative. The
cluster name can change when users
edit the cluster representatives (see
"Changing Representative Sentences"
on page 20).
18
Managing Clusters
Column Name
Details
Matches
Indicates how often sentences in
each cluster where offered as
suggestions in the Acrolinx Plug-ins.
Last Detected
Indicates when the sentence was last
offered as a suggestion.
Size
Indicates how many sentences are in
a cluster.
Version
The version number of the cluster
and sentences. The version number
of a cluster increases when you use
the clustering wizard to merge new
sentences into an existing cluster.
Newer sentences in a cluster have
higher version numbers. The cluster
inherits the version number of the
newest sentence.
Representative Sentences
A representative sentence is the preferred sentence within a cluster. The
Acrolinx plug-ins displays the representative sentence as a suggested
replacement if a variation is found.
During clustering process the Acrolinx Server selects the first sentence in the
cluster with an Acrolinx score of zero to be representative sentence.
Acrolinx score ranks a sentence on how closely the sentence adheres to
Acrolinx style and grammar standards. If a cluster does not contain any
sentences with an Acrolinx score of zero, the cluster is not added to the
The
repository.
After you have created a repository, you can change the representative
sentence or select additional representative sentences. You might choose
more than one representative sentence if the sentences can be used in
different ways depending on the context.
Editing Clusters
You can edit clusters to change the way sentences are grouped and to select
new representative sentences. After you review and edit your clusters, you
can change the status of individual clusters to enable or disable them for
checking.
19
Sorting and Filtering the Cluster List
Column Sorting
You can sort columns with bold headers in ascending or descending order.
Column sorting is useful for validating large lists of clusters. For example,
you can sort by match frequency to see the most frequently detected
sentences within clusters.
You can also filter the cluster list based on certain attributes of a cluster, the
cluster ID, or keywords within the clustered sentences.
â
To sort or filter your cluster list, follow these steps:
1 Click a column header.
2 Enter one or more keywords in the search field and click Search.
NOTE: Numerals are not recognized in keyword searches. For example,
the search "4 fan trays" returns all sentences that contain "fan trays" but
not "4 fan trays".
3 Enter a cluster ID in the search field and click Search.
4 In the Minimum Cluster Size field, enter the minimum number of sentences
that a cluster must contain in order to appear in the search results.
5 Select a filter checkbox to filter clusters by status.
•
•
•
The
off.
The
The
Proposed checkbox shows clusters that are not yet turned on or
Enabled checkbox shows clusters that are turned on.
Disabled checkbox shows clusters that are turned off.
Changing the Cluster Status
After you activate a cluster and restart the language server, sentences that
vary from the representative sentence are flagged when users check their
documents.
â
To change the cluster status:
1 Click the On button to enable the cluster or click the Off button to disable
the cluster.
2 Restart the relevant language server to make your changes available for
checking.
You can also make additional changes before you restart the language
server.
You cannot delete a cluster because the cluster might be created again when
you add new harvested sentences to your repository. When you disable a
cluster, the cluster is not used for checking. A disabled cluster is also never
re-created.
20
Managing Clusters
Changing Representative Sentences
By default, Acrolinx automatically selects a representative sentence when
you create a repository. However, you can change the representative sentence
and select additional representative sentences.
â
To edit representative sentences, follow these steps:
1
2
3
4
Click the cluster name to see the sentences in the cluster.
Select or deselect the checkboxes next the relevant sentences.
Click the Set Representatives button.
Restart the relevant language server to make your changes available for
checking.
You can also make additional changes before you restart the language
server.
Removing Sentences from Clusters
You can remove a sentence from a cluster if the sentence is not relevant to
the cluster. If you notice many clusters that contain irrelevant sentences,
you might need to adjust your cluster settings (see page 11) and create the
repository again.
Although the sentences are deleted from the cluster, they are still kept in
the repository to ensure that they are not clustered again. Deleted sentences
are moved to a new cluster with the status 'Disabled'.
â
To remove sentences from a cluster, follow these steps:
1 Click the cluster name to see the sentences in the cluster.
2 Select the checkboxes next the relevant sentences.
3 Click Delete.
After you click Delete, the removed sentence is moved to its own cluster
that has the status disabled.
4 Restart the relevant language server to make your changes available for
checking.
You can also make additional changes before you restart the language
server.
Adding Sentences to Clusters
You can add additional sentences to a cluster by entering a new sentence or
moving an existing sentence from other cluster.
To search clusters, ensure that a language server in the search language is
running. For example, to search Japanese clusters, ensure that a language
server configured with Japanese is running.
â
To add a new sentence to a cluster, follow these steps.
1 In the cluster list, expand the cluster that you want to edit and click the
Edit Cluster button.
2 Add a sentence:
21
•
To add a new sentence:
a Enter a new sentence in the Enter new sentence field.
b Click Add Sentence.
•
To move an existing sentence:
a Enter a sentence or set of keywords and click Search Clusters.
b Select the sentences that you want to add to the cluster and click Move
Selected Sentences.
The selected sentences are removed from the clusters in the Clusters
with Similar Sentences section and added to the Target Cluster.
3 Restart the relevant language server to make your changes available for
checking.
You can also make additional changes before you restart the language
server.
Creating New Clusters
You can manually create a cluster to contain new sentences that are not in
your repository.
For example, you have a small document that contains variations of the
sentence "Install gateways and switch cards". The sentences in this document
were not harvested, but you need a quick way of adding the sentences to
your repository.
You can also create a new cluster if an existing cluster is too big and needs
to be split into two smaller clusters.
If you want your new clusters to be available for checking, ensure that you
enable the clusters.
â
To create a cluster that contains new sentences, follow these steps.
1 In a repository, click the New Cluster button at the top of the
cluster list.
The cluster has the placeholder name "Missing Representative" until you
add sentences to the cluster.
2 Add sentences to the cluster (see "Adding Sentences to Clusters" on page
20).
3 Restart the relevant language server to make your changes available for
checking.
You can also make additional changes before you restart the language
server.
â
To create a new cluster from existing sentences, follow these steps:
1 In the cluster list, expand the cluster which contains the sentences that
will go into the new cluster.
2 Select the desired sentences.
3 Click Create as New Cluster.
The
Edit Cluster page opens for the new cluster.
4 Restart the relevant language server to make your changes available for
checking.
22
Managing Clusters
You can also make additional changes before you restart the language
server.
23
Chapter 4
Managing Reuse Repositories
The Repositories Page displays a list of reuse repositories. A Reuse Repository
is used to store sentences which are grouped into clusters (see "Managing
Clusters" on page 16).
The
Repositories table has the following columns:
Column Name
Details
Language
The language of the repository. A
repository can contain sentences in
one language only.
Repository
The name of the repository. You
enter the name when you create a
new repository.
Active In
The rule set which contains the
repository when used for checking.
Clusters
The number of clusters in the
repository. The number of clusters
that you can manage depends on
your available hardware resources.
Sentences
The number of sentences in the
repository. The number of sentences
that you can manage depends on
your available hardware resources.
Matches
Indicates how often sentences in the
repository were offered as
suggestions in the Acrolinx Plug-ins.
You use statistics to prioritize clusters
for editing.
Version
The version number of the repository.
The version number of the repository
changes when you use the clustering
wizard to update or replace an
existing repository.
Enabling Reuse Repositories for Checking
A repository is not enabled for checking until you assign the repository to a
rule set. You assign new reuse repositories to rule sets on the Reuse
Repositories page in the Resources section. You can also deactivate or activate
existing assignments. A reuse repository configuration page is available for
each of the languages configured in your resources.
24
Managing Reuse Repositories
Assigning a Reuse Repository to a Rule Set
To enable a repository for checking, you must assign the repository to a rule
set. You can assign the same repository to one or more rule sets, but you
can assign only one repository to each rule set.
â
To assign a reuse repository to a rule set, follow these steps:
1 Navigate to Resources > Reuse Repositories.
2 Navigate to the reuse repository configuration page for the relevant
language.
3 In the Repository column, select the repository for the rule set that you
want to assign repository to.
4 Select the checkbox in the Active column to activate the assignment for
checking.
5 Click Save.
Activating or Deactivating Repositories
You can control whether a reuse repository is loaded by the language servers
by using the checkboxes in the Active column. If you want to update the
contents of the repository but do not want users to check with a repository
that might change, you can deactivate the repository.
â
To activate or deactivate a repository assignment:
•
In the relevant rows, select or deselect the checkbox in the Active column
to activate or deactivate the assignment between the rule set and the
resue repository and click Save.
Language Server Statuses and Warnings
The table on the Reuse Repositories Configuration page contains a Status
column which displays the language server status of each reuse repository.
The language server status indicates the availability of the reuse repository
to plug-in users.
Language Server Statuses
The following table describes the possible loading statuses.
Status
Description
Loaded
The language server loaded the reuse
repository and the reuse repository
is available to plug-in users.
Loading
The language server is in the process
of loading the reuse repository.
Not loaded
The reuse repository was loaded by
the language server and is not
available to plug-in users.
Changes not loaded
The reuse repository was edited on
the reuse repository configuration
25
Status
Description
page but the changes are not yet
loaded by the language server.
Language configuration unavailable
The reuse repository was created for
a language which is no longer
configured. This status is usually
displayed when the server cannot
locate the language configuration file
configuration.properties in either
one of the following directories:
•
•
<INSTALL_DIR>\data\<LANG_ID>\
%ACROLINX_CONFIGURATION_ROOT%\data\<LANG_ID>\
If the language configuration file does
not exist, copy your backup of this
file to the following location:
%ACROLINX_CONFIGURATION_ROOT%\data\<LANG_ID>\
If no backup copy exists, reinstall
your linguistic resources.
Language server unavailable
A language server is configured with
the language that is required by this
repository. However, the language
server is not running.
Start the language server for the
language that you are working with.
Reuse Repository Warnings
If the directory which stores the reuse repository has been deleted from your
resources, the name of the reuse repository appears with a red border on
the Reuse Repository Configuration page.
Figure 3: A Reuse
Repository Warning
After you select another valid repository and save your changes, the red
border is removed and the missing repository is removed from the repositories
dropdown.
Exporting a Reuse Repository
You can use the Export Repository feature to export a list of clustered
sentences to a text file for users to review offline. After you have reviewed
the export file, you might want to create a revised list of representative
sentences which you can import to a new repository.
26
Managing Reuse Repositories
â
To export a reuse repository, follow these steps:
1 Open the Clusters Page for your repository.
2 Click Export Repository.
The
Export Reuse Repository dialog box appears.
3 Select one of the following options:
•
Include repository summary to include the repository summary at the
beginning of the file.
The repository summary contains information about when the export
was created and the number and size of clusters in the repository.
•
Include clusters summary to include a cluster summary above each
cluster.
The cluster summary contains information about the number of
sentences and representatives in each cluster.
•
Include representatives only to export only the representative sentences
from each cluster.
4 Click OK.
5 Right click the Download Link that appears, and click Save Target As.
Checking for Reuse Issues in the Acrolinx Plug-ins
To check for reuse issues in the Acrolinx Plug-ins:
•
•
select a rule set that has a reuse repository assigned.
select the Reuse option.
Deleting a Reuse Repository
You can permanently delete repositories which are not being updated or are
active in a rule set.
The Active In column on the Repositories page displays the value 'n/a' for
repositories which are not active in any rule set. Repositories with this value
can usually be deleted unless they are also being updated.
If the repository you want to delete is active in one or more rule sets, remove
all associations to the repository on the Reuse Repository Configuration page
(see "Assigning a Reuse Repository to a Rule Set" on page 24) before
commencing this procedure.
â
To delete a reuse repository, follow these steps:
1 On the Repositories page, select the repositories to delete.
2 Click Delete.
The selected repositories are marked for deletion on the Repositories page.
Repositories that are marked for deletion appear in strike-through
27
formatting. Deleted repositories are removed from the list after the core
server is restarted.
Backing up Reuse Repositories
You should back up your repositories in case your installation is corrupted
or lost.
Reuse repositories are stored in the directory
<INSTALL_DIR>\data\reuse\<LANGUAGE_ID>\<REPOSITORY_NAME>.
Example: C:\Program_Files\acrolinx\acrolinx\data\reuse\EN\Topspin
â
To back up all reuse repositories and sentence banks, follow these steps:
1 Stop the core server.
2 Make a copy of the directory <INSTALL_DIR>\data\reuse\.
3 Restart the core server.
28
Index
Index
A
R
Acrolinx Reuse, overview • 4
harvesting sentences
with Acrolinx Batch Checker • 9
with Acrolinx Plug-in • 9
reuse
prerequisites • 6
process overview • 4
repositories • 7
using • 26
reuse repositories
activating • 24
adding harvested sentences • 10
assigning • 24
backup • 27
cancelling • 15
clusters • 11
creating • 7
creating from import files • 13
deactivating • 24
deleting • 26
empty • 14
enabling • 23
exporting • 25
harvesting sentences • 7
importing sentences • 13
interface • 23
managing • 23
updating • 7
rule sets • 24
I
S
importing
import log messages • 14
text or TMX files • 13
sentence banks
compatability • 9
contents • 8
creating • 8
default • 8
legacy sentence banks • 9
overview • 8
sentences, identifying • 7
C
clusters
adding sentences • 20
editing • 18
initial status • 13
interface • 16
list view • 19
managing • 16
minimum cluster size • 11
minimum word count • 11
new • 21
removing sentences • 20
representative sentences • 18, 20
status • 19
strictness • 12
H
L
language servers
statuses • 24
warnings • 24