Federated search engine and PAPIzed Wiki

Federated search engine
and the (PAPIzed) TF-emc2Wiki
TF-EMC2
The Searchy Architecture
● Each source incorporates an agent, available through a
SOAP interface
● Uses RDF as internal representation
● Agents for LDAP, SQL, the Google API, and Searchy itself
Federated search engine and the (PAPIzed) TF-emc2Wiki
Searchy test installation
● To evaluate federated data acces using Searchy
● Build a directory of middleware resources
● Using each organization's data sources
● Installing a Searchy agent in your systems
● Initially, RedIRIS runs the main search interface
http://www.rediris.es/busquedas/searchy/middleware/index.en.phtml
● Prepare a report with your feedback as a deliverable
Federated search engine and the (PAPIzed) TF-emc2Wiki
Installing your Searchy agent
● Download and unpack the lattest Searchy distribution
● http://jsearchy.sourceforge.net/
● You only need J2SE >= 1.4
● Select your data sources (backends)
● SQL
● LDAP
● Web servers (Google API for a restricted search)
● Configure your agent
● Use the sample agent configuration file in the conf directory
● Or the simplified configuration to be distributed in the list
● Support at http://lists.sourceforge.net/lists/listinfo/jsearchy-users
● Register your agent
● Host and port
● [email protected]
Federated search engine and the (PAPIzed) TF-emc2Wiki
Configuring your Searchy agent
● Searchy configuration is contained in a XML file
● conf/agent.xml
● Three main elements
● <transport>
● General parameters of the agent
● <provider>
● Access parameters to the different data sources
● More than one provider can be used for an agent
● <map>
● Take care of the data transformations
● Queries received by the agent into queries to the provider(s)
● Responses from the providers into metadata to be sent by the agent
Federated search engine and the (PAPIzed) TF-emc2Wiki
The <transport> element
● Basic configuration parameters
● Identifier for the agent
● Providers to be used
● Port to listen at and maximum number of connections
● Log configuration (using log4j)
● Vocabulary to be used by the metadata
● A subset of Dublin Core is going to be used:
● dc:title, dc:subject and dc:description for queries
● dc:title, dc:subject, dc:description, dc:creator (and URL!) for
responses
● ACLs to be applied when receiving
● Simple rules based on hostname or IP addresses
● Pilot config only accepts connections from certain RedIRIS hosts
Federated search engine and the (PAPIzed) TF-emc2Wiki
The <provider> element
● Identifier, type and applicable map
● The rest of parameters depend on the type
● Three types included in the pilot config
● Google
● The account key to be used when connecting to the WS interface
● SQL
● A valid JDBC driver class name
● Connection data: URL using the jdbc method, hostname, port,
database, username, password
● LDAP
● URL for the LDAP server
● Root and search scope
● Other LDAP parameters: follow referrals, timeout,...
Federated search engine and the (PAPIzed) TF-emc2Wiki
The <map> element
● Map name and applicable vocabulary
● Elements describing input/outpust transformations
● <URL>: Do not fiddle with it unless you know what you're doing!
● One element per input term (type="query")
● How query term is translated into the backend query language
<dc:title filter="query">
SELECT titleDB, subjectDB, creatorDB, descriptionDB FROM table
WHERE (titleDB="%query%")
</dc:title>
● One element per output term (type="response")
● How results field (enclosed between %) are transformed to build the
term contents in the response
<dc:description type="response">
%snippet%
</dc:description>
Federated search engine and the (PAPIzed) TF-emc2Wiki
The (PAPIzed) TF-emc2Wiki
● Available at
http://www.rediris.es/wiki/tf-emc2/
● Protected by PAPI
● Possibility of full and read-only access
● We'll be happy to make interoperability tests with other AAIs
● We'll include all the users in the mailing list
● Username: your e-mail address
● Password: you'll receive one that you can (should) change
● Those already with access to the JRA5Wiki will be automatically
enabled
Federated search engine and the (PAPIzed) TF-emc2Wiki