Unify Enterprise Search Remote indexing template

From VYRE

Jump to: navigation, search
Unify Enterprise Search Remote indexing template
module: Content module
supplier: VYRE Ltd.


The remote indexing template dictates what information from a content store is stored and indexed, and how.

Contents

Creating

To create a remote indexing template (RIT) for a content store go to Publishing → Files → Code templates → Remote indexing templates. There, create a template and assign it to a content store in its edit screen (under Remote Indexing).

To submit all items for that store to the remote index click EditActionsSend all items to remote index.

It is also possible to reindex a selected item (itemEditActionsReindex this item).

Template Code

The template is created to represent the content store item as XML using FreeMarker.

Here is a sample template that stores two attributes from a content store:

  <SearchableItem>
    <documentBoost>1.0</documentBoost>
    <properties>
      <entry>
        <SearchableProperty>
          <name>title</name>
          <value><![CDATA[${item.name}]]></value>
          <boost>1.0</boost>
        </SearchableProperty>
      </entry>
      <entry>
        <SearchableProperty>
          <name>description</name>
          <value><![CDATA[${item.description}]]></value>
          <boost>1.0</boost>
        </SearchableProperty>
      </entry>
      <!-- in 4.6.3 you can declare a multiValued field -->
      <entry>
        <SearchableProperty>
          <name>tags</name>
          <values>
            <value>movie</value>
            <value>action</value>
            <value>batman</value>
          </values>
          <boost>1.0</boost>
        </SearchableProperty>
      </entry>
    </properties>
  </SearchableItem>

As you can see the template represents the overall structure of the item as XML with content store attributes inserted or calculated using FreeMarker expressions or interpolation.

Configuration

SearchableItem

This is the item indexed by the search engine and it mandates two child nodes: documentBoost and properties which is an iterable of SearchableProperties describing each attribute.

documentBoost

A float value representing the overall score multiplier given to this document at index time. As an example would a documentBoost value of 1.5 mean that this document would be given an inflated score that's 50% higher than its natural rank. This allows you to specify that documents from one datastore are to be considered more relevant than others.

properties

A map of searchable properties that represent each stored value.

SearchableProperty

A searchable property encapsulates how the value of a single attribute from a content store item and mandates how the search server should treat it at index time.

name

The identifier or name of the indexed attribute, e.g. 'title'

value

The value of the indexed attribute.

values

A list of values used for a multiValued field. When values is specified, the value element which is a child element of SearchableProperty will be ignored.

boost

A float value representing the overall score multiplier given to keyword hits in this attribute at query time. This means that a hit in a given attribute can be given higher or less weighting than others. As an example could a hit in a 'title' field be considered more valuable to natural rank than a hit in a 'description' or 'body' field.

Helper Object

To help with retrieving values in the RIT, a helper object is added. Methods can be called on this object to retrieve different information. (Source code for this helper can be found in RemoteIndexTemplateHelper.java)

For example ${helper.getAttributeValue( '18' )} will return the String value for attribute 18 of the current item (or the default value, see below).

Method Call Return Object Description
getAttributeValue( String attributeId ) String Returns the attribute value for the current item, assumes the default from the helper.
getAttributeValue( String attributeId, String defaultValue ) String Returns the attribute value for the current item, or defaultValue if the attribute is null.
getAttributeValue(Item item, String attributeId, String defaultValue ) String Returns the string value for the given attribute id, assumes the item has the loaded content and metadata
getAttributeValue(Item item, String attributeId, String defaultValue, boolean loadContentMetadata ) String Return the value for the given attribute id for the item passed, or if the value is not present return the defaultValue.

If true is passed for loadContentMetadata the content and metadata is loaded, this should be used sparinly. For example, if only one attribute is needed for a linked item then this can be used, but if many attributes are needed then .loadContentAndMetadata() should be called on the item once and false passed to the many attribute calls. Deprecated since 4.6.

N.B. If null is passed as the defaultValue, the default value for the helper is used.

getAttributeValueByName( String attributeName ) String Returns the attribute value for the current item, assumes the default from the helper.
getAttributeValueByName( String attributeName, String defaultValue ) String Returns the attribute value for the current item, or defaultValue if the attribute is null.
getAttributeValueByName(Item item, String attributeName, String defaultValue ) String Returns the string value for the geiven attribute name, assumes the item has the loaded content and metadata
getAttributeValueByName(Item item, String attributeName, String defaultValue, boolean loadContentMetadata ) String Returns the value for the given attribute name for the item passed, or if the value is not present return the defaultValue.

If true is passed for loadContentMetadata the content and metadata is loaded, this should be used sparinly. For example, if only one attribute is needed for a linked item then this can be used, but if many attributes are needed then .loadContentAndMetadata() should be called on the item once and false passed to the many attribute calls. Deprecated since 4.6.

N.B. If null is passed as the defaultValue, the default value for the helper is used.

getLinkedItemInfos( String linkDefId ) List<ItemInfo> Gets the linked itemInfos for the current item (that have not been deleted).
getLinkedItemInfos( Item item, String linkDefId ) List<ItemInfo> Gets the linked iteminfos for the passed item (that have not been deleted).
getLinkedItems( String linkDefId, boolean loadContentMetadata ) List<Item> Gets the linked items for the current item, note the content and metadata should not be loaded unless it is needed for all the items.
getLinkedItems( Item item, String linkDefId, boolean loadContentMetadata ) List<Item> Gets the linked items for the passed item (that have not been deleted), note the content and metadata should not be loaded unless it is needed for all the items. Deprecated since 4.6.
getLinkedItems( Profile userProfile, String linkDefTitle ) List<Item> Gets the linked items for an profile (user profile) by UserLink definition title. Since 4.8
getItemFromItemInfo(ItemInfo itemInfo, boolean loadContentMetadata ) Item Returns the item for the passed ItemInfo, if loadContentMetadata is true it will load the content and metadata. Deprecated since 4.6.
getDefaultValue() String Gets the default value that the helper will use if an attribute is null. Removed in 4.6.3.
setDefaultValue(String defaultValue) void Sets the default value that the helper will use if an attribute is null. Removed in 4.6.3.
getLinkedUserProfiles( Item item, String userLinkDefinitionId) List<Profile> Returns list of linked user profiles for the specified Item. Since 4.7. Deprecated since 4.8, use getLinkedUserProfilesByTitle
getLinkedAttributeValue (Item item, String linkDefId, int attributeId) Map<String, String> map of linked item ids mapped to the attribute values. since 4.7
getLinkedAttributeValue (Item item, String linkDefId, String attributeName) Map<String, String> map of linked item ids mapped to the attribute values. since 4.7
getItemSubcategoriesByPath(Item item, String categoryPath ) List<Category> Get all taxonomised categories from a parent category (taxonomy path) for a specified item. since 4.7.
getItemSubcategories(Item item, String categoryId ) List<Category> Get all taxonomised categories from a parent category for a specified item. since 4.7.
getSolrDate(String unifyDate, String defaultValue) String Used to convert unify date ( "dd.MM.yyyy" OR "dd.MM.yyyy HH:mm") into Apache Solr-compatible ISO 8601 date strings. Output timezone is always UTC. since 4.7
getSolrDate(Date date, String defaultValue) String Used to convert unify date ( "dd.MM.yyyy" OR "dd.MM.yyyy HH:mm") into Apache Solr-compatible ISO 8601 date strings. Output timezone is always UTC. since 4.7
getAttributePresentationValue( String attributeId, String itemValue) String Gets the attribute presentation rule value for an item value. since 4.7.
getAttributePresentationValue(String attributeName, CollectionSchema collectionSchema, String itemValue) String Gets the attribute presentation rule value for an item value. since 4.7.
getAttributePresentationValue(AttributeDefinition def, String itemValue) String Gets the attribute presentation rule value for an item value. since 4.7.
getBeanFactory() BeanFactory Provides the access to the global bean factory. since 4.7.
generateSearchableProperty(String fieldName, boolean value) String Generates Searchable Property xml for boolean field. since 4.7.
generateSearchableProperty(String fieldName, Number value) String Generates Searchable Property xml for Number field. since 4.7.
generateSearchableProperty(String fieldName, Date value) String Generates Searchable Property xml for Date field, if date field is not empty. since 4.7.
generateSearchableProperty(String fieldName, String value) String Generates Searchable Property xml for String field if String is not empty since 4.7.
generateSearchableProperty(String fieldName, Collection<String> value) String Generates Searchable Property xml for a list of String values for non-empty collections. since 4.7.
getDerivedFileExtensionByServiceName(String fileServiceName, String defaultValue) String Gets the file extension for the derived files of the indexed item. since 4.8.
getDerivedFilePrettySizeByName(String fileServiceName, String defaultValue) String Get the (pretty print version of ) file size of derived file of the indexed item. since 4.8.
getLinkedUserProfilesByTitle(Item item, String userLinkTitle) List<Profile> Gets a List of user Profile(s) linked to the specified item via a user link definition.
getLinkedItemsByTitle(Item item, String linkTitle) List<Item> Gets a List of Item(s) linked to the specified item via a link definition.

Default fields indexed

The fields below are all stored automatically when sending an item to the remote index. Note: A common mistake is that people forget to add these fields to the retrieved fields within the portlet configuration.

Variable Name Stored Value
uuid id of the item. Stored and indexed.
item id of item. alias of uuid. since 4.6.3.
active if this item is active, will be "true" or "false". Stored and indexed.
title name filed of the item. Stored and indexed.
description description field of the item. Stored and indexed.
keywords keyword field of the item. Stored and indexed.
collection id of the collection. Stored and indexed.
creationDate date the item was created (in SOLR format 2007-12-24T23:59:59Z). Stored and indexed.
lastModifiedDate date the item was modified (in SOLR format 2007-12-24T23:59:59Z). Stored and indexed.
creator profile id of the item creator. Stored and indexed.
lastModifier profile id of the last modifier of the item. Stored and indexed.
category stores the string values of the taxonomy categories. Stored and indexed.
categoryIds stores the id values of the taxonomy categories or prints "none" if item is not taxonomised. Stored and indexed.
leafCategoryIds the leaf categories. Stored and indexed.

NB. If an item belongs to category X and none of the descendants of X, then we say that X is a "leaf category" of that item.

locale locale of the item. Stored and indexed.
secondary "true" if this is a secondary item. Stored and indexed.
primary "false" if this is a secondary item. Stored and indexed.
primaryItem id of the primary item ("none" if this is a primary item). Stored and indexed.
defaultSearch the field that will be searched if no field is specified explicitly in a search. By default this contains title, description, derived file information, keywords and category data. Indexed only; this is not stored and can not be obtained from a set of results. It can only be searched against.
secondaryItem_{locale} ex: secondaryItem_ca_ES is the id of the catalan version of this primary item (or this secondary items primary item)

Access to static methods

Since Unify 4.7 it has became possible to address static methods in the Remote Indexing templates. To do this, refer to statics constraint:

  <#assign iterableNumbers = statics["com.google.common.base.Splitter"].on(",").split("1,2,3,4")>

Changes in the Remote Indexing Template since the retirement of the Rabida project

The old Rabida project was a wrapper around a Solr search server, and has since been replaced with calls made directly to a Solr search server configured to work with Unify content items. The old remote indexing template allowed you to specify a document UUID, however, if the template developer made a mistake and provided the wrong id for this value, the result was a document in Solr that could not be deleted by Unify. This feature has been removed.

In addition, the old remote indexing templates allowed you to specify if a field was stored, indexed or tokenised. Calls made directly to Solr do not allow this. If you need to set up fields that deviate from the standard field settings then you can do so by editing the Solr schema file. Currently the only option you are allowed to specify for a field other than name and value is a boost, as explained above. A separate document-level boost can also be provided in the template.

Changes in Solr configuration for use with Unify 4.6

No changes have been made to the way the remote indexing template works between versions 4.5.0.1 and 4.6, but some of the indexing behavior on the Solr side has changed. The new index will use Solr Trie numeric types for numeric and date fields, this allows for faster range queries. Fields which contain boolean values now use the boolean type rather than the text "true" and "false", unnecessary default copy fields have been removed, some field types have been renamed to have more sensible names, and the field that searches occur on by default is now called "defaultSearch".

Any of the default indexed fields can be over-riden by supplying your own value for them in the remote indexing template.

If, for 4.6, your remote indexing templates still contain a UUID element which specifies item id, then you should drop this element from the template. Frequently, there is a call to a Freemarker method to load supporting data embedded within this XML element, usually called something like loadDataModel. The patten for this method is for it load and assign a number of template-level variables, and then return the UUID of the item. You can still call the method by placing it earlier in the template, outside of an XML tag, but change the return type to some form of white space.

You can also supply your own fields in the remote indexing template. How you name these fields will determine how they are indexed. At the moment, the default schema.xml file in Solr is set up to allow a number of dynamic fields. These are as follows.

Pattern of variable name Matching data type configured in schema.xml
*_s A plain string. This string is stored as-is, and is not tokenised before it is stored.
*_i An integer.
*_l A long.
*_t A block of text, which is parsed using the rules for the "textBlock" field type. It is tokenised on white space, and then filtered for stop words, stemmed words, synonyms, and is lower cased.
*_b A boolean
*_f A float value
*_d A double value
*_dt A date and time value in Solr date time format
*_ws White space delimited text.
*_com Comma delimited text.
*_tight Present for legacy purposes only. Do not use.
*_sortable A sortable text field. Fields of this type are indexed but not stored, you cannot get their original value back from Solr, you can only search on them and sort on them.
* A fall through value for fields that match none of the above patterns. These are always stored as the "textBlock" type described above.

See Also

Personal tools