Index Call - Managing Index Data


Service Call:
http://api.semantichacker.com/TOKEN/index/INDEXID/COMMAND?

Overview

The index service provides calls for retrieving data from an index and managing a custom index. Individual items and facet data can be retrieved from all public indexes and from custom indexes. Additional calls are provided for adding, updating, and deleting items from a custom index. These calls are only available once you have separately licensed a Custom Content index from TextWise LLC. Licensing your own content index allows you to use the SemanticHacker API to perform Similarity Search operations against your own custom content. Contact us for pricing and other information regarding Custom Content index services.

Upon licensing a Custom Content index, an index ID will be assigned to your Semantic Hacker token and you will be able to add, delete, and update your index in addition to performing Similarity Search operations. Our authorization system will ensure that only your token has access to the content contained within your index.

Differences From Other API Services

The index service differs from the other available API services in a few ways. First, the upload size limits are much more generous. By default they are 15 Megabytes. In addition, the uploaded file can be compressed. This allows even more information to be sent in a single upload. Second, the index service does not require a call to specify content to be processed. Instead, all needed information is uploaded in the specified xml format to be processed.

Index Commands

The following index control commands are used to load data into the index, to obtain the current state of the index, retrieve a specific item contained within the index, or to obtain facet data from the index.

Command Purpose Availability
batch A 'batch' command is used for instructing the system to apply a set of changes to the index in a manner that will not directly affect the runtime performance of the index. Batch commands require a content payload file which contains the modifications to the index to be provided. When a batch command is received by the system, the content payload will be saved and a batch ID will be returned to the caller. The batch ID can be used by the status call to determine the state of the batch process.
A batch process call will apply all changes specified within the content payload to a copy of the index. Once all changes are applied, the copy will be swapped into production and the changes will be available. This method of modifying an index is the preferred method when dealing with large change sets.
Your custom indexes
live A 'live' command is used for instructing the system to apply a set of changes to the index immediately. Live commands require a content payload file which contains the modifications to the index to be provided. Once the payload file is received, the system will immediately apply the changes to the running index and return a summary of the operations performed.
While this command enables the ability to update an index immediately, it can have a negative impact on performance if a large number of modifications are provided in a single call and care should be taken when used. By default the command restricts the number of modifications performed by a live call to 100.
Your custom indexes
status The 'status' command provides the ability to understand the general status of an index. If a batch ID returned from a batch command is provided, the status of that batch will be included within the response. Your custom indexes
item The 'item' command provides the ability to obtain all information about an item we have stored within an index. Items are identified by providing their external ID within the item call. All public indexes and your custom indexes
facets The 'facets' command provides the ability to obtain a list of all facets which have been defined for the index.
The facets command supports an optional "includeFacetValueCounts=true" parameter. When this parameter is included with the facets command, the system will return the list of all known facets, all known values for each facet, and the count of documents that have a specific facet value.
All public indexes and your custom indexes

Index Call Parameters

The index call differs from the other Semantic Hacker API calls in that it does not support the common request parameters. The only parameters available are 'batchId' for the status command, and 'externalId', 'showLabels', and 'fields', for the item command.

Name Value Description
batchId A batch ID returned from a previous batch command. Only supported with the status command. The parameter is optional. If not included the status command will return the general status of the index.
externalId The external ID of an item in the index. Only supported, and required, for the item command.
showLabels 'true' or 'false'. 'false' is the default. Include labels with the signature returned for a single item. Only supported, and optional, for the item command.
fields A comma separated list of field names to return for the selected item Only supported, and optional, for the item command. Defaults can be configured on per-index basis at installation time. If defaults are not configured, all stored fields on the item will be returned.
includeFacetValueCounts 'true' or 'false'. 'false' is the default Only supported, and optional, for the facet command. Instructs the system to include the possible values or facet ranges for all facets as well as the number of items which have a specific facet value of facet range.
facetValueOrderBy 'value' or 'documentCount'. 'value' is the default Only supported, and optional, for the facet command. It will be ignored if the includeFacetValueCounts parameter is not included. Instructs the system to order the facets values by name or document count.

Content Payload format

The content payload needs to be provided to a 'live' or 'batch' call as either a file included within a multipart/form-data POST, or sent as the body of a POST or PUT. In the case of multipart/form-data files, the file extension will be used to determine file type. Uncompressed xml, gzip compressed xml, and bzip2 compressed xml are supported with file the extensions, '.xml', 'xml.gz', and 'xml.bz2', respectively. In the case of POST or PUT body requests, the Content-Encoding header of the request will be used to determine file encoding. The table below details the mappings.

Request Type For XML For Gzip XML For Bzip2 XML
multi-part/form-data
(by file extension)
.xml .xml.gz .xml.bz2
POST or PUT body
(by Content-Encoding)
UTF-8 ( or blank ) 'gzip' or 'x-gzip' 'bzip2' or 'x-bzip2'

semantichacker_content_payload.xsd can be used to generate programming bindings for the purpose of creating a content payload file. The following sections provide an overview of the required elements.

content-payload

This is the root element of the payload. This element supports the following optional attributes for use within a batch command only -- the attributes are ignored otherwise.

Attribute Value Purpose
clearIndex true or false (default is false) Clears all items from the index before performing any item operations contained within the payload.
deleteItemsOlderThan XML duration type. example: P6D == 6 days Removes all items from the index that have been in the index longer than the specified duration. Since the updating of an item resets the age of an item to zero, the specification of this attribute causes the delete to occur after all item operations contained within the payload are performed.

The following example demonstrates a content payload request that will delete any items older then 10 days. It also explicitly declares 'false' for the clearIndex attribute.

<content-payload deleteItemsOlderThan="P10D" clearIndex="false">

facets

Facets are simply attributes of an item that exist within the index and provide a mechanism to restrict the search space of the index based on specific values (i.e. all items with a color of blue). Once facets have been defined within an index payload file and loaded into the index, they can specified as constraints within a match request.

Facets are defined for the index using the facets element inside an index payload file and are processed by the system when the payload file is sent via a batch load call. The system will ignore any facet definitions that are included in a payload file submitted via a live load call.

In addition to containing the 'facet' elements, the facets element has an optional reset="true|false" attribute. If the reset attribute is set to true, all facets previously defined for the index will be removed from the system and any new facet definitions that were provided will be applied. If the attribute is not set to true and the facets element contains new facet definitions, the new and existing facets will be applied to all items within the index. A facets definition with only the reset="true" set may be provided for the the purpose of clearing all facets from the index. The reset attribute is not required and defaults to false.

Supported Data Types for Facets

Type Description Supports multiple values per item Supports Value Range Counts Supports Value Counts
string Facet values are character strings. The maximum number of unique values for the index is 65536. This is a good choice for attributes with a known small set of string values, such as a retailer category attribute with values like 'Books', 'DVDs', 'Sports', etc. true false true
short Facet values must parse into integers in the range -32768 to 32767. This is a good choice for a custom numeric category attribute with a small range of values. Another example use case is for a user rating attribute where it is useful to track the number of items that have each rating. true true true
int Facet values must parse into integers in the range -2147483648 to 2147483647. This is a good choice for attributes with a wider range of values that will not fit into a short facet, e.g. the number of times an item was sold in a year. true true false
long Facet values must parse into integers in the range -263 to 263-1. This is a good choice for attributes with a wider range of values that will not fit into an int facet, e.g. mapping a wide range of dates with seconds precision that do not fit in the int or timestamp facets. true true false
float Facet values must parse into floating point numbers in the range 2-149 to (2-2-23)*2127. true true false
price Wrapper around the int facet with additional parsing support. Value strings can be whole amounts, or can have a decimal point with one or two digits, and must be in the range 0 - 21474836.47. Value strings that end in a decimal point or that do not start with a digit are invalid. When printing facet values the value is divided by 100 and printed with a decimal point and two digits after. false true false
timestamp Wrapper around the int facet with date specific parsing support. The dateformat attribute must be specified when defining a facet of this type. Values are parsed into Unix timestamps (seconds since midnight Jan 1, 1970 UTC) and stored as integers. Values must be in the range midnight Jan 1, 1970 UTC to 03:14:07 Jan 19, 2038 UTC. For requirements with wider date ranges, timestamps with millisecond precision, or dates without time component, the long or int facet can be used with an application specific mapping. false false false

Attributes For Defining a Facet

Attribute Description Required
type Indicates the facet type and must be equal to one of the facet types defined above. Yes
multivalue If set to true an item can have one or more values for the facet. Otherwise each item must have exactly one value for the facet. Note that a parsing error will occur if this is set to true for facets that do not support multiple values. Examples where multivalue would need to be set to true include a category attribute where items can be assigned to more than one category, or for a product index where a product may be available in more than one color. No. Default value is 'false'.
multivalueDelimiter Indicates the character string that will be used to separate individual values for multi-valued facets. No. Default value is ',' (except for the timestamp facet which uses '|').
name The name of the item attribute that will be used for the facet. Must not be equal to 'SEMANTIC_CATEGORY' (see below). Note that the attribute for the facet will automatically be added to the attributes that will be stored for the index (if it is not already defined there). Yes
dateformat Special attribute for the timestamp facet that specifies the format string to use when parsing value strings. The dateformat value must conform to Java's SimpleDateFormat pattern specification. Required for the timestamp facet, ignored for other facets.

A numeric facet definition may have a ranges element containing 1 or more range elements. The system will keep track of how many items have a value for the facet that falls into each range. A range definition must have either the start attribute defined, the end attribute defined, or both defined. If the start attribute is omitted, it defaults to negative infinity. If the end attribute is omitted, it defaults to positive infinity. The starting point for a range is inclusive, the endpoint is exclusive. It is OK for range definitions to overlap.

Value range definitions can be used in applications to allow users to narrow their match results such as selecting only those matches that fall into a certain price range or for reducing a match list of items to only those that are above a desired popularity rating.

The following example shows a facets element containing several facet definitions:

   <facets>
      <facet  name="productGroup" type="string" multivalue="true" multivalueDelimiter="," />
      <facet  name="productWeight" type="short" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0" end="10" />
            <range start="10" end="20" />
            <range start="20" />
         </ranges>
      </facet>
      <facet name="created" type="timestamp" dateformat="yyyy-MM-dd HH:mm:ss Z"/>
      <facet name="price" type="price">
         <ranges>
            <range start="0.0" end="100.0" />
            <range start="100.0" end="200.0" />
            <range start="200.0" end="300.0" />
            <range start="300.0" end="400.0" />
            <range start="400.0" end="500.0" />
            <range start="500.0" end="1000.0" />
            <range start="1000.0" />
         </ranges>
      </facet>
      <facet name="popularityFunc" type="float" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0.0" end="50.0" />
            <range start="50.0" end="60.0" />
            <range start="60.0" end="70.0" />
            <range start="70.0" end="80.0" />
            <range start="80.0" end="90.0" />
            <range start="90.0" />
         </ranges>
      </facet>
   </facets>

The 'SEMANTIC_CATEGORY' facet is provided automatically for each index. During the item load process the category service is used to generate a list of one or more category IDs for the item. The category IDs are stored in the item's attribute map using 'SEMANTIC_CATEGORY' as the attribute name. This facet will always exist for the index and cannot be removed by the reset attribute of a facets element definition.

Notes:

item-signature-attributes

Indexes are built using the Semantic Signatures of the provided content, and depending on mutually-agreed-upon index creation rules the index this may or may not be an optional element within the content payload. Regardless, the purpose of this element is to instruct which attributes of a provided content item are to be used when creating the Semantic Signature of the item.
When this element is provided, at least one child element named attribute must be provided or the payload will be considered invalid by the system. If included, this element must be appear before any item elements in the content payload.

The following example demonstrates a item-signature-attributes element that will be used by the system to determine which fields to use for each content item used to construct Semantic Signatures for the item.

<item-signature-attributes>
	<attribute>title</attribute>
	<attribute>description</attribute>
</item-signature-attributes>

item-stored-attributes

The external ID for an item will always be returned when performing a Similarity Search on an index. Additionally, an index can return any number of named attributes of a content item. Depending on index creation rules at initial setup, this element is used to instruct the system which attributes of a content item to store so that they can be returned to the caller of a match(Similarity Search) or item call. When this element is provided within the content file at least one child element named attribute must be provided or the payload will be considered invalid by the system. If this element is not provided within the payload, then no attributes of the content item will be stored by the system. When included, this element must be appear before any item elements in the content payload.

The following example demonstrates a item-stored-attributes element that will be used by the system to determine the fields to store for each content item.

<item-stored-attributes>
	<attribute>title</attribute>
	<attribute>landingPageUrl</attribute>
	<attribute>author</attribute>
</item-stored-attributes>

item

The item element represents an operation to perform on the index for a specific content item. At least one item element must be contained within the payload file unless the payload file represents a batch operation for clearing an index or deleting items that are older than a period of time. The item element must contain both an externalId attribute and an operation attribute. The external ID can be any value that is less than or equal to 1024 characters in length, and is typically used to associate an item returned from an index with an external metadata store. The operation attribute must be one of the following:

Value Purpose
ADD Adds the item to the index. An error will be reported for the item if the external ID already exists within the index. At least one attribute item must exist for the item element or an error will be reported for the item.
UPDATE Updates the item within the index by replacing all data within the index for that item with what has been provided in the payload. An error will be reported for the item if the external ID does not exist within the index. At least one attribute item must exist for the item element or an error will be reported for the item.
ADD_OR_UPDATE Adds the item to the index or updates the item if it already exists within the index. At least one attribute item must exist for the item element or an error will be reported for the item.
DELETE Deletes the item from the index. No attribute elements are required for the element. An error will be reported for the item if the external ID does not exist within the index.
The following example demonstrates an operation that will be performed on an item
<item externalId="http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/" operation="ADD">
	<attribute name="author">nepheron</attribute>
	<attribute name="title">Copper Fire Log Heater: Burns clean with alcohol!</attribute>
	<attribute name="description">
		I have entered this instructable into the Stay Warm Contest so give me a +1
		of you like it! In this instructable I am going to show you how I made an
		alcohol log style heater. The process is very simple and requires knowledge
		of soldering and filing. This device put out allot of heat/light and sme...
	</attribute>
	<attribute name="landingPageUrl">
		http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/
	</attribute>
	<attribute name="categories">craft,science</attribute>
	<attribute name="pubDate">Thu, 15 Jan 2009 20:02:40 PST</attribute>
</item>

Full Content Payload Example

This following example demonstrates a content payload that will can be used in either a 'batch' or 'live' command. If the payload was sent with a 'batch' command, that process will remove any items from the index that are older than 7 days before processing the items in the payload. If the payload was sent with a batch command, the 'deleteItemsOlderThan' attribute would be ignored.

<?xml version="1.0" encoding="UTF-8"?>
<content-payload deleteItemsOlderThan="P7D" xmlns="http://www.semantichacker.com/api">
	<item-signature-attributes>
		<attribute>title</attribute>
		<attribute>description</attribute>
	</item-signature-attributes>
	<item-stored-attributes>
		<attribute>title</attribute>
		<attribute>landingPageUrl</attribute>
		<attribute>author</attribute>
	</item-stored-attributes>

	<item externalId="http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/"
		  operation="ADD">
		<attribute name="author">DebH57</attribute>
		<attribute name="title">Go Green Upside Down Hanging Planters</attribute>
		<attribute name="description">
			The concept is that it keeps plants off the ground away...
		</attribute>
		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/
		</attribute>
		<attribute name="categories">green,home</attribute>
		<attribute name="pubDate">Sat, 10 Jan 2009 03:27:42 PST</attribute>
	</item>

	<item externalId="http://www.instructables.com/id/Lined_Messenger_Bag/"
		  operation="ADD">
		<attribute name="author">cwickham</attribute>
		<attribute name="title">Lined Messenger Bag</attribute>
		<attribute name="description">
			How to make a messenger bag with inner lining.  This is a wool felt bag
			lined with vinyl (for some waterproofness). The basic idea is to make a
			bag shape out of both fabrics then slot one inside the other and join
			them around the opening.  This means there are no rough seams on
			the inside.  It also...
		</attribute>
		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Lined_Messenger_Bag/
		</attribute>
		<attribute name="categories">craft,ride</attribute>
		<attribute name="pubDate">Sun, 11 Jan 2009 17:49:15 PST</attribute>
	</item>

	<item externalId="http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/"
		  operation="ADD">
		<attribute name="author">nepheron</attribute>
		<attribute name="title">Copper Fire Log Heater: Burns clean with alcohol!</attribute>
		<attribute name="description">
			I have entered this instructable into the Stay Warm Contest so give me a +1
			of you like it! In this instructable I am going to show you how I made an
			alcohol log style heater. The process is very simple and requires knowledge
			of soldering and filing. This device put out allot of heat/light and sme...
		</attribute>
		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/
		</attribute>
		<attribute name="categories">craft,science</attribute>
		<attribute name="pubDate">Thu, 15 Jan 2009 20:02:40 PST</attribute>
	</item>
  </content-payload>

Index Service Response Format

All calls made to the index service will result in a response that adheres to the XML structure defined within the semantichacker_index_response.xsd schema. The following sections provide an overview of the various elements that can be returned by the service.

about

The about section for the index service is a subset of the about section used for other API service calls. The following table lists the elements that could be provided in the about section for an index service call.

Field Description
requestId TextWise-generated request ID unique for each request.
systemType Identifier of the service call that was executed ("signature", "match", "category", etc).
externalId A caller-defined value that is echoed back in the response.
requestDate An ISO_8601 formatted date that represents when the request was submitted.
systemVersion Identifies the version of the system that serviced the request.

contentIndexStatus

This element will always be included within the response regardless of the command and provides basic information about the status of the index. If the response is the result of a 'status' command without a 'batchId', then the contentIndexStatus element will be the only element contained within the response besides the about element.

<contentIndexStatus indexId="myCustomIndex"
	dictionaryId="odp_2007_l1_1.7k"
	dateCreated="2009-01-31T12:00:00+0000"
	currentSizetype="45367"
	capacity="5000000"
	lastModificationDate="2009-02-13T16:43:00+0000"
	modificationInProcess="false" />

batch-process-info

This element will be contained within the contentIndex element if either a 'batch' command was made or if a 'status' command was made which included a 'batchId' parameter.

Example response for a 'batch' command.
<batch-process-info batchId="1234" state="CREATED"/>

Example response for a 'status' command that includes a 'batchId' parameter for a completed batch. In this example the batch completed in just under 3.5 seconds. The index changes were 34 adds, 5 updates, 2 deletes, and one delete failed because the item specified did not exist. Note that ADD_OR_UPDATE commands will be reported as either an add, update, or failure, depending on their outcome, in the status.

<batch-process-info	processingTime="3456ms" batchId="1234" state="COMPLETED"
					totalItems="42" adds="34" updates="5" deletes="2" failed="1">
	<failures>
		<item externalId="1020">Did not exist - not Deleted</item>
	</failures>
</batch-process-info>

live-process-info

This element will be contained within the contentIndex element when a 'live' command is used. The following example response from a 'live' command that completed in just over 1 second with two item failures.

<live-process-info	processingTime="1036ms"
					totalItems="42" adds="24" updates="16" deletes="0" failed="2">
	<failures>
		<item externalId="1021">Did not exist - not Updated</item>
		<item externalId="1022">Did not exist - not Updated</item>
	</failures>
</live-process-info>

contentIndexResponse ( item command )

The contentIndexResponse element will be contained within the contentIndex element when an 'item' command is used. An index can be configured on installation to return a set of default fields or all stored attributes for the item when a 'fields' parameter is not given by the caller. If no attributes were stored, then the attributes element will be omitted. The contentIndexResponse element will also include the signature of the content item and the labels if the 'showLabels' parameter is set to 'true' by the caller.

Here is an example of a contentIndexResponse. Most of the signature dimensions have been truncated for brevity.

<contentIndexResponse>
	<contentId>1561</contentId>
	<externalId>1236691578280</externalId>
	<signature>
		<dimension weight="0.53375244" index="1528" />
		<dimension weight="0.34802264" index="1474" />
		...
	</signature>
	<attributes>
		<attribute name="title">Go Green Upside Down Hanging Planters</attribute>
		<attribute name="author" >DebH57</attribute>
		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/
		</attribute>
	</attributes>
</contentIndexResponse>

facets ( facets command )

Below are examples of facets call responses. The first is with includeFacetValueCounts set to false, the second set to 'true'. The value and range counts have been truncated for brevity.

   <facets>
      <facet  name="productGroup" type="string" multivalue="true" multivalueDelimiter="," />
      <facet  name="SEMANTIC_CATEGORY" type="short" multivalue="true" multivalueDelimiter="," />
      <facet  name="productWeight" type="short" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0" end="10" />
            <range start="10" end="20" />
            <range start="20" />
         </ranges>
      </facet>
      <facet name="popularityFunc" type="float" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0.0" end="50.0" />
            <range start="50.0" end="60.0" />
            <range start="60.0" end="70.0" />
            <range start="70.0" end="80.0" />
            <range start="80.0" end="90.0" />
            <range start="90.0" />
         </ranges>
      </facet>
   </facets>
   <facets>
      <facet  name="productGroup" type="string" multivalue="true" multivalueDelimiter=",">
         <valueCounts>
            <valueCount count="1932">books</valueCount>
                        ...
         </valueCounts>
      </facet>
      <facet  name="SEMANTIC_CATEGORY" type="short" multivalue="true" multivalueDelimiter=",">
         <valueCounts>
            <valueCount count="4522">45</valueCounts>
            ...
         </valueCounts>
      </facet>
      <facet  name="productWeight" type="short" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0" end="10" />
            <range start="10" end="20" />
            <range start="20" />
         </ranges>
         <valueCounts>
            <valueCount count="3901">1</valueCount>
            ...
         </valueCounts> 
         <rangeCounts>
            <rangeCount start="0" end="10" count="65002" />
            ...
         </rangeCounts>
      </facet>
      <facet name="popularityFunc" type="float" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0.0" end="50.0" />
            <range start="50.0" end="60.0" />
            <range start="60.0" end="70.0" />
            <range start="70.0" end="80.0" />
            <range start="80.0" end="90.0" />
            <range start="90.0" />
         </ranges>
         <rangeCounts>
            <rangeCount start="0.0" end="50.0" count="45919" />
            ...
         </rangeCounts>
      </facet>
   </facets>

Error Codes

Several error codes have been added to the system that will only be returned for the index service. They are duplicated here from the API Response Reference for completeness. The format of the error message response is unchanged.

CodeMessageExplanation
413 'Update In Progress' A previously sent index update is being processed by the system and this live update can not be performed at this time.
414 'Too Many Items For Live Update' A live update has be requested with more items then the configured maximum for your index. The update will not be performed.
415 'XML Schema or Parsing Error' The uploaded XML file for an index update can not be parsed, or does not comply to the index upload schema.