You are here

Index Service

Service Call

http://api.semantichacker.com/TOKEN/index/INDEXID/COMMAND?

Overview

The index service provides calls for retrieving data from an index and managing a custom index. Individual items and facet data can be retrieved from all public indexes and from custom indexes. Additional calls are provided for adding, updating, and deleting items from a custom index. These calls are only available once you have separately licensed a Custom Content index from TextWise LLC. Licensing your own content index allows you to use the TextWise API to perform Similarity Search operations against this custom content. Contact TextWise for pricing and other information regarding Custom Content index services.

Upon licensing a Custom Content index, an index ID will be assigned to your API token and you will be able to add, delete, and update your index in addition to performing Similarity Search operations. Our authorization system will ensure that only your token has access to the content contained within your index.

Differences From Other API Services

The index service differs from the other available API services in a few ways. First, the upload size limits are much more generous. By default they are 15 Megabytes. In addition, the uploaded file can be compressed. This allows even more information to be sent in a single upload. Second, the index service does not require a call to specify content to be processed. Instead, all needed information is uploaded in the specified XML format to be processed.

Index Commands

The following index commands are used to load data into a custom index, to obtain the current state of an index, retrieve a specific item from an index, or to obtain facet data from an index.

Command Purpose Availability
batch A 'batch' command is used for instructing the system to apply a set of changes to the index in a manner that will not directly affect the runtime performance of the index. Batch commands require a content payload file which contains the modifications to the index to be provided. When a batch command is received by the system, the content payload will be saved and a batch ID will be returned to the caller. The batch ID can be used by the status call to determine the state of the batch process.
A batch process call will apply the changes specified within the content payload using offline processing as much as possible. This method of modifying an index is the preferred method when dealing with large change sets. This is a required field, as "batch" is the only supported method of update via the API.
Custom indexes
status The 'status' command provides the ability to understand the general status of an index. If a batch ID returned from a batch command is provided, the status of that batch will be included within the response. Custom indexes
item The 'item' command provides the ability to obtain all information about an item we have stored within an index. Items are identified by providing their external ID within the item call. All public indexes and custom indexes
facets The 'facets' command provides the ability to obtain a list of all facets which have been defined for the index. All public indexes and custom indexes

 

Index Call Parameters

The index call differs from the other TextWise API calls in that it does not support the common request parameters. The only parameters available are 'batchId' for the status command, and 'externalId', 'showLabels', and 'fields', for the item command.

Name Value Description
batchId A batch ID returned from a previous batch command. Only supported with the status command. The parameter is optional. If not included the status command will return the general status of the index.
externalId The external ID of an item in the index. Only supported, and required, for the item command.
fields A comma separated list of field names to return for the selected item Only supported, and optional, for the item command. Defaults can be configured on per-index basis at installation time. If defaults are not configured, all stored fields on the item will be returned.

 

Content Payload Format

The content payload needs to be provided to a 'batch' call as either a file included within a multipart/form-data POST, or sent as the body of a POST or PUT. In the case of multipart/form-data files, the file extension will be used to determine file type. Uncompressed xml, gzip compressed xml, and bzip2 compressed xml are supported with file the extensions, '.xml', 'xml.gz', and 'xml.bz2', respectively. In the case of POST or PUT body requests, the Content-Encoding header of the request will be used to determine file encoding. The table below details the mappings.

Request Type For XML For Gzip XML For Bzip2 XML
multi-part/form-data
(by file extension)
.xml .xml.gz .xml.bz2
POST or PUT body
(by Content-Encoding)
UTF-8 ( or blank ) 'gzip' or 'x-gzip' 'bzip2' or 'x-bzip2'

 

http://textwise.com/api_docs/api/semantichacker_content_payload.xsd can be used to generate programming bindings for the purpose of creating a content payload file. The following sections provide an overview of the required elements.

content-payload

This is the root element of the payload. This element supports the following optional attributes for use within a batch command only -- the attributes are ignored otherwise.

Attribute Value Purpose
clearIndex true or false (default is false) Clears all items from the index before performing any item operations contained within the payload.
deleteItemsOlderThan XML duration type. example: P6D == 6 days Removes all items from the index that have been in the index longer than the specified duration. Since the updating of an item resets the age of an item to zero, the specification of this attribute causes the delete to occur after all item operations contained within the payload are performed.

 

The following example demonstrates a content payload request that will delete any items older than 10 days. It also explicitly declares 'false' for the clearIndex attribute.

<content-payload deleteItemsOlderThan="P10D" clearIndex="false">

facets

Facets are attributes of an item in the index. Facets provide a mechanism to restrict the search space of the index based on specific values (i.e. all items with a color of blue). Once facets have been defined within an index payload file and loaded into the index, they can specified as constraints within a match request.

Facets are defined for the index using the facets element inside an index payload file. The facets element is processed by the system on a batch load call only if the index is empty or if the clearIndex attribute in the payload file is set to 'true'.

Supported Data Types for Facets

Type Description Supports multiple values per item Supports Value Range Counts Supports Value Counts
string Facet values are case sensitive character strings. The maximum number of unique values for the index is 65536. This is a good choice for attributes with a known small set of string values, such as a retailer category attribute with values like 'Books', 'DVDs', 'Sports', etc. true false true
short Facet values must parse into integers in the range -32768 to 32767. This is a good choice for a custom numeric category attribute with a small range of values. Another example use case is for a user rating attribute where it is useful to track the number of items that have each rating. true true true
int Facet values must parse into integers in the range -2147483648 to 2147483647. This is a good choice for attributes with a wider range of values that will not fit into a short facet, e.g. the number of times an item was sold in a year. true true false
long Facet values must parse into integers in the range -263 to 263-1. This is a good choice for attributes with a wider range of values that will not fit into an int facet, e.g. mapping a wide range of dates with seconds precision that do not fit in the int or timestamp facets. true true false
float Facet values must parse into floating point numbers in the range 2-149 to (2-2-23)*2127. true true false
price Wrapper around the int facet with additional parsing support. Value strings can be whole amounts, or can have a decimal point with one or two digits, and must be in the range 0 - 21474836.47. Value strings that end in a decimal point or that do not start with a digit are invalid. When printing facet values the value is divided by 100 and printed with a decimal point and two digits after. false true false
timestamp Wrapper around the int facet with date specific parsing support. The dateformat attribute must be specified when defining a facet of this type. Values are parsed into Unix timestamps (seconds since midnight Jan 1, 1970 UTC) and stored as integers. Values must be in the range midnight Jan 1, 1970 UTC to 03:14:07 Jan 19, 2038 UTC. For requirements with wider date ranges, timestamps with millisecond precision, or dates without time component, the long or int facet can be used with an application specific mapping. false false false

 

Attributes For Defining a Facet

Attribute Description Required
type Indicates the facet type and must be equal to one of the facet types defined above. Yes
multivalue If set to true an item can have one or more values for the facet. Otherwise each item must have exactly one value for the facet. Note that a parsing error will occur if this is set to true for facets that do not support multiple values. Examples where multivalue would need to be set to true include a category attribute where items can be assigned to more than one category, or for a product index where a product may be available in more than one color. No. Default value is 'false'.
multivalueDelimiter To specify more than one value for an attribute you can either include that attribute multiple times for an item or use the multivalueDelimiter to define the separator between multiple values within a single string. No. Default value is ',' (except for the timestamp facet which uses '|').
name The name of the item attribute that will be used for the facet. Note that the attribute for the facet will automatically be added to the attributes that will be stored for the index (if it is not already defined there). Yes
dateformat Special attribute for the timestamp facet that specifies the format string to use when parsing value strings. The dateformat value must conform to Java's SimpleDateFormat pattern specification. Required for the timestamp facet, ignored for other facets.

 

A numeric facet definition may have a ranges element containing 1 or more range elements. The system will keep track of how many items have a value for the facet that falls into each range. A range definition must have either the start attribute defined, the end attribute defined, or both defined. If the start attribute is omitted, it defaults to negative infinity. If the end attribute is omitted, it defaults to positive infinity. The starting point for a range is inclusive, the endpoint is exclusive. It is OK for range definitions to overlap.

Value range definitions can be used in applications to allow users to narrow their match results such as selecting only those matches that fall into a certain price range or for reducing a match list of items to only those that are above a desired popularity rating.

The following example shows a facets element containing several facet definitions:

    <facets>
      <facet  name="productGroup" type="string" multivalue="true" multivalueDelimiter="," />
      <facet  name="productWeight" type="short" multivalue="false" multivalueDelimiter=",">

         <ranges>
            <range start="0" end="10" />
            <range start="10" end="20" />
            <range start="20" />
         </ranges>
      </facet>

      <facet name="created" type="timestamp" dateformat="yyyy-MM-dd HH:mm:ss Z"/>
      <facet name="price" type="price">
         <ranges>
            <range start="0.0" end="100.0" />
            <range start="100.0" end="200.0" />
            <range start="200.0" end="300.0" />

            <range start="300.0" end="400.0" />
            <range start="400.0" end="500.0" />
            <range start="500.0" end="1000.0" />
            <range start="1000.0" />
         </ranges>
      </facet>

      <facet name="popularityFunc" type="float" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0.0" end="50.0" />
            <range start="50.0" end="60.0" />
            <range start="60.0" end="70.0" />
            <range start="70.0" end="80.0" />

            <range start="80.0" end="90.0" />
            <range start="90.0" />
         </ranges>
      </facet>
   </facets>

Notes:

  • Think carefully about what facets need to be defined for the index and what facet type is best suited for each facet. Choose smaller data types over larger data types whenever possible; they use less resources and will perform better than larger data types.
  • It is OK for an item to not have a value for a facet, although it is a good rule of thumb to provide values for all attributes that have been configured as facets. See the match service documentation for details on how items that are missing values for facets are handled in match requests.
  • If the default multivalue delimiter string ',' does not meet your application's requirements for a particular facet and you are setting the multivalueDelimiter attribute in the facet definition, be sure to choose a string that will not appear inside of any individual values for the facet, e.g. don't use '.' for a price or float facet.
  • An item will be marked as filtered if it has a value string for an attribute that has been configured as a facet and that value does not parse into the facet's data type. An item that has a value containing the multivalue delimiter for a single value facet will also be marked as filtered.

 

item-signature-attributes

Indexes are built using semantic information from the provided content, and depending on mutually-agreed-upon index creation rules, this may or may not be an optional element within the content payload. Regardless, the purpose of this element is to instruct which attributes of a provided content item are to be used when creating the semantic model of the item. When this element is provided, at least one child element named attribute must be provided or the payload will be considered invalid by the system. If included, this element must be appear before any item elements in the content payload.

The following example demonstrates a item-signature-attributes element that will be used by the system to determine which fields to use for each content item for semantic analysis of the item.

  <item-signature-attributes>
    <attribute>title</attribute>
    <attribute>description</attribute>
  </item-signature-attributes>

item-stored-attributes

The external ID for an item will always be returned when performing a Similarity Search on an index. Additionally, an index can return any number of named attributes of a content item. Depending on index creation rules at initial setup, this element is used to instruct the system which attributes of a content item to store so that they can be returned to the caller of a match or item call. When this element is provided within the content file at least one child element named attribute must be provided or the payload will be considered invalid by the system. If this element is not provided within the payload, then no attributes of the content item will be stored by the system. When included, this element must be appear before any item elements in the content payload.

The following example demonstrates a item-stored-attributes element that will be used by the system to determine the fields to store for each content item.

  <item-stored-attributes>
    <attribute>title</attribute>
    <attribute>landingPageUrl</attribute>
    <attribute>author</attribute>
  </item-stored-attributes>

item

The item element represents an operation to perform on the index for a specific content item. At least one item element must be contained within the payload file unless the payload file represents a batch operation for clearing an index or deleting items that are older than a period of time. The item element must contain both an externalId attribute and an operation attribute. The external ID can be any value that is less than or equal to 1024 characters in length, and is typically used to associate an item returned from an index with an external metadata store. The operation attribute must be one of the following:

Value Purpose
ADD Adds the item to the index. An error will be reported for the item if the external ID already exists within the index. At least one attribute item must exist for the item element or an error will be reported for the item.
UPDATE Updates the item within the index by replacing all data within the index for that item with what has been provided in the payload. An error will be reported for the item if the external ID does not exist within the index. At least one attribute item must exist for the item element or an error will be reported for the item.
ADD_OR_UPDATE Adds the item to the index or updates the item if it already exists within the index. At least one attribute item must exist for the item element or an error will be reported for the item.
DELETE Deletes the item from the index. No attribute elements are required for the element. An error will be reported for the item if the external ID does not exist within the index.

The following example demonstrates an operation that will be performed on an item

<item externalId="http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/" operation="ADD">
	<attribute name="author">nepheron</attribute>
	<attribute name="title">Copper Fire Log Heater: Burns clean with alcohol!</attribute>
	<attribute name="description">
		I have entered this instructable into the Stay Warm Contest so give me a +1
		of you like it! In this instructable I am going to show you how I made an
		alcohol log style heater. The process is very simple and requires knowledge
		of soldering and filing. This device put out allot of heat/light and sme...
	</attribute>

	<attribute name="landingPageUrl">
		http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/
	</attribute>
	<attribute name="categories">craft,science</attribute>
	<attribute name="pubDate">Thu, 15 Jan 2009 20:02:40 PST</attribute>

</item>

Full Content Payload Example

This following example demonstrates a content payload that will can be used in a 'batch' command. If the payload was sent with a 'batch' command, that process will remove any items from the index that are older than 7 days before processing the items in the payload. If the payload was sent with a batch command, the 'deleteItemsOlderThan' attribute would be ignored.

<?xml version="1.0" encoding="UTF-8"?>
  <content-payload deleteItemsOlderThan="P7D" xmlns="http://www.semantichacker.com/api">
	<item-signature-attributes>

		<attribute>title</attribute>
		<attribute>description</attribute>
	</item-signature-attributes>
	<item-stored-attributes>
		<attribute>title</attribute>

		<attribute>landingPageUrl</attribute>
		<attribute>author</attribute>
	</item-stored-attributes>

	<item externalId="http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/"
		  operation="ADD">

		<attribute name="author">DebH57</attribute>
		<attribute name="title">Go Green Upside Down Hanging Planters</attribute>
		<attribute name="description">
			The concept is that it keeps plants off the ground away...
		</attribute>

		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/
		</attribute>
		<attribute name="categories">green,home</attribute>
		<attribute name="pubDate">Sat, 10 Jan 2009 03:27:42 PST</attribute>

	</item>

	<item externalId="http://www.instructables.com/id/Lined_Messenger_Bag/"
		  operation="ADD">
		<attribute name="author">cwickham</attribute>
		<attribute name="title">Lined Messenger Bag</attribute>

		<attribute name="description">
			How to make a messenger bag with inner lining.  This is a wool felt bag
			lined with vinyl (for some waterproofness). The basic idea is to make a
			bag shape out of both fabrics then slot one inside the other and join
			them around the opening.  This means there are no rough seams on
			the inside.  It also...
		</attribute>
		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Lined_Messenger_Bag/
		</attribute>
		<attribute name="categories">craft,ride</attribute>

		<attribute name="pubDate">Sun, 11 Jan 2009 17:49:15 PST</attribute>
	</item>

	<item externalId="http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/"
		  operation="ADD">
		<attribute name="author">nepheron</attribute>

		<attribute name="title">Copper Fire Log Heater: Burns clean with alcohol!</attribute>
		<attribute name="description">
			I have entered this instructable into the Stay Warm Contest so give me a +1
			of you like it! In this instructable I am going to show you how I made an
			alcohol log style heater. The process is very simple and requires knowledge
			of soldering and filing. This device put out allot of heat/light and sme...
		</attribute>
		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/
		</attribute>

		<attribute name="categories">craft,science</attribute>
		<attribute name="pubDate">Thu, 15 Jan 2009 20:02:40 PST</attribute>
	</item>
  </content-payload>

Index Service Response Format

All calls made to the index service will result in a response that adheres to the XML structure defined within the http://textwise.com/api_docs/api/semantichacker_index_response.xsd schema. The following sections provide an overview of the various elements that can be returned by the service.

about

The about section for the index service is a subset of the about section used for other API service calls. The following table lists the elements that could be provided in the about section for an index service call.

Field Description
requestId TextWise-generated request ID unique for each request.
systemType Identifier of the service call that was executed ("match", "category", etc).
externalId A caller-defined value that is echoed back in the response.
requestDate An ISO_8601 formatted date that represents when the request was submitted.
systemVersion Identifies the version of the system that serviced the request.

 

contentIndexStatus

This element will be included within the response for the batch and status index commands. If the response is the result of a status command without a 'batchId', then the contentIndexStatus element will be the only element contained within the response besides the about element.

	<contentIndexStatus indexId="myCustomIndex"
	dictionaryId="odp_2007_l1_1.7k"
	dateCreated="2009-01-31T12:00:00+0000"
	currentSizetype="45367"
	capacity="5000000"
	lastModificationDate="2009-02-13T16:43:00+0000"
	modificationInProcess="false" />

batch-process-info

This element will be contained within the contentIndex element if either a 'batch' command was made or if a 'status' command was made which included a 'batchId' parameter.

Example response for a 'batch' command.

	<batch-process-info batchId="1234" state="CREATED"/>

Example response for a 'status' command that includes a 'batchId' parameter for a completed batch. In this example the batch completed in just under 3.5 seconds. The index changes were 34 adds, 5 updates, 2 deletes, and one delete failed because the item specified did not exist. Note that ADD_OR_UPDATE commands will be reported as either an add, update, or failure, depending on their outcome, in the status.

<batch-process-info	processingTime="3456ms" batchId="1234" state="COMPLETED"
					totalItems="42" adds="34" updates="5" deletes="2" failed="1">
	<failures>
		<item externalId="1020">Did not exist - not Deleted</item>
	</failures>
</batch-process-info>

contentIndexResponse ( item command )

The contentIndexResponse element will be contained within the contentIndex element when an 'item' command is used. An index can be configured on installation to return a set of default fields or all stored attributes for the item when a 'fields' parameter is not given by the caller. If no attributes were stored, then the attributes element will be omitted.

An example contentIndexResponse is shown below.

<contentIndexResponse>
	<contentId>1561</contentId>
	<externalId>1236691578280</externalId>

	<attributes>
		<attribute name="title">Go Green Upside Down Hanging Planters</attribute>
		<attribute name="author" >DebH57</attribute>
		<attribute name="landingPageUrl">
			http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/
		</attribute>

	</attributes>
</contentIndexResponse>

facets ( facets command )

An example response for the facets call is shown below.

<facets>

      <facet  name="productGroup" type="string" multivalue="true" multivalueDelimiter="," />
      <facet  name="productWeight" type="short" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0" end="10" />
            <range start="10" end="20" />
            <range start="20" />

         </ranges>
      </facet>
      <facet name="popularityFunc" type="float" multivalue="false" multivalueDelimiter=",">
         <ranges>
            <range start="0.0" end="50.0" />
            <range start="50.0" end="60.0" />

            <range start="60.0" end="70.0" />
            <range start="70.0" end="80.0" />
            <range start="80.0" end="90.0" />
            <range start="90.0" />
         </ranges>
      </facet>

   </facets>

Error Codes

Several error codes have been added to the system that will only be returned for the index service. They are duplicated here from the API Response Reference for completeness. The format of the error message response is unchanged.

TW Code HTTP Status Code Message Explanation
415 400 'XML Schema or Parsing Error' The uploaded XML file for an index update can not be parsed, or does not comply to the index upload schema.