You are here
Index Service
Service Call
http://api.semantichacker.com/TOKEN/index/INDEXID/COMMAND?
Overview
The index service provides calls for retrieving data from an index and managing a custom index. Individual items and facet data can be retrieved from all public indexes and from custom indexes. Additional calls are provided for adding, updating, and deleting items from a custom index. These calls are only available once you have separately licensed a Custom Content index from TextWise LLC. Licensing your own content index allows you to use the TextWise API to perform Similarity Search operations against this custom content. Contact TextWise for pricing and other information regarding Custom Content index services.
Upon licensing a Custom Content index, an index ID will be assigned to your API token and you will be able to add, delete, and update your index in addition to performing Similarity Search operations. Our authorization system will ensure that only your token has access to the content contained within your index.
Differences From Other API Services
The index service differs from the other available API services in a few ways. First, the upload size limits are much more generous. By default they are 15 Megabytes. In addition, the uploaded file can be compressed. This allows even more information to be sent in a single upload. Second, the index service does not require a call to specify content to be processed. Instead, all needed information is uploaded in the specified XML format to be processed.
Index Commands
The following index commands are used to load data into a custom index, to obtain the current state of an index, retrieve a specific item from an index, or to obtain facet data from an index.
Index Call Parameters
The index call differs from the other TextWise API calls in that it does not support the common request parameters. The only parameters available are 'batchId' for the status command, and 'externalId', 'showLabels', and 'fields', for the item command.
Content Payload Format
The content payload needs to be provided to a 'batch' call as either a file included within a multipart/form-data POST, or sent as the body of a POST or PUT. In the case of multipart/form-data files, the file extension will be used to determine file type. Uncompressed xml, gzip compressed xml, and bzip2 compressed xml are supported with file the extensions, '.xml', 'xml.gz', and 'xml.bz2', respectively. In the case of POST or PUT body requests, the Content-Encoding header of the request will be used to determine file encoding. The table below details the mappings.
http://textwise.com/api_docs/api/semantichacker_content_payload.xsd can be used to generate programming bindings for the purpose of creating a content payload file. The following sections provide an overview of the required elements.
content-payload
This is the root element of the payload. This element supports the following optional attributes for use within a batch command only -- the attributes are ignored otherwise.
The following example demonstrates a content payload request that will delete any items older than 10 days. It also explicitly declares 'false' for the clearIndex attribute.
<content-payload deleteItemsOlderThan="P10D" clearIndex="false">
facets
Facets are attributes of an item in the index. Facets provide a mechanism to restrict the search space of the index based on specific values (i.e. all items with a color of blue). Once facets have been defined within an index payload file and loaded into the index, they can specified as constraints within a match request.
Facets are defined for the index using the facets element inside an index payload file. The facets element is processed by the system on a batch load call only if the index is empty or if the clearIndex attribute in the payload file is set to 'true'.
Supported Data Types for Facets
Attributes For Defining a Facet
A numeric facet definition may have a ranges element containing 1 or more range elements. The system will keep track of how many items have a value for the facet that falls into each range. A range definition must have either the start attribute defined, the end attribute defined, or both defined. If the start attribute is omitted, it defaults to negative infinity. If the end attribute is omitted, it defaults to positive infinity. The starting point for a range is inclusive, the endpoint is exclusive. It is OK for range definitions to overlap.
Value range definitions can be used in applications to allow users to narrow their match results such as selecting only those matches that fall into a certain price range or for reducing a match list of items to only those that are above a desired popularity rating.
The following example shows a facets element containing several facet definitions:
<facets>
<facet name="productGroup" type="string" multivalue="true" multivalueDelimiter="," />
<facet name="productWeight" type="short" multivalue="false" multivalueDelimiter=",">
<ranges>
<range start="0" end="10" />
<range start="10" end="20" />
<range start="20" />
</ranges>
</facet>
<facet name="created" type="timestamp" dateformat="yyyy-MM-dd HH:mm:ss Z"/>
<facet name="price" type="price">
<ranges>
<range start="0.0" end="100.0" />
<range start="100.0" end="200.0" />
<range start="200.0" end="300.0" />
<range start="300.0" end="400.0" />
<range start="400.0" end="500.0" />
<range start="500.0" end="1000.0" />
<range start="1000.0" />
</ranges>
</facet>
<facet name="popularityFunc" type="float" multivalue="false" multivalueDelimiter=",">
<ranges>
<range start="0.0" end="50.0" />
<range start="50.0" end="60.0" />
<range start="60.0" end="70.0" />
<range start="70.0" end="80.0" />
<range start="80.0" end="90.0" />
<range start="90.0" />
</ranges>
</facet>
</facets>Notes:
- Think carefully about what facets need to be defined for the index and what facet type is best suited for each facet. Choose smaller data types over larger data types whenever possible; they use less resources and will perform better than larger data types.
- It is OK for an item to not have a value for a facet, although it is a good rule of thumb to provide values for all attributes that have been configured as facets. See the match service documentation for details on how items that are missing values for facets are handled in match requests.
- If the default multivalue delimiter string ',' does not meet your application's requirements for a particular facet and you are setting the multivalueDelimiter attribute in the facet definition, be sure to choose a string that will not appear inside of any individual values for the facet, e.g. don't use '.' for a price or float facet.
- An item will be marked as filtered if it has a value string for an attribute that has been configured as a facet and that value does not parse into the facet's data type. An item that has a value containing the multivalue delimiter for a single value facet will also be marked as filtered.
item-signature-attributes
Indexes are built using semantic information from the provided content, and depending on mutually-agreed-upon index creation rules, this may or may not be an optional element within the content payload. Regardless, the purpose of this element is to instruct which attributes of a provided content item are to be used when creating the semantic model of the item. When this element is provided, at least one child element named attribute must be provided or the payload will be considered invalid by the system. If included, this element must be appear before any item elements in the content payload.
The following example demonstrates a item-signature-attributes element that will be used by the system to determine which fields to use for each content item for semantic analysis of the item.
<item-signature-attributes>
<attribute>title</attribute>
<attribute>description</attribute>
</item-signature-attributes>item-stored-attributes
The external ID for an item will always be returned when performing a Similarity Search on an index. Additionally, an index can return any number of named attributes of a content item. Depending on index creation rules at initial setup, this element is used to instruct the system which attributes of a content item to store so that they can be returned to the caller of a match or item call. When this element is provided within the content file at least one child element named attribute must be provided or the payload will be considered invalid by the system. If this element is not provided within the payload, then no attributes of the content item will be stored by the system. When included, this element must be appear before any item elements in the content payload.
The following example demonstrates a item-stored-attributes element that will be used by the system to determine the fields to store for each content item.
<item-stored-attributes>
<attribute>title</attribute>
<attribute>landingPageUrl</attribute>
<attribute>author</attribute>
</item-stored-attributes>item
The item element represents an operation to perform on the index for a specific content item. At least one item element must be contained within the payload file unless the payload file represents a batch operation for clearing an index or deleting items that are older than a period of time. The item element must contain both an externalId attribute and an operation attribute. The external ID can be any value that is less than or equal to 1024 characters in length, and is typically used to associate an item returned from an index with an external metadata store. The operation attribute must be one of the following:
The following example demonstrates an operation that will be performed on an item
<item externalId="http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/" operation="ADD"> <attribute name="author">nepheron</attribute> <attribute name="title">Copper Fire Log Heater: Burns clean with alcohol!</attribute> <attribute name="description"> I have entered this instructable into the Stay Warm Contest so give me a +1 of you like it! In this instructable I am going to show you how I made an alcohol log style heater. The process is very simple and requires knowledge of soldering and filing. This device put out allot of heat/light and sme... </attribute> <attribute name="landingPageUrl"> http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/ </attribute> <attribute name="categories">craft,science</attribute> <attribute name="pubDate">Thu, 15 Jan 2009 20:02:40 PST</attribute> </item>
Full Content Payload Example
This following example demonstrates a content payload that will can be used in a 'batch' command. If the payload was sent with a 'batch' command, that process will remove any items from the index that are older than 7 days before processing the items in the payload. If the payload was sent with a batch command, the 'deleteItemsOlderThan' attribute would be ignored.
<?xml version="1.0" encoding="UTF-8"?> <content-payload deleteItemsOlderThan="P7D" xmlns="http://www.semantichacker.com/api"> <item-signature-attributes> <attribute>title</attribute> <attribute>description</attribute> </item-signature-attributes> <item-stored-attributes> <attribute>title</attribute> <attribute>landingPageUrl</attribute> <attribute>author</attribute> </item-stored-attributes> <item externalId="http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/" operation="ADD"> <attribute name="author">DebH57</attribute> <attribute name="title">Go Green Upside Down Hanging Planters</attribute> <attribute name="description"> The concept is that it keeps plants off the ground away... </attribute> <attribute name="landingPageUrl"> http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/ </attribute> <attribute name="categories">green,home</attribute> <attribute name="pubDate">Sat, 10 Jan 2009 03:27:42 PST</attribute> </item> <item externalId="http://www.instructables.com/id/Lined_Messenger_Bag/" operation="ADD"> <attribute name="author">cwickham</attribute> <attribute name="title">Lined Messenger Bag</attribute> <attribute name="description"> How to make a messenger bag with inner lining. This is a wool felt bag lined with vinyl (for some waterproofness). The basic idea is to make a bag shape out of both fabrics then slot one inside the other and join them around the opening. This means there are no rough seams on the inside. It also... </attribute> <attribute name="landingPageUrl"> http://www.instructables.com/id/Lined_Messenger_Bag/ </attribute> <attribute name="categories">craft,ride</attribute> <attribute name="pubDate">Sun, 11 Jan 2009 17:49:15 PST</attribute> </item> <item externalId="http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/" operation="ADD"> <attribute name="author">nepheron</attribute> <attribute name="title">Copper Fire Log Heater: Burns clean with alcohol!</attribute> <attribute name="description"> I have entered this instructable into the Stay Warm Contest so give me a +1 of you like it! In this instructable I am going to show you how I made an alcohol log style heater. The process is very simple and requires knowledge of soldering and filing. This device put out allot of heat/light and sme... </attribute> <attribute name="landingPageUrl"> http://www.instructables.com/id/Copper_Fire_Log_Heater_Burns_clean_with_alcohol/ </attribute> <attribute name="categories">craft,science</attribute> <attribute name="pubDate">Thu, 15 Jan 2009 20:02:40 PST</attribute> </item> </content-payload>
Index Service Response Format
All calls made to the index service will result in a response that adheres to the XML structure defined within the http://textwise.com/api_docs/api/semantichacker_index_response.xsd schema. The following sections provide an overview of the various elements that can be returned by the service.
about
The about section for the index service is a subset of the about section used for other API service calls. The following table lists the elements that could be provided in the about section for an index service call.
contentIndexStatus
This element will be included within the response for the batch and status index commands. If the response is the result of a status command without a 'batchId', then the contentIndexStatus element will be the only element contained within the response besides the about element.
<contentIndexStatus indexId="myCustomIndex" dictionaryId="odp_2007_l1_1.7k" dateCreated="2009-01-31T12:00:00+0000" currentSizetype="45367" capacity="5000000" lastModificationDate="2009-02-13T16:43:00+0000" modificationInProcess="false" />
batch-process-info
This element will be contained within the contentIndex element if either a 'batch' command was made or if a 'status' command was made which included a 'batchId' parameter.
Example response for a 'batch' command.
<batch-process-info batchId="1234" state="CREATED"/>
Example response for a 'status' command that includes a 'batchId' parameter for a completed batch. In this example the batch completed in just under 3.5 seconds. The index changes were 34 adds, 5 updates, 2 deletes, and one delete failed because the item specified did not exist. Note that ADD_OR_UPDATE commands will be reported as either an add, update, or failure, depending on their outcome, in the status.
<batch-process-info processingTime="3456ms" batchId="1234" state="COMPLETED" totalItems="42" adds="34" updates="5" deletes="2" failed="1"> <failures> <item externalId="1020">Did not exist - not Deleted</item> </failures> </batch-process-info>
contentIndexResponse ( item command )
The contentIndexResponse element will be contained within the contentIndex element when an 'item' command is used. An index can be configured on installation to return a set of default fields or all stored attributes for the item when a 'fields' parameter is not given by the caller. If no attributes were stored, then the attributes element will be omitted.
An example contentIndexResponse is shown below.
<contentIndexResponse> <contentId>1561</contentId> <externalId>1236691578280</externalId> <attributes> <attribute name="title">Go Green Upside Down Hanging Planters</attribute> <attribute name="author" >DebH57</attribute> <attribute name="landingPageUrl"> http://www.instructables.com/id/Go_Green_Upside_Down_Hanging_Planters/ </attribute> </attributes> </contentIndexResponse>
facets ( facets command )
An example response for the facets call is shown below.
<facets>
<facet name="productGroup" type="string" multivalue="true" multivalueDelimiter="," />
<facet name="productWeight" type="short" multivalue="false" multivalueDelimiter=",">
<ranges>
<range start="0" end="10" />
<range start="10" end="20" />
<range start="20" />
</ranges>
</facet>
<facet name="popularityFunc" type="float" multivalue="false" multivalueDelimiter=",">
<ranges>
<range start="0.0" end="50.0" />
<range start="50.0" end="60.0" />
<range start="60.0" end="70.0" />
<range start="70.0" end="80.0" />
<range start="80.0" end="90.0" />
<range start="90.0" />
</ranges>
</facet>
</facets>Error Codes
Several error codes have been added to the system that will only be returned for the index service. They are duplicated here from the API Response Reference for completeness. The format of the error message response is unchanged.