Match Call (Similarity Search)


Service Call:
http://api.semantichacker.com/TOKEN/match/INDEXID?

Overview

The match service call enables the ability to match the content provided in the call against a set of content contained within one of our indexes -- this is Similarity Search. The provided content is converted into a Semantic Signature® and compared against all Semantic Signatures® for the items contained within the specified index ID. The result returned by the service is a list of content items from the index that are most relevant to the content provided to the service. The items are listed by their "match score" from highest to lowest.

Each content item in the response match list includes a document ID, a match score, and the requested attributes for that content item. An optional fields parameter can be provided in the call to specify which attributes to return for each content item. If the fields parameter is not provided, then a default set of attributes will be used.


Available Content Indexes

The INDEXID node of the request path specifies what content index the matching call will attempt to match the provided content against. Following is a list of publicly accessible content indexes that we have made available. In addition to the public indexes we can support custom indexes that can be made available to you. Please contact us at: development@semantichacker.com for more information.

indexId Description Update Frequency Expiration Information Approximate Size Default Attributes Available Attributes Facets
rssnews RSS News Articles Approximately six times per day Old articles are removed from the index after five days 35,000
  • title
  • description
  • landingPageUrl
  • title
  • description
  • landingPageUrl
  • author
  • channelTitle
  • channelLink
  • pubDateMillis
  • categories
  • pubDateMillis (long)
  • SEMANTIC_CATEGORY
rssblogs Blogs Articles Once a day Old articles are removed from the index after thirty days 115,000
  • title
  • description
  • landingPageUrl
  • title
  • description
  • landingPageUrl
  • author
  • channelTitle
  • channelLink
  • pubDateMillis
  • categories
  • pubDateMillis (long)
  • SEMANTIC_CATEGORY
rsscombined Blogs and RSS News Articles Approximately seven times per day Old articles are removed from the index after ten days 120,000
  • title
  • description
  • landingPageUrl
  • feedType
  • title
  • description
  • landingPageUrl
  • author
  • channelTitle
  • channelLink
  • pubDateMillis
  • categories
  • feedType
  • feedType (string)
  • pubDateMillis (long)
  • SEMANTIC_CATEGORY
youtube YouTube videos Once a day Old videos that are not updated are removed from the index after thirty days 265,000
  • title
  • landingPageUrl
  • enclosureUrl
  • title
  • landingPageUrl
  • enclosureUrl
  • pubDateMillis
  • thumbnailUrl
  • pubDateMillis (long)
  • SEMANTIC_CATEGORY
wikipedia Wikipedia Articles (English) Monthly All articles in the previous load are removed, only articles in the current load are kept in the index 3,900,000
  • title
  • description
  • landingPageUrl
  • title
  • description
  • landingPageUrl
  • channelTitle
  • channelLink
  • SEMANTIC_CATEGORY
images Wikipedia Images
Flickr
Once a day All images in the previous load are removed, only images in the current load are kept in the index 525,000
  • title
  • description
  • landingPageUrl
  • license
  • creator
  • imageUrl
  • thumbnailUrl
  • title
  • description
  • landingPageUrl
  • license
  • creator
  • imageUrl
  • thumbnailUrl
  • creatorUrl
  • source
  • SEMANTIC_CATEGORY
amazon Amazon.com Products Two times a week All products in the previous load are removed, only products in the current load are kept in the index 13,700,000
  • title
  • description
  • landingPageUrl
  • title
  • description
  • landingPageUrl
  • imageUrl
  • price
  • price (price)
  • retailerCategory (string)
  • SEMANTIC_CATEGORY
Notes:

Index Item Input Parameters

In addition to the the URI and content input mechanisms documented on the common request page, the match call supports using an item from the index as input for the match request. The item is retrieved by using either a content ID or an external ID and is then used as the input for the request.

Name Value Required
(Yes/No)
Purpose
sourceContentId An integer content identifier for an item that exists in the index No Indicates the content ID of the item to retrieve from the index that will be used as input for the match request.
sourceExternalId A string external identifier for an item that exists in the index No Indicates the external ID of the item to retrieve from the index that will be used as input for the match request.

The content ID (also known as document ID or ID) for an item is the value for the id attribute of a match element in a match call response. The value for the id attribute can be used as the value for the sourceContentId parameter in a match request using the index item input method. The external ID for an item is returned as the externalId element inside of a match element in a match call response. The value for the externalId element can be used as the value for the sourceExternalId parameter in a match request using the index item input method.

Examples

curl -F "sourceExternalId=B001P77X70"  "http://api.semantichacker.com/TOKEN/match/amazon"
curl -F "sourceContentId=1048"  "http://api.semantichacker.com/TOKEN/match/rssnews"

Error Codes

There are several cases where error codes will be returned when using the index item input method.

TW Code HTTP Status Code Message Explanation
210 400 'Invalid Argument'
  • If both the sourceContentId and sourceExternalId parameters are included with the request.
  • If the value of the sourceContentId parameter does not parse into an integer.
  • If a uri or content parameter is included with a request that already includes an index item input parameter.
417 404 'Item not found for match request' No item was found for the sourceContentId or sourceExternalId provided with the request. If the item did exist in one of the public indexes at some point in the past, it may no longer exist due to periodic deletion of old items.

Match Call Parameters

In addition to the common request parameters, the match call has the following optional parameters.

Name Value Required
(Yes/No)
Purpose
format 'xml' or 'json' No The default format is 'xml'.
nMatches An integer > 0 and <= 100 No, default = 10 Specify how many content items should be returned in the response.
offset An integer >= 0 (zero based) No, default = 0 The starting point to return results from in the canonical list of matches. For example, if offset is set to 4, the first item returned will be the 5th item in the full match list. The end of the canonical list of matches has been reached when the number of matches returned from the call is less than nMatches.
fields A comma separated list of field names to return for each content item No Overrides the default content attributes returned for each content item in the match list and returns only those that were specified.

Note: if you have licensed an index that contains your content, there are no default fields that are provided back for a match call. You can specify which fields you want returned using this parameter. If this parameter is omitted for a custom index, all stored fields for each item will be returned.
minMatchScore A non-negative floating point number No This feature has been removed and the results of using this parameter in match requests are undefined.

Match Call Facet Parameters

The match call supports several parameters, all optional, for using the facets that have been defined for the index. See the index service facets documentation for details about the facet types that are available and how to define facets for a custom index.

Name Value Purpose
facetQuery A query string that conforms to the facet query syntax defined below. Restricts the results of the match request to those items that meet the criteria of the facet query.
matchRank 'relevance' (the default), or the name of a single value facet that has been defined for the index If set to 'relevance' the match list will be returned in the default relevance order. If set to a single value facet name, the relevance ranked match list will be re-ranked using the values for the facet. Ties are broken using the relevance score, always in descending order. For facet only queries ties are broken using the contentId, always in ascending order. Note that for facet only queries this parameter must be included with the match call.
matchRankOrder 'desc' (descending, the default), or 'asc' (ascending) Controls the ordering for the match results when they are re-ranked by a facet via the matchRank parameter. For facets, this means sorting based on the natural order of the facet values. For numeric facets 'asc' would mean lowest to highest and 'desc' would mean highest to lowest. For string facets 'asc' would mean in alphabetical order, 'desc' would mean reverse alphabetical order.
includeFacetValueCounts 'false' (the default), or 'true' If set to true the match call response will include the accumulated facet value and range counts for the items in the full match results list (not just the section of results returned using the offset and nMatches parameters).
facetValueOrderBy 'value' (the default), or 'documentCount' Determines the ordering for the facet value and range counts printed when the includeFacetValueCounts parameter is set to true. If set to 'value' the entries will be printed in sorted order based on value and in range definition order for range counts. If set to 'documentCount' the entries will be printed in document count order, highest to lowest.

Facet Query Syntax

The syntax for specifying a facet query string is detailed below and you will see that it is similar to the "where" clause for SQL and includes support for parenthetical notation.

Examples:

color != blue,green
(prodColor = blue,green AND productGroup = shirts)
SEMANTIC_CATEGORY != 568,452,340
(price >= 599.99 OR popularityFunc > 88.5)
(prodColor = blue,green AND (price <= 50.00 AND popularityFunc > 95.0))

Notes:

Facet Only Queries

It is valid to submit a match call that only contains a facetQuery parameter and does not have a URI or text content included for relevance matching. In this case, items that match the facet query are collected and sorted according to the matchRank parameter. Since the value for the matchRank parameter must be the name of a single value facet, facet only queries are not possible on indexes for which only multivalue facets are defined.

An example use case for submitting a facet only query would be to query an index of timestamped items for the items in a specified date range, ordered by their timestamp.

Example Match Calls With Facet Parameters

Return index items that are relevant to the http://www.linux.org site, returning facet value counts ordered by document count:

curl -F "includeFacetValueCounts=true" -F "facetValueOrderBy=documentCount"
    "http://api.semantichacker.com/TOKEN/match/INDEX?uri=http%3a%2f%2fwww.linux.org"

Return index items that have a price <= 25.00 and belong to the tshirts product group, and are relevant to the http://www.linux.org site:

curl -F "facetQuery=(price <= 25.00 AND productGroup = tshirts)" 
    "http://api.semantichacker.com/TOKEN/match/INDEX?uri=http%3a%2f%2fwww.linux.org"

Match Call XML Response Example

http://api.semantichacker.com/TOKEN/match/youtube?uri=http%3a%2f%2fwww.nfl.com&nMatches=3&fields=title,landingPageUrl,enclosureUrl

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://www.semantichacker.com/api">
        <about>
                <requestId>994E0EF89781BB3B0A71F930240B4E5F</requestId>
                <docId>12B9CC519FA32800883C988C03ED726A</docId>
                <systemType>match</systemType>
                <contentType>text/html</contentType>
                <contentDigest>E38F117A6ACDEA42CADE983E5666EC92</contentDigest>
                <requestDate>2011-08-30T20:04:34+00:00</requestDate>
                <systemVersion>2.1</systemVersion>
                <sourceUri>http://www.nfl.com</sourceUri>
        </about>
        <contentMatch>
                <contentMatchResponse>
                        <matches>
                                <match id="539774" score="0.47137815" indexId="youtube" >
                                        <externalId>http://gdata.youtube.com/feeds/api/videos/kTgVYrOo4DA</externalId>
                                        <attribute name="title">NFL 2011 Chicago Bears Versus New York Giants: Giants 3rd Field Goal HD.</attribute>
                                        <attribute name="landingPageUrl">http://www.youtube.com/watch?v=kTgVYrOo4DA&amp;feature=youtube_gdata</attribute>
                                        <attribute name="enclosureUrl">http://www.youtube.com/v/kTgVYrOo4DA?f=videos&amp;app=youtube_gdata</attribute>
                                </match>
                                <match id="539775" score="0.43644464" indexId="youtube" >
                                        <externalId>http://gdata.youtube.com/feeds/api/videos/IxE7hUgk31Q</externalId>
                                        <attribute name="title">NFL 2011 Chicago Bears Versus New York Giants: Bears 2nd Field Goal HD.</attribute>
                                        <attribute name="landingPageUrl">http://www.youtube.com/watch?v=IxE7hUgk31Q&amp;feature=youtube_gdata</attribute>
                                        <attribute name="enclosureUrl">http://www.youtube.com/v/IxE7hUgk31Q?f=videos&amp;app=youtube_gdata</attribute>
                                </match>
                                <match id="930248" score="0.40170527" indexId="youtube" >
                                        <externalId>http://gdata.youtube.com/feeds/api/videos/GsvzLrgIX04</externalId>
                                        <attribute name="title">D NATIONAL 2010 SEASON</attribute>
                                        <attribute name="landingPageUrl">http://www.youtube.com/watch?v=GsvzLrgIX04&amp;feature=youtube_gdata</attribute>
                                        <attribute name="enclosureUrl">http://www.youtube.com/v/GsvzLrgIX04?f=videos&amp;app=youtube_gdata</attribute>
                                </match>
                        </matches>
                </contentMatchResponse>
        </contentMatch>
</response>

Match Call JSON Response Example

http://api.semantichacker.com/TOKEN/match/youtube?uri=http%3a%2f%2fwww.nfl.com&nMatches=3&fields=title,landingPageUrl,enclosureUrl&format=JSON

{
    "about":     {
        "requestId": "9012BEE686DC750914E5A14CD82DE530",
        "docId": "12B9CC519FA32800883C988C03ED726A",
        "systemType": "match",
        "contentType": "text/html",
        "contentDigest": "E38F117A6ACDEA42CADE983E5666EC92",
        "requestDate": "2011-08-30T20:05:56+00:00",
        "systemVersion": "2.1",
        "sourceUri": "http://www.nfl.com"
    },
    "contentMatch": {"contentMatchResponse": {"matches":     [
                {
            "id": "539774",
            "score": "0.47137815",
            "indexId": "youtube",
            "externalId": "http://gdata.youtube.com/feeds/api/videos/kTgVYrOo4DA",
            "attributes":             [
                                {
                    "name": "title",
                    "value": "NFL 2011 Chicago Bears Versus New York Giants: Giants 3rd Field Goal HD."
                },
                                {
                    "name": "landingPageUrl",
                    "value": "http://www.youtube.com/watch?v=kTgVYrOo4DA&feature=youtube_gdata"
                },
                                {
                    "name": "enclosureUrl",
                    "value": "http://www.youtube.com/v/kTgVYrOo4DA?f=videos&app=youtube_gdata"
                }
            ]
        },
                {
            "id": "539775",
            "score": "0.43644464",
            "indexId": "youtube",
            "externalId": "http://gdata.youtube.com/feeds/api/videos/IxE7hUgk31Q",
            "attributes":             [
                                {
                    "name": "title",
                    "value": "NFL 2011 Chicago Bears Versus New York Giants: Bears 2nd Field Goal HD."
                },
                                {
                    "name": "landingPageUrl",
                    "value": "http://www.youtube.com/watch?v=IxE7hUgk31Q&feature=youtube_gdata"
                },
                                {
                    "name": "enclosureUrl",
                    "value": "http://www.youtube.com/v/IxE7hUgk31Q?f=videos&app=youtube_gdata"
                }
            ]
        },
                {
            "id": "930248",
            "score": "0.40170527",
            "indexId": "youtube",
            "externalId": "http://gdata.youtube.com/feeds/api/videos/GsvzLrgIX04",
            "attributes":             [
                                {
                    "name": "title",
                    "value": "D NATIONAL 2010 SEASON"
                },
                                {
                    "name": "landingPageUrl",
                    "value": "http://www.youtube.com/watch?v=GsvzLrgIX04&feature=youtube_gdata"
                },
                                {
                    "name": "enclosureUrl",
                    "value": "http://www.youtube.com/v/GsvzLrgIX04?f=videos&app=youtube_gdata"
                }
            ]
        }
    ]}}
}