Match Service (Similarity Search)

Service Call:

http://api.semantichacker.com/TOKEN/match/INDEXID?

Overview

The match service call enables the ability to match the content provided in the call against a set of content contained within one of our indexes -- this is Similarity Search. The provided content is converted into a Semantic Signature® and compared against all Semantic Signatures® for the items contained within the specified index ID. The result returned by the service is a list of content items from the index that are most relevant to the content provided to the service. The items are listed by their "match score" from highest to lowest.

Each content item in the response match list includes a document ID, a match score, and the requested attributes for that content item. An optional fields parameter can be provided in the call to specify which attributes to return for each content item. If the fields parameter is not provided, then a default set of attributes will be used.

Available Content Indexes

The INDEXID node of the request path specifies what content index the matching call will attempt to match the provided content against. Following is a list of publicly accessible content indexes that we have made available. In addition to the public indexes we can support custom indexes that can be made available to you. Please contact us at: development@semantichacker.com for more information.

Match Call Chart

 

Notes:
  • If you have licensed an index that contains your content, there are no default fields that are provided back for a match call. You can specify which fields you want returned using the 'fields' parameter. If the 'fields' parameter is omitted for a custom index, all stored fields for each item will be returned.
  • See the facets section on the index service page for details about the SEMANTIC_CATEGORY facet and the facet data types that are available.

Index Item Input Parameters

In addition to the the URI and content input mechanisms documented on the common request page, the match call supports using an item from the index as input for the match request. The item is retrieved by using either a content ID or an external ID and is then used as the input for the request.

Name Value Required
(Yes/No)
Purpose
sourceContentId An integer content identifier for an item that exists in the index No Indicates the content ID of the item to retrieve from the index that will be used as input for the match request.
sourceExternalId A string external identifier for an item that exists in the index No Indicates the external ID of the item to retrieve from the index that will be used as input for the match request.

The content ID (also known as document ID or ID) for an item is the value for the id attribute of a match element in a match call response. The value for the id attribute can be used as the value for the sourceContentId parameter in a match request using the index item input method. The external ID for an item is returned as the externalId element inside of a match element in a match call response. The value for the externalId element can be used as the value for the sourceExternalId parameter in a match request using the index item input method.

Examples

	curl -F "sourceExternalId=B001P77X70"  "http://api.semantichacker.com/TOKEN/match/amazon"
curl -F "sourceContentId=1048"  "http://api.semantichacker.com/TOKEN/match/rssnews"

Error Codes

There are several cases where error codes will be returned when using the index item input method.

TW Code HTTP Status Code Message Explanation
210 400 'Invalid Argument'
  • If both the sourceContentId and sourceExternalId parameters are included with the request.
  • If the value of the sourceContentId parameter does not parse into an integer.
  • If a uri or content parameter is included with a request that already includes an index item input parameter.
417 404 'Item not found for match request' No item was found for the sourceContentId or sourceExternalId provided with the request. If the item did exist in one of the public indexes at some point in the past, it may no longer exist due to periodic deletion of old items.

Match Call Parameters

In addition to the common request parameters, the match call has the following optional parameters.

Name Value Required
(Yes/No)
Purpose
format 'xml' or 'json' No The default format is 'xml'.
nMatches An integer > 0 and <= 100 No, default = 10 Specify how many content items should be returned in the response.
offset An integer >= 0 (zero based) No, default = 0 The starting point to return results from in the canonical list of matches. For example, if offset is set to 4, the first item returned will be the 5th item in the full match list. The end of the canonical list of matches has been reached when the number of matches returned from the call is less than nMatches.
fields A comma separated list of field names to return for each content item No Overrides the default content attributes returned for each content item in the match list and returns only those that were specified.

Note: if you have licensed an index that contains your content, there are no default fields that are provided back for a match call. You can specify which fields you want returned using this parameter. If this parameter is omitted for a custom index, all stored fields for each item will be returned.

minMatchScore A non-negative floating point number No This feature has been removed and the results of using this parameter in match requests are undefined.

Match Call Facet Parameters

The match call supports several parameters, all optional, for using the facets that have been defined for the index. See the index service facets documentation for details about the facet types that are available and how to define facets for a custom index.

Name Value Purpose
facetQuery A query string that conforms to the facet query syntax defined below. Restricts the results of the match request to those items that meet the criteria of the facet query.
matchRank 'relevance' (the default), or the name of a single value facet that has been defined for the index If set to 'relevance' the match list will be returned in the default relevance order. If set to a single value facet name, the relevance ranked match list will be re-ranked using the values for the facet. Ties are broken using the relevance score, always in descending order. For facet only queries ties are broken using the contentId, always in ascending order. Note that for facet only queries this parameter must be included with the match call.
matchRankOrder 'desc' (descending, the default), or 'asc' (ascending) Controls the ordering for the match results when they are re-ranked by a facet via the matchRank parameter. For facets, this means sorting based on the natural order of the facet values. For numeric facets 'asc' would mean lowest to highest and 'desc' would mean highest to lowest. For string facets 'asc' would mean in alphabetical order, 'desc' would mean reverse alphabetical order.
includeFacetValueCounts 'false' (the default), or 'true' If set to true the match call response will include the accumulated facet value and range counts for the items in the full match results list (not just the section of results returned using the offset and nMatches parameters).
facetValueOrderBy 'value' (the default), or 'documentCount' Determines the ordering for the facet value and range counts printed when the includeFacetValueCounts parameter is set to true. If set to 'value' the entries will be printed in sorted order based on value and in range definition order for range counts. If set to 'documentCount' the entries will be printed in document count order, highest to lowest.

Facet Query Syntax

The syntax for specifying a facet query string is detailed below and you will see that it is similar to the "where" clause for SQL and includes support for parenthetical notation.

  • EXPR
  • EXPR: '(' EXPR 'AND' EXPR ')'
  • EXPR: '(' EXPR 'OR' EXPR ')'
  • EXPR: ATTR OP VALUES
  • ATTR: string
  • OP: '=' | '!=' | '<' | '>' | '<=' | '>='
  • VALUES: VALUE [DELIM VALUE]*
  • VALUE: number | string
  • DELIM: string

Examples:

	color != blue,green
(prodColor = blue,green AND productGroup = shirts)
SEMANTIC_CATEGORY != 568,452,340
(price >= 599.99 OR popularityFunc > 88.5)
(prodColor = blue,green AND (price <= 50.00 AND popularityFunc > 95.0))

 

Notes:

  • DELIM is the facet's multivalue delimiter string that was set in the facet's definition, or ',' (the default).
  • Match calls with invalid facet query strings will be returned as illegal argument errors by the API.
  • When constructing queries keep in mind that queries using 'OR' clauses may execute slower than queries with 'AND' clauses.
  • Multiple values for an '=' clause indicate an implicit OR condition, e.g. color = blue,green means color = blue OR color = green.
  • Multiple values for a '!=' clause indicates an implicit AND condition, e.g. SEMANTIC_CATEGORY != 568,452 means SEMANTIC_CATEGORY != 568 AND SEMANTIC_CATEGORY != 452.
  • Queries using '>', '<', '>=', or '<=' against numeric facets with multiple values per item will match an item if just one of the item's values meet the query, regardless if the other values for the item do not.
  • An argument string for '>', '<', '>=', or '<=' must be a single value. A multivalue argument string will be rejected.
  • Queries using '!=' will match items that do not have a value for that facet.
  • Queries using any operation other than '!=' can match an item only if that item has a value for that facet.
  • If a match call uses a GET request, the facetQuery parameter value must be URL encoded.
  • Expressions that work against strings are case sensitive.

 

Facet Only Queries

It is valid to submit a match call that only contains a facetQuery parameter and does not have a URI or text content included for relevance matching. In this case, items that match the facet query are collected and sorted according to the matchRank parameter. Since the value for the matchRank parameter must be the name of a single value facet, facet only queries are not possible on indexes for which only multivalue facets are defined.

An example use case for submitting a facet only query would be to query an index of timestamped items for the items in a specified date range, ordered by their timestamp.

Example Match Calls With Facet Parameters

Return index items that are relevant to the http://www.linux.org site, returning facet value counts ordered by document count:

	curl -F "includeFacetValueCounts=true" -F "facetValueOrderBy=documentCount"
    "http://api.semantichacker.com/TOKEN/match/INDEX?uri=http%3a%2f%2fwww.linux.org"

 

Return index items that have a price <= 25.00 and belong to the tshirts product group, and are relevant to the http://www.linux.org site:

	curl -F "facetQuery=(price <= 25.00 AND productGroup = tshirts)" 
    "http://api.semantichacker.com/TOKEN/match/INDEX?uri=http%3a%2f%2fwww.linux.org"

 

Match Call XML Response Example

http://api.semantichacker.com/TOKEN/match/youtube?uri=http%3a%2f%2fwww.n...

	<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://www.semantichacker.com/api">
        <about>
                <requestId>994E0EF89781BB3B0A71F930240B4E5F</requestId>

                <docId>12B9CC519FA32800883C988C03ED726A</docId>
                <systemType>match</systemType>
                <contentType>text/html</contentType>
                <contentDigest>E38F117A6ACDEA42CADE983E5666EC92</contentDigest>

                <requestDate>2011-08-30T20:04:34+00:00</requestDate>
                <systemVersion>2.1</systemVersion>
                <sourceUri>http://www.nfl.com</sourceUri>
        </about>

        <contentMatch>
                <contentMatchResponse>
                        <matches>
                                <match id="539774" score="0.47137815" indexId="youtube" >
                                        <externalId>http://gdata.youtube.com/feeds/api/videos/kTgVYrOo4DA</externalId>

                                        <attribute name="title">NFL 2011 Chicago Bears Versus New York Giants: Giants 3rd Field Goal HD.</attribute>
                                        <attribute name="landingPageUrl">http://www.youtube.com/watch?v=kTgVYrOo4DA&amp;feature=youtube_gdata</attribute>
                                        <attribute name="enclosureUrl">http://www.youtube.com/v/kTgVYrOo4DA?f=videos&amp;app=youtube_gdata</attribute>

                                </match>
                                <match id="539775" score="0.43644464" indexId="youtube" >
                                        <externalId>http://gdata.youtube.com/feeds/api/videos/IxE7hUgk31Q</externalId>
                                        <attribute name="title">NFL 2011 Chicago Bears Versus New York Giants: Bears 2nd Field Goal HD.</attribute>
                                        <attribute name="landingPageUrl">http://www.youtube.com/watch?v=IxE7hUgk31Q&amp;feature=youtube_gdata</attribute>

                                        <attribute name="enclosureUrl">http://www.youtube.com/v/IxE7hUgk31Q?f=videos&amp;app=youtube_gdata</attribute>
                                </match>
                                <match id="930248" score="0.40170527" indexId="youtube" >
                                        <externalId>http://gdata.youtube.com/feeds/api/videos/GsvzLrgIX04</externalId>

                                        <attribute name="title">D NATIONAL 2010 SEASON</attribute>
                                        <attribute name="landingPageUrl">http://www.youtube.com/watch?v=GsvzLrgIX04&amp;feature=youtube_gdata</attribute>
                                        <attribute name="enclosureUrl">http://www.youtube.com/v/GsvzLrgIX04?f=videos&amp;app=youtube_gdata</attribute>

                                </match>
                        </matches>
                </contentMatchResponse>
        </contentMatch>
</response>

Match Call JSON Response Example

http://api.semantichacker.com/TOKEN/match/youtube?uri=http%3a%2f%2fwww.n...

	{
    "about":     {
        "requestId": "9012BEE686DC750914E5A14CD82DE530",
        "docId": "12B9CC519FA32800883C988C03ED726A",
        "systemType": "match",
        "contentType": "text/html",
        "contentDigest": "E38F117A6ACDEA42CADE983E5666EC92",
        "requestDate": "2011-08-30T20:05:56+00:00",
        "systemVersion": "2.1",
        "sourceUri": "http://www.nfl.com"
    },
    "contentMatch": {"contentMatchResponse": {"matches":     [
                {
            "id": "539774",
            "score": "0.47137815",
            "indexId": "youtube",
            "externalId": "http://gdata.youtube.com/feeds/api/videos/kTgVYrOo4DA",
            "attributes":             [
                                {
                    "name": "title",
                    "value": "NFL 2011 Chicago Bears Versus New York Giants: Giants 3rd Field Goal HD."
                },
                                {
                    "name": "landingPageUrl",
                    "value": "http://www.youtube.com/watch?v=kTgVYrOo4DA&feature=youtube_gdata"
                },
                                {
                    "name": "enclosureUrl",
                    "value": "http://www.youtube.com/v/kTgVYrOo4DA?f=videos&app=youtube_gdata"
                }
            ]
        },
                {
            "id": "539775",
            "score": "0.43644464",
            "indexId": "youtube",
            "externalId": "http://gdata.youtube.com/feeds/api/videos/IxE7hUgk31Q",
            "attributes":             [
                                {
                    "name": "title",
                    "value": "NFL 2011 Chicago Bears Versus New York Giants: Bears 2nd Field Goal HD."
                },
                                {
                    "name": "landingPageUrl",
                    "value": "http://www.youtube.com/watch?v=IxE7hUgk31Q&feature=youtube_gdata"
                },
                                {
                    "name": "enclosureUrl",
                    "value": "http://www.youtube.com/v/IxE7hUgk31Q?f=videos&app=youtube_gdata"
                }
            ]
        },
                {
            "id": "930248",
            "score": "0.40170527",
            "indexId": "youtube",
            "externalId": "http://gdata.youtube.com/feeds/api/videos/GsvzLrgIX04",
            "attributes":             [
                                {
                    "name": "title",
                    "value": "D NATIONAL 2010 SEASON"
                },
                                {
                    "name": "landingPageUrl",
                    "value": "http://www.youtube.com/watch?v=GsvzLrgIX04&feature=youtube_gdata"
                },
                                {
                    "name": "enclosureUrl",
                    "value": "http://www.youtube.com/v/GsvzLrgIX04?f=videos&app=youtube_gdata"
                }
            ]
        }
    ]}}
}
$

 Semantic Signature is a registered trademark - © 2010 TextWise, LLC. All rights reserved. Privacy Policy