Categorization

The SemanticHacker API category service call identifies the main topic categories for the input text or URI, ordered by weight. Categorization of content is performed by analyzing the dimensions and weights of the provided content's Semantic Signature and identifying those dimensions that are most relevant to the Signature. Each dimension in a Semantic Signature represents a path in a tree of categories and the category's weight indicates how relevant it is to the topics contained in the input.

Categories produced by the SemanticHacker API are based on the Open Directories Project (ODP) categorization scheme. In some cases similar dimensions are combined to provide a broader theme. Thus, not all categories have a direct correspondence to the ODP categorization scheme.

Examples include:

  • 'Sports/Football/American/NFL' and 'Sports/Football/American/Players' for the input URI 'http://www.nfl.com'
  • 'Computers/Software/Operating Systems/Linux' for the input URI 'http://www.kernel.org'

There are two ways in which you can implement categorization through the SemanticHacker API:

  1. If you have a small data set and prefer high-level categories, use our category tags.
  2. If you have a large data set and prefer a more refined distinction, use the top dimension to categorize. (See Semantic Signatures overview for more about dimensions)

 

Winter 2012 Technology Preview

A new version of the categorization service with improved relevance is available to be previewed. This version is backwards-compatible with the current default configuration of categorization (configuration ID: odp_2010_categorization). This compatibility includes category lables and category IDs.  To try out this new and improved categorization, please see the Configuration IDs for the Winter 2012 Category and Concept Tag Technology Preview.

$

 Semantic Signature is a registered trademark - © 2010 TextWise, LLC. All rights reserved. Privacy Policy