Filter Call


Service Call:
http://api.semantichacker.com/TOKEN/filter[/FilterID]?

Overview

The filter call attempts to 'filter' out useless text from the indicated content. What text is produced varies depending on the filter algorithm used. The filter call is an external call to the same functions used when the 'filter' parameter is provided for other API calls. It can be very useful to "see" what text out of some content page is actually be utilized by the underlying Semantic Signature® technology. The following table details the available algorithms:

Name Filter ID Description
HTML html Removes all HTML markup from the provided input. This is a simple algorithm that only removes the HTML tags for the purpose of returning the text contained within. External resources specified within any HTML markup or routines that dynamically add content will not be processed. This process does not guarantee that all text appearing on a page as rendered in a browser will be produced in the output of this call.
Web Page web Web pages often have different sections of text on them besides the page's main topic, such as advertisements, links, headers, footers, etc. The Web Page filter uses an advanced algorithm that looks at each text block on the page and attempts to identify the text segments that contain the page's main subject text. This filter often does a much better job of finding the 'useful' text from web pages than the simple HTML filter.
Plain Text text 'Cleans up' plain text by removing extra whitespace, foreign characters, etc.
Wiki wp Produces useful subject text from MediaWiki formatted text.

The desired filter type used is specified in the call by adding its identifier to the end of the request path. Although the filter ID is not required, it is highly recommended for each call to include the filter ID. If not provided, the filter used will be selected by looking at several properties of the API request made. The method of indicating content, URI, parameter, or upload, as well as the 'Content-Type' header will be utilized to determine the proper filter.

The filter call can be used to see the final text that would be used as input when using other TextWise API calls. To use a filter type other then the default, be sure to include the 'filter' parameter with the request for any call.

Besides the common request parameters, the filter call currently does not have any other request parameters.

The filter call supports the 'xml' and 'json' output formats.

Examples

http://api.semantichacker.com/TOKEN/filter/web?uri=http://www.linux.com

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://www.semantichacker.com/api">
        <about>
                <requestId>910A5DD9C997C44AD55C27D77FD76429</requestId>
                <docId>C3B51B2F00040CF563AB5EB111BCF036</docId>
                <systemType>filter</systemType>
                <configId>odp_2007_l1_1.7k</configId>
                <contentType>text/html</contentType>
                <contentDigest>A7CABFB691D893395C211CAE701824DC</contentDigest>
                <requestDate>2011-08-30T20:28:04+00:00</requestDate>
                <systemVersion>2.1</systemVersion>
                <sourceUri>http://www.linux.com</sourceUri>
        </about>
        <filter>
                <filterResponse>
                        <filteredTextLength>3645</filteredTextLength>
                        <filteredText>Linux.com   The source for Linux information .
Reflections from LinuxCon by Philip Koltun I just returned from a great trip to Vancouver for LinuxCon North America 2011. Here&apos;s a quick (and visual) report on the proceedings
, with a bias toward t... .
What We Know For Sure on Linux&apos;s 20th Anniversary by Jim Zemlin Twenty years ago today Linux creator Linus Torvalds posted a message online that would c... . the world:
A Look at Oregon State University&apos;s Open Source Lab by Joe &apos;Zonker&apos; Brockmeier One of the key themes at LinuxCon North America 2011 is the ubiquity of Linux. Many peo
ple use Linux in many ways, often totally unaware that they&apos;re depending on ... .
IBM&apos;s Irving Wladawsky-Berger Talks Linux Then and Now by Joe &apos;Zonker&apos; Brockmeier The second day of LinuxCon North America 2011 kicked off with a key figure to Linux&
apos;s success, Dr. Irving Wladawsky-Berger. Formerly responsible for IBM&apos;s resp... .
Last week I wrote about Samsung releasing code to a new DRM driver for one of their ARM SoCs, the Exynos 4210 that&apos;s used by the Samsung Galaxy S II and other mobile devices. I
t looks like this open-source kernel driver from Samsung stands a chance as being the first...      Read more... .
Toshiba&apos;s first Honeycomb-powered tablet has a 10.1-inch screen and a 1 GHz NVIDIA Tegra 250 processor, and is now available in Europe. At 15.8 mm deep and 765 grams, it is one
 of the thickest and heaviest Android devices yet...        Read more... .
Last week, I had the opportunity to present for the Maryland Association of Counties (MACO) summer meeting along with Baltimore County Executive Kevin Kamenetz. Earlier this year, h
e announced that Baltimore County would be launching 23 technology initiatives and he shared a bit about their social...    Read more... .
Not long ago, when Amazon first reported that it would be selling the Kindle eBook device, many analysts scoffed, and they had good reasons to do so. Read more... .
Here&apos;s some very exciting news coming out of the Google Chromium OS team for upstream work they continue making to Mesa... They have enabled GLX_EXT_texture_from_pixmap in the 
software drivers! This means that it may now be possible to use compositing window managers nicely from the Gallium3D software drivers like LLVMpipe...     Read more... .
Geeks Without Frontiers is at the final stage of building a low-cost, open source Wi-Fi software that aims to provide affordable broadband in new locations. Read more... .
Hold on to something tight, webOS geeks. Your favorite tablet, which I can only assume is the TouchPad, might not be the last webOS tablet incarnation from HP. An HP executive and former webOS VP recently stated that the company could resurrect the TouchPad stating to Reuters, &quot;tablet computing is a segment of the market that&apos;s relevant, absolutely.&quot;  Read more... .
GIMP 2.7.3 has added one of the most requested features in the program&apos;s history: a single window mode. Version 2.7 is part of the development branch, so unfortunately, the feature wont hit most distro repositories for a while.         Read more... .
If it reaches a price point below $300, Amazon could sell 3 to 5 million tablets in the fourth quarter, according to Forrester analyst Sarah Rotman Epps.        Read more... .
20th Anniversary of Linux T-shirt Celebrate 20 Years of Linux! This official 20th Anniversary of Linux T-shirt was designed by Sweden-based designer Kim Blanche, the winner of this year&apos;s Linux.com Store T-shirt Design Contest...
 Read More... .
The Linux Foundation is a non-profit consortium dedicated to the growth of Linux. More About the foundation... Frequent Questions Join / Linux Training / Board .

</filteredText>
                </filterResponse>
        </filter>
</response>

http://api.semantichacker.com/TOKEN/filter/html?uri=http://www.kde.org/

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://www.semantichacker.com/api">
        <about>
                <requestId>6645A36EED6A645DDD0987E131DB35FF</requestId>
                <docId>75DEEBB640C14FC4DBF33DBDCD69E72E</docId>
                <systemType>filter</systemType>
                <configId>odp_2007_l1_1.7k</configId>
                <contentType>text/html</contentType>
                <contentDigest>D052105CA400E7661921B480B05EA89B</contentDigest>
                <requestDate>2011-08-31T18:22:07+00:00</requestDate>
                <systemVersion>2.1</systemVersion>
                <sourceUri>http://www.kde.org</sourceUri>
        </about>
        <filter>
                <filterResponse>
                        <filteredTextLength>3775</filteredTextLength>
                        <filteredText>&quot;KDE Homepage, KDE.org&quot;.

KDE - Experience Freedom! Skip to content Skip to link menu KDE Community Community About KDE Software Compilation Project Management Development Model Internationalization KDE e.V. Foundation Free Qt Foundation History Awards Press Contact Announcements Events Get Involved Donate Code Of Conduct Press Page Workspaces Workspaces Plasma Desktop Plasma Netbook Applications Applications Development Education Games Graphics Internet Multimedia Office System Utilities Developer Platform Dev. Platform Techbase Wiki API Docs Tutorials Support Support International Sites Documentation Userbase Wiki Sys Admin Wiki Forums Report a Bug Mailing Lists Security Advisories Join The Game Experience Freedom! Get KDE Software Latest News Wrap Up - Desktop Summit 2011 Berlin KDE Commit-Digest for 21st August 2011 KDE in France - the View from RMLL Call for Hosts for Akademy 2012 KDE Commit-Digest for 14th August 2011 More News Community Blogs GSoC 2011 finished COSCUP and Taiwan Future of Webbaverse Broadcast Network kde email list unsubs Akademy 2012 a &quot; looking for hosts More Blogs Forum News &amp; Discussion :: Re: Krita on non-KDE systems Other KDE software :: kdecache is huge, for God&apos;s sake! Semantic Desktop :: Re: Please add option to disable Semantic Desktop Calligra :: Re: Source code missing. KOffice :: Re: Kexi MySQL Decimal More Forum Topics The KDE  Community is an international technology team dedicated to creating a free and user-friendly computing experience, offering an advanced graphical desktop, a wide variety of applications for communication, work, education and entertainment and a platform to easily build new applications upon. We have a strong focus on finding innovative solutions to old and new problems, creating a vibrant atmosphere open for experimentation. Learn more... Latest Announcements Release 4.7 - New Features, Improved Stability and Performance On 27th July 2011, KDE has released 4.7.0, containing compelling new features and improvements to the Plasma Workspaces , the KDE Applications and the KDE Development Platform . KDE Shows Second Release Candidates of Summer Release July, 11th, 2011. KDE Ships Second 4.7 Release Candidates of this summer&apos;s release of the Workspaces, Applications and Platform.
KDE Ships July Updates July, 7th, 2011. KDE has released a series of updates to its workspaces, applications and development platform. This lifts the version of the KDE Software Compilation to 4.6.5.
View more announcements... Latest Applications PCNC Numerical control software for 3 axis over parallel port for step based CNC-Machines and others.
The navigation programs can only be get on the projectA s homepage
urlhttp://pcnc.freeoda.com//url ... skreener Skreener is a screencast recording tool, primarily aimed at creating tutorials on how to use software. Ita  s closest proprietary analog is Wink.
Skreener records your actions by taking screenshots ... PeaZip PeaZip is an open source file and archive manager: cross platform, available as portable and installable software for Windows (9x, 2k, XP, Vista) and Linux x86 and x86-64.
PeaZip is a desktop neutral ... indoLyrics Here is an update to the old indic lyrics script @http://kde-apps.org/content/show.php/iTRANS+Amarok+Script?content=50890 to work with Amarok v2. The bundled lyrics_lyricwiki has been enhanced to show ... KCM Qt Graphics System This KCM allows you to easily configure the standard Qt graphics system.
Please note that this requires Qt 4.7.0 or greater to work.
... More Applications Global navigation links KDE Home KDE Accessibility Home Description of Access Keys Back to content Back to menu Maintained by KDE Webmasters KDE   and the K Desktop Environment   logo are registered trademarks of KDE e.V.  
Legal 
</filteredText>
                </filterResponse>
        </filter>
</response>