Analyzing Content

We arrive at this step after you finished your raw content. This doesn’t mean unpolished, but mostly plain, unlinked text. This is the module where we get to use the whole variety of tools available to analyze the content and find ways to add layers of meaning, and media, and links, and, and, and….

So basically this provides the information we need for the following steps which are augmentation and monetization of content.
The resources we will use in this step are the Alchemy API and the libots “Open Text summarizer”. Both of those are free to use, libots because it is an open source tool, Alchemy API can be used free up to a high count of requests actually 1’000 daily requests on a free account, or 30’000 for approved academic users (know a student?).

If you’d like to dive right in, try the web demo of Alchemy or one web frontend for the text summarizer.

Using the libots library

This tool is actually meant to be used to summarize text. The default setting then pars any source down to 20% of the original length. That said, it works best with technical documentation or any rather technical text. Feel free to use it on other texts but be prepared for some quirky results.
The way this works is that the library finds the most salient phrases in your texts and cuts the text down to those. Download the free Windows client and you even get a highlight of those passages.
We will go to the land of unintended uses now. As always, this is where the fun is found.

Don’t summarize me, bro!
Another way to use this tool is to use it to trim down a text just a tiny bit.
For this, set the compression number to anywhere between 80% and 95% and see what is cut away in order to get rid of those superfluous phrases.
Basically, your own editor in the machine.
Reduce to the max!
Using an extreme compression level means you can also find out if your text hits the sweet spot, if your main points are really the most salient phrases in your text.

Using the alchemy API
Now, if you got the impression that the old ape was drooling all over WordAI in the last installment of this series… well, you would be right.

And now is the time for the old simian to introduce yet another one of his favorites.

The Alchemy API is a huge collection of tools to analyze text. It is big enough that it’d need its own series to cover completely. For this reason we’ll just look at it on a surface level and leave the deep exploration to the dear reader – or a follow up if there is enough interest.

What can Alchemy API do for us?

  • Content scraping
    Well, this should probably have been mentioned in the “procure content” section. Alchemy API has the ability to grab the content from any URL.Now this is not just a simple download, using advanced analysis tools, it tries to grab the actual content without navigational elements, footer text, etc.. from any given address.Sound useful?
  • Categorize the text
    Alchemy gives you 12 categories –
    1. Arts & Entertainment
    2. Business
    3. Computers & Internet
    4. Culture & Politics
    5. Gaming
    6. Health
    7. Law & Crime
    8. Religion
    9. Recreation
    10. Science & Technology
    11. Sports
    12. Weather
  • Identify and Extract People and Locations
    Just the possibilities of linking out (to people) or adding to your site (maps, anyone?) are endless.
  • Tagging
    Alchemy API tries to tag the text for you.
    For this, it tries to find the concepts in the text. This is called “concept tagging”
    It is capable of making abstractions – the given example is
    (“Hillary Clinton + Barbara Bush + Laura Bush == First Ladies of the United States”)
    which makes this extremely useful for a variety of tasks.
  • Sentiment extraction
    Is the text positive or negative regarding the topics it discusses?
  • Keyword Extraction
    Find relevant keywords for the text.

Now, if you don’t find enough data in all this to help you further… what….sigh..
OK, let the simian sensei outline some ideas here.

Tagging – let the machine do the tagging work for you. Instead of having to think up tags (or have some poor writer think them up) You’ll have a hassle free and consistent source for tags.
Ectracting people and locations – Maps are always nice bywork for visitors. Linking to biographies (or wikipedia pages) of people is a nice outbound link. Some programming work and you’ll just link to your own internal articles with the same entities (locations or people).
Categorization – Just grab an offer for each category and link to it on articles with that catagorization – easy.
The most basic use of all this would be to take all the data for further media exploration – use locations and people to link out to wikipedia, use concept tags to find images and videos on youtube, etc..

But, those are entry level ideas. If you are of a more scraping / automation mindset, just the set of tools will have your mind spinning already.

As it should.

Deeper Content System

  1. Introduction
  2. Tools
  3. Topic research
  4. Creating basic content
  5. Analyzing content
  6. Augmenting content
  7. Monetization