Creating Basic Content
In this part of the Deep Content series we will deal with getting or creating our basic content.
There’s two ways to get that content.
First: Procured content.
Second: Manually created content.
1. Procured content
As was said before, we would never just blatantly steal content. Nooooo.
This step is about how to use search engines and other sources to get raw content.
Any content you get this way will have to be reworked at the end. The best way to do this is to automate this using Word AI.
So, where to get the content?
This really is a two-step process.
In the first step you crawl the search result pages.
In the second step you crawl each result individually.
I’m not going to go into the details of web scraping here. You can write about using the language of your choice or use your box to quickly bang out a scraper. A good resource on programming bots is the book Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
A good tool to use to bang out those bots is – of course – uBot Studio. I have mentioned it previously in this series under Tools and have also reviewed it here.
However, what I want to go into more detail on is what to look out for when crawling the search engines.
Search engines protect their search engine result pages. You can employ two essential tactics to get around this.
1. Make your bot look like a human
This is actually the easier of the two techniques. What you do is simply space out your requests over time. A simple random wait interval between requests can take care of that. Make the bot wait 1-10 seconds before sending off the next query, done.
Pro: This is simple to implement and quite effective.
Con: This means that bigger jobs need more time and having to prepare your searches before hand makes ad hoc idea exploration difficult. In our example (1-10 seonds wait, the average will be 5 seconds wait between queries. Assuming a full roundtrip (send the query, wait for response, save result) takes 2 seconds, this would make an exploration of 1’000 queries balloon from roughly 30 minutes to two and a half hours.
2. Use proxies
Loads of proxies, loads of them.
Pro: effective, make huge amounts of scraping possible.
Con: This is harder to implement and quality proxy services cost money.
3. Bonus Technique
Another way to get around this protection is to use selected sources.
Explore your niche space and find important sites, forums, and article directories related to your topic. You can then collect all these into a custom search engine or CSE. You can then hit that one for more focused results either manually or automatically.
This means using writers for your content.
This can be either yourself or guest writers for your blogs or paid writers. If you’re going to pay for writing services I would suggest you check out writesources.com or textbroker.com
Protip: Don’t be a cheap bastard. With content, you truly get what you pay for.
Trust your old sensei, he tried. (sigh)
Deeper Content System