Filling autoblogs with Google news
Google news is a nice way to get some content into your niche autoblog, but you will have some preparing to do.
(For starters, get and install WP-o-Matic)
- Point your browser to news.google.com
- Enter your search phrase, for this example we will be using “chimpanzee birth” (Always good to keep up with the family)
- Now look on the lower left navigation and press the link for “RSS”
- You are taken to the page of the RSS feed, go and grab that URL. It should look like this: http://news.google.ch/news?ie=UTF-8&oe=UTF-8&
rls=org.mozilla%3Aen-US%3Aofficial&client=firefox-a&um=1&
tab=wn&q=chimpanzee+birth&output=rss&ned=:ePkh8BM9EwLbwQq0w4AFqy1A
qVyIFNweIwH_SKsbIrNcC6-xa3UWy88sBgBW2w2e
Unbelievably long Google link here
- Enter that as feed address into a new WP-o-Matic campaign
Now we have to get rid of the nastiness that is the Google news link. Basically, Google does not give you the direct links in the feed , but redirects them via their servers. Luckily, we can get rid of them by way of the rewrite ability of WP-o-Matic.
- Go to the “Rewrite” tab
- Enter this for origin:
/http...news.*amp;url=(.*)&cid.*"{1}/
- Check “Regex”
- Under “Rewrite to” enter
$1
- check “rewrite”
-Submit
Done.
13 Comments so far
Leave a reply

Well, I cant agree more.
The rewrite does not seem to work for me? Did G change the format? I am hopeless with regex
Sometimes google changes the format (redirects are quite common).
Regex are really worth your time, though.
get yourself a book and learn. Or bribe me to write a tutorial.
::emp::
Hi,
The link I get following your instructions is:
http://news.google.com/news?hl=en&ned=us&q=chimpanzee+birth&ie=UTF-8&output=rss
This WP-o-Matic then is not able to fetch the single news.
Any idea?
Hi,
I followed the instructions mentioned above. The problem I get is that no posts are fetched. Although I get no error message, WP-o-Matic says the following after trying to fetch for the first time: “Campaign processed. 0 posts fetched”.
Any idea?
You are running into what is known as the “inherent brittleness” of scraping.
This basically means that as soon as your source (in this case, google news) changes the code, you are out of luck.
Google has been known to add tracking code every once in a while. This might be the case, just check the source code and see if you need to adjust the regular expression.
thanks..
All,
There are a few reasons why the above code does not work. First of all, Google News has definitely tinkered with their code since the original post was first published. Not much, but just enough to make the REGEX code obsolete.
Second, the code itself has flaws, anyway. I’m not sure how the original author ever got it to work, honestly. It is what is commonly referred to as a “greedy” expression. It ends up deleting most of what is in the RSS feed, after “fixing” the URL. As of a few minutes ago, the below REGEX code should work:
Find:
/”http:\/\/news\.google\.com.+?url=(.+?)&.+?”/
Replace With:
“$1″ (yes, include the quotation marks)
However, when you go to paste this code into WP-O-Matic, there is a bug in the software that gives a false warning about the code not working. The only way to fix this without recoding WP-O-Matic itself is to enter these values into the database directly. So, you will need to get familiar with PHPMyAdmin, or some other MySQL database editor. Once you enter these values into the database directly, DO NOT try altering them via the WP-O-Matic GUI! It will just mess things up, and you will have to start all over again.
Now, as to the reason why it is that people who enter the correct Google News Search RSS feeds into WP-O-Matic and get “0″ results or matches, that is an entirely different problem altogether, which has been discussed at length elsewhere. There is a hack that you can use to fix this though.
First, enter the feed in WP-O-Matic via the WP admin and set everything up as usual. Be sure to enter something in the REGEX field in order to activate the feature at this time. You can edit it directly in the MySQL database later. Don’t download anything new until you have completed all of these steps.
Second, go to PHPMyAdmin, or whatever MySQL editor you happen to use. Find the Table and Field that contain the value for your Google News RSS Feed(s). These will need to be edited. You’ll notice that some RSS feeds, particularly Google News, look different than the way that you entered them in WP-O-Matic. There are two ways of fixing this. Either copy and paste the Google News Search RSS URL as you originally found it directly into this fields, overwriting the values that WP-O-Matic has entered. Save it. The other way is to edit the URL by hand, removing every instance of the phrase “&”. WP-O-Matic adds these by default, for whatever bizarre reason. After you’ve done this, click save. At this time, you should find the correct Table and Field for REGEX and add the formula values I gave above. Save. You will now be able to download any Google News Search via an RSS feed, AND the “Googled” URL will be unmangled, back to its original.
Another plugin that I found EXTREMELY helpful during my testing is called WP “Search REGEX” You can download it from wordpress.org. It allowed me to search through the entire MYSQL database using REGEX code and test the search and replace values without doing any direct modifications to the database. It’s great for testing “what if” scenarios, and finding the location of text data. It will also save these search and replace sessions to the database, if you want it to. A very, very handy tool!
WP-O-Matic has a lot of bugs, to be sure. It’s a shame that the developer is no longer working on it. From his website, it looks like he was all of 17 years old when he wrote it! Unless someone else revives the project, it will always have little problems that need to be fixed. The unstable CRON feature is one that I have yet to solve. Hope this helps!!!
Van
Regarding the “Campaign processed. 0 posts fetched.” issue.
It is a one line fix to the wpomatic.php code. Just need to include the htmlspecialchars-decode command to the feed url.
http://www.davewooding.com/wp-o-matic-wordpress-plugin-does-not-fetch-results/
Dave
@Van that REGEX you offered:
/”http:\/\/news\.google\.com.+?url=(.+?)&.+?”/
*almost* works… although it doesn’t seem to strip out the trailing “USG” parameter that is in the Google News URL after the original news story URL…
Wish I was more of a REGEX pro…
Rob
Or…you could just run the google news feed through feedburner and use that. Works like a champ!
Informative post.. keep writing
Another way to do this is to write a PHP script to pull up the Google News RSS feed and parse it. Then enter that URL into WP-O-Matic. Something like this:
// Identify URL
$url = “http://news.google.com/news?pz=1&cf=all&ned=us&hl=en&q=%22investing+in+gold%22+-GLD&cf=all&output=rss”;
// Echo contents
$rss = file_get_contents($url,0);
// Perform some regex here
// $rss = preg_match ( Perform some regex here );
echo $rss;