Updating Proxy Database
The senile simian is sharing this concept simply because it has been gathering dust for too long.
If you code it, please let me know.
Here is the flow diagram, please view in full.
The idea is to take advatage of the free proxies provided all over the net.
Gather a few text files with a simple google query like this:
+”:8080″ +”:3128″ +”:80″ filetype:txt
Dump all of them from the textfiles into a database, consisting simply of
URL, last_date, schedule
URL is the url for the proxy, last_date is the date of the last check, schedule is the time in days to pass before it gets checked again.
For a new proxy, we set schedule to 0, which will be a special case.
Your “check_proxies” function simply queries the database to grab all URLS that should be checked today, as well as those with that 0 entry in “schedule”.
It then proceeds to perform a simple function check (get the code from a website, for example)
Rate and schedule proxies
The logic now is simple.
If the proxy works, we time it and rate it.
It also gets bumped up to 1 day schedule, regardless of where it was before.
If it does not work, bump it down.
That is it in a nutshell.
Let it run one time per day, fill in a few new proxies every now and then and enjoy.
Room for improvements
Keep track of how often a proxy was used by your scripts (to rotate), Make it easy to dump new proxies into the DB, check if a proxy supports what you need (SSL,HTTPS, etc…), check if the proxy is passing only what you want, etc…
Knock yourself out.