Updating Proxy Database

Updating Proxy Database

The senile simian is sharing this concept simply because it has been gathering dust for too long.

If you code it, please let me know.

Here is the flow diagram, please view in full.

Updating Proxy Database – flow

 

The idea is to take advatage of the free proxies provided all over the net.

Get proxies

Gather a few text files with a simple google query like this:

+”:8080″ +”:3128″ +”:80″ filetype:txt

Dump all of them from the textfiles into a database, consisting simply of

URL, last_date, schedule

URL is the url for the proxy, last_date is the date of the last check, schedule is the time in days to pass before it gets checked again.
For a new proxy, we set schedule to 0, which will be a special case.

Check proxies

Your “check_proxies” function simply queries the database to grab all URLS that should be checked today, as well as those with that 0 entry in “schedule”.

It then proceeds to perform a simple function check (get the code from a website, for example)

Rate and schedule proxies

The logic now is simple.

If the proxy works, we time it and rate it.

It also gets bumped up to 1 day schedule, regardless of where it was before.

If it does not work, bump it down.

That is it in a nutshell.

Let it run one time per day, fill in a few new proxies every now and then and enjoy.

Room for improvements

Of course!

Keep track of how often a proxy was used by your scripts (to rotate), Make it easy to dump new proxies into the DB, check if a proxy supports what you need (SSL,HTTPS, etc…), check if the proxy is passing only what you want, etc…

Knock yourself out.