By constantly rotating your IP list in your Python operations, you can use your IP addresses in a healthier way and make more comfortable requests in your data scraping operations.
Proxy addresses can hide your IP address, but when you send multiple requests from the same IP address, you will get stuck in the transaction limits of the other site, which we call rate-limit, and your IP address may be banned because the other site will perceive you as an attacker. If you are not using a rotating proxy, but instead have a proxy pool of static IP addresses, we will explain how you can easily rotate IP addresses in the following article.
What is a Rotating Proxy?
Rotating proxies are actually IP addresses that are usually rotated continuously in Residential proxy services. However, sometimes when the Residential proxy is not fast or if you already have an IP list, you can use your own rotation library to rotate IP addresses within the software you are using. This is actually a more convenient solution for people who don’t want to do GB-based accounting or have their own IP addresses.
IP rotation is the process of dynamically changing the IP address used for outgoing network requests. The IP change usually happens after each request, at regular intervals or as you need it. The idea behind IP rotation is quite simple. Instead of using the same IP address for every web request, you switch between multiple IPs.
How to Return Proxies in Python
In this section, you will learn the steps involved in returning proxies in Python. To test the proxy connection, you will use https://httpbin.io/ip, an endpoint that returns your IP address. But first, let’s look at the requirements we need to get started.
Step 1: What we need
To follow this tutorial, make sure that the following are available on your machine:
- Python3: Pre-installed on some systems. However, make sure it is up to date (version 3+).
- An IDE Although this tutorial uses VS Code, you can still follow along with your preferred IDE.
- Requests You will use Python’s Requests as the HTTP client. Install using pip:
pip3 install requests
Step 2: Buy a Proxy
If our package is installed, we are ready. Next you will need a proxy list. According to the website you will do Web Scraping, you can browse the Proxynet Self-Service customer panel so that you can choose the location proxy you want according to the sites you will request, and you can buy IPv4 or IPv6 proxy optionally. For this, you can always contact the sales representative beforehand.
Step 3: Send a request without a proxy
By default, it will use your local IP if you do not specify a request. To check, let’s send an initial request to the target test site without a proxy.
Import the requests library, visit the target site and print its response:
# pip3 install requests
import requests
# send a request to the test endpoint
response = requests.get("https://httpbin.io/ip")
# validate the response
if response.status_code != 200:
print(f"The request failed with {response.status_code}")
else:
print(response.text)
The above script will output your default local IP address. Now, let’s enhance it to set up a proxy in Python.
Step 4: Let’s make a request with a proxy
Update your existing code with the proxy address, specifying HTTP and HTTPS protocols in a dictionary. Then route your request through it:
# pip3 install requests
import requests
# specify the proxy server address
proxies = {
"http": "http://proxy.proxynet.io:60001",
"https": "http://proxy.proxynet.io:60001",
}
# send a request to the test endpoint
response = requests.get(
"https://httpbin.io/ip",
proxies=proxies,
)
# validate the response
if response.status_code != 200:
print(f"The request failed with {response.status_code}")
else:
print(response.text)
When you pass the code, it will give the result of the IP address behind the proxy address as follows. If the IP addresses are different in the request you send without the proxy and the request you send with the proxy, your request is successful and the proxy is now available.
{
"origin": "120.36.89.225:30223"
}
Step 5: Create a rotating proxy list
You can save the proxies you have purchased from our site in a file like proxy-list.txt. These IP addresses should be as follows:
proxy.proxynet.io:60001
proxy.proxynet.io:6002
You can return proxies in two ways:
- Iterate the proxy pool in order.
- Random iteration in the proxy pool.
Below we will show you how to apply both methods.
Selecting an IP address from the proxy list in order
Sequential proxy rotation is suitable for even traffic distribution between proxies. It can be useful if you have a small proxy pool and want to avoid overusing some proxies over others.
However, the limitation of this method is that the target server can detect a pattern and ban the proxy pool. To return the proxies in order, you will loop through the list in the correct order. Let’s implement this now!
# pip3 install requests
import requests
from itertools import cycle
# read the proxies from the proxy list file
proxies_list = open("proxies_list.txt", "r").read().strip().split("\n")
# create a proxy generator
proxy_pool = cycle(proxies_list)
# iterate through the proxy list
for _ in range(4):
# get the next proxy from the generator
proxy = next(proxy_pool)
# prepare the proxy address
proxies = {
"http": f"http://{proxy}",
"https": f"http://{proxy}",
}
# send a request to the target site with the proxy
response = requests.get(
"https://httpbin.io/ip",
proxies=proxies,
)
if response.status_code != 200:
print(f"The request failed with {response.status_code}")
else:
print(response.text)
Select randomly from the proxy list:
Proxy randomization can prevent the target server from detecting a pattern in your request. This method involves selecting proxies from a pool in an undetermined way. However, a disadvantage is that some proxies may be overused compared to others.
Let’s modify the previous code to randomize the proxies. To randomize the proxy list directly in the for loop, use Python’s built-in random.choice method:
# pip3 install requests
import requests
import random
# read the proxies from the proxy list file
proxies_list = open("proxies_list.txt", "r").read().strip().split("\n")
# iterate through the proxy list
for _ in range(4):
# choose a proxy at random from the list
random_proxy = random.choice(proxies_list)
# prepare the proxy address
proxies = {
"http": f"http://{random_proxy}",
"https": f"http://{random_proxy}",
}
# send a request to the target site with the proxy
response = requests.get(
"https://httpbin.io/ip",
proxies=proxies,
)
if response.status_code != 200:
print(f"The request failed with {response.status_code}")
else:
print(response.text)
With the two methods above, you can pull data from the websites you want to send requests to or send requests.
Use different User-Agents
It will always be more useful for you to send different user-agents in your code. At the same time, if you want to automate this, you can take a look at libraries like fake-useragent.
Conclusion
Python is ultimately an ideal solution for data scraping. With free libraries, you can make your data scraping even more effective and run smoothly.