Scrapy: a simple example of how to use TTProxy

If you read the first two blog posts about Scrapy - <Scrapy: A Python web scraping framework> and <Web crawler development - Get Started with Scrapy>, you must have a general understanding of Scrapy.
Then you would also like to know how to use proxy to crawl data in a Scrapy project.
Now, let's use a simple example to show you how to use TTProxy proxy IP in Scrapy.
First, you need to purchase a proxy certificate in the Management Console of the TTProxy. If you are a newly registered user, TTProxy will give you a 100MB free traffic certificate for testing, you can also use this certificate to crawl data first.
Then you need to have a Scrapy project, which can be a new one or a project you already have. You can initiate a new Scrapy project with the following command:
scrapy startproject tutorial
Add credential configuration
In, we add two configurations to save TTProxy credential information.
TTPROXY_LICENSE = "your license"
TTPROXY_SECRET = "your license's secret"
And enable TutorialDownloaderMiddleware:
'tutorial.middlewares.TutorialDownloaderMiddleware': 543,
Modify the download middleware
The downloader middleware is a framework of hooks into Scrapy’s request/response processing. It’s a light, low-level system for globally altering Scrapy’s requests and responses.
Before using the proxy, the proxy IP must be obtained, we can use the requests library to obtain. Define a get_proxy method in TutorialDownloadMiddleware in to obtain the proxy, the code is as follows:
def get_proxy(self,license,secret):
params = {
"license": license,
"time": int(time.time()),
"cnt": 1,
params["sign"] = hashlib.md5((params["license"] + str(params["time"]) + secret).encode('utf-8')).hexdigest()
response = requests.get(
"Content-Type": "text/plain; charset=utf-8",
res = json.loads(response.content)
return res["data"]["proxies"][0]
except requests.exceptions.RequestException:
print('HTTP Request failed')
Then we can call get_proxy in the process_request method and set the obtained proxy to the request:
request.meta["proxy"] = "http://" + self.get_proxy(spider.settings["TTPROXY_LICENSE"],spider.settings["TTPROXY_SECRET"])
Complete TutorialDownloaderMiddleware code is as follows:
import base64
from scrapy import signals
import hashlib
import requests
import time
import json
# ...
class TutorialDownloaderMiddleware(object):
def get_proxy(self,license,secret):
params = {
"license": license,
"time": int(time.time()),
"cnt": 1,
params["sign"] = hashlib.md5((params["license"] + str(params["time"]) + secret).encode('utf-8')).hexdigest()
response = requests.get(
"Content-Type": "text/plain; charset=utf-8",
res = json.loads(response.content)
return res["data"]["proxies"][0]
except requests.exceptions.RequestException:
print('HTTP Request failed')
def from_crawler(cls, crawler):
s = cls()
crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
return s
def process_request(self, request, spider):
request.meta["proxy"] = "http://" + self.get_proxy(spider.settings["TTPROXY_LICENSE"],spider.settings["TTPROXY_SECRET"])
return None
def process_response(self, request, response, spider):
return response
def process_exception(self, request, exception, spider):
def spider_opened(self, spider):'Spider opened: %s' %
A simple crawler
Use the following command to create a simple crawler whose function is to verify whether the proxy IP is set successfully:
scrapy genspider myip
Then modify to get the current IP address by requesting :
# -*- coding: utf-8 -*-
import scrapy
class MyipSpider(scrapy.Spider):
name = 'myip'
allowed_domains = ['']
start_urls = ['']
def parse(self, response):
Now you can run the myip crawler using the following command:
scrapy crawl myip --nolog
Then you will see the IP address checked by By comparing with your local public network ip you can easily judge whether the ttproxy proxy is set successfully.
Through this simple example, I believe you already know how to use TTProxy in Scrapy.