Python 使用BrowserMob Proxy + selenium 获取Ajax加密数据


	Python 使用BrowserMob Proxy + selenium 获取Ajax加密数据
[编程语言教程]

BrowserMob Proxy,简称 BMP,它是一个 HTTP 代理服务,我们可以利用它截获 HTTP 请求和响应内容。

第一步:先安装 BrowserMob Proxy 的包。

pip install browsermob-proxy

技术图片

 

 

 第二步:下载 browsermob-proxy 的二进制文件,用于启动 BrowserMob Proxy。

下载地址:https://github.com/lightbody/browsermob-proxy/releases

技术图片

 

 

 第三步:将下载好的文件直接放到项目目录下。

代码走起:

# _*_ coding:utf-8 _*_
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from browsermobproxy import Server
import time
import json


class BaseFramework(object):

    def __init__(self):
        self.server = Server(‘./browsermob-proxy-2.1.4/bin/browsermob-proxy‘)
        self.server.start()
        self.proxy = self.server.create_proxy()
        chrome_options = Options()
        chrome_options.add_argument(‘--ignore-certificate-errors‘)
        chrome_options.add_argument(‘--proxy-server={0}‘.format(self.proxy.proxy))
        chrome_options.add_argument(‘--headless‘)  # 无头模式
        self.browser = webdriver.Chrome(options=chrome_options)

    def process_request(self, request, response):
        pass

    def process_response(self, response, request):
        pass

    def run(self, func, *args):
        self.proxy.new_har(options={
            ‘captureContent‘: True,
            ‘captureHeaders‘: True
        })
        func(*args)
        result = self.proxy.har
        for entry in result[‘log‘][‘entries‘]:
            request = entry[‘request‘]
            response = entry[‘response‘]
            self.process_request(request, response)
            self.process_response(response, request)

    def __del__(self):
        self.proxy.close()
        self.browser.close()


class Framework(BaseFramework):

    def load(self, url):
        self.browser.get(url)
        time.sleep(3)

    def process_request(self, request, response):
        pass

    def process_response(self, response, request):
        # print(request[‘url‘])
        if ‘/item/timemap/cn/‘ in request[‘url‘]:
     # 找到你所需数据的url即可快乐的解析数据了 try: text = response[‘content‘][‘text‘] text_dict = json.loads(text) data_result = text_dict[‘data‘] except KeyError: print(‘----KeyError: text----‘) return name = data_result[‘name‘] # 姓名 id_name = name_id + ‘_‘ + name print(id_name) time_map_list = data_result[‘timeMap‘] if time_map_list: time_map_dict = {} for i in range(len(time_map_list)): time_map = time_map_list[i] time_map_dict[str(i)] = time_map else: return path = f‘./****/{id_name}.json‘ if os.path.exists(path): print(f‘------{id_name}--已存在------‘) return with open(path, ‘w‘, encoding=‘utf-8‘) as f: f.write(json.dumps(time_map_dict, ensure_ascii=False, indent=4)) if __name__ == ‘__main__‘: Framework = Framework() id_list = [‘********‘] for name_id in id_list: url = "************************" Framework.run(Framework.load, url)
hmoban主题是根据ripro二开的主题,极致后台体验,无插件,集成会员系统
自学咖网 » Python 使用BrowserMob Proxy + selenium 获取Ajax加密数据