2025年【爬虫】scrapy框架

【爬虫】scrapy框架目录 什么是框架 如何学习框架 什么是 scrapy scrapy 的基本使用 环境安装 基本使用 什么是框架 集成了很多功能 具有很强的通用性的项目模板 如何学习框架 学习框架封装的功能的详细用法 深层 底层封装源码了解 什么是 scrapy 爬虫中封装好的明星框架 高性能的持久化存储

大家好,我是讯享网,很高兴认识大家。

目录

什么是框架

如何学习框架

什么是scrapy

scrapy的基本使用

环境安装

基本使用


什么是框架

集成了很多功能,具有很强的通用性的项目模板

如何学习框架

  1. 学习框架封装的功能的详细用法
  2. 深层,底层封装源码了解

什么是scrapy

  1. 爬虫中封装好的明星框架。
  2. 高性能的持久化存储、异步的数据下载、高性能的数据解析、分布式

scrapy的基本使用

环境安装

Mac/Linux :

pip install scrapy 

Windows:


讯享网

 

pycharm环境下直接pip install scrapy帮助下载完需要的whl文件:

(venv) D:\pychram\spider>pip install scrapy
Collecting scrapy
  Downloading Scrapy-2.3.0-py2.py3-none-any.whl (237 kB)
     |████████████████████████████████| 237 kB 20 kB/s
Collecting protego>=0.1.15
  Downloading Protego-0.1.16.tar.gz (3.2 MB)
     |████████████████████████████████| 3.2 MB 8.9 kB/s
Collecting w3lib>=1.17.0
  Downloading w3lib-1.22.0-py2.py3-none-any.whl (20 kB)
Collecting PyDispatcher>=2.0.5
  Downloading PyDispatcher-2.0.5.tar.gz (34 kB)
Collecting Twisted>=17.9.0
  Downloading Twisted-20.3.0-cp37-cp37m-win_amd64.whl (3.1 MB)
     |████████████████████████████████| 3.1 MB 26 kB/s
Requirement already satisfied: lxml>=3.5.0 in d:\pychram\spider\venv\lib\site-packages (from scrapy) (4.5.2)
Collecting parsel>=1.5.0
  Downloading parsel-1.6.0-py2.py3-none-any.whl (13 kB)
Collecting pyOpenSSL>=16.2.0
  Downloading pyOpenSSL-19.1.0-py2.py3-none-any.whl (53 kB)
     |████████████████████████████████| 53 kB 26 kB/s
Collecting cssselect>=0.9.1
  Downloading cssselect-1.1.0-py2.py3-none-any.whl (16 kB)
Collecting queuelib>=1.4.2
  Downloading queuelib-1.5.0-py2.py3-none-any.whl (13 kB)
Collecting zope.interface>=4.1.3
  Downloading zope.interface-5.1.0-cp37-cp37m-win_amd64.whl (194 kB)
     |████████████████████████████████| 194 kB 26 kB/s
Collecting cryptography>=2.0
  Downloading cryptography-3.0-cp37-cp37m-win_amd64.whl (1.5 MB)
     |████████████████████████████████| 1.5 MB 18 kB/s
Collecting itemloaders>=1.0.1
  Downloading itemloaders-1.0.2-py3-none-any.whl (11 kB)
Collecting service-identity>=16.0.0
  Downloading service_identity-18.1.0-py2.py3-none-any.whl (11 kB)
Collecting itemadapter>=0.1.0
  Downloading itemadapter-0.1.0-py3-none-any.whl (7.0 kB)
Collecting six
  Using cached six-1.15.0-py2.py3-none-any.whl (10 kB)
Requirement already satisfied: attrs>=19.2.0 in d:\pychram\spider\venv\lib\site-packages (from Twisted>=17.9.0->scrapy) (19.3.0)
Collecting constantly>=15.1
  Downloading constantly-15.1.0-py2.py3-none-any.whl (7.9 kB)
Collecting Automat>=0.3.0
  Downloading Automat-20.2.0-py2.py3-none-any.whl (31 kB)
Collecting PyHamcrest!=1.10.0,>=1.9.0
  Downloading PyHamcrest-2.0.2-py3-none-any.whl (52 kB)
     |████████████████████████████████| 52 kB 28 kB/s
Collecting hyperlink>=17.1.1
  Downloading hyperlink-20.0.1-py2.py3-none-any.whl (48 kB)
     |████████████████████████████████| 48 kB 36 kB/s
Collecting incremental>=16.10.1
  Downloading incremental-17.5.0-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: setuptools in d:\pychram\spider\venv\lib\site-packages (from zope.interface>=4.1.3->scrapy) (47.3.1)
Requirement already satisfied: cffi!=1.11.3,>=1.8 in d:\pychram\spider\venv\lib\site-packages (from cryptography>=2.0->scrapy) (1.14.0)
Collecting jmespath>=0.9.5
  Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Collecting pyasn1
  Downloading pyasn1-0.4.8-py2.py3-none-any.whl (77 kB)
     |████████████████████████████████| 77 kB 31 kB/s
Collecting pyasn1-modules
  Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
     |████████████████████████████████| 155 kB 13 kB/s
Requirement already satisfied: idna>=2.5 in d:\pychram\spider\venv\lib\site-packages (from hyperlink>=17.1.1->Twisted>=17.9.0->scrapy) (2.10)
Requirement already satisfied: pycparser in d:\pychram\spider\venv\lib\site-packages (from cffi!=1.11.3,>=1.8->cryptography>=2.0->scrapy) (2.20)
Building wheels for collected packages: protego, PyDispatcher
  Building wheel for protego (setup.py) ... done
  Created wheel for protego: filename=Protego-0.1.16-py3-none-any.whl size=7769 sha256=2cbf8d0ea70c25086daef895e28b8c6bd052d72dcbecb18dc
  Stored in directory: c:\users\inspur\appdata\local\pip\cache\wheels\ca\44\01\3592ccfbcfaee4ab297c4097e6e9dbe1c7697e3531a39877ab
  Building wheel for PyDispatcher (setup.py) ... done
  Created wheel for PyDispatcher: filename=PyDispatcher-2.0.5-py3-none-any.whl size=12552 sha256=b8a36f20c079dabe458ecddd530daeb0506a376f8de22f17714f
0d811b
  Stored in directory: c:\users\inspur\appdata\local\pip\cache\wheels\dc\d0\bf\0cc715c01fce0bace63b46283acf5cc630d5e5dbb4602c54e5
Successfully built protego PyDispatcher
Installing collected packages: six, protego, w3lib, PyDispatcher, constantly, Automat, zope.interface, PyHamcrest, hyperlink, incremental, Twisted, csssele
ct, parsel, cryptography, pyOpenSSL, queuelib, itemadapter, jmespath, itemloaders, pyasn1, pyasn1-modules, service-identity, scrapy
Successfully installed Automat-20.2.0 PyDispatcher-2.0.5 PyHamcrest-2.0.2 Twisted-20.3.0 constantly-15.1.0 cryptography-3.0 cssselect-1.1.0 hyperlink-20.0.
1 incremental-17.5.0 itemadapter-0.1.0 itemloaders-1.0.2 jmespath-0.10.0 parsel-1.6.0 protego-0.1.16 pyOpenSSL-19.1.0 pyasn1-0.4.8 pyasn1-modules-0.2.8 que
uelib-1.5.0 scrapy-2.3.0 service-identity-18.1.0 six-1.15.0 w3lib-1.22.0 zope.interface-5.1.0
 

基本使用

  • 创建一个工程:scrapy startproject +文件名。如:scrapy startproject firstBlood

     
  •  cd firstBlood
  •  创建一个爬虫文件scrapy genspider example example.com

  example文件打开,如下:在此文件中写爬虫代码

 

控制台语句和结果 :

  • 执行工程: scrapy crawl +爬虫文件名称

 

 

小讯
上一篇 2025-02-15 11:15
下一篇 2025-02-19 20:11

相关推荐

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容,请联系我们,一经查实,本站将立刻删除。
如需转载请保留出处:https://51itzy.com/kjqy/50036.html