Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

Scrapy Architecture

Creating a Spider.

　　Spiders are classes that you define that Scrapy uses to scrape(extract) information from a website(s).

import scrapy

class QuoteSpider(scrapy.Spider):

    name = "quote"

    start_urls = [

        'https://bluelimelearning.github.io/my-fav-quotes/'

    ]

    def parse(self, response):

        for quote in response.css('div.quotes'):

            yield{

                'quote':quote.css('p.aquote::text').extract(),

                'author':quote.css('p.author::text').extract_first(),

            }

Running your spider and saving scrapped data.

scrapy runspider quotes_spiders.py -o quotes.xml

https://www.cleancss.com/strip-xml/

Scraping data with Scrapy Shell

scrapy shell "https://bluelimelearning.github.io/my-fav-quotes/"

response.css('title')

response.css('title::text').extract()

response.css('h1::text').extract()

quote = response.css("div.quotes")[]

aquote = quote.css("p.aquote::text").extract()

aquote

巴特西

Web Scraping using Python Scrapy_BS4 - using Scrapy and Python(2)

最新文章

热门文章