본문 바로가기

크롤링3

[빅데이터 분석] 크롤링 -(3) 1. 동적 웹페이지 크롤링 selenium 라이브러리 , chromedriver 사용 [실전] : CoffeeBean 가맹점 이름 크롤링해보기 ''' from bs4 import BeautifulSoup import urllib.request import pandas as pd import datetime from selenium import webdriver import time def CoffeeBean_store(result): CoffeeBean_URL = "https://www.coffeebeankorea.com/store/store.asp" wd = webdriver.Chrome(executable_path = 'chromedriver가 있는 경로' for i in range(1,370): w.. 2021. 4. 12.

[빅데이터 분석] 크롤링 - (2) 정적 웹 페이지 크롤링 beautifulsoup 패키지 이용 (pip install beautifulsoup4로 설치) ''' frome bs4 import BeautifulSoup html= ''' 검색엔진 Naver Google Daum ''' soup = BeautifulSoup(html,'html.parser') print(soup.prettify()) ''' [Mac으로 html 파일 열기] 각 검색 엔진 이름을 클릭하면 해당 사이트로 이동 *참조 문서 Beautiful Soup Documentation — Beautiful Soup 4.9.0 documentation Non-pretty printing If you just want a string, with no fancy formatting.. 2021. 4. 12.

[빅데이터 분석] 크롤링 - (1) 크롤링이란? 무수히 많은 컴퓨터에 분산 저장되어 있는 문서를 수집하여 검색 대상의 색인으로 포함시키는 기술. API 기반 정보수집 프로그램을 위한 정보 제공 API가 존재 1. Naver Crawling NAVER Developers 네이버 오픈 API들을 활용해 개발자들이 다양한 애플리케이션을 개발할 수 있도록 API 가이드와 SDK를 제공합니다. 제공중인 오픈 API에는 네이버 로그인, 검색, 단축URL, 캡차를 비롯 기계번역, 음 developers.naver.com 1 ) 개발자 센터 가입 2 ) 서비스 api 선택 3 ) 오픈 api 이용 신청 [예시] 파이썬 urllib 패키지로 웹 크롤링 1 ) "인공지능"과 관련된 블로그 검색 (검색 API 이용) ''' import os import sy.. 2021. 4. 10.

이전 1 다음

728x90

티스토리툴바