Python 網路爬蟲講座 2015/05/30
- 取得連結
- X
- 以電子郵件傳送
- 其他應用程式
爬蟲就是~ 用程式來模仿瀏覽器的使用模式
爬Dcard網站試試看
網頁右鍵->檢查元素-> Network ->Preserve log打開
data:image/s3,"s3://crabby-images/a7ed1/a7ed163329ef18cb346bab9b9c5fa79b51c4f072" alt=""
把這個貼到瀏覽器裡面
data:image/s3,"s3://crabby-images/3140b/3140bc21fff934b96bc35bd01ea26dd7d762ec30" alt=""
看起來很醜
我去裝了JsonView套件
data:image/s3,"s3://crabby-images/008d3/008d373ed43686f334197d0709fc1e6616762c06" alt=""
data:image/s3,"s3://crabby-images/0fd87/0fd873cceabc7e77bb90beb811520fc2094b9d3c" alt=""
剛剛的Dcard是用GET
所以直接貼網址到瀏覽器上就可以看到資料
但是如果是7-11的
data:image/s3,"s3://crabby-images/bf5a4/bf5a408a13c6d079d83d8e5fdd0d832d42027927" alt=""
data:image/s3,"s3://crabby-images/42f47/42f47f5723e58b8f752a606c238027593f6f47f9" alt=""
data:image/s3,"s3://crabby-images/a6edb/a6edbe72abe130b62bc4d9dbe4d59ebdba698089" alt=""
貼上去就沒有QQ
GET vs POST
data:image/s3,"s3://crabby-images/00747/0074728dff5b5cb8078115eb3000e80bc8baad90" alt=""
data:image/s3,"s3://crabby-images/49741/49741e37309fcbf3ce6ecf51c9b6fc5e5a3b68bb" alt=""
====================================================
來看看裡面的架構
data:image/s3,"s3://crabby-images/324ed/324eda1766ff628b6387754d4387dda53b76251e" alt=""
去seven網站選停車場跟廁所的filter
然後去看
把Form Data view source點一下會看到這串
data:image/s3,"s3://crabby-images/a70f4/a70f4cf0e8554daf81fa82d80e9426370ff20d50" alt=""
data:image/s3,"s3://crabby-images/28f23/28f23436233d53a4cf10ad1caa37466b34c62a00" alt=""
把這串貼在
commandid=SearchStore&city=%E8%8A%B1%E8%93%AE%E7%B8%A3&town=%E5%A3%BD%E8%B1%90%E9%84%89&roadname=&ID=&StoreName=&SpecialStore_Kind=&isDining=False&isParking=True&isLavatory=True&isATM=False&is7WiFi=False&isIce=False&isHotDog=False&isHealthStations=False&isIceCream=False&isOpenStore=False&isFruit=False&isCityCafe=False&address=
把前面這整串貼到 firefox 的hackbar裡面
data:image/s3,"s3://crabby-images/61399/6139945b1bc453eb2476ad994868a8293eb05d72" alt=""
去寫個python來抓這些資料
- import requests
- import lxml
- import lxml.etree
- data = {"commandid":"SearchStore","city":"台北市","town":"大安區"}
- res = requests.post("http://emap.pcsc.com.tw/EMapSDK.aspx",data=data)
- print res.content
- elem=lxml.etree.fromstring(res.content)
- elem.xpath("//Address/text()")
- for e in elem.xpath("//Address/text()"):
- print e
阿!忘了encode成UTF-8
data:image/s3,"s3://crabby-images/286d9/286d9c25eba4a2536733b855edf26361af409a7b" alt=""
最上面加一排 # conding=UTF-8
data:image/s3,"s3://crabby-images/db078/db078be93bcabb535ea47750680f058979f7da56" alt=""
抓資料成功!!
- 取得連結
- X
- 以電子郵件傳送
- 其他應用程式
留言
張貼留言