描述:从PPBC网站上搜索到物种后,因为不能直接批量下载地理位置信息,因此选择代码直接从网页上获取地址。
1. 搜索羊踯躅,获得以下网址:https://ppbc.iplant.cn/sp/25162

2. 利用python脚本(get_address.py)获取网址信息,并对获得的信息进行去重
import requests
from bs4 import BeautifulSoup
import pandas as pd
# 发送GET请求
try:
index = 1
result = []
while True:
response = requests.get('https://ppbc.iplant.cn/ashx/getphotopage.ashx?page='
+ str(index) + '&n=2&group=sp&cid=25162')
response.raise_for_status() # 检查HTTP状态码
html = response.text
if len(html) == 0:
break
# 1. 解析HTML
soup = BeautifulSoup(html, 'html.parser')
items = soup.find_all('div', class_='item3 masonry_brick')
for item in items
# 提取图片URL
span = item.find('span')
username = span.find('a').text.strip()
location = span.text.replace(username, '').replace('<font>@</font>', '').replace("@", "").strip()
result.append({'作者': username, '地址': location})
index += 1
print()
# 将字典转换为元组集合去重
unique_tuples = set(tuple(d.items()) for d in result)
# 转换回字典列表
unique_dicts = [dict(t) for t in unique_tuples]
print(unique_dicts)
df = pd.DataFrame(unique_dicts)
# 导出到Excel文件
df.to_excel("output.xlsx", index=False)
except requests.exceptions.Timeout:
print("请求超时,请检查网络连接")
3. 运行脚本,即可获得所有的地理信息
$ python get_address.py