（三）百度AI 开放平台API调用之应用实践

xiaoxiao2021-02-28 108

选择调用的接口，eg:

词法分析接口

仔细阅读官方接口说明

接口描述请求说明 HTTP方法: POST（通用版）请求URL: https://aip.baidubce.com/rpc/2.0/nlp/v1/lexer（定制版）请求URL: https://aip.baidubce.com/rpc/2.0/nlp/v1/lexer_customURL参数：参数值access_token通过API Key和Secret Key获取的access_token,参考“Access Token获取” Header如下：参数值Content-Typeapplication/json

body请求示例:

{ "text": "百度是一家高科技公司" }

请求参数

参数名称类型详细说明textstring待分析文本（目前仅支持GBK编码），长度不超过20000字节

POST方式调用

注意：要求使用JSON格式的结构体来描述一个请求的具体内容。**发送时默认需要对body整体进行GBK编码。**若使用UTF-8编码，请在url参数中添加charset=UTF-8 （大小写敏感）例如：https://aip.baidubce.com/rpc/2.0/nlp/v1/lexer?charset=UTF-8&access_token=24.f9ba9c5241b67688bb4adbed8bc91dec.2592000.1485570332.282335-8574074

返回格式

JSON格式，返回内容为GBK编码

这些注意事项提醒我们要注意编码的问题。

法一：使用使用urllib时会返回如下错误，搜索解决方法是因为传入数据是字典需要编码。post_data=json.dumps(post_data).encode('GBK') 282004invalid parameter(s)请求中包含非法参数，请检查后重新尝试

也不说怎么错了，上正确的代码吧。

#!/ Mypython # -*- coding: utf-8 -*- # @Time : 2018/3/28 16:48 # @Author : LinYimeng # @File : baidulearn.py # @Software: PyCharm import urllib import urllib3 import json ###第一步：获取access_token # client_id 为官网获取的AK， client_secret 为官网获取的SK host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=pd152AVsnYc6Nbfk5Yuh9XTh&client_secret=ZTn7ASCKAGgPFEFmMxszEoQmAlVb2LRM' request = urllib.request.Request(host) request.add_header('Content-Type', 'application/json; charset=UTF-8') response = urllib.request.urlopen(request) content = response.read() if (content): print(type(content))#<class 'bytes'> content_str=str(content, encoding="utf-8") ### content_dir = eval(content_str) access_token = content_dir['access_token'] ####第二部分调用 ###api地址 import sys print(sys.getdefaultencoding()) http=urllib3.PoolManager() url = "https://aip.baidubce.com/rpc/2.0/nlp/v1/lexer?access_token="+access_token data = {"text":"今天北京天气怎么样？"} encodedata=json.dumps(data).encode('GBK') request = urllib.request.Request(url, encode_data) request.add_header('Content-Type', 'application/json') response = urllib.request.urlopen(request) content = response.read() # content是一个utf-8格式的<class 'bytes'> #bytes ⇒ str：str(b, encoding='utf-8') content_str = str(content, encoding="gbk") # content被编码为gbk格式的字节串，赋给content_str print(type(content_str)) if (content_str): print(content_str)

法二：使用使用urllib3

#!/ Mypython # -*- coding: utf-8 -*- # @Time : 2018/3/28 16:48 # @Author : LinYimeng # @File : baidulearn.py # @Software: PyCharm import urllib import urllib3 import json ###第一步：获取access_token # client_id 为官网获取的AK， client_secret 为官网获取的SK host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=pd152AVsnYc6Nbfk5Yuh9XTh&client_secret=ZTn7ASCKAGgPFEFmMxszEoQmAlVb2LRM' request = urllib.request.Request(host) request.add_header('Content-Type', 'application/json; charset=UTF-8') response = urllib.request.urlopen(request) content = response.read() if (content): print(type(content))#<class 'bytes'> content_str=str(content, encoding="utf-8") ### content_dir = eval(content_str) access_token = content_dir['access_token'] ####第二部分调用 ###api地址 import sys print(sys.getdefaultencoding()) http=urllib3.PoolManager() url = "https://aip.baidubce.com/rpc/2.0/nlp/v1/lexer?access_token="+access_token print(url) data ={"text":"今天北京天气怎么样？"} encode_data= json.dumps(data).encode('GBK') #JSON:在发起请求时,可以通过定义body 参数并定义headers的Content-Type参数来发送一个已经过编译的JSON数据： request = http.request('POST', url, body=encode_data, headers={'Content-Type':'application/json'} ) result = str(request.data,'GBK') print(result)

结果：{"log_id": 7265183527788840706, "text": "今天北京天气怎么样？", "items": [{"loc_details": [], "byte_offset": 0, "uri": "", "pos": "", "ne": "TIME", "item": "今天", "basic_words": ["今天"], "byte_length": 4, "formal": ""}, {"loc_details": [], "byte_offset": 4, "uri": "", "pos": "", "ne": "LOC", "item": "北京", "basic_words": ["北京"], "byte_length": 4, "formal": ""}, {"loc_details": [], "byte_offset": 8, "uri": "", "pos": "n", "ne": "", "item": "天气", "basic_words": ["天气"], "byte_length": 4, "formal": ""}, {"loc_details": [], "byte_offset": 12, "uri": "", "pos": "r", "ne": "", "item": "怎么样", "basic_words": ["怎么", "样"], "byte_length": 6, "formal": ""}, {"loc_details": [], "byte_offset": 18, "uri": "", "pos": "w", "ne": "", "item": "？", "basic_words": ["？"], "byte_length": 2, "formal": ""}]}

可以是识别到“昆明”是地名等信息。

2018年6月20:摸索的新的代码,更清晰的显示。

import urllib import urllib3 import json import urllib.request lawText="""昆明，云南省辖下地级市，地处云贵高原中部，北与凉山彝族自治州相连，西南与玉溪市、东南与红河哈尼族彝族自治州毗邻，西与楚雄彝族自治州接壤，东与曲靖市交界，是滇中城市群的核心圈、亚洲5小时航空圈的中心，国家一级物流园区布局城市之一。 """#也可以以读入的形式传入 def baiduNER(myData,APIurl,access_token): url = APIurl+access_token data ={"text":myData} encode_data= json.dumps(data).encode('GBK') http = urllib3.PoolManager() #JSON:在发起请求时,可以通过定义body 参数并定义headers的Content-Type参数来发送一个已经过编译的JSON数据： request = http.request('POST', url, body=encode_data, headers={'Content-Type':'application/json'} ) result = str(request.data,"GBK") result_dir = eval(result) NNP ={} if "items" in result_dir.keys(): print(result_dir.setdefault("items")) for eachWord in result_dir.setdefault("items"): if eachWord['ne']!="": word = eachWord.get("item") nnP = eachWord['ne'] NNP[word] = nnP return NNP def main(): #百度AI开放平台通行证 access_token ="2###################我的呦2308" ##具体接口的链接 APIurl = "https://aip.baidubce.com/rpc/2.0/nlp/v1/lexer?access_token=" # myData = ' '.join(each for each in lawText.split()) #一句话短文本 # NNP = baiduNER(myData,APIurl,access_token) #一句话短文本 myDataAll =lawText.split("\n") #长短文本均可 NNP ={} for each in myDataAll: NNPnew =baiduNER(each,APIurl,access_token) NNP.update(NNPnew) for key in NNP.keys(): print(key, ':', NNP[key]) if __name__ == '__main__': main()

说明一下：("PER 人名 LOC 地名 ORG 机构名 TIME 时间")

结果：

玉溪市 : LOC红河哈尼族彝族自治州 : LOC云南省 : LOC亚洲 : LOC云贵高原 : LOC西南 : LOC楚雄彝族自治州 : LOC曲靖市 : LOC5小时 : TIME一级物流园区 : LOC凉山彝族自治州 : LOC滇中城市群 : LOC昆明 : LOC

转载请注明原文地址: https://www.6miu.com/read-2620240.html

技术

最新回复(0)