你如何提取使用的Django从beautifulsoup JSON数据

人气:316 发布:2022-09-16 标签: json django-views beautifulsoup



Good day. I'm facing an issue while trying to extract values from json. First of all my beautifulsoup works very fine in the shell, but not in django. and also what I'm trying to achieve is extracting data from the received json, but with no success. Here's the class in my view doing it:

class FetchWeather(generic.TemplateView):
    template_name = 'forecastApp/pages/weather.html'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        url = 'http://weather.news24.com/sa/cape-town'
        city = 'cape town'
        url_request = requests.get(url)
        soup = BeautifulSoup(url_request.content, 'html.parser')
        city_list = soup.find(id="ctl00_WeatherContentHolder_ddlCity")
        city_as_on_website = city_list.find(text=re.compile(city, re.I)).parent
        cityId = city_as_on_website['value']
        json_url = "http://weather.news24.com/ajaxpro/TwentyFour.Weather.Web.Ajax,App_Code.ashx"

        headers = {
            'Content-Type': 'text/plain; charset=UTF-8',
            'Host': 'weather.news24.com',
            'Origin': 'http://weather.news24.com',
            'Referer': url,
            'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
            'X-AjaxPro-Method': 'GetCurrentOne'}

        payload = {
            "cityId": cityId
        request_post = requests.post(json_url, headers=headers, data=json.dumps(payload))
        context['Observations'] = request_post.content
        return context


In the json, there's a array "Observations" from which I'm trying to get the city name, the temperature high and low.


but when I tried to do this:

cityDict = json.loads(str(html))


I'm receiving an error. Here's the traceback to it:

 Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 4067 (char 4066)

任何帮助将非常乐意AP preciated。

any help will be gladly appreciated.


有两个问题您的JSON数据在 request_post.content

There are two problems with your JSON data inside request_post.content:


there are JS date object values there, for instance:

"Date":new Date(Date.UTC(2016,1,26,22,0,0,0))

有结尾不需要的字符:; / *

让我们清理JSON数据,以便它可以与 JSON 加载:

Let's clean the JSON data so that it can be loaded with json:

from datetime import datetime

data = request_post.text

def convert_date(match):
    return '"' + datetime(*map(int, match.groups())).strftime("%Y-%m-%dT%H:%M:%S") + '"'

data = re.sub(r"new Date\(Date\.UTC\((\d+),(\d+),(\d+),(\d+),(\d+),(\d+),(\d+)\)\)",

data = data.strip(";/*")
data = json.loads(data)

context['Observations'] = data