你如何提取使用的Django从beautifulsoup JSON数据

人气:316 发布:2022-09-16 标签: json django-views beautifulsoup

问题描述

美好的一天。我现在面临的一个问题,而试图从JSON中提取值。首先我所有beautifulsoup作品非常精致的外壳,而不是在Django。也就是我试图实现从接收JSON中提取数据,但没有成功。下面是在我看来,这样做的类:

Good day. I'm facing an issue while trying to extract values from json. First of all my beautifulsoup works very fine in the shell, but not in django. and also what I'm trying to achieve is extracting data from the received json, but with no success. Here's the class in my view doing it:

class FetchWeather(generic.TemplateView):
    template_name = 'forecastApp/pages/weather.html'

    def get_context_data(self, **kwargs):
        context = super().get_context_data(**kwargs)
        url = 'http://weather.news24.com/sa/cape-town'
        city = 'cape town'
        url_request = requests.get(url)
        soup = BeautifulSoup(url_request.content, 'html.parser')
        city_list = soup.find(id="ctl00_WeatherContentHolder_ddlCity")
        print(soup.head)
        city_as_on_website = city_list.find(text=re.compile(city, re.I)).parent
        cityId = city_as_on_website['value']
        json_url = "http://weather.news24.com/ajaxpro/TwentyFour.Weather.Web.Ajax,App_Code.ashx"

        headers = {
            'Content-Type': 'text/plain; charset=UTF-8',
            'Host': 'weather.news24.com',
            'Origin': 'http://weather.news24.com',
            'Referer': url,
            'User-Agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/48.0.2564.82 Chrome/48.0.2564.82 Safari/537.36',
            'X-AjaxPro-Method': 'GetCurrentOne'}

        payload = {
            "cityId": cityId
        }
        request_post = requests.post(json_url, headers=headers, data=json.dumps(payload))
        print(request_post.content)
        context['Observations'] = request_post.content
        return context

在JSON,有一个阵列意见从我试图让城市名称,温度高和低。

In the json, there's a array "Observations" from which I'm trying to get the city name, the temperature high and low.

但是当我试图做到这一点:

but when I tried to do this:

cityDict = json.loads(str(html))

我收到一个错误。下面是回溯到它:

I'm receiving an error. Here's the traceback to it:

 Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 4067 (char 4066)

任何帮助将非常乐意AP preciated。

any help will be gladly appreciated.

推荐答案

有两个问题您的JSON数据在 request_post.content

There are two problems with your JSON data inside request_post.content:

有JS日期对象值出现,例如:

there are JS date object values there, for instance:

"Date":new Date(Date.UTC(2016,1,26,22,0,0,0))

有结尾不需要的字符:; / *

让我们清理JSON数据,以便它可以与 JSON 加载:

Let's clean the JSON data so that it can be loaded with json:

from datetime import datetime

data = request_post.text

def convert_date(match):
    return '"' + datetime(*map(int, match.groups())).strftime("%Y-%m-%dT%H:%M:%S") + '"'

data = re.sub(r"new Date\(Date\.UTC\((\d+),(\d+),(\d+),(\d+),(\d+),(\d+),(\d+)\)\)",
              convert_date,
              data)

data = data.strip(";/*")
data = json.loads(data)

context['Observations'] = data

613