Spark Scala-更新/添加Json对象中的新列并使用新的数据框值更新

人气:704 发布:2022-10-16 标签: json struct scala dataframe apache-spark

问题描述

我想用另一个json对象的内容更新现有json对象中的对象数组.

I want to update an array of objects within an existing json object with the content from another json object.

初始对象:

{
    "user": "gT35Hhhre9m",
    "date": "2016-01-29",
    "status": "OK",
    "reason": "some reason",
    "content": [
        {
            "foo": 123,
            "bar": "val1"
        }
    ]
}

补充对象:

{
    "id": "gT35Hhhre9m"
}

合并后对象结构:

{
    "user": "gT35Hhhre9m",
    "date": "2016-01-29",
    "status": "OK",
    "reason": "some reason",
    "content": [{
        "foo": 123,
        "bar": "val1"
        "id": "gT35Hhhre9m"
    }]
}

推荐答案

展开初始对象",并将Spark数据帧视为柱状 数据类似于SQL表. 完成转换 转换为JSON作为Spark数据框. Flatten the "Initial object" and treat Spark dataframes as columnar data similar to a SQL table. Complete transformations Convert back to Spark dataframes as JSON.

不要将数据帧视为JSON是窍门.

Not thinking dataframe as JSON is the trick.

246