日期:对于此非数字和非字符类型,尚未实现NAbound=True

人气:257 发布:2022-10-16 标签: r dplyr na

问题描述

我有此数据帧:

df1 <- structure(list(ID = c(1, 2, 2, 2, 3, 4, 5, 6, 6, 7, 8, 8, 9, 
10), dateA = structure(c(14974, 18628, 18628, 18628, 14882, 16800, 
14882, 17835, 17835, 16832, 16556, 16556, 15949, 16801), class = "Date"), 
dateB = structure(c(14610, 15340, 15706, 17501, 14730, NA, 
14700, 16191, 17106, 16801, 15810, 16436, 14655, 15431), class = "Date"), 
dateC = structure(c(18628, 15705, 17500, 18628, 18628, NA, 
18628, 17105, 18628, 18628, 16435, 16556, 15706, 18628), class = "Date")), row.names = c(NA, 
-14L), class = c("data.table", "data.frame"))

    ID      dateA      dateB      dateC
 1:  1 2010-12-31 2010-01-01 2021-01-01
 2:  2 2021-01-01 2012-01-01 2012-12-31
 3:  2 2021-01-01 2013-01-01 2017-11-30
 4:  2 2021-01-01 2017-12-01 2021-01-01
 5:  3 2010-09-30 2010-05-01 2021-01-01
 6:  4 2015-12-31       <NA>       <NA>
 7:  5 2010-09-30 2010-04-01 2021-01-01
 8:  6 2018-10-31 2014-05-01 2016-10-31
 9:  6 2018-10-31 2016-11-01 2021-01-01
10:  7 2016-02-01 2016-01-01 2021-01-01
11:  8 2015-05-01 2013-04-15 2014-12-31
12:  8 2015-05-01 2015-01-01 2015-05-01
13:  9 2013-09-01 2010-02-15 2013-01-01
14: 10 2016-01-01 2012-04-01 2021-01-01
我想检查一下日期A是否在日期B和日期C之间: 我的代码:

library(dplyr)
df1 %>% 
  mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))

给予:

Error: Problem with `mutate()` column `match`.
i `match = ifelse(between(dateA, dateB, dateC), 1, 0)`.
x Not yet implemented NAbounds=TRUE for this non-numeric and non-character type

如果我删除包含NA的行,代码将正常工作:

df1 %>% 
  slice(-6) %>% 
  mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))

我想知道,我是否可以离开NA行并执行代码?

推荐答案

不清楚使用的是betweenOP,因为输入对象是data.table,而使用的代码是dplyr。因此,如果我们假设两个包都已加载,则在每个包中都有一个between函数,并且根据最后加载的包的不同,前一个包中的between将被屏蔽。如果使用dplyr::between,则它未完全矢量化,并在?dplyr::between

中记录

左、右边界值(必须是标量)。

df1 %>%
    rowwise %>% 
    mutate(match = +(dplyr::between(dateA, dateB, dateC))) %>%
    ungroup

-输出

# A tibble: 14 × 5
      ID dateA      dateB      dateC      match
   <dbl> <date>     <date>     <date>     <int>
 1     1 2010-12-31 2010-01-01 2021-01-01     1
 2     2 2021-01-01 2012-01-01 2012-12-31     0
 3     2 2021-01-01 2013-01-01 2017-11-30     0
 4     2 2021-01-01 2017-12-01 2021-01-01     1
 5     3 2010-09-30 2010-05-01 2021-01-01     1
 6     4 2015-12-31 NA         NA            NA
 7     5 2010-09-30 2010-04-01 2021-01-01     1
 8     6 2018-10-31 2014-05-01 2016-10-31     0
 9     6 2018-10-31 2016-11-01 2021-01-01     1
10     7 2016-02-01 2016-01-01 2021-01-01     1
11     8 2015-05-01 2013-04-15 2014-12-31     0
12     8 2015-05-01 2015-01-01 2015-05-01     1
13     9 2013-09-01 2010-02-15 2013-01-01     0
14    10 2016-01-01 2012-04-01 2021-01-01     1

然而,?data.table::between并非如此(根据OP的帖子中显示的错误,似乎使用的between来自data.table

下限-范围下限。长度为1或与x相同的长度。

上界-上界。长度为1或与x相同的长度。

class可能是个问题,尽管它另有说明

x-任何可排序的向量,即<=有相关方法的向量,如果是BETWING,则是数值,如果是InRange,则是数值向量。

Date类转换为integer/numeric,应该可以使用

df1 %>%
   mutate(match = +(data.table::between(as.numeric(dateA), 
       as.numeric(dateB), as.numeric(dateC))))

-输出

ID      dateA      dateB      dateC match
 1:  1 2010-12-31 2010-01-01 2021-01-01     1
 2:  2 2021-01-01 2012-01-01 2012-12-31     0
 3:  2 2021-01-01 2013-01-01 2017-11-30     0
 4:  2 2021-01-01 2017-12-01 2021-01-01     1
 5:  3 2010-09-30 2010-05-01 2021-01-01     1
 6:  4 2015-12-31       <NA>       <NA>     1
 7:  5 2010-09-30 2010-04-01 2021-01-01     1
 8:  6 2018-10-31 2014-05-01 2016-10-31     0
 9:  6 2018-10-31 2016-11-01 2021-01-01     1
10:  7 2016-02-01 2016-01-01 2021-01-01     1
11:  8 2015-05-01 2013-04-15 2014-12-31     0
12:  8 2015-05-01 2015-01-01 2015-05-01     1
13:  9 2013-09-01 2010-02-15 2013-01-01     0
14: 10 2016-01-01 2012-04-01 2021-01-01     1

深入研究后,问题出在参数NAbounds中,该参数默认情况下为TRUE。在OP的数据中,有一个NA元素

df1 %>% 
    mutate(match = data.table::between(dateA, dateB, dateC))
错误:mutate()match出错。 ℹmatch = data.table::between(dateA, dateB, dateC)。 对于此非数字和非字符类型,尚未实现✖=TRUE 运行rlang::last_error()以查看错误发生的位置。

我们可能需要将其设置为FALSE

df1 %>% 
   mutate(match = +(data.table::between(dateA, dateB, dateC, NAbounds = FALSE)))
    ID      dateA      dateB      dateC match
 1:  1 2010-12-31 2010-01-01 2021-01-01     1
 2:  2 2021-01-01 2012-01-01 2012-12-31     0
 3:  2 2021-01-01 2013-01-01 2017-11-30     0
 4:  2 2021-01-01 2017-12-01 2021-01-01     1
 5:  3 2010-09-30 2010-05-01 2021-01-01     1
 6:  4 2015-12-31       <NA>       <NA>    NA
 7:  5 2010-09-30 2010-04-01 2021-01-01     1
 8:  6 2018-10-31 2014-05-01 2016-10-31     0
 9:  6 2018-10-31 2016-11-01 2021-01-01     1
10:  7 2016-02-01 2016-01-01 2021-01-01     1
11:  8 2015-05-01 2013-04-15 2014-12-31     0
12:  8 2015-05-01 2015-01-01 2015-05-01     1
13:  9 2013-09-01 2010-02-15 2013-01-01     0
14: 10 2016-01-01 2012-04-01 2021-01-01     1

或也可以使用as.DateNA上执行转换

df1 %>% 
    mutate(match = +(data.table::between(dateA, dateB, dateC, 
         NAbounds = as.Date(NA))))
    ID      dateA      dateB      dateC match
 1:  1 2010-12-31 2010-01-01 2021-01-01     1
 2:  2 2021-01-01 2012-01-01 2012-12-31     0
 3:  2 2021-01-01 2013-01-01 2017-11-30     0
 4:  2 2021-01-01 2017-12-01 2021-01-01     1
 5:  3 2010-09-30 2010-05-01 2021-01-01     1
 6:  4 2015-12-31       <NA>       <NA>    NA
 7:  5 2010-09-30 2010-04-01 2021-01-01     1
 8:  6 2018-10-31 2014-05-01 2016-10-31     0
 9:  6 2018-10-31 2016-11-01 2021-01-01     1
10:  7 2016-02-01 2016-01-01 2021-01-01     1
11:  8 2015-05-01 2013-04-15 2014-12-31     0
12:  8 2015-05-01 2015-01-01 2015-05-01     1
13:  9 2013-09-01 2010-02-15 2013-01-01     0
14: 10 2016-01-01 2012-04-01 2021-01-01     1

677