问题描述
我有此数据帧:
df1 <- structure(list(ID = c(1, 2, 2, 2, 3, 4, 5, 6, 6, 7, 8, 8, 9,
10), dateA = structure(c(14974, 18628, 18628, 18628, 14882, 16800,
14882, 17835, 17835, 16832, 16556, 16556, 15949, 16801), class = "Date"),
dateB = structure(c(14610, 15340, 15706, 17501, 14730, NA,
14700, 16191, 17106, 16801, 15810, 16436, 14655, 15431), class = "Date"),
dateC = structure(c(18628, 15705, 17500, 18628, 18628, NA,
18628, 17105, 18628, 18628, 16435, 16556, 15706, 18628), class = "Date")), row.names = c(NA,
-14L), class = c("data.table", "data.frame"))
ID dateA dateB dateC
1: 1 2010-12-31 2010-01-01 2021-01-01
2: 2 2021-01-01 2012-01-01 2012-12-31
3: 2 2021-01-01 2013-01-01 2017-11-30
4: 2 2021-01-01 2017-12-01 2021-01-01
5: 3 2010-09-30 2010-05-01 2021-01-01
6: 4 2015-12-31 <NA> <NA>
7: 5 2010-09-30 2010-04-01 2021-01-01
8: 6 2018-10-31 2014-05-01 2016-10-31
9: 6 2018-10-31 2016-11-01 2021-01-01
10: 7 2016-02-01 2016-01-01 2021-01-01
11: 8 2015-05-01 2013-04-15 2014-12-31
12: 8 2015-05-01 2015-01-01 2015-05-01
13: 9 2013-09-01 2010-02-15 2013-01-01
14: 10 2016-01-01 2012-04-01 2021-01-01
我想检查一下日期A是否在日期B和日期C之间:
我的代码:
library(dplyr)
df1 %>%
mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))
给予:
Error: Problem with `mutate()` column `match`.
i `match = ifelse(between(dateA, dateB, dateC), 1, 0)`.
x Not yet implemented NAbounds=TRUE for this non-numeric and non-character type
如果我删除包含NA
的行,代码将正常工作:
df1 %>%
slice(-6) %>%
mutate(match= ifelse(between(dateA, dateB, dateC), 1, 0))
我想知道,我是否可以离开NA
行并执行代码?
推荐答案
不清楚使用的是between
OP,因为输入对象是data.table
,而使用的代码是dplyr
。因此,如果我们假设两个包都已加载,则在每个包中都有一个between
函数,并且根据最后加载的包的不同,前一个包中的between
将被屏蔽。如果使用dplyr::between
,则它未完全矢量化,并在?dplyr::between
左、右边界值(必须是标量)。
df1 %>%
rowwise %>%
mutate(match = +(dplyr::between(dateA, dateB, dateC))) %>%
ungroup
-输出
# A tibble: 14 × 5
ID dateA dateB dateC match
<dbl> <date> <date> <date> <int>
1 1 2010-12-31 2010-01-01 2021-01-01 1
2 2 2021-01-01 2012-01-01 2012-12-31 0
3 2 2021-01-01 2013-01-01 2017-11-30 0
4 2 2021-01-01 2017-12-01 2021-01-01 1
5 3 2010-09-30 2010-05-01 2021-01-01 1
6 4 2015-12-31 NA NA NA
7 5 2010-09-30 2010-04-01 2021-01-01 1
8 6 2018-10-31 2014-05-01 2016-10-31 0
9 6 2018-10-31 2016-11-01 2021-01-01 1
10 7 2016-02-01 2016-01-01 2021-01-01 1
11 8 2015-05-01 2013-04-15 2014-12-31 0
12 8 2015-05-01 2015-01-01 2015-05-01 1
13 9 2013-09-01 2010-02-15 2013-01-01 0
14 10 2016-01-01 2012-04-01 2021-01-01 1
然而,?data.table::between
并非如此(根据OP的帖子中显示的错误,似乎使用的between
来自data.table
,
下限-范围下限。长度为1或与x相同的长度。
上界-上界。长度为1或与x相同的长度。
但class
可能是个问题,尽管它另有说明
x-任何可排序的向量,即<=
有相关方法的向量,如果是BETWING,则是数值,如果是InRange,则是数值向量。
从Date
类转换为integer/numeric
,应该可以使用
df1 %>%
mutate(match = +(data.table::between(as.numeric(dateA),
as.numeric(dateB), as.numeric(dateC))))
-输出
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> 1
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1
深入研究后,问题出在参数NAbounds
中,该参数默认情况下为TRUE
。在OP的数据中,有一个NA
元素
df1 %>%
mutate(match = data.table::between(dateA, dateB, dateC))
错误:mutate()
列match
出错。
ℹmatch = data.table::between(dateA, dateB, dateC)
。
对于此非数字和非字符类型,尚未实现✖=TRUE
运行rlang::last_error()
以查看错误发生的位置。
我们可能需要将其设置为FALSE
df1 %>%
mutate(match = +(data.table::between(dateA, dateB, dateC, NAbounds = FALSE)))
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> NA
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1
或也可以使用as.Date
在NA
上执行转换
df1 %>%
mutate(match = +(data.table::between(dateA, dateB, dateC,
NAbounds = as.Date(NA))))
ID dateA dateB dateC match
1: 1 2010-12-31 2010-01-01 2021-01-01 1
2: 2 2021-01-01 2012-01-01 2012-12-31 0
3: 2 2021-01-01 2013-01-01 2017-11-30 0
4: 2 2021-01-01 2017-12-01 2021-01-01 1
5: 3 2010-09-30 2010-05-01 2021-01-01 1
6: 4 2015-12-31 <NA> <NA> NA
7: 5 2010-09-30 2010-04-01 2021-01-01 1
8: 6 2018-10-31 2014-05-01 2016-10-31 0
9: 6 2018-10-31 2016-11-01 2021-01-01 1
10: 7 2016-02-01 2016-01-01 2021-01-01 1
11: 8 2015-05-01 2013-04-15 2014-12-31 0
12: 8 2015-05-01 2015-01-01 2015-05-01 1
13: 9 2013-09-01 2010-02-15 2013-01-01 0
14: 10 2016-01-01 2012-04-01 2021-01-01 1