问题描述
考虑下面的MWE,其中Amt
表示每个Food
项的不同数量(从1到40),另一个变量表示该食品项的Site
。我想要食品的汇总中位数和计数n()
,但没有NA
的。
MWE
mwe <- data.frame(
Site = sample(rep(c("Home", "Office"), size = 884)),
Food = sample(rep(c("Banana","Apple","Egg","Berry","Tomato","Potato","Bean","Pea","Nuts","Onion","Carrot","Cabbage","Eggplant"), size=884)),
Amt = sample(seq(1, 40, by = 0.25), size = 884, replace = TRUE)
)
random <- sample(seq(1, 884, by = 1), size = 100, replace = TRUE) # to randomly introduce 100 NAs to Amt vector
mwe$Amt[random] <- NA
数据框
Site Food Amt
1 Office Cabbage 16.50
2 Home Apple 36.00
3 Office Egg 7.25
4 Home Onion 16.00
5 Office Eggplant 36.50
6 Home Nuts NA
汇总代码
dfsummary <- mwe %>%
dplyr::group_by(Food, Site) %>%
dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2), N = n()) %>%
ungroup()
输出
# A tibble: 6 x 4
Food Site Median N
<fct> <fct> <dbl> <int>
1 Apple Home 17 34
2 Apple Office 22.2 34
3 Banana Home 19.5 34
4 Banana Office 19.9 34
5 Bean Home 20 34
6 Bean Office 18 34
一些食品显示NA值,但它们在N
计数中取得了进展。我只是不想计算NA
向量中有NA
s的那些。
推荐答案
我们可以在顶部filter
,然后执行summarise
而不更改代码
library(dplyr)
mwe %>%
filter(!is.na(Amt)) %>%
dplyr::group_by(Food, Site) %>%
dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2),
N = n()) %>%
ungroup()
或其他选项是将n()
更改为sum(!is.na(Amt))
mwe %>%
dplyr::group_by(Food, Site) %>%
dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2),
N = sum(!is.na(Amt))) %>%
ungroup()