图表错误(.....):'UTF8TOWCS'中的输入无效。

人气:504 发布:2022-10-16 标签: special-characters utf-8 r

问题描述

我需要您的帮助,因为尝试不同的方法也会出现相同的错误。我想要从数据帧中删除特殊字符,如从数据帧中删除特殊字符。 谢谢!

首先我尝试这样做:

trata<-function(Campo){
  Campo<-Campo %>% chartr('ÇÆ£ØÞß&@Ð','XXXXXXXXX',.) %>%
    str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
    str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕÑ','AEIOUAEIOUAEIOUAEIOUAAOX', .)
  return(Campo)
}


trataRS<-function(Campo){
  Campo<-Campo %>% chartr('ÇÆ£ØÞßÐ','XXXXXXXXX',.) %>%
    str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
    str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕ','AEIOUAEIOUAEIOUAEIOUAAO', .)
  return(Campo)
}

然后我将这些函数应用于:

Base$paterno_originador<-trata(Base$paterno_originador)
Base$razon_originador <- trataRS(Base$razon_originador)

但我收到此错误:

Error in chartr("ÇÆ£ØÞßÐ","XXXXXXXXX",.) : invalid input 'HÉCTOR" in 'utftowcs'

所以我尝试了从@alexandre_lima:

找到的另一种方法
rm_accent <- function(str,pattern="all") {
  if(!is.character(str))
    str <- as.character(str)
  
  pattern <- unique(pattern)
  
  if(any(pattern=="Ç"))
    pattern[pattern=="Ç"] <- "ç"
  
  symbols <- c(
    acute = "áéíóúÁÉÍÓÚýÝ",
    grave = "àèìòùÀÈÌÒÙ",
    circunflex = "âêîôûÂÊÎÔÛ",
    tilde = "ãõÃÕñÑ",
    umlaut = "äëïöüÄËÏÖÜÿ",
    cedil = "çÇ"
  )
  
  nudeSymbols <- c(
    acute = "aeiouAEIOUyY",
    grave = "aeiouAEIOU",
    circunflex = "AEIOUAEIOU",
    tilde = "AOAOXX",
    umlaut = "AEIOUAEIOUX",
    cedil = "XX"
  )
  
  accentTypes <- c("´","`","^","~","¨","ç")
  
  if(any(c("all","al","a","todos","t","to","tod","todo")%in%pattern)) # opcao retirar todos
    return(chartr(paste(symbols, collapse=""), paste(nudeSymbols, collapse=""), str))
  
  for(i in which(accentTypes%in%pattern))
    str <- chartr(symbols[i],nudeSymbols[i], str) 
  
  return(str)
}

但我收到了类似的错误:

Error in chartr(paste(symbols, collapse = ""), paste(nudeSymbols, collapse = ""),  : 
  invalid input 'RUÍZ' in 'utf8towcs'

我写这篇文章是为了向您展示编码。在该列中有特殊字符的位置显示UTF-8:

编码(Base$NOMBRE_INCRENTATOR) [1]未知的

推荐答案

将.csv文件导入到R中时,将设置您的编码来解决‘utf8owcs’中无效输入的解决方案。

当您使用read.csv()o read.delim()导入文件时,请指定ENCODING=&Quot;UTF-8&Quot;或ENCODING=&Quot;拉丁语-1&Quot;。我试过拉丁语,它解决了这个问题。

您可能还希望检查您的系统编码是什么,并进行匹配。您可以使用Sys.getLocale()(并使用Sys.setLocale()设置它)来实现这一点。例如,在我的系统上:

Sys.getLocale() [1]";en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8";

示例

data <- read.delim("input/data/data.txt", sep=";", 
              encoding = "Latin-1", stringsAsFactors = F )

data <- read.csv("input/data/data.csv", sep=";", 
              encoding = "Latin-1", stringsAsFactors = F )

致以最诚挚的问候

677