Python正则表达式:查找单词和表情

人气:465 发布:2022-10-16 标签: regex python emoticons

问题描述

我想找到一条推文和包含单词,短语和表情符号的字符串列表之间的匹配项.这是我的代码:

I want to find matches between a tweet and a list of strings containing words, phrases, and emoticons. Here is my code:

words = [':)','and i','sleeping','... :)','! <3','facebook'] regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)

words = [':)','and i','sleeping','... :)','! <3','facebook'] regex = re.compile(r'\b%s\b|(:\(|:\))+' % '\\b|\\b'.join(words), flags=re.IGNORECASE)

我一直收到此错误:

error: unbalanced parenthesis

显然,代码有问题,它不能匹配表情符号.知道如何解决吗?

Apparently there is something wrong with the code and it cannot match emoticons. Any idea how to fix it?

推荐答案

re模块具有函数escape,该函数负责正确地转义单词,因此您可以使用

The re module has a function escape that takes care of correct escaping of words, so you could just use

words = map(re.escape, [':)','and i','sleeping','... :)','! <3','facebook'])

请注意,当单词边界与以实际单词字符开头或结尾不存在的单词一起使用时,单词边界可能无法按预期工作.

Note that word boundaries might not work as you expect when used with words that don't start or end with actual word characters.

292