python 中的maketrans在utf-8文件中该怎么使用
博彩导航大全
博彩导航大全
当前位置 : 博彩导航大全 > 博彩导航网站排行榜

python 中的maketrans在utf-8文件中该怎么使用

我写了一个处理文本的文件就是把文本中所有的符号都替换掉,替换成空格。用的python中maketrans和translate。其中在使用对于ASCII编码的文件时是正常的,但对于utf-8文件时,就报错,提示maketrans中的参数不等长,但是明明是一样长的啊:

File "/Users/lgq/Desktop/p3.py", line 10, in text_to_words

"abcdefghijklmnopqrstuvwxyz                                                   " 

ValueError: the first two maketrans arguments must have equal length

我查了一下说是maketrans在utf-8下不能用,那我在utf-8下该怎么替换掉字符呢,求各位大神指点。

def text_to_wordsthe_text:
    """ 
        Return a list of words with all punctuation removed,
        and all in lowercase.
    """
    my_substitutions = the_text.maketrans
        # If you find any of these
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&*+,-./:;<=>?@[]^_`{|}~\\",
        # Replace them by these
        "abcdefghijklmnopqrstuvwxyz                                            "
    # Translate the text now.
    cleaned_text = the_text.translatemy_substitutions
    wds = cleaned_text.split
    return wds


def get_words_in_bookfilename:
    """ Read a book from filename, and return a list of its words."""
    f = openfilename, "r", encoding = "utf-8"
    content = f.read
    f.close
    wds = text_to_wordscontent
    return wds


book_words = get_words_in_book"alice.txt"
print"There are {0} words in the book, the first 100 are\n{1}".
        formatlenbook_words, book_words[:100]

首先 这两个字符串长度不相等, \" 是一个字符, \\ 也是一个字符
你可以用 len 查看。
然后关于字符串什么的问题,最好说明 python 的版本

maketrans 参数长度不相等

 my_substitutions = the_text.maketrans
        # If you find any of these
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&*+,-./:;<=>?@[]^_`{|}~\\",
        # Replace them by these
        "abcdefghijklmnopqrstuvwxyz                                            "

测试代码:

from string import translate, maketrans

def text_to_wordsthe_text:
    """ 
        Return a list of words with all punctuation removed,
        and all in lowercase.
    """
    my_substitutions = maketrans
        # If you find any of these
        "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&*+,-./:;<=>?@[]^_`{|}~\\",
        # Replace them by these
        "abcdefghijklmnopqrstuvwxyz                                          "
    # Translate the text now.
    cleaned_text = the_text.translatemy_substitutions
    wds = cleaned_text.split
    return wds

text_to_wordsABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&*+,-./:;<=>?@[]^_`{|}~\\\测试

output

[abcdefghijklmnopqrstuvwxyz, \xe6\xb5\x8b\xe8\xaf\x95]

这是 python2 的运行结果

广告位

博彩导航大全