python - how can I count the specific bigram words? -
i want find , count specific bigram words such "red apple" in text file. made text file word list, couldn't use regex count whole phrase. (i.e. bigram) ( or can ? )
how can count specific bigram in text file? not using nltk or other module... regex can solution?
why have made text file list. it's not memory efficient. instead of text can use file.read() method directly.
import re text = 'i red apples , green apples red apples more.' bigram = ['red apples', 'green apples'] in bigram: print 'found', i, len(re.findall(i, text))
out:
found red apples 2 found green apples 1
Comments
Post a Comment