Skip to content

JacyYYMode

SanghounSong edited this page Dec 19, 2014 · 6 revisions

MeCab

In order to use the YY mode as the input pipeline of Jacy, three items need to be installed on your machine; viz. (1) mecab, (2) mecab dictionary, and (3) mecab python wrapper.

mecab binary

wget https://mecab.googlecode.com/files/mecab-0.996.tar.gz
tar xvf mecab-0.996.tar.gz 
mv mecab-0.996 mecab
cd mecab
./configure 
sudo make
sudo make check
sudo make install

mecab dictionary

wget https://mecab.googlecode.com/files/mecab-ipadic-2.7.0-20070801.tar.gz
tar xvf mecab-ipadic-2.7.0-20070801.tar.gz 
mv mecab-ipadic-2.7.0-20070801 mecab-ipadic
cd mecab-ipadic
./configure --with-charset=utf8
make
make install

mecab python wrapper

wget https://mecab.googlecode.com/files/mecab-python-0.996.tar.gz
tar xvf mecab-python-0.996.tar.gz 
mv mecab-python-0.996 mecab-python
cd mecab-python
sudo python setup.py build
sudo python setup.py install

test

>>> import MeCab
>>> m = MeCab.Tagger('-Ochasen')
>>> print m.parse('私は昨日、引っ越ししました。')

jpn2yy.py

You can run this script (under jacy/utils) independently, using the following pipeline.

$ echo "雨が降った." | python jpn2yy.py
(0, 0, 1, <0:1>, 1, "雨", 0, "null", "名詞-一般:n-n" 1.0)(1, 1, 2, <1:2>, 1, "が", 0, "null", "助詞-格助詞-一般:n-n" 1.0)(2, 2, 3, <2:4>, 1, "降っ", 0, "null", "動詞-自立:五段・ラ行-連用タ接続" 1.0)(3, 3, 4, <4:5>, 1, "た", 0, "null", "助動詞:特殊・タ-基本形" 1.0)

This YY code can be used as the input of ACE. Note that ACE uses the -y option for invoking the YY mode.

$ ace -g ace/config.tdl -G jacy.dat
$ echo "雨が降った." | python utils/jpn2yy.py | ace -g jacy.dat -y1Tf
Clone this wiki locally