漢字変換を雑に実装してみる

一発変換と変換候補を出すだけ。 たぶん一番基本的なやつ。

真面目に日本語入力を実現するには形態素解析とかいろいろしなきゃいけないので自分には MURI。 ライブラリもあるけど興味もそこまでない。

Mozc 更新して。

コメント書けないのはアレだけど、Python だと JSON を標準ライブラリでサポートしてるので便利。

Wikipedia の Trie の記事を参考にした: https://en.wikipedia.org/wiki/Trie

ソースコード

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""
Copyright (C) 2018 Vanaestea

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
"""

from collections import deque

class Node(object):
    def __init__(self):
        self.values = []
        self.children = {}

class Trie(object):
    def __init__(self, vocabulary=None):
        self.__root = Node()
        if vocabulary is not None:
            for key, values in vocabulary.items():
                self.insert(key, values)

    def insert(self, key, values):
        node = self.__root
        i = 0
        while i < len(key):
            if key[i] in node.children:
                node = node.children[key[i]]
                i += 1
            else:
                break

        while i < len(key):
            node.children[key[i]] = Node()
            node = node.children[key[i]]
            i += 1

        node.values.extend(values)

    def query(self, key):
        node = self.__root
        for char in key:
            if char in node.children:
                node = node.children[char]
            else:
                return list()
        return node.values

    def suggest(self, key, n=10):
        suggestions = []

        node = self.__root
        for char in key:
            if char in node.children:
                node = node.children[char]
            else:
                break

        if node is not self.__root:
            suggestions.extend(node.values)

            que = deque()
            que.append(node.children)
            while len(que) > 0:
                for child in que.popleft().values():
                    que.append(child.children)
                    suggestions.extend(child.values)
                    if len(suggestions) > n:
                        return suggestions

        return suggestions

vocabulary.json

{
    "い": ["胃", "異", "煎"],
    "いい": ["良い", "謂", "易々"],
    "いいん": ["委員", "医院"],
    "いいんちょう": ["委員長"]
}

使用例

$ python3
Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from trie import Trie
>>> import json
>>> with open('vocabulary.json', 'r') as f:
...     vocabulary = json.load(f)
...
>>> trie = Trie(vocabulary)
>>> trie.query('いいんちょう')
['委員長']
>>> trie.query('い')
['胃', '異', '煎']
>>> trie.query('がっきゅういいん')
[]
>>> trie.suggest('いい')
['良い', '謂', '易々', '委員', '医院', '委員長']
>>>