漢字変換を雑に実装してみる
一発変換と変換候補を出すだけ。 たぶん一番基本的なやつ。
真面目に日本語入力を実現するには形態素解析とかいろいろしなきゃいけないので自分には MURI。 ライブラリもあるけど興味もそこまでない。
Mozc 更新して。
コメント書けないのはアレだけど、Python だと JSON を標準ライブラリでサポートしてるので便利。
Wikipedia の Trie の記事を参考にした: https://en.wikipedia.org/wiki/Trie
ソースコード
#!/usr/bin/env python3 # -*- coding: utf-8 -*- """ Copyright (C) 2018 Vanaestea Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. """ from collections import deque class Node(object): def __init__(self): self.values = [] self.children = {} class Trie(object): def __init__(self, vocabulary=None): self.__root = Node() if vocabulary is not None: for key, values in vocabulary.items(): self.insert(key, values) def insert(self, key, values): node = self.__root i = 0 while i < len(key): if key[i] in node.children: node = node.children[key[i]] i += 1 else: break while i < len(key): node.children[key[i]] = Node() node = node.children[key[i]] i += 1 node.values.extend(values) def query(self, key): node = self.__root for char in key: if char in node.children: node = node.children[char] else: return list() return node.values def suggest(self, key, n=10): suggestions = [] node = self.__root for char in key: if char in node.children: node = node.children[char] else: break if node is not self.__root: suggestions.extend(node.values) que = deque() que.append(node.children) while len(que) > 0: for child in que.popleft().values(): que.append(child.children) suggestions.extend(child.values) if len(suggestions) > n: return suggestions return suggestions
vocabulary.json
{ "い": ["胃", "異", "煎"], "いい": ["良い", "謂", "易々"], "いいん": ["委員", "医院"], "いいんちょう": ["委員長"] }
使用例
$ python3 Python 3.5.3 (default, Jan 19 2017, 14:11:04) [GCC 6.3.0 20170118] on linux Type "help", "copyright", "credits" or "license" for more information. >>> from trie import Trie >>> import json >>> with open('vocabulary.json', 'r') as f: ... vocabulary = json.load(f) ... >>> trie = Trie(vocabulary) >>> trie.query('いいんちょう') ['委員長'] >>> trie.query('い') ['胃', '異', '煎'] >>> trie.query('がっきゅういいん') [] >>> trie.suggest('いい') ['良い', '謂', '易々', '委員', '医院', '委員長'] >>>