CV2（OpenCV）は、コンピュータビジョンのためのオープンソースのライブラリですこのライブラリは、画像処理やコンピュータビジョンの応用を開発するために広く使用されています CV2を使用すると、さまざまな画像処理タスクを実行できますその中の一つが、イメージ上のパターンを見つけることですパターン検出は、コンピュータビジョンの重要な分野です例えば、顔検出や文字認識などのアプリケーションでは、パターン検出が一般的に使用されます

In Japanese CV2（OpenCV）は、コンピュータビジョンのためのオープンソースのライブラリですこのライブラリは、画像処理やコンピュータビジョンの応用を開発するために広く使用されています CV2を使用すると、さまざまな画像処理タスクを実行できますその中の一つが、イメージ上のパターンを見つけることですパターン検出は、コンピュータビジョンの重要な分野です例えば、顔検出や文字認識などのアプリケーションでは、パターン検出が一般的に使用されます

この記事では、コンピュータビジョンとニューラルネットワークを使用して、100年以上前に筆記体で書かれたテキストから単語を見つける方法について説明します。

この短い例では、Tensorflow / Kerasでトレーニングされたモデルに従って、筆記体のテキストを含む画像を解析し、特定の単語を抽出するためにコンピュータビジョンに焦点を当てたCV2パッケージを使用します。

import cv2import numpy as npimport pandas as pdfrom google.colab.patches import cv2_imshowfrom statistics import meanimport tensorflow as tf

画像はカラーから2色のスケールに変換して読み込まれます。

image = cv2.imread('test.jpg')#グレースケールに変換image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)cv2_imshow(image)

画像はバックエンドでテンソルとして解釈され、各ピクセルごとに3点からなるテンソルとなります。これらの3点は色の彩度を表します。

image

次に、inRange関数を使用して各ピクセルを二値分類形式（白または黒）に設定します。その後、値を反転させる必要があります。なぜなら、機械学習の標準では白を黒の上に表示するからです。

lower = np.array([0, 0, 120])upper = np.array([0, 0, 255])msk = cv2.inRange(image, lower, upper)msk = cv2.bitwise_not(msk)cv2_imshow(msk)

次の2つの関数は腐食と膨張です。最初の関数は、ノイズとなる可能性がある白い点を除去するために行われます（石を浸食する砂を想像してください）。2番目の関数は、白い領域を拡大してぼやけたスキーマを作成するために行われます。このスキーマでは、おそらく1つの連続ブロックが1つの単語に相当するはずです。

In [5]:

#ケルネルを定義する必要があります。この場合は、抽出する図形の形状が設定されています（長方形、楕円、円など）。この場合は長方形kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(2,2))#白を腐食してノイズを除去しますrrmsk = cv2.erode(msk,kernel,iterations = 1)kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(10,5))rrmsk = cv2.dilate(msk, kernel, iterations=1)cv2_imshow(rrmsk)

このセクションは、元の画像内の単語を特定するための例として実行されます。また、この方法で採取した各単語の画像の矩形を格納するwordオブジェクトも作成しました。

contours, hierarchy = cv2.findContours(rrmsk, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE)minw = 50minh = 10image = cv2.imread('test.jpg')image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)cleancontours = []words = []for contour in contours: x,y,w,h = cv2.boundingRect(contour) if ((w>=minw) & (h>=minh)):  cleancontours.append(contour)  word = image[y-5:y+h+5,x-5:x+w+5]  words.append(word)cleancontours = tuple(cleancontours)imageC = cv2.imread('test.jpg')imageC = cv2.cvtColor(imageC, cv2.COLOR_BGR2HSV)cv2.drawContours(imageC, cleancontours, -1, (0,255,0), 3)cv2_imshow(imageC)

これらの単語を前の手順と同じ方法で処理します。

image = cv2.imread('test.jpg')image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)lower = np.array([0, 0, 150])upper = np.array([0, 0, 255])masks = []for word in words: if len(word) > 0:  mask = cv2.inRange(word, lower, upper)  mask = cv2.bitwise_not(mask)  masks.append(mask)

ここでは、肯定的なトレーニングサンプルに取り組みます。画像内で「ガルシア」という単語を探します。同じ人物によって書かれた同じ単語の他の画像のサンプルを収集しました。これらの画像を前述の方法と同様にクリーニングし、同じサイズに変換しました（すべてのサンプルを摂取した平均）。ここに表示されている3番目の画像は、すべての16枚の画像の平均です。

lower = np.array([0, 0, 150])upper = np.array([0, 0, 255])samples = ['sample1.jpg','sample2.jpg','sample3.jpg','sample4.jpg','sample5.jpg','sample6.jpg','sample7.jpg','sample8.jpg','sample9.jpg','sample10.jpg','sample11.jpg','sample12.jpg','sample13.jpg','sample14.jpg','sample15.jpg','sample16.jpg']train = []hs = []ws = []for sample in samples: im = cv2.imread(sample) im = cv2.cvtColor(im, cv2.COLOR_BGR2HSV) im = cv2.inRange(im, lower, upper) im = cv2.bitwise_not(im) height, width = im.shape hs.append(height) ws.append(width) train.append(im)from statistics import meanhh = int(mean(hs))+1ww = int(mean(ws))+1trainr = []for im in train: im = cv2.resize(im, (ww, hh), interpolation = cv2.INTER_CUBIC) trainr.append(im)cv2_imshow(trainr[2])cv2_imshow(trainr[1])meanimg = np.mean(trainr, axis=0)cv2_imshow(meanimg)

平均画像を取り、バイナリ値に変換します。

lower = 80upper = 255meanimgT = cv2.inRange(meanimg, lower, upper)cv2_imshow(meanimgT)

このセクションでは、平均を取り、テキスト内の各単語からユークリッド距離を抽出します。このシンプルな方法が良い結果をもたらすかどうか見てみましょう。

minh = min(hs)-20minw = min(ws)-20maxh = max(hs)+20maxw = max(ws)+20testr = []for im in masks: height, width = im.shape if ((height >= minh) & (width >= minw) & (height <= maxh) & (width <= maxw)):  im = cv2.resize(im, (ww, hh), interpolation = cv2.INTER_CUBIC)  testr.append(im)a,b=meanimg.shapeleng = a*bdistc = []i = 0for im in testr: distance = np.sqrt(np.sum(np.square(meanimg - im)))/leng dist = pd.DataFrame({'position':[i], 'distance':[distance]}) i = i + 1 distc.append(dist)distc = pd.concat(distc, axis=0, ignore_index=True)distc.sort_values('distance')

そして…最も近い単語は実際にガルシアです。

cv2_imshow(testr[23])

この次のセクションでは、７つのテキストボディの画像を取得し、テスト画像と同じように処理します。目的は、ネガティブなトレーニングセットとして使用できるいくつかのダミーの単語の画像を抽出することです。

lower = np.array([0, 0, 120])upper = np.array([0, 0, 255])samples2 = ['dummy1.jpg','dummy2.jpg','dummy3.jpg','dummy4.jpg','dummy5.jpg','dummy6.jpg','dummy7.jpg']minw = 50minh = 10train2r = []for sample in samples2: im = cv2.imread(sample) im = cv2.cvtColor(im, cv2.COLOR_BGR2HSV) im = cv2.inRange(im, lower, upper) im = cv2.bitwise_not(im) kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(2,2)) im = cv2.erode(im,kernel,iterations = 1) kernel = cv2.getStructuringElement(cv2.MORPH_RECT,(10,5)) im = cv2.dilate(im, kernel, iterations=1) image = cv2.imread(sample) image = cv2.cvtColor(image, cv2.COLOR_BGR2HSV) for contour in contours:  x,y,w,h = cv2.boundingRect(contour)  if ((w>=minw) & (h>=minh)):   word = image[y-5:y+h+5,x-5:x+w+5]   if (len(word) > 0):    try:     mask = cv2.inRange(word, lower, upper)     mask = cv2.bitwise_not(mask)     height, width = mask.shape     if ((height >= minh) & (width >= minw) & (height <= maxh) & (width <= maxw)):      mask = cv2.resize(mask, (ww, hh), interpolation = cv2.INTER_CUBIC)      train2r.append(mask)    except Exception as error:     print(error)

これはKerasのニューラルネットワークモデルの核です。この場合、二次元入力と一次元のバイナリ出力をサポートするレイヤーが使用されています。トレーニングセットが非常に小さく、モデルがすぐに過適合領域に飛び込んだため、何度も実行してデータの選択に「揺れ」を許す低いバッチサイズを使用し、異なるサンプル上で何度も実行する必要がありました。

from random import sampletrain2r_s = sample(train2r,30)training = trainr + train2r_sys = ([1] * len(trainr)) + ([0] * len(train2r_s))training = np.array(training)ys = np.array(ys)#need shape of the inputnn, xx, yy = np.array(training).shape#initializeneur = tf.keras.models.Sequential()#layersneur.add(tf.keras.layers.Conv2D(5,3, activation='relu', input_shape=(xx,yy,1)))neur.add(tf.keras.layers.Conv2D(15,3, activation='tanh'))neur.add(tf.keras.layers.Conv2D(15,3, activation='tanh'))neur.add(tf.keras.layers.Flatten())neur.add(tf.keras.layers.Dense(10, activation='tanh'))#output layerneur.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))neur.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])neur.fit(np.array(training), np.array(ys), batch_size=15, epochs=3)

過学習を軽減するために、モデルを異なるランダムなトレーニングセットで繰り返しトレーニングするためのループを作成しました。ネガティブの大規模なサンプルがあったため、これを行うことができました。また、正の事例と負の事例の間の高い不均衡も考慮する必要がありました。

for i in list(range(0,30)): train2r_s = sample(train2r,30) training = trainr + train2r_s ys = ([1] * len(trainr)) + ([0] * len(train2r_s)) training = np.array(training) ys = np.array(ys) print("training part: " + str(i+1)) neur.fit(np.array(training), np.array(ys), batch_size=30, epochs=3)

テスト実行、未表示データ。

test_out = neur.predict(np.array(testr))pd.DataFrame(test_out).sort_values(0, ascending=False)

テストの上位画像（1に最も近い値）を抽出し、実質的には「Garcia」という単語です。

check = 23print(test_out[check])cv2_imshow(testr[check])

We will continue to update VoAGI; if you have any questions or suggestions, please contact us!

Computer VisionKerasMachine learningNeural NetworksPython

Was this article helpful?

93 out of 132 found this helpful

Was this article helpful?

「NVIDIAとScalewayがヨーロッパのスタートアップと企業の開発を加速」

大規模展開向けのモデル量子化に深く掘り下げる

機械学習

2023年にフォローすべきトップ10のAIインフルエンサー

「ChatGPTが連邦取引委員会によって潜在的な被害の調査を受ける」

「AIと倫理の架け橋：医療実施における包括的な解決策」

言葉の解明：AIによる詩と文学の進化' (Kotoba no kaimei AI ni yoru shi to bungaku no shinka)

LangChain：LLMがあなたのコードとやり取りできるようにします

あなたの製品の開発者学習のためのLLM（大規模言語モデル）