An Application of Numbers and Characters Recognition on Radar Images

From the project of the date and time reading in the Radar images, the OCR templates technique was applied because some Radar images do not have date and time information in their properties. The method had the problems that the Radar images could not be retrieved when the month names are changed and then the new month name templates had to be created. Therefore, this work has applied Tesseract OCR library to help read the date and time images and used Fast Normalized Cross-Correlation theory to enhance the performance of the template matching and to retrieve more Radar images. The accuracy of results in this program using Fast Normalized Cross-Correlation with OCR templates was 86 percent.


Introduction
Over the past severalyears, there were many severe flood and extreme drought events in Thailand. The rainfall monitoring from Radar images is one module which has been developed to monitor rainfall in any areas of Thailand. Currently, the OCR technique is widely applied for converting the text or number images into the editable format. In previous work [1], the OCR templates technique was applied to recognize the text images on Radar images for managing them in storage. The date and time text recognition on Radar and Satellite images program by applying the template matching technique was developed in [1]. This application works only with stored characters and numbers templates [2]. However, in [1], if the image templates or the date and time features on the images are changed, the character templates will be manually generated for the new characters features templates.
For flexible detection, in this work, the character features have been read and converted into the editable format when the characters features will be changed. The Tesseract OCR library has been applied to solve the above mentioned problem. Also, the OCR templates method using in [1] was still applied to this application. Moreover, the Fast Normalized Cross-Correlation theory was applied with the OCR templates method to improve the accuracy in text image reading.
The remaining of this paper is organized as follows: Section 2 presents the general background information about this application. Section 3 proposes the algorithm of the text image recognition. In section 4, the experimental results are considered. Finally, section 5 is the discussion and conclusion.

The sets of Radar Images
The input data of the program is radar images of some provinces and districts in Thailand.In this work, we focused on the set of Radar images which cover 7 provincesand 1 district as follows: Surat Thani province, Narathiwat province, Nan province, KhonKaen province, Krabi province, SamutSongkhram province, Songkhla province, and Phanom district.The examples of Radar images are shown in Figure 1.

Methodologies
In this work, there are two methods applied for text images reading as follows.

Tesseract OCR
Tesseract is an open source tool for recognizing text in images using Optical Character Recognition (OCR) method of extracting text from images.Deep learning is applied in Tesseract OCR. It is complex and the results are high accuracy. Tesseract library is used to perform OCR on images and the output is stored in a text file [3].
In Python, the pytesseract module is imported into the codes. The basic usage requires passing the image to image_to_string method of the pytesseract class.

Fast Normalized Cross Correlation Theory
Normalized Cross Correlation (NCC) is the correlation measure to define similarity between two or more images. The NCC method is a simple template matching method that defines the location of a desired pattern represented by a template function, t, inside a two dimensional image function, f. The NCC correlation , between template t and blocks in the reference image is given by [4]:

Let
, be the intensity value of the image f at pixel (x,y). , denotes the mean value of , within the area of the template t shifted by (u,v) steps and defined: However, the normalized correlation operation does not meet speed requirements for time critical applications [5]. Therefore, the Fast Normalized Cross Correlation applied in this work is the fast calculation of the Normalized Cross Correlation to solve the problem of template matching. Even using Fast Fourier transform (FFT) methods, it is too computationally intense for rapidly managing several large images.

Data classification
Radar images of each province are different, so the data could be classified into two groups according to date text locations and image resolutions as follows.

The algorithm of text reading program
Since the data was classified into two groups in this work, and we used the Python library that is Tesseract OCR for reading the date text from the Radar images to apply with both groups of data.Then, we found that Tesseract OCR worked well with the first group which the date texts are at the upper right corner of the images.However, the second group which the date texts are at the bottom line of the images could not be used this library because these images have low resolutions, small font sizes and much more noise. Therefore, it was necessary to design the different date text reading processes to be two the following algorithms.

The algorithm using Tesseract OCR library
The input used for this process was the first group. The process flow is shown in Figure 4. The process diagram in Figure 4 was the main process of the date text reading from the Radar images using Tesseract OCR library.The input data in this procedure were in the first group. They were Radar image of Songkhla province, Radar image of Krabi province, Radar Image of Phanom district.The Radar images of each province in this group have different details such as the characteristic of numbers, the locations of the date texts, and the image resolutions.So, it made some processes worked different depending on the mentioned details previously, however, all images in this case applied Tesseract OCR library to read the date texts in the Radar images.

The algorithm using Fast Normalized Cross-Correlation
When applying Tesseract OCR library with the Radar images in the second group, the results had more errors.This group was not suitable to use the Tesseract OCR library method.Thus, the Fast Normalized Cross-Correlation theory was applied to read the date texts in the second group Radar images.This theory is widely used to measure the similarity between the two images.The results are in the range of -1 to 1.If the result is 1, the two images are similar.On the other hand, if the result is -1, the two images are absolutely not similar.
The Fast Normalized Cross-Correlation theory was applied in this algorithm. Figure 5 illustrates its process flow. The program has the following processes: Figure 5.The process diagram of algorithm using Fast Normalized Cross-Correlation with OCR templates. Figure 5 shows the process diagram of Fast Normalized Cross-Correlation theory applied to date reading in Radar images program. The Radar images data input used for this method were in the second group which were the Radar images of Surat Thani province, the Radar images of Narathiwat province, the Radar images of Nan province, the Radar images of KhonKaen province, and the Radar images of SamutSongkhram province.In this images group, the Radar images of each province have different details such as the character of numbers, the locations of the date texts, and the image resolutions as well.So, it also made some processes worked different depending on the problems, however, overall applied Fast Normalized Cross-Correlation theory to read the date texts in the Radar images.

1.The testing in case of using Tesseract OCR
The first groupof Radar images which are Krabi province and Phanom district were tested in this section. The set of images for this test were 24 hours and every 10 minutes for Phanom district, so all was 168 images for each day. We tested in this case for 15 days and compared to the cases using OCR templates. The results were shown in graph (a) and (b) in Figure 6. From Figure 6, graph (a) and (b), Tesseract OCR method could detect date and time that was 73 percent accurate but OCR templates accuracy was 82 percent. It was concluded that programs using Tesseract OCR library could read the date immediately when the months name change without creating the new templates every time. However, Tesseract works best when there is a very clean segmentation of the foreground text from the background or without noises.

2.The testing in case of using Fast Normalized Cross-Correlation
The second groupof Radar images which are Narathiwat province, Nan province, KhonKaen province, and SamutSongkhram province were tested. The set of images for this test were also 24 hours and every 10 minutes, so all was 312 images for each day. We tested in this case for 11 days and compared to the cases using OCR templates. The results were shown in graph (a) and (b) in Figure 7.  Figure 7, graph (a) and (b), Using Fast Normalized Cross-Correlation with OCR templates in these cases could detect date and time that was 86 percent accurate but using only OCR templates accuracy was 83 percent. The cause of errors from using Fast Normalized Cross-Correlation with OCR templates may be due to the characteristic of numbers or the resolution of the image.

Discussion and Conclusion
From the results of the experimental program applying the Fast Normalized Cross-Correlation, the test results had a higher percentage of accuracy than OCR templates method. It was 86 percent accurate considered to be high level. However, even if the results of the Tesseract OCR program were lower, 73 percent accurate, but this program can help solve the problem when the month names were changed. In the future, we will develop the program which is able to detect the date and time from the Radar images to be more effective.