Gani, Danny and Purnama, James and Eng, Kho I (2019) Deep Learning Analysis in Development of Handwritten and Plain Text Classification API. Bachelor thesis, Swiss German University.
|
Text
Danny Gani 11502016 TOC.pdf Download (1MB) | Preview |
|
Text
Danny Gani 11502016 1.pdf Restricted to Registered users only Download (704kB) |
||
Text
Danny Gani 11502016 2.pdf Restricted to Registered users only Download (1MB) |
||
Text
Danny Gani 11502016 3.pdf Restricted to Registered users only Download (1MB) |
||
Text
Danny Gani 11502016 4.pdf Restricted to Registered users only Download (952kB) |
||
Text
Danny Gani 11502016 5.pdf Restricted to Registered users only Download (116kB) |
||
|
Text
Danny Gani 11502016 Ref.pdf Download (580kB) | Preview |
Abstract
Optical Character Recognition (OCR) and Handwritten Text Recognition (HTR) are technologies that enable text recognition. The difference between OCR and HTR is one designed specifically for digital text and one designed for handwritten text. There are already various implementations of OCR and HTR online. However, such systems do not guarantee the systems are in premises. To solve this problem, the OCR and HTR system must be built from the scratch. The purpose of this research is to improve the recognition by separating the text whether it is a handwritten or a printed text, which will later be forwarded into the appropriate recognition system. An application program interface (API) was also created in order to finalize the classification system into real world usage. In this research, the classification system being developed using convolutional neural network (CNN) method. To be able to reach the highest accuracy of the classification system, the experimentation and improvement about hyperparameters, dataset format, data augmentation and analysis on 3 CNN architectures were conducted. In the end of this research, there are 2 architectures in a tight competition, one is VGG-16 with 90.63% accuracy and one is AlexNet with 90.17% accuracy on ideal data testing. However, AlexNet is chosen as the winner after the testing with real data.
Item Type: | Thesis (Bachelor) |
---|---|
Uncontrolled Keywords: | CNN ; Text Classification ; Text Categorization ; In Premises ; OCR |
Subjects: | T Technology > T Technology (General) > T58.6 Management information systems |
Divisions: | Faculty of Engineering and Information Technology > Department of Information Technology |
Depositing User: | Adityatama Ratangga |
Date Deposited: | 19 May 2020 13:46 |
Last Modified: | 19 May 2020 13:46 |
URI: | http://repository.sgu.ac.id/id/eprint/615 |
Actions (login required)
View Item |