top of page


Administering Text Processors
The User's Guide to OCR Data Pipeline can provide useful information for setting the environment and configuring the system. System...
aramhayr
Aug 15, 20247 min read


OCR Data Pipeline
OCR Processor The OCR Processor implements Corpus Data Pipeline, the a2-b2-c2-d2 path on the Diagram of the Թվային հումանիտար գիտություններ . The Diagram below shows main modules of the system. The dotted-line modules are in progress. The rest contains three independent processors: PDFProcessor - Reads PDF files and converts it into single page JPEG images. Three is an option to slice the page into 2 or 3 columns to improve OCR quality for Dictionaries, which frequently for
aramhayr
Jul 25, 20243 min read
Armenian Characters set review
Unicode Standard overview To evaluate the Armenian Characters encoding we need to understand the principles of Unicode standard and determine the degree of textual source coverage by specified characters set. Coverage evaluation has 2 aspects: the percentage of 1) sources, units of texts, and 2) the characters specified in the set vs used in all texts. I am going to address only principles that are relevant to the Armenian character set and affect rendering as well as proces
aramhayr
Apr 18, 20248 min read
Running the Armenian Parser
Overview The Parser performs spell-checking, tagging, and lemmatization of Eastern Armenian [plain] text, typed in revised orthography (see exception in the book ). It is Linux command line application distributed as a Java .jar executable, bash script to run it, and morpheme Dictionaries as plain text (JSON objects). The language model used for text processing is described in details in the Բնական խոսքի ընդհանրական ներկայացման մի տար-բերակի մասին . To install the applica
aramhayr
Dec 11, 20238 min read
Reviewing Thought-Based Linguistics by Wallace Chafe
Overview I started reading the [Cha2018] book by looking at the Table of Contents, glancing through the Prolog and first couple of sections. I noticed some similarities with my [Հայ2022] book in the selection of topics (judging by sections titles) and the overall structure. The phrase “The human brain has justifiably been called the most complicated object in the known universe” [Cha2018::7] almost literally appears in my book [Հայ2022::153]. This coincidence is remarkable
aramhayr
Aug 30, 202316 min read
What is language?
Overview While rereading technical and popular publications on linguistics – this time closely – I realized that loose usage of terms language and speech, consciousness, thinking, thought, semantics, meaning, etc. causes misunderstanding; writing a book titled What is language? might be a good idea. For now it is just a blog. Language and speech “To speak clearly and accurately we need a precise and well-defined language” [Aho1972::1]. Let us start with defining the terms l
aramhayr
Aug 30, 202332 min read
Interviewing chatGPT: Armenian Corpora and morphology test
Dr. Gayane Hovhanisian shared these Q and A with permission to use in my research. I added comments and links in chatGPT answers. Can...
aramhayr
Aug 19, 20235 min read
Interviewing chatGPT: Corpus linguistics
Date: 2023-03-26, chatGPT version: Mar 14, 2023 1. What is a text corpus? A text corpus is a large and structured collection of written...
aramhayr
Mar 26, 202312 min read
Interviewing chatGPT: Natural Language Processing (NLP)
Date: 2023-02-25; chatGPT version: Feb 13, 2023 1. Ferdinand de Saussure was one of the first to articulate the difference between...
aramhayr
Feb 25, 202314 min read
Interviewing ChatBotGPT: Language basics
Interview Date: 2023-02-21; chatGPT version: Feb 13, 2023 1. What is consciousness? Consciousness is a term used to describe a state of...
aramhayr
Feb 21, 202313 min read
On Syntactic Structure Representation
(author's summary – v.1.7.6 ) Հայերեն A.Hayrapetyan. On Syntactic Structure Representation (in Eastern Armenian : Բնական...
aramhayr
Feb 5, 20232 min read
bottom of page