Question 1 of 30
Dr. Anya Sharma, a computational linguist, is developing an automated script identification system for a digital archive containing historical documents from various regions of the Ottoman Empire. The archive includes texts in Ottoman Turkish, Arabic, Persian, and various Balkan languages, all potentially using different script variants or even code-switching between scripts within the same document. Her initial system relies solely on algorithmic script detection based on character frequency analysis. However, she notices that the system frequently misidentifies the dominant script in documents containing Ottoman Turkish, particularly when the text includes a high proportion of Arabic loanwords or regional script variations common in the Balkans. Considering the complexities of script usage within the Ottoman Empire and the limitations of purely algorithmic approaches, which of the following strategies would MOST effectively improve the accuracy of Dr. Sharma\'s script identification system?
Integrating linguistic context analysis to leverage language-specific features and statistical models that account for script variants and code-switching patterns.
Increasing the size of the training dataset used for the algorithmic script detection model to include a wider range of scripts and character variations.
Implementing a rule-based system that prioritizes scripts based on the geographical origin of the document, assuming a strong correlation between location and script.
Focusing on improving the accuracy of optical character recognition (OCR) to ensure that all characters are correctly identified before script detection is performed.

Preparing for ISO 23950:1998 Information and documentation -- Information retrieval (Z39.50)? Now land the interview.

73% of qualified candidates get rejected because of weak resumes. Build an ATS-optimized, recruiter-ready resume in under 5 minutes - free to start.

Build My Resume Free