Strings of Natural Languages: Unsupervised Analysis and Segmentation on the Expression Level

Name: Strings of Natural Languages: Unsupervised Analysis and Segmentation on the Expression Level
Author: Markus Stengel
ISBN: 9783836656276

Author Markus Stengel

Publisher Diplomica Verlag

Category Computers

📄 Viewing lite version Full site ›

🌎 Shop on Amazon — choose country

🇺🇸 USA 🇨🇦 Canada 🇬🇧 UK 🇩🇪 Germany 🇫🇷 France 🇮🇳 India

74.90 USD

🛒 Buy New on Amazon 🇺🇸 🏷 Buy Used — $63.17

✓ Usually ships in 24 hours

Book Details

Author(s)Markus Stengel

PublisherDiplomica Verlag

ISBN / ASIN3836656272

ISBN-139783836656276

AvailabilityUsually ships in 24 hours

Sales Rank99,999,999

CategoryComputers

MarketplaceUnited States 🇺🇸

Description ▲

Learning a second language is often difficult. One major reason for this is the way we learn: We try to translate the words and concepts of the other language into those of our own language. As long as the languages are fairly similar, this works quite well. However, when the languages differ to a great degree, problems are bound to appear. For example, to someone whose first language is French, English is not difficult to learn. In fact, he can pick up any English book and at the very least recognize words and sentences. But if he is tasked with reading a Japanese text, he will be completely lost: No familiar letters, no whitespace, and only occasionally a glyph that looks similar to a punctuation mark appears. Nevertheless, anyone can learn any language. Correct pronunciation and understanding alien utterances may be hard for the individual, but as soon as the words are transcribed to some kind of script, they can be studied and - given some time - understood. The script thus offers itself as a reliable medium of communication. Sometimes the script can be very complex, though. For instance, the Japanese language is not much more difficult than German - but the Japanese script is. If someone untrained in the language is given a Japanese book and told to create a list of its vocabulary, he will likely have to succumb to the task. Or does he not? Are there maybe ways to analyze the text, regardless of his unfamiliarity with this type of script and language? Should there not be characteristics shared by all languages which can be exploited? This thesis assumes the point of view of such a person, and shows how to segment a corpus in an unfamiliar language while employing as little previous knowledge as possible. To this end, a methodology for the analysis of unknown languages is developed. The single requirement made is that a large corpus in electronic form which underwent only a minimum of preprocessing is available. Analysis is limited strictly to the expression lev

Windows XP, Vol. 1 (SELECT Series)

View

Internet Searching and Indexing: The Subject Approach

View

Control Problems in Industry: Proceedings from the SIA…

View

Open Source Systems Security Certification

View

Java: Data Structures and Programming

View

User-Centered Web Development

View

Query Processing in Database Systems (Topics in Inform…

View

Fundamentals of SQL Server 2005

View

Dreamweaver CS4: The Missing Manual (Spanish Edition)

View

Strings of Natural Languages: Unsupervised Analysis and Segmentation on the Expression Level

Description ▲

More Books in Computers