SeslKoreanGeneralizer
Utility class for naturalizing Korean text by selecting the correct Josa (postpositional particles) based on the preceding character.
Korean Josa are grammatical particles that follow nouns or pronouns and indicate their grammatical function. The form of a Josa often changes depending on whether the preceding character ends with a consonant (Jongseong) or a vowel.
This class provides a method naturalizeText that takes a Korean string containing Josa placeholders (e.g., "사과(이)가 맛있다" - "apple(이가) is delicious") and replaces them with the grammatically correct Josa.
The class handles various Josa pairs like:
- 은(는) / (은)는
- 이(가) / (이)가
- 을(를) / (을)를
- 와(과) / (와)과
- 아(야) / (아)야
- (이)여
- (으)로
- (이)라
- (이에)예 / 이에(예)
- (이었)였 / 이었(였)
- (이)네
It also considers special pronunciation rules for certain symbols and numbers when determining the appropriate Josa.
Example Usage:
SeslKoreanGeneralizer generalizer = new SeslKoreanGeneralizer();
String inputText = "삼성(와)과 애플(이)가 경쟁한다.";
String naturalizedText = generalizer.naturalizeText(inputText);
// naturalizedText will be "삼성과 애플이 경쟁한다."
String inputText2 = "레벨(으)로";
String naturalizedText2 = generalizer.naturalizeText(inputText2);
// naturalizedText2 will be "레벨로"
String inputText3 = "1(은)는 홀수이다.";
String naturalizedText3 = generalizer.naturalizeText(inputText3);
Content copied to clipboard