开发者

Detect Language of NSString

开发者 https://www.devze.com 2023-03-12 15:30 出处:网络
Somebody told me about a class for language recognition in Cocoa. Does anybody know which one it is? This is not working:

Somebody told me about a class for language recognition in Cocoa. Does anybody know which one it is?

This is not working:

NSSpellChecker *spellChecker = [NSSpellChecker shared开发者_运维百科SpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = @"Guten Tag Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";
[spellChecker checkSpellingOfString:spellCheckText startingAt:0];
NSLog(@"%@", [spellChecker language]);

The result is 'en' but should be 'de'.


There is API in cocoa available to check the language of a string, and it is always best to use Foundation over CoreFoundation whenever possible.

NSArray *tagschemes = [NSArray arrayWithObjects:NSLinguisticTagSchemeLanguage, nil];
NSLinguisticTagger *tagger = [[NSLinguisticTagger alloc] initWithTagSchemes:tagschemes options:0];
[tagger setString:@"Das ist ein bisschen deutscher Text. Bitte löschen Sie diesen nicht."];
NSString *language = [tagger tagAtIndex:0 scheme:NSLinguisticTagSchemeLanguage tokenRange:NULL sentenceRange:NULL];

Alternatively, if you happen to have mixed language text, you can use the enumerateLinguisticTagsInRange API to get the language of each word in the text.


Thats the result:

- (NSString *)languageForString:(NSString *) text{

     if (text.length < 100) {
         return (NSString *) CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, text.length));
     } else {
         return (NSString *)CFStringTokenizerCopyBestStringLanguage((CFStringRef)text, CFRangeMake(0, 100));
     }
}


You can use -requestCheckingOfString:… instead. NSTextCheckingTypeOrthography attempts to identify the language used in the string, and the completion handler receives an NSOrthography parameter that can be used to get information about the orthography in the string, including its dominant language.

The following example outputs dominant language = de:

NSSpellChecker *spellChecker = [NSSpellChecker sharedSpellChecker];
[spellChecker setAutomaticallyIdentifiesLanguages:YES];
NSString *spellCheckText = @"Guten Herr Mustermann. Dies ist ein deutscher Text. Bitte löschen Sie diesen nicht.";

[spellChecker requestCheckingOfString:spellCheckText
    range:(NSRange){0, [spellCheckText length]}
    types:NSTextCheckingTypeOrthography
    options:nil
    inSpellDocumentWithTag:0
    completionHandler:^(NSInteger sequenceNumber, NSArray *results, NSOrthography *orthography, NSInteger wordCount) {
        NSLog(@"dominant language = %@", orthography.dominantLanguage);
}];


A swift String extension for Jennifer's answer:

extension String {
    func language() -> String? {
        let tagger = NSLinguisticTagger(tagSchemes: [NSLinguisticTagSchemeLanguage], options: 0)
        tagger.string = self
        return tagger.tagAtIndex(0, scheme: NSLinguisticTagSchemeLanguage, tokenRange: nil, sentenceRange: nil)
    }
}

Usage:

let language = "What language is this?".language()


With Swift 5, you can choose one of the following approaches in order to detect the language of a given string.


#1. Using NSLinguisticTagger's dominantLanguage property

Since iOS 11, NSLinguisticTagger has a property called dominantLanguage. dominantLanguage has the following declaration:

var dominantLanguage: String? { get }

Returns the dominant language of the string set for the linguistic tagger.

The Playground sample code below show how to use dominantLanguage in order to know the dominant language of a string:

import Foundation

let text = "あなたはそれを行うべきではありません。"
let tagger = NSLinguisticTagger(tagSchemes: [.language], options: 0)
tagger.string = text
let language = tagger.dominantLanguage
print(language) // Optional("ja")

#2. Using NSLinguisticTagger's dominantLanguage(for:) method

As an alternative, NSLinguisticTagger has a convenience method called dominantLanguage(for:) for creating a new linguistic tagger, setting its string property and getting the dominantLanguage property. dominantLanguage(for:) has the following declaration:

class func dominantLanguage(for string: String) -> String?

Returns the dominant language for the specified string.

Usage:

import Foundation

let text = "Die Kleinen haben friedlich zusammen gespielt."
let language = NSLinguisticTagger.dominantLanguage(for: text)
print(language) // Optional("de")

#3. Using NLLanguageRecognizer's dominantLanguage property

Since iOS 12, NLLanguageRecognizer has a property called dominantLanguage. dominantLanguage has the following declaration:

var dominantLanguage: NLLanguage? { get }

The most likely language for the processed text.

Here’s how to use dominantLanguage to guess the dominant language of natural language text:

import NaturalLanguage

let string = "J'ai deux amours. Mon pays et Paris."
let recognizer = NLLanguageRecognizer()
recognizer.processString(string)
let language = recognizer.dominantLanguage
print(language?.rawValue) // Optional("fr")


As of iOS 11 you can use the dominantLanguage(for:)/dominantLanguageForString: class method of NSLinguisticTagger.

Swift:

extension String {
    var language: String? {
        return NSLinguisticTagger.dominantLanguage(for: self)
    }
}

print("Good morning".language)
print("Buenos días".language)

Objective-C:

@interface NSString (Tagger)

@property (nonatomic, readonly, nullable) NSString *language;
@end

@implementation NSString (Tagger)

- (NSString *)language {
    return [NSLinguisticTagger dominantLanguageForString:self];
}

@end

NSLog(@"%@", @"Good morning".language);
NSLog(@"%@", @"Buenos días".language);

Output (for both):

en
es

0

精彩评论

暂无评论...
验证码 换一张
取 消