I use the following code to read data. It throws java.nio.charset.MalformedInputException. The file I can open normally, but it does include non-ascii chars. Anyway I can fix this problem?
Source.fromInputStream(stream).getLines foreach { line =>
// store items on the fly
lineParser(line.trim) match {
case None => // no-op
case Some(pair) => // some-op
}
}
stream.close()
The stream construction code is here:
def getStream(path: String) = {
if (!fileExist开发者_如何学Gos(path)) {
None
} else {
val fileURL = new URL(path)
val urlConnection = fileURL.openConnection
Some(urlConnection.getInputStream())
}
}
Try Source.fromInputStream(stream)(io.Codec("UTF-8"))
or whatever charset you need.
Jean-Laurent is likely completely right that Stream.fromInputStream is using an encoding that doesn't match your stream—likely the platform default, i.e. ISO8859-1 on Windows, UTF-8 on recent Linux distros, IIUC MacRoman on Macs... Since you got an encoding exception, it's likely that it was defaulting to UTF-8—since it's a fairly rigid scheme—and the file was some other encoding (most likely ISO8859-1).
Broadly, there's no way to tell a priori what character encoding was used to generate some bitstream—you need some out-of-band mechanism to communicate it. In the case of HTTP responses, you can often get it from the Content-Type
header, but various web apps do it wrong sometimes. If the file is XML, it's common to claim an encoding in the Processing Instruction at the top. Some file formats specify a single standard encoding... It's all over the map really.
Your best bet, in the absence of any integration requirement, is to use UTF-8 explicitly everywhere, and don't rely on the platform default encoding.
精彩评论