My script downloads开发者_JAVA百科 files from the net and then it saves them under the name taken from the same web server. I need a filter/remover of invalid characters for file/folder names under Windows NTFS.
I would be happy for multi platform filter too.
NOTE: something like htmlentities
would be great....
Like Geo said, by using gsub
you can easily convert all invalid characters to a valid character. For example:
file_names.map! do |f|
f.gsub(/[<invalid characters>]/, '_')
end
You need to replace <invalid characters>
with all the possible characters that your file names might have in them that are not allowed on your file system. In the above code each invalid character is replaced with a _
.
Wikipedia tells us that the following characters are not allowed on NTFS:
- U+0000 (NUL)
- / (slash)
- \ (backslash)
- : (colon)
- * (asterisk)
- ? (question mark)
- " (quote)
- < (less than)
(greater than)
- | (pipe)
So your gsub
call could be something like this:
file_names.map! { |f| f.gsub(/[\x00\/\\:\*\?\"<>\|]/, '_') }
which replaces all the invalid characters with an underscore.
filename_string.gsub(/[^\w\.]/, '_')
Explanation: Replace everything except word-characters (letter, number, underscore) and dots
I think your best bet would be gsub
on the filename. One of the things I know you'll need to delete/replace is :
.
I don't know how you plan to use those files later, but pretty much most reliable solution would be to keep the original filenames in a db table (or otherwise serialized hash), and name physical files after the unique ID that you (or the database) generated.
PS Another advantage of this approach is that you don't have to worry about the files with the same names (or different names that filter to same names).
精彩评论