开发者

How to scan hundreds of log files containing SSN, and change the files to mask out the SSNs without changing the reset of the contents

开发者 https://www.devze.com 2023-04-10 07:56 出处:网络
I was asked this question on an interview and I couldn\'t came up with an efficient idea to solve this problem.

I was asked this question on an interview and I couldn't came up with an efficient idea to solve this problem.

"How to scan hundreds of log files containing SSN, and change the files to mask out the SSNs without changing the reset of the contents."

Can anybody give me a hint? Thank you.开发者_如何学C

UPDATE: It was a Java developer position interview.


Don't use java (the question never indicated you needed to use java).

sed/awk on a *nix is easier and less complicated.

Sometimes interviewers want to know if you only have one tool in your basket.

If you had to use java,

1) read the file line by line
2) use regex to replace each line of the file in form nnn-nn-nnnn with the appropriate mask (n is the digits)
3) while doing that write each line to the new file
4) when done, possibly delete the old file and change the name of the new file you created to the old file name.


I'd use sed. It's not java, but it's fast and already-made.


I know this isn't the answer they're looking for, but if I were asked that question my answer would be something along the lines of "I'd never rely on an automated process like this to try to obscure something as sensitive as a SSN" Too many things can go wrong - say you use a regular expression (with sed, for example), and one of the SSNs is missing its first digit. The first three digits are trivial to guess (figure out someone's birthplace) and your algorithm will miss it. The first time there's a mistake...

0

精彩评论

暂无评论...
验证码 换一张
取 消