I have a semicolon-delimited pandas DataFrame with all dtypes of object. Within some of the cells the string value can have ", a comma (,), or both (ex. TES"T_ING,_VALUE). I am then querying the DF using df.query based on some condition to get a subset of the DataFrame but the rows that have the pattern described in the example are being omitted completely but the remaining rows are being returned just fine. Another requirement is that I need to match all " within the text with a closing quote as well but applying a lambda to replace " with "" is also not being done properly. I have tried several methods and they are listed below
Problem 1:
pd.read_csv("file.csv", delimiter=';')
pd.read_csv("file.csv", delmiter=';', thousands=',')
pd.read_csv("file.csv", delimiter=";", escapechar='"')
pd.read_csv("file.csv", delimiter=";", encoding='utf-8')
All of the above fail to load the data in question.
Problem 2: Input: TES"T_ING,_VALUE to TES""T_ING,_VALUE I have tried:
df.apply(lambda s: s.str.replace('"', '""')
which doesn't do anything.
What exactly is going on? I haven't been able to find any questions tackling this particular type of issue anywhere.
Appreciate your help in advan开发者_如何学运维ce.
精彩评论