I'm trying to load data from a CSV file into a MySQL database, and noticed that a large number of records seem to be skipped when I import the file.
The data comes from a Government source, and is very oddly formatted with single quotes, etc in unusual places. Here's a sample of a record not getting inserted:
"'050441'","STANFORD HO开发者_Python百科SPITAL","CA","H_HSP_RATING_7_8","How do patients rate the hospital overall?","Patients who gave a rating of'7' or '8' (medium)","22","300 or more","37",""
This record, however, does get inserted:
"'050441'","STANFORD HOSPITAL","CA","H_HSP_RATING_0_6","How do patients rate the hospital overall?","Patients who gave a rating of '6' or lower (low)","8","300 or more","37",""
The SQL I'm using to load the data is here:
mysql> load data infile "c:\\HQI_HOSP_HCAHPS_MSR.csv" into table hospital_qualit
y_scores fields terminated by "," enclosed by '"' lines terminated by "\n" IGNOR
E 1 LINES;
The format of the table I'm loading the data into is as follows:
delimiter $$
CREATE TABLE `hospital_quality_scores` (
`ProviderNumber` varchar(8) NOT NULL,
`HospitalName` varchar(50) DEFAULT NULL,
`State` varchar(2) DEFAULT NULL,
`MeasureCode` varchar(25) NOT NULL,
`Question` longtext,
`AnswerDescription` longtext,
`AnswerPercent` int(11) DEFAULT NULL,
`NumberofCompletedSurveys` varchar(50) DEFAULT NULL,
`SurveyResponseRatePercent` varchar(50) DEFAULT NULL,
`Footnote` longtext,
PRIMARY KEY (`ProviderNumber`,`MeasureCode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8$$
Anyone have any ideas why this is happening? It seems that only have of the records are actually being inserted correctly.
Could it be your primary key is preventing the additional data from being inserted?
Look for a record that has been inserted with a ProviderNumber of "'050441'" and a MeasureCode of "H_HSP_RATING_7_8", if you have one of those, then it is a duplicate key problem.
You may need to add "AnswerDescription" to the primary key to get round this issue.
Regards,
Dave
Actually I'm thinking maybe your problem has more to do with the first value being double quoted (i.e. it is quoted twice as in "'value'"), which is probably resulting in the value you are trying to insert being '050441', not 050441 like it should be.
At any rate, without special handling, you are going to be INSERTing the extra single quotes, which I am thinking you probably did not mean to do.
Good Luck and may all your code run flawlessly!
Rodney
精彩评论