开发者

Coldfusion web service consumed as WSDL chokes on control characters

开发者 https://www.devze.com 2023-02-16 15:29 出处:网络
Clients upload files in .doc format to a server directory, and the text within them is extracted using POI as per Ray Camdens posting here The content is saved in a text/memo field in a MySQL database

Clients upload files in .doc format to a server directory, and the text within them is extracted using POI as per Ray Camdens posting here The content is saved in a text/memo field in a MySQL database and is made available as a web service which is consumed as wsdl. All works as intended, until consumers of the web service access records containing certain (I presume) control characters, at which point the web service throws a 500 error.

In the database the problem rows seem to have control characters, and when the text field is displayed in Firefox there's odd characters too.

Coldfusion web service consumed as WSDL chokes on control characters

The web service simply returns a CF query of returntype = "any" and is called as

<cfinvoke webservice="https://nww.someplace.nhs.uk/cfcs/providerapi.cfc?wsdl" 开发者_JAVA技巧
method="getPendingReferrals" returnvariable="getReferrals">
    <cfinvokeargument name="userName" value=#username#>
    <cfinvokeargument name="password" value=#password#>
    <cfinvokeargument name="maxrows" value=#maxrows#>
</cfinvoke>

I presume the WSDL cannot transmit these characters, so is there a way to encode them, or do I just have to strip them out using a regex or something?

<cfcomponent>
    <cffunction  output="false" access="remote"  returntype="any" name="getPendingReferrals">
        <cfargument required="false" name="userName" type="string"/>
        <cfargument required="false" name="password" type="string"/>
        <cfargument required="false" name="maxrows" type="numeric" default="20"/>       
        <cfset var q="">

<cfinvoke component="cfcs.security" method="checkAuthenticated" returnvariable="checkAuth">
    <cfinvokeargument name="username" value="#arguments.userName#">
    <cfinvokeargument name="password" value="#arguments.password#">
</cfinvoke>


<cfif checkAuth.authenticates is "true"> 
    <!--- log the login --->
    <cfset filename=#datepart("yyyy", now())#&#datepart("m", now())#&#datepart("d", now())#&"loginlog.txt">
    <CFSET OUTFILE = "#application.Root#"&"logs\"&"#filename#">
    <cfif #FileExists(OUTFILE)# is "Yes">
        <cffile action="append" file="#OUTFILE#"  output="#checkAuth.userName#, #now()#,  #remote_addr#, #Left(http_user_agent, 50)#">
    <cfelse>
        <CFFILE action="write" output="#checkAuth.userName#, #now()#,  #remote_addr#, #Left(http_user_agent, 50)#" file="#OUTFILE#">
    </cfif> 


        <cfif checkAuth.organisationID is 1>
            <cfset toStr="toID=1">
        <cfelseif checkAuth.organisationID is 28>
            <cfset toStr="(toID=28 OR toID=29)">
        </cfif>

        <cfquery name="q" datasource='mySqlData' maxrows=#arguments.maxrows#>
            SELECT messages.messageID, messages.toID, messages.fromID AS referrerID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.fromID) as referrerName,messages.threadID, messages.messageBody, messages.dateCreated, messages.dateSent, 
            messages.deleted, messages.createdByID, (SELECT CONCAT(title, ' ',firstName, ' ', lastname) FROM users WHERE users.userID = messages.createdByID) as createdByName, (SELECT organisationName FROM organisations WHERE messages.originatingOrganisationID = organisations.organisationID) as originatingOrganisationName, messages.originatingOrganisationID, messages.viewed, messages.referral, messages.actioned, messages.patientID, messages.refTypeID, messages.specialtyID, organisations.organisationName AS toOrganisationName, patients.nhsNumber AS patientNHSnumber, patients.patientTitle, patients.patientLastname, patients.patientFirstname,  patients.patientDOB, patients.address1 as patientAddress1, patients.address2 AS patientAddress2, patients.address3 AS patientAddress3, patients.address4 AS patientAddress4, patients.postcode AS patientPostcode, patients.patientPhone1
            FROM users INNER JOIN (organisations INNER JOIN (patients INNER JOIN messages ON patients.patientID = messages.patientID) ON organisations.organisationID = messages.toID) ON users.userID = messages.fromID 
            WHERE #toStr#
            AND NOT actioned
            AND NOT originatingOrganisationID=3
            ORDER BY messageID
            </cfquery>


            <cfif isQuery(q)>
                <cfreturn q>
            <cfelse>
                <cfreturn "Error : in query">
            </cfif>



<cfelse> 
    <cfreturn "Error : failed to authenticate">
</cfif>    

</cffunction>


You should probably strip out all high-ascii characters using a regular expression. One of the best I've found was written up by Ben Nadel, here. (Though it wasn't perfect, and I made some improvements to it in the comments.)

Basically, if you just want to strip the high-ascii characters out, do this:

<cfset result = reReplace(messageBody, "[^\x20-\x7E\x0D\x09]", "", "all") />

This regular expression takes a white-list approach, allowing only printable characters to remain:

  • \x20-\x7E = {space} ! " # $ % & ' ( ) * + , - . / 0-9 : ; < = > ? @ A-Z [ \ ] ^ _ ` a-z { | } ~
  • \x0D = carriage return
  • \x09 = horizontal tab

If you like this approach to sanitization, you could use Sean Coyne's method for updating the query with a loop:

<cfloop query="q">
    <cfset querySetCell(
        q,
        "messageBody",
        clean(q.messageBody[q.currentRow]),
        q.currentRow
    )/>
</cfloop>
<cffunction name="clean">
    <cfargument name="in" />
    <cfreturn reReplace(arguments.in, "[^\x20-\x7E\x0D\x09]", "", "all") />
</cffunction>


It is not ideal, but you could try something like:

<cfloop query="q">
    <cfset querySetCell(q,"messageBody",xmlFormat(q.messageBody[q.currentRow]),q.currentRow) />
</cfloop>

If xmlFormat fails to remove all the characters (it has been known to miss a few), you may need to write a manual method to strip them out.


As Sean is mentioning you will need to escape all kind of special characters to get valid XML - have a look at http://www.petefreitag.com/item/202.cfm for example

0

精彩评论

暂无评论...
验证码 换一张
取 消