开发者

Reading file input from a multipart/form-data POST

开发者 https://www.devze.com 2023-04-05 20:40 出处:网络
I\'m POSTing a file to a WCF REST service through a HTML form, with enctype set to multipart/form-data and a single component: <input type=\"file\" name=\"data\">. The resulting stream being rea

I'm POSTing a file to a WCF REST service through a HTML form, with enctype set to multipart/form-data and a single component: <input type="file" name="data">. The resulting stream being read by the server contain开发者_如何学编程s the following:

------WebKitFormBoundary
Content-Disposition: form-data; name="data"; filename="DSCF0001.JPG"
Content-Type: image/jpeg

<file bytes>
------WebKitFormBoundary--

The problem is that I'm not sure how do extract the file bytes from the stream. I need to do this in order to write the file to the disk.


Sorry for joining the party late, but there is a way to do this with Microsoft public API.

Here's what you need:

  1. System.Net.Http.dll
    • Included in .NET 4.5
    • For .NET 4 get it via NuGet
  2. System.Net.Http.Formatting.dll
    • For .NET 4.5 get this NuGet package
    • For .NET 4 get this NuGet package

Note The Nuget packages come with more assemblies, but at the time of writing you only need the above.

Once you have the assemblies referenced, the code can look like this (using .NET 4.5 for convenience):

public static async Task ParseFiles(
    Stream data, string contentType, Action<string, Stream> fileProcessor)
{
    var streamContent = new StreamContent(data);
    streamContent.Headers.ContentType = MediaTypeHeaderValue.Parse(contentType);

    var provider = await streamContent.ReadAsMultipartAsync();

    foreach (var httpContent in provider.Contents)
    {
        var fileName = httpContent.Headers.ContentDisposition.FileName;
        if (string.IsNullOrWhiteSpace(fileName))
        {
            continue;
        }

        using (Stream fileContents = await httpContent.ReadAsStreamAsync())
        {
            fileProcessor(fileName, fileContents);
        }
    }
}

As for usage, say you have the following WCF REST method:

[OperationContract]
[WebInvoke(Method = WebRequestMethods.Http.Post, UriTemplate = "/Upload")]
void Upload(Stream data);

You could implement it like so

public void Upload(Stream data)
{
    MultipartParser.ParseFiles(
           data, 
           WebOperationContext.Current.IncomingRequest.ContentType, 
           MyProcessMethod);
}


You may take a look at the following blog post which illustrates a technique that could be used to parse multipart/form-data on the server using the Multipart Parser:

public void Upload(Stream stream)
{
    MultipartParser parser = new MultipartParser(stream);
    if (parser.Success)
    {
        // Save the file
        SaveFile(parser.Filename, parser.ContentType, parser.FileContents);
    }
}

Another possibility is to enable aspnet compatibility and use HttpContext.Current.Request but that's not a very WCFish way.


I've had some issues with parser that are based on string parsing particularly with large files I found it would run out of memory and fail to parse binary data.

To cope with these issues I've open sourced my own attempt at a C# multipart/form-data parser here

Features:

  • Handles very large files well. (Data is streamed in and streamed out while reading)
  • Can handle multiple file uploads and automatically detects if a section is a file or not.
  • Returns files as a stream not as a byte[] (good for large files).
  • Full documentation for the library including a MSDN-style generated website.
  • Full unit tests.

Restrictions:

  • Doesn't handle non-multipart data.
  • Code is more complicated then Lorenzo's

Just use the MultipartFormDataParser class like so:

Stream data = GetTheStream();

// Boundary is auto-detected but can also be specified.
var parser = new MultipartFormDataParser(data, Encoding.UTF8);

// The stream is parsed, if it failed it will throw an exception. Now we can use
// your data!

// The key of these maps corresponds to the name field in your
// form
string username = parser.Parameters["username"].Data;
string password = parser.Parameters["password"].Data

// Single file access:
var file = parser.Files.First();
string filename = file.FileName;
Stream data = file.Data;

// Multi-file access
foreach(var f in parser.Files)
{
    // Do stuff with each file.
}

In the context of a WCF service you could use it like this:

public ResponseClass MyMethod(Stream multipartData)
{
    // First we need to get the boundary from the header, this is sent
    // with the HTTP request. We can do that in WCF using the WebOperationConext:
    var type = WebOperationContext.Current.IncomingRequest.Headers["Content-Type"];

    // Now we want to strip the boundary out of the Content-Type, currently the string
    // looks like: "multipart/form-data; boundary=---------------------124123qase124"
    var boundary = type.Substring(type.IndexOf('=')+1);

    // Now that we've got the boundary we can parse our multipart and use it as normal
    var parser = new MultipartFormDataParser(data, boundary, Encoding.UTF8);

    ...
}

Or like this (slightly slower but more code friendly):

public ResponseClass MyMethod(Stream multipartData)
{
    var parser = new MultipartFormDataParser(data, Encoding.UTF8);
}

Documentation is also available, when you clone the repository simply navigate to HttpMultipartParserDocumentation/Help/index.html


I open-sourced a C# Http form parser here.

This is slightly more flexible than the other one mentioned which is on CodePlex, since you can use it for both Multipart and non-Multipart form-data, and also it gives you other form parameters formatted in a Dictionary object.

This can be used as follows:

non-multipart

public void Login(Stream stream)
{
    string username = null;
    string password = null;

    HttpContentParser parser = new HttpContentParser(stream);
    if (parser.Success)
    {
        username = HttpUtility.UrlDecode(parser.Parameters["username"]);
        password = HttpUtility.UrlDecode(parser.Parameters["password"]);
    }
}

multipart

public void Upload(Stream stream)
{
    HttpMultipartParser parser = new HttpMultipartParser(stream, "image");

    if (parser.Success)
    {
        string user = HttpUtility.UrlDecode(parser.Parameters["user"]);
        string title = HttpUtility.UrlDecode(parser.Parameters["title"]);

        // Save the file somewhere
        File.WriteAllBytes(FILE_PATH + title + FILE_EXT, parser.FileContents);
    }
}


Another way would be to use .Net parser for HttpRequest. To do that you need to use a bit of reflection and simple class for WorkerRequest.

First create class that derives from HttpWorkerRequest (for simplicity you can use SimpleWorkerRequest):

public class MyWorkerRequest : SimpleWorkerRequest
{
    private readonly string _size;
    private readonly Stream _data;
    private string _contentType;

    public MyWorkerRequest(Stream data, string size, string contentType)
        : base("/app", @"c:\", "aa", "", null)
    {
        _size = size ?? data.Length.ToString(CultureInfo.InvariantCulture);
        _data = data;
        _contentType = contentType;
    }

    public override string GetKnownRequestHeader(int index)
    {
        switch (index)
        {
            case (int)HttpRequestHeader.ContentLength:
                return _size;
            case (int)HttpRequestHeader.ContentType:
                return _contentType;
        }
        return base.GetKnownRequestHeader(index);
    }

    public override int ReadEntityBody(byte[] buffer, int offset, int size)
    {
        return _data.Read(buffer, offset, size);
    }

    public override int ReadEntityBody(byte[] buffer, int size)
    {
        return ReadEntityBody(buffer, 0, size);
    }
}

Then wherever you have you message stream create and instance of this class. I'm doing it like that in WCF Service:

[WebInvoke(Method = "POST",
               ResponseFormat = WebMessageFormat.Json,
               BodyStyle = WebMessageBodyStyle.Bare)]
    public string Upload(Stream data)
    {
        HttpWorkerRequest workerRequest =
            new MyWorkerRequest(data,
                                WebOperationContext.Current.IncomingRequest.ContentLength.
                                    ToString(CultureInfo.InvariantCulture),
                                WebOperationContext.Current.IncomingRequest.ContentType
                );

And then create HttpRequest using activator and non public constructor

var r = (HttpRequest)Activator.CreateInstance(
            typeof(HttpRequest),
            BindingFlags.Instance | BindingFlags.NonPublic,
            null,
            new object[]
                {
                    workerRequest,
                    new HttpContext(workerRequest)
                },
            null);

var runtimeField = typeof (HttpRuntime).GetField("_theRuntime", BindingFlags.Static | BindingFlags.NonPublic);
if (runtimeField == null)
{
    return;
}

var runtime = (HttpRuntime) runtimeField.GetValue(null);
if (runtime == null)
{
    return;
}

var codeGenDirField = typeof(HttpRuntime).GetField("_codegenDir", BindingFlags.Instance | BindingFlags.NonPublic);
if (codeGenDirField == null)
{
    return;
}

codeGenDirField.SetValue(runtime, @"C:\MultipartTemp");

After that in r.Files you will have files from your stream.


The guy who solved this posted it as LGPL and you're not allowed to modify it. I didn't even click on it when I saw that. Here's my version. This needs to be tested. There are probably bugs. Please post any updates. No warranty. You can modify this all you want, call it your own, print it out on a piece of paper and use it for kennel scrap, ... don't care.

using System.Collections.Generic;
using System.Collections.Specialized;
using System.IO;
using System.Net;
using System.Text;
using System.Web;

namespace DigitalBoundaryGroup
{
    class HttpNameValueCollection
    {
        public class File
        {
            private string _fileName;
            public string FileName { get { return _fileName ?? (_fileName = ""); } set { _fileName = value; } }

            private string _fileData;
            public string FileData { get { return _fileData ?? (_fileName = ""); } set { _fileData = value; } }

            private string _contentType;
            public string ContentType { get { return _contentType ?? (_contentType = ""); } set { _contentType = value; } }
        }

        private NameValueCollection _post;
        private Dictionary<string, File> _files;
        private readonly HttpListenerContext _ctx;

        public NameValueCollection Post { get { return _post ?? (_post = new NameValueCollection()); } set { _post = value; } }
        public NameValueCollection Get { get { return _ctx.Request.QueryString; } }
        public Dictionary<string, File> Files { get { return _files ?? (_files = new Dictionary<string, File>()); } set { _files = value; } }

        private void PopulatePostMultiPart(string post_string)
        {
            var boundary_index = _ctx.Request.ContentType.IndexOf("boundary=") + 9;
            var boundary = _ctx.Request.ContentType.Substring(boundary_index, _ctx.Request.ContentType.Length - boundary_index);

            var upper_bound = post_string.Length - 4;

            if (post_string.Substring(2, boundary.Length) != boundary)
                throw (new InvalidDataException());

            var current_string = new StringBuilder();

            for (var x = 4 + boundary.Length; x < upper_bound; ++x)
            {
                if (post_string.Substring(x, boundary.Length) == boundary)
                {
                    x += boundary.Length + 1;

                    var post_variable_string = current_string.Remove(current_string.Length - 4, 4).ToString();

                    var end_of_header = post_variable_string.IndexOf("\r\n\r\n");

                    if (end_of_header == -1) throw (new InvalidDataException());

                    var filename_index = post_variable_string.IndexOf("filename=\"", 0, end_of_header);
                    var filename_starts = filename_index + 10;
                    var content_type_starts = post_variable_string.IndexOf("Content-Type: ", 0, end_of_header) + 14;
                    var name_starts = post_variable_string.IndexOf("name=\"") + 6;
                    var data_starts = end_of_header + 4;

                    if (filename_index != -1)
                    {
                        var filename = post_variable_string.Substring(filename_starts, post_variable_string.IndexOf("\"", filename_starts) - filename_starts);
                        var content_type = post_variable_string.Substring(content_type_starts, post_variable_string.IndexOf("\r\n", content_type_starts) - content_type_starts);
                        var file_data = post_variable_string.Substring(data_starts, post_variable_string.Length - data_starts);
                        var name = post_variable_string.Substring(name_starts, post_variable_string.IndexOf("\"", name_starts) - name_starts);
                        Files.Add(name, new File() { FileName = filename, ContentType = content_type, FileData = file_data });
                    }
                    else
                    {
                        var name = post_variable_string.Substring(name_starts, post_variable_string.IndexOf("\"", name_starts) - name_starts);
                        var value = post_variable_string.Substring(data_starts, post_variable_string.Length - data_starts);
                        Post.Add(name, value);
                    }

                    current_string.Clear();
                    continue;
                }

                current_string.Append(post_string[x]);
            }
        }

        private void PopulatePost()
        {
            if (_ctx.Request.HttpMethod != "POST" || _ctx.Request.ContentType == null) return;

            var post_string = new StreamReader(_ctx.Request.InputStream, _ctx.Request.ContentEncoding).ReadToEnd();

            if (_ctx.Request.ContentType.StartsWith("multipart/form-data"))
                PopulatePostMultiPart(post_string);
            else
                Post = HttpUtility.ParseQueryString(post_string);

        }

        public HttpNameValueCollection(ref HttpListenerContext ctx)
        {
            _ctx = ctx;
            PopulatePost();
        }


    }
}


I have implemented MultipartReader NuGet package for ASP.NET 4 for reading multipart form data. It is based on Multipart Form Data Parser, but it supports more than one file.


How about some Regex?

I wrote this for a text a file, but I believe this could work for you

(In case your text file contains line starting exactly with the "matched" ones below - simply adapt your Regex)

    private static List<string> fileUploadRequestParser(Stream stream)
    {
        //-----------------------------111111111111111
        //Content-Disposition: form-data; name="file"; filename="data.txt"
        //Content-Type: text/plain
        //...
        //...
        //-----------------------------111111111111111
        //Content-Disposition: form-data; name="submit"
        //Submit
        //-----------------------------111111111111111--

        List<String> lstLines = new List<string>();
        TextReader textReader = new StreamReader(stream);
        string sLine = textReader.ReadLine();
        Regex regex = new Regex("(^-+)|(^content-)|(^$)|(^submit)", RegexOptions.IgnoreCase | RegexOptions.Compiled | RegexOptions.Singleline);

        while (sLine != null)
        {
            if (!regex.Match(sLine).Success)
            {
                lstLines.Add(sLine);
            }
            sLine = textReader.ReadLine();
        }

        return lstLines;
    }


I have dealt WCF with large file (serveral GB) upload where store data in memory is not an option. My solution is to store message stream to a temp file and use seek to find out begin and end of binary data.


In my case (as the OP) I always have just one part returned. So here is a quick and dirty to handle the simplest case.

/*
 * If the byte array has a paragraph of text at the beginning, remove it, and also 
 * remove a number of bytes at the end equal to the length of the first line.
 * Otherwise, just return the rawContent.
 */
public static byte[] RemoveMultipartWrapper(byte[] rawContent)
{
    var i1 = 0;
    var i2 = 0;
    for (var i = 1; i + i1 + 6 < rawContent.Length && i < 1000; i++)
    {
        if (rawContent[i] == 13 && rawContent[i + 1] == 10)  // search for CRLF
        {
            if (i1 == 0) i1 = i;  // first line break
            else if (i == i2 + 2)  // empty line
                return rawContent.Skip(i + 2).Take(rawContent.Length - i1 - i - 8).ToArray();
            i2 = i;
        }
    }
    return rawContent;   // no wrapper found
}
0

精彩评论

暂无评论...
验证码 换一张
取 消