开发者

How to load cell array of strings in Matlab mat files into Python list or tuple using Scipy.io.loadmat

开发者 https://www.devze.com 2023-02-07 12:01 出处:网络
I am a Matlab user new to Python. I would like to write a cell array of strings in Matlab to a Mat file, and load this Mat file using Python (maybe scipy.io.loadmat) into some similar type (e.g list o

I am a Matlab user new to Python. I would like to write a cell array of strings in Matlab to a Mat file, and load this Mat file using Python (maybe scipy.io.loadmat) into some similar type (e.g list of strings or tuple of strings). But loadmat read things into array and I am not sure how to convert it into a list. I tried the "tolist" function which does not work as I expected ( I have a poor understanding of Python array or numpy array). For example:

Matlab code:

cell_of_strings = {'thank',  'you', 'very', 'much'};
save('my.mat', 'cell_of_strings');

Python code:

matdata=loadmat('my.mat', chars_as_strings=1, matlab_compatible=1);
array_of_strings = matdata['cell_of_strings']

Then, the variable array_of_strings is:

array([[[[u't' u'h' u'a' u'n' u'k']], [[u'y' u'o' u'u']],
    [[u'v' u'e' u'r' u'y']], [[u'm' u'u' u'c' u'h']]]], dtype=object)

I am not sure how to convert this array_of_strings into a Python list or tuple so that it looks like

list_of_strings 开发者_运维技巧= ['thank',  'you', 'very', 'much'];

I am not familiar with the array object in Python or numpy. Your help will be highly appreciated.


Have your tried this:

import scipy.io as si

a = si.loadmat('my.mat')
b = a['cell_of_strings']                # type(b) <type 'numpy.ndarray'>
list_of_strings  = b.tolist()           # type(list_of_strings ) <type 'list'>

print list_of_strings 
# output: [u'thank', u'you', u'very', u'much']


This looks like a job for list comprehension. Repeating your example, I did this in MATLAB:

cell_of_strings = {'thank',  'you', 'very', 'much'};
save('my.mat', 'cell_of_strings','-v7'); 

I'm using a newer version of MATLAB, which saves .mat files in HDF5 format by default. loadmat can't read HDF5 files, so the '-v7' flag is to force MATLAB to save to an older version .mat file, which loadmat can understand.

In Python, I loaded the cell array just like you did:

import scipy.io as sio
matdata = sio.loadmat('%s/my.mat' %path, chars_as_strings=1, matlab_compatible=1);  
array_of_strings = matdata['cell_of_strings']

Printing array_of_strings gives:

[[array([[u't', u'h', u'a', u'n', u'k']], 
          dtype='<U1')
      array([[u'y', u'o', u'u']], 
          dtype='<U1')
      array([[u'v', u'e', u'r', u'y']], 
          dtype='<U1')
      array([[u'm', u'u', u'c', u'h']], 
          dtype='<U1')]]

The variable array_of_strings is a (1,4) numpy object array but there are arrays nested within each object. For example, the first element of array_of_strings is an (1,5) array containing the letters for 'thank'. That is,

array_of_strings[0,0]
array([[u't', u'h', u'a', u'n', u'k']], 
      dtype='<U1')

To get at the first letter 't', you have to do something like:

array_of_strings[0,0][0,0]
u't'

Since we are dealing with nested arrays, we need to employ some recursive technique to extract the data, i.e. nested for loops. But first, I'll show you how to extract the first word:

first_word = [str(''.join(letter)) for letter in array_of_strings[0][0]]
first_word
['thank']

Here I am using a list comprehension. Basically, I am looping through each letter in array_of_strings[0][0] and concatenating them using the ''.join method. The string() function is to convert the unicode strings into regular strings.

Now, to get the list strings you want, we just need to loop through each array of letters:

words = [str(''.join(letter)) for letter_array in array_of_strings[0] for letter in letter_array]
words
['thank', 'you', 'very', 'much']

List comprehensions take some getting used to, but they are extremely useful. Hope this helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消