Here is a brief summary of my aims. I have a list of data in the data text file that are basically names or identifiers. The list of names is all on one line and seperated by a space. I want to make each data a seperate line. These data are identifiers. If for instance one name from the original data text file in also present in the big file I want to have that line of data in the big file, i.e. the name and some additional 开发者_StackOverflowinformation all on the same line written to a smaller data file.
This is the program that I have started to attempt such a feat. Perhaps this is pushing the limits of my skills but I hope to be able to complete this.
datafile = open ('C:\\datatext.txt', 'r')
line = [item for item in open('C:\\datatext.txt', 'r').read().split(' ')
if item.startswith("name") or item.startswith("name2")]
line_list = line.split(" ")
completedataset = open('C:\\bigfile.txt', 'r')
smallerdataset = open('C:\\smallerdataset.txt', 'w')
trials = [ line_list ]
for line in completedataset:
for t in trials:
if t in line:
smallerdataset.write(line)
completedataset.close()
smallerdataset.close()
Here is the error that i receive when i run the program in python:
Traceback (most recent call last):
File "C:/program3.py", line 7, in <module>
line_list = line.split(" ")
AttributeError: 'list' object has no attribute 'split'
I have tried to be very thourough and look forward to your comments. If you have additional questions I will elaborate as needed promptly. All the best and enjoy the rainy weather.
EDIT:
I have made some changes to the program based on suggestions. I have this as my program now:
with open('C:\\datatext.txt', 'r') as datafile:
lines = datafile.read().split(' ')
matchedLines = [item for item in lines if item.startswith("name1") or item.startswith("othername")]
completedataset = open('C:\\bigfile.txt', 'r')
smallerdataset = open('C:\\smallerdataset.txt', 'w')
trials = [ matchedLines ]
for line in completedataset:
for t in trials:
if t in line:
smallerdataset.write(line)
completedataset.close()
smallerdataset.close()
and i'm getting this error now:
Traceback (most recent call last): File "C:/program5.py", line 17, in if t in line: TypeError: 'in ' requires string as left operand, not list >>>
Thank you for you're continued help in this matter.
EDIT 2:
I have made several changes and now I'm getting this error:
Traceback (most recent call last): File "C:/program6.py", line 9, in open('C:\\smallerdataset.txt', 'w')) as (completedataset, smallerdataset): AttributeError: 'tuple' object has no attribute '__exit__'
Here is my program as it stands now:
with open('C:\\datatext.txt', 'r') as datafile:
lines = datafile.read().split(' ')
matchedLines = [item for item in lines if item.startswith("nam1") or item.startswith("ndname")]
with (open('C:\\bigfile.txt', 'r'),
open('C:\\smallerdataset.txt', 'w')) as (completedataset, smallerdataset):
for line in completedataset:
for t in matchedLines:
if t in line:
smallerdataset.write(line)
completedataset.close()
smallerdataset.close()
How can I get around this hurdle?
line = [item for item in open('C:\chiptext.txt', 'r').read().split(' ')
if item.startswith("SNP") or item.startswith("AFFY")]
This is making line a list of strings. A list object does not have a split method.
It looks like you want a list of all the names in datatext and a subset of that list for names that match some predicate. The best way to do that is the following.
with open('C:\\datatext.txt', 'r') as datafile:
lines = datafile.read().split(' ')
matchedLines = [item for item in lines if (PREDICATE)]
As a general comment, try not to get too carried away with one-lining code. Your list comprehension line is leaving the file object open.
Edit for new edit:
matchedLines
is already a list, so I'm not sure why you are wrapping it in another list when you make trials
. Below is a simple example of what you are doing.
l = [1,2,3]
ll = [l]
print ll //[[1, 2, 3]]
When you get errors that don't make sense based on what you expect the value of a variable to be, you should add in print statements so you can confirm that the values are correct.
This is likely what you need:
with open('C:\datatext.txt', 'r') as datafile:
lines = datafile.read().split(' ')
matchedLines = [item for item in lines if item.startswith("name1") or item.startswith("othername")]
with open('C:\bigfile.txt', 'r') as completedataset:
with open('C:\smallerdataset.txt', 'w') as smallerdataset:
for line in completedataset:
for t in matchedLines:
if t in line:
smallerdataset.write(line)
精彩评论