开发者

Extracting line after new line

开发者 https://www.devze.com 2023-03-14 09:27 出处:网络
I have a text file with something like Country1 city1 city2 Country2 city3 city4 I want to separate country and cities. Is there any quick way of doing it? I am thinking of some fil开发者_Python百

I have a text file with something like

Country1
city1
city2

Country2
city3
city4

I want to separate country and cities. Is there any quick way of doing it? I am thinking of some fil开发者_Python百科e handling and then extracting to different files, is it best way or can be done with some regex etc quickly?


countries=[]
cities=[]
with open("countries.txt") as f:
    gap=True
    for line in f:
        line=line.strip()
        if gap:
            countries.append(line)
            gap=False
        elif line=="":
            gap=True
        else:
            cities.append(line)
print countries
print cities

output:

['Country1', 'Country2']
['city1', 'city2', 'city3', 'city4']

if you want to write these to files:

with open("countries.txt","w") as country_file, open("cities.txt","w") as city_file:
    country_file.write("\n".join(countries))
    city_file.write("\n".join(cities))


f = open('b.txt', 'r')
status = True
country = []
city = []
for line in f:
    line = line.strip('\n').strip()
    if line:
        if status:
            country.append(line)
            status = False
        else:
            city.append(line)
    else:
        status = True

print country
print city


output :

>>['city1', 'city2', 'city3', 'city4']
>>['Country1', 'Country2']


$countries = array();
$cities = array();
$gap = false;
$file = file('path/to/file');
foreach($file as $line)
{
  if($line == '') $gap = true;
  elseif ($line != '' and $gap) 
  {
    $countries[] = $line;
    $gap = false;
  }
  elseif ($line != '' and !$gap) $cities[] = $line;
}


Depending on how regular your file is, it may be this simple in python:

with open('inputfile.txt') as fh:
  # To iterate over the entire file.
  for country in fh:
    cityLines = [next(fh) for _i in range(2)]

    # read a blank line to advance countries.
    next(fh)

That's not likely to be exactly right, because I imagine many countries have variable numbers of cities. You could modify it like so to address that:

with open('inputfile.txt') as fh:
  # To iterate over the entire file.
  for country in fh:
    # we assume here that each country has at least 1 city.
      cities = [next(fh).strip()]

      while cities[-1]: # will continue until we encounter a blank line.
        cities.append(next(fh).strip())

That doesn't do anything to put the data into an output file, or store it much past the file handle itself, but it's a start. You really should choose a language for your questions though. A lot of the time until


Another PHP example that doesn't read the entire file in an array.

<?php

$fh = fopen('countries.txt', 'r');

$countries = array();
$cities = array();

while ( $data = fgets($fh) )
{
  // If $country is empty (or not defined), the this line is a country.
  if ( ! isset($country) )
  {
    $country = trim($data);
    $countries[] = $country;
  }
  // If an empty line is found, unset $country.
  elseif ( ! trim($data) )
    unset($country);
  // City
  else
    $cities[$country][] = trim($data);
}

fclose($fh);

The $countries array will contain a list of countries while the $cities array will contain a list of cities by countries.


Is there some pattern that distinguishes countries from cities? Or is it that the first line after a blank line is a country and all subsequent lines are city names until the next blank line? Alternatively are you finding countries based on a look-up table (a "dictionary" in Python; an associative array in PHP; a hash in Perl --- one that includes all the officially recognized countries)?

Is it safe to assume that there are no cities whose names collide with any country? Is there a France, Iowa, USA, or the old Usa, Japan?

What do you want to do with these after you separate them? You mention "some file handling and then extracting to different files" --- are you thinking of something like one file per country containing a list of all the cities therein? Or one directory per country and one file per city?

The obvious approach would be to iterate over the file, line by line, and maintain a little state machine: empty (beginning of file, blank lines between countries?) during which you enter the "country" state (whenever you found any pattern that matches whatever criteria means you've encountered the name of a country). Once you've found a country name then you're in the city loading state. I would create a dictionary using country names as keys and set of cities as cities (though perhaps you might really need county/provice, city name tuples in cases where a country has multiple cities by the same name: Portland, Maine vs. Portland, Oregon, for example). You can also have some "error" state if the contents of your file lead to some sort of ambiguity (city names before you've determined a country, two country names in a row, whatever).

It's hard to suggest a good fragment of code given how vague your spec. here is.


Not sure that this would help, but you can try to use the following code to get dictionary and then work with it(write to files, compare and etc):

res = {}
with open('c:\\tst.txt') as f:
    lines = f.readlines()
    for i,line in enumerate(lines):
        line = line.strip()
        if (i == 0 and line):
            key = line
            res[key] = []
        elif not line and i+1 < len(lines):
            key = lines[i+1].strip()
            res[key] = []
        elif line and line != key:
            res[key].append(line)
print res


This regex would work for your example:

/(?:^|\r\r)(.+?)\r(.+?)(?=\r\r|$)/s

Catches countries in group 1 and cities in group 2. You may have to adjust your newline characters, depending on your system. They can be \n, \r or \r\n. edit: added a $ sign, so you don't need two linebreaks at the end. You will need the flag for dotall for the regex to work as expected.


Print fild 1 with awk - countries

awk 'BEGIN {RS="";FS="\n"} {print $1 > "countries"} {for (i=2;i<=NF;i++) print $i > "cities"}' source.txt 
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号