python: proper usage of global variable_问答_开发者

开发者 https://www.devze.com 2023-01-09 00:50 出处：网络

here\'s the code! import csv def do_work(): global data global b get_file() samples_subset1() return def get_file():

相关专题：csv python

here's the code!

import csv

def do_work():
      global data
      global b
      get_file()
      samples_subset1()
      return

def get_file():

      start_file='thefile.csv'

      with open(start_file, 'rb') as f:
        data = list(csv.reader(f))
        import collections
        counter = collections.defaultdict(int)

      for row in data:
        counter[row[10]] += 1
      return

def samples_subset1():

      with open('/pythonwork/samples_subset1.csv', 'wb') as outfile:
          writer = csv.writer(outfile)
          sample_cutoff=5000
          b_counter=0
          global b
          b=[]
          for row in data:
              if counter[row[10]] >= sample_cutoff:
                 global b
                 b.append(row) 
                 writer.writerow(row)
                 #print b[b_counter]
                 b_counter+=1
      return

i am a beginner at python. th开发者_如何学Pythone way my code runs is i call do_work and do_Work will call the other functions. here are my questions:

if i need datato be seen by only 2 functions should i make it global? if not then how should i call samples_subset1? should i call it from get_file or from do_work?
the code works but can you please point other good/bad things about the way it is written?
i am processing a csv file and there are multiple steps. i am breaking down the steps into different functions like get_file, samples_subset1, and there are more that i will add. should i continue to do it the way i am doing it right now here i call each individual function from do_work?

here is the new code, according to one of the answers below:

import csv
import collections

def do_work():
      global b
      (data,counter)=get_file('thefile.csv')
      samples_subset1(data, counter,'/pythonwork/samples_subset1.csv')
      return

def get_file(start_file):

        with open(start_file, 'rb') as f:
        global data
        data = list(csv.reader(f))
        counter = collections.defaultdict(int)

      for row in data:
        counter[row[10]] += 1
      return (data,counter)

def samples_subset1(data,counter,output_file):

      with open(output_file, 'wb') as outfile:
          writer = csv.writer(outfile)
          sample_cutoff=5000
          b_counter=0
          global b
          b=[]
          for row in data:
              if counter[row[10]] >= sample_cutoff:
                 global b
                 b.append(row) 
                 writer.writerow(row)
                 #print b[b_counter]
                 b_counter+=1
      return

As a rule of thumb, avoid global variables.

Here, it's easy: let get_file return data then you can say

data = get_file()
samples_subset1(data)

Also, I'd do all the imports on the top of the file

if you must use a global (and sometimes we must) you can define it in a Pythonic way and give only certain modules access to it without the nasty global keyword at the top of all of your functions/classes.

Create a new module containing only global data (in your case let's say csvGlobals.py):

# create an instance of some data you want to share across modules
data=[]

and then each file you want to have access to this data can do so in this fashion:

import csvGlobals

csvGlobals.data = [1,2,3,4]
for i in csvGlobals.data:
    print i

If you want to share data between two or more functions then it is generally better to use a class and turn the functions into methods and the global variable into attributes on the class instance.

BTW, you do not need the return statement at the end of every function. You only need to explicitly return if you want to either return a value or to return in the middle of the function.