开发者

Analyse data using time and date objects

开发者 https://www.devze.com 2023-01-11 06:14 出处:网络
I have a rather unique problem I\'m trying to solve: Based on this sample data (actual data is very many records, and at least 4 per card per day):

I have a rather unique problem I'm trying to solve:

Based on this sample data (actual data is very many records, and at least 4 per card per day):

serial, card, rec_date, rec_time, retrieved_on
2976 00040  2010-07-29 18:57 2010-07-31 13:37:31 
2977 00040  2010-07-30 09:58 2010-07-31 13:37:31
2978 00040  2010-07-30 15:33 2010-07-31 13:37:31
2979 00040  2010-07-30 16:13 2010-07-31 13:37:31
2980 00040  2010-07-30 19:41 2010-07-31 13:37:31

The records are from a time-clock system.

What I want to do is take a certain group of entries, filtered by card and rec_date, then determine how long the person has been working during the day and the length of each workspan, how many breaks he/she has taken, and at the end of the week get the total number of hours worked.

From the above list, 2977 is a check in, then 2978 is a check out and so on.

I'm lost at how to do this though, so I thought someone here would开发者_运维技巧 have an idea.

I'm using a simple class to store this data after importing else from elsewhere:

class TimeClock(models.Model):
  serial = models.CharField(max_length = 16)
  card_no = models.CharField(max_length = 10)
  rec_date = models.DateField()
  rec_time = models.TimeField()
  oper_date = models.DateTimeField(default=datetime.today)


Clearly, class TimeClock -- by itself -- is inadequate for what you're doing.

You need to summarize TimeClock to create WorkIntervals, which you can work with. These are pairs of TimeClock rows that show the (theoretical) start and end of a work span.

If someone fails to clock in, you're completely unable to reason out what's going on. It's not "hard", it's impossible.

Also, if someone works past midnight, you're unable to reason out what's going on. It's not "hard", it's impossible.

But, we'll pretend no one works past midnight and no one fails to clock in or out (hahaha)

def make_pairs( tc_query_set ):
    start = None
    for row in tc_query_set:
         if start is None:
             start= row
             continue
         elif start.card == row.card and start.rec_date == row.rec_date:
             yield start, row
             start= None
         else:
             # May as well raise an exception -- the data cannot the processed
             yield start, None 
             start= row

You use this as follows.

data = TimeClock.objects.order_by('card','rec_date','rec_time').all()
for start, end in make_pairs( data ):
    WorkIntervals.objects.create( start.card, start.rec_date, start.rec_time, end.rec_time, ... )

Now you can work with the intervals. If it was possible to create them.


Well, there are a bunch of separate problems here. I assume that you've already got filtered data, so that your log looks like all the events for a unique card on a particular day. Suppose this data is stored as a list of strings in log. Then:

import datetime
def dates( log ):
    ''' Yields consecutive datetimes in the log. '''
    for event in log:
        yield datetime.datetime.strptime( event[ 12 : 28 ], "%Y-%m-%d %H:%M" )

def time_clocked_in( log ):
    assert not len( log ) % 2
    total_time = datetime.timedelta( 0 )
    event_dates = dates( log )
    try:
        while 1:
            total_time -= next( event_dates ) - next( event_dates )
    except StopIteration:
        pass
    return total_time

log = [
    "2977 00040  2010-07-30 09:58 2010-07-31 13:37:31",
    "2978 00040  2010-07-30 15:33 2010-07-31 13:37:31",
    "2979 00040  2010-07-30 16:13 2010-07-31 13:37:31",
    "2980 00040  2010-07-30 19:41 2010-07-31 13:37:31"
]

print( time_clocked_in( log ) )
>>> 9:03:00
0

精彩评论

暂无评论...
验证码 换一张
取 消