开发者

Tricky add consecutive date row based on category in Pandas

开发者 https://www.devze.com 2022-12-07 20:34 出处:网络
I would like to add a new consecutive date row at the end of a new category based on a column. Ex. based on the Area column.

I would like to add a new consecutive date row at the end of a new category based on a column. Ex. based on the Area column.

Data

Start       End         Area    ID  Stat
2/1/2022    3/1/2022    NY      222 Y
3/1/2022    4/1/2022    NY      111 Y
2/1/2022    3/1/2022    CA      333 Y
3/1/2022    4/1/2022    CA      100 Y
       开发者_如何学Python         
            

Desired

Start       End         Area    ID  Stat
2/1/2022    3/1/2022    NY      222 Y
3/1/2022    4/1/2022    NY      111 Y
4/1/2022    5/1/2022    NY      
2/1/2022    3/1/2022    CA      333 Y
3/1/2022    4/1/2022    CA      100 Y
4/1/2022    5/1/2022    CA  

Doing

I am thinking I can use the append method:

df1 = df.append(new_row, ignore_index=True)

Any suggestion is appreciated. I am currently researching this.


Annotated code

# Convert the cols to datetime
c = ['Start', 'End']
df[c] = df[c].apply(pd.to_datetime, dayfirst=True)

# drop the duplicates rows by Area while keeping only the last row
rows = df[[*c, 'Area']].drop_duplicates('Area', keep='last')

# Add a dateoffset of 1 day
rows[c] += pd.DateOffset(days=1)

# Concat the rows and sort index to maintain order
pd.concat([df, rows]).sort_index(ignore_index=True)
Result
       Start        End Area     ID Stat
0 2022-01-02 2022-01-03   NY  222.0    Y
1 2022-01-03 2022-01-04   NY  111.0    Y
2 2022-01-04 2022-01-05   NY    NaN  NaN
3 2022-01-02 2022-01-03   CA  333.0    Y
4 2022-01-03 2022-01-04   CA  100.0    Y
5 2022-01-04 2022-01-05   CA    NaN  NaN
0

精彩评论

暂无评论...
验证码 换一张
取 消