Pipes (df.pipe) turns a DataFrame into the “noun” that flows through small, focused “verb” functions. Instead of mutating globals across cells, you build a linear, readable pipeline.
# helper processing functions
def filter_active_users(df):
return df[df["events"] > 3]
def drop_bots(df):
return df[~df["user_agent"].str.contains("bot", case=False)]
def sessionize(df, timeout_minutes: int):
...
return df_with_sessions
# pipeline
def clean_events(df):
return (
df
.pipe(filter_active_users)
.pipe(drop_bots)
.pipe(sessionize, timeout_minutes=30)
)
Combining with df.query one can create super clean pipelines, while allowing for testing and reusability.