Skip to content

With an external grouper, there is no way to access the grouped value in a DataFrame(...).groupby(...).apply(...) workflow #9545

@brianthelion

Description

@brianthelion

groupby-apply workflows are important pandas idioms. Here's a brief example grouping on a named DataFrame column:

>>> df = pd.DataFrame({'key': [1, 1, 1, 2, 2, 2, 3, 3, 3], 'value': range(9)})
>>> result = df.groupby('key').apply(lambda x: x['key'])
>>> result
key   
1    0    1
     1    1
     2    1
2    3    2
     4    2
     5    2
3    6    3
     7    3
     8    3
Name: key, dtype: int64

An important highlight of this example is the ability to reference the grouped value -- eg, x['key'] -- inside the applied function.

pandas also supports grouping on arbitrary mapping functions, iterables, and lots of other objects. In these cases, the grouped value is not represented as a named column in the DataFrame. Thus, when using apply(...), there is no apparent way to access the group key value. The only alternative is to use a (slow) for-loop solution as in:

foo = lambda _k, _g: ...
grouped = df.groupby(grouper)
result_iter = (foo(key, group) for key, group in grouped) 
key_iter = (key for key, group in grouped)
pd.DataFrame.from_records(result_iter, index=key_iter)

IMHO, the ability to access the grouped value in an idiomatic way from within the applied function is ergonomically important; the groupby-apply idiom is at best partially realized without it.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions