-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
[ArrowStringArray] API: StringDtype parameterized by storage (python or pyarrow) #39908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
4cb60e6
d242f2d
d39ab2c
2367810
9166d3b
8760705
d5b3fec
2c657df
647a6c2
0596fd7
c5a19c5
99680c9
69a6cc1
bd147ba
830275f
214e524
c9ba03c
7425536
68ac391
5cfa97a
74dbf96
3985943
3bda421
0c108a4
523e24c
279624c
80d231e
c5ced5a
459812c
d707b6b
71ccf24
daaac06
46626d1
3677bfa
42d382f
4fb1a0d
5d4eac1
15efb2e
b53cfe0
b7db53f
3399f08
e365f01
71d1e6c
9e23c35
c69a611
64b3206
d83a4ff
ef38660
aef1162
6247a5b
a6d066c
8adb08d
3ad0638
56714c9
6a1cc2b
1761a84
3e26baa
6b470b1
2ec6de0
a0b7a70
d9dcd20
4a37470
1d59c7a
e57c850
51f1b1d
fc95c06
ef02a43
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -431,23 +431,25 @@ def test_arrow_array(dtype): | |
|
||
|
||
@td.skip_if_no("pyarrow") | ||
def test_arrow_roundtrip(dtype): | ||
def test_arrow_roundtrip(dtype, string_storage2): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why aren't you just using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. string_storage parameterizes dtype, so string_storage is the same as dtype and can't test the StringArray against pyarrow global storage setting and vice-versa. this way we have 4 tests, not 2 |
||
# roundtrip possible from arrow 1.0.0 | ||
import pyarrow as pa | ||
|
||
data = pd.array(["a", "b", None], dtype=dtype) | ||
df = pd.DataFrame({"a": data}) | ||
table = pa.table(df) | ||
assert table.field("a").type == "string" | ||
result = table.to_pandas() | ||
assert isinstance(result["a"].dtype, type(dtype)) | ||
tm.assert_frame_equal(result, df) | ||
with pd.option_context("string_storage", string_storage2): | ||
result = table.to_pandas() | ||
assert isinstance(result["a"].dtype, pd.StringDtype) | ||
expected = df.astype(f"string[{string_storage2}]") | ||
tm.assert_frame_equal(result, expected) | ||
# ensure the missing value is represented by NA and not np.nan or None | ||
assert result.loc[2, "a"] is pd.NA | ||
|
||
|
||
@td.skip_if_no("pyarrow") | ||
def test_arrow_load_from_zero_chunks(dtype): | ||
def test_arrow_load_from_zero_chunks(dtype, string_storage2): | ||
# GH-41040 | ||
import pyarrow as pa | ||
|
||
|
@@ -457,9 +459,11 @@ def test_arrow_load_from_zero_chunks(dtype): | |
assert table.field("a").type == "string" | ||
# Instantiate the same table with no chunks at all | ||
table = pa.table([pa.chunked_array([], type=pa.string())], schema=table.schema) | ||
result = table.to_pandas() | ||
assert isinstance(result["a"].dtype, type(dtype)) | ||
tm.assert_frame_equal(result, df) | ||
with pd.option_context("string_storage", string_storage2): | ||
result = table.to_pandas() | ||
assert isinstance(result["a"].dtype, pd.StringDtype) | ||
expected = df.astype(f"string[{string_storage2}]") | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_value_counts_na(dtype): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the skips should be encompassed in the fixtures themselves no? (can change this later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test is skipped if pyarrow not installed since the test needs pyarrow to create
pa.table(df)
, and skips the ArrowStringArray tests from fixture if the installed version of pyarrow is < 1.0.0, so would only test the python StringDtype with the python global setting