-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement TimeStampXXXTZVector for parquet isAdjustedToUTC #926 #576 #577 #927
base: master
Are you sure you want to change the base?
Implement TimeStampXXXTZVector for parquet isAdjustedToUTC #926 #576 #577 #927
Conversation
@fb64 might interest you ;) |
dataframe-arrow/src/main/kotlin/org/jetbrains/kotlinx/dataframe/io/arrowReadingImpl.kt
Show resolved
Hide resolved
if (isNull(i)) { | ||
null | ||
} else { | ||
DateUtility.getLocalDateTimeFromEpochMilli(TimeUnit.SECONDS.toMillis(getObject(it)), this.timeZone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at second glance I'm not so sure about this one.
Parquet does not have seconds precision and my PR is for Parquet, but pyarrow feather can .floor('S')
its datetimes.
I'm not sure what's going to be present in its .feather file, either seconds or milliseconds.
I need to test this against a .feather
file together with seconds precision and timezone awareness, perhaps one from https://github.com/Kotlin/dataframe/tree/master/dataframe-arrow/src/test/resources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your pull request focuses on Parquet, but it would be beneficial to incorporate upgrades for all Arrow TimeStamp types as well, because Parquet reading (in my PR) relies totally on Arrow.
The test testTimeStamp covers both cases for IPC format and Feather format (by using ArrowStreamWriter
and ArrowFileWriter
), so you only should add TimeStampNanoTZVector
,TimeStampMicroVector
and TimeStampSecTZVector
to the writeArrowTimestamp method and to the expected DataFrame in testTimeStamp
as you did for TimeStampMicroTZVector
.
Finally as your PR improves Arrow types compatibility it could be merge independently of #577 (IMHO) 😃
#926 #576 #577