Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Parsing of Midnight-Time Component in lubridate::ymd_hms()? #1124

Open
SESjo opened this issue Jun 6, 2023 · 5 comments
Open

Comments

@SESjo
Copy link

SESjo commented Jun 6, 2023

Dear maintainers,

I'm not sure it's a proper issue considering it happens when parsing from a Posicx (I know it's not what the function is for), but the way the ymd_hms function handles the 00:00:00 time seems inconsistent (?)

lubridate::ymd_hms(as.POSIXct("2018-11-21"))
#> [1] NA
#> Warning message:
#> All formats failed to parse. No formats found.
lubridate::ymd_hms(as.POSIXct("2018-11-21 00:00:00"))
#> [1] NA
#> Warning message:
#> All formats failed to parse. No formats found.
lubridate::ymd_hms(as.POSIXct("2018-11-21 00:00:01"))
#> [1] "2018-11-21 00:00:01 UTC"
lubridate::ymd_hms("2018-11-21 00:00:01")
#> [1] "2018-11-21 00:00:01 UTC"
lubridate::ymd_hms("2018-11-21 00:00:00")
#> [1] "2018-11-21 UTC"
lubridate::ymd_hms("2018-11-21")
#> [1] NA

The workaround I found is to convert the Posicx into a character, while being explicit about the time:

lubridate::ymd_hms(format(as.POSIXct("2018-11-21 00:00:00"), format = "%Y-%m-%d %T %Z"))
#> [1] "2018-11-21 UTC"

SeesionInfo

R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base    

loaded via a namespace (and not attached):
 [1] tidyr_1.3.0         rgeos_0.6-3         utf8_1.2.3          generics_0.1.3      renv_0.17.3        
 [6] class_7.3-22        xml2_1.3.4          KernSmooth_2.23-21  track2KBA_1.0.5     lattice_0.21-8    
[11] magrittr_2.0.3      grid_4.3.0          iterators_1.0.14    fastmap_1.1.1       maps_3.4.1        
[16] foreach_1.5.2       e1071_1.7-13        DBI_1.1.3           httr_1.4.6          rgdal_1.6-7        
[21] purrr_1.0.1         fansi_1.0.4         scales_1.2.1        move_4.1.12         codetools_0.2-18  
[26] CircStats_0.2-6     ade4_1.7-22         cli_3.6.1           rlang_1.1.1         units_0.8-2        
[31] munsell_0.5.0       yaml_2.3.7          cachem_1.0.8        tools_4.3.0         raster_3.6-20      
[36] parallel_4.3.0      geosphere_1.5-18    memoise_2.0.1       dplyr_1.1.2         colorspace_2.1-0  
[41] ggplot2_3.4.2       boot_1.3-28.1       adehabitatHR_0.4.21 vctrs_0.6.2         R6_2.5.1          
[46] proxy_0.4-27        lifecycle_1.0.3     classInt_0.4-9      adehabitatMA_0.3.16 MASS_7.3-60        
[51] adehabitatLT_0.3.27 pkgconfig_2.0.3     terra_1.7-29        pillar_1.9.0        gtable_0.3.3      
[56] glue_1.6.2          Rcpp_1.0.10         sf_1.0-13           tibble_3.2.1        tidyselect_1.2.0  
[61] rstudioapi_0.14     Matching_4.10-8     compiler_4.3.0      sp_1.6-1   
@cjl8zf
Copy link

cjl8zf commented Jul 19, 2023

I have run into this issue too and I came here to ask the same question. Due to the edge case of dropping the HH:MM:SS component for midnight timestamps the ymd_hms function fails to be idempotent, i.e. ymd_hms(ymd_hms(x)) != ymd_hms(x) for all $x$. This would be a nice property to have and holds at all non-midnight times.

However, I have tracked down the source of this issue and it turns out it comes from a call to the base R function .POSIXct here:

.POSIXct(parse_dt(x, fmt, TRUE, FALSE, cutoff_2000), "UTC")

For example:

> x <- 1640995200
> base::.POSIXct(x,tz="UTC")
[1] "2022-01-01 UTC"
> base::.POSIXct(x+1,tz="UTC")
[1] "2022-01-01 00:00:01 UTC"

I think this demonstrates that lubridate is being consistent with base R. It is just a property of how the .POSIXct class behaves and thus this issue is upstream of lubridate, although I personally wish it behaved like ymd_hms("2022-01-01 00:00:00 UTC") = "2022-01-01 00:00:00".

@milesalanmoore
Copy link

milesalanmoore commented May 10, 2024

I just wanted to ping this issue to say that this behavior persists (perhaps unsurprisingly since it is a result of base R's treatment of this class).

@nfaux
Copy link

nfaux commented Aug 21, 2024

Just wanting to ping this again, as it can cause issues down stream, particularly if the users happens to call ymd_hms() on an object that is already dttm.
eg:

dt_tm <- "2024-01-01 00:00:00"
dt_tm_ymd_hms <- ymd_hms(dt_tm)
print(dt_tm_ymd_hms)
[1] "2024-01-01 UTC"
ymd_hms(dt_tm_ymd_hms)
[1] NA
Warning message:
All formats failed to parse. No formats found.

Thus leading to a loss of the data. Does mean one has to be very careful when munging date data.

@elfatherbrown
Copy link

Yup. Got hit by this one also. Maybe a warning in the docs could help? One would expect the default behavior of ymd_hms("1980-01-01 00:00:00") to yield a date time object, but it gives a date instead. It is a common use case. For example, if you want to write a test with testthat, your fixtures would be hardcoded date times and its natural for the developer to go for basic 00:00:00 for a test.

My unrequested 0.02.

Best wishes.

@milesalanmoore
Copy link

I am not sure why the original poster is passing POSIXct objects to lubridate::ymd_hms() as ymd_hms() itself returns POSIXct values.

However, format() is the way for users who want to convert to character and not drop the time component. To my understanding, POSIXct objects have a storage format and a display format inherent to their class. So the hh:mm:ss data is still associated with the midnight timestamp, even if on display or conversion to character it get's dropped:

"2024-01-01 00:00:00" |> lubridate::ymd_hms() |> as.character()
# [1] "2024-01-01"

"2024-01-01 00:00:00" |> as.POSIXct(tz='UTC') |> as.character()
# [1] "2024-01-01"
``

In my experience, the time component of the timestamp is dropped when exporting a data frame with `write.csv()` which is the first place I ran into this (when publishing data to [EDI](https://edirepository.org/)). However, note that using `format` to convert to character before export is probably the best workaround.

```R
"2024-01-01 00:00:00" |> lubridate::ymd_hms() |> format("%Y-%m-%d %H:%M:%S")
# [1] "2024-01-01 00:00:00"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants