GPS_Screener

The GPS_Screener function is a fast and versatile R-based tool for automatically detecting anomalies in GPS data. using a combination of movement metrics. It combines movement metrics, optional acceleration data (e.g., VeDBA), and isolation forest models to handle a wide range of GPS data complexities, including burst modes and 1 Hz recordings. Additionally, it offers visualization capabilities to inspect anomaly detection results interactively.

[This function is in active development. Feedback, testing, and suggestions for improvement are highly encouraged!]

Key Features

Core Functionality

Data Pre-Processing:
- Processes GPS burst modes and standardizes GPS intervals to a consistent sampling frequency, ensuring robust anomaly detection for diverse datasets (e.g., burst modes or 1 Hz recordings).
Anomaly Detection:
- Derives GPS movement metrics by analyzing three consecutive fixes, including outbound and inbound speed, vertex angles, and distance changes between the first and third fixes.
- Combines within-function defined thresholds with unsupervised multi-dimensional isolation forest models.
Acceleration Data Integration (see 'Eobs_Data_Reader' function to process acceleration data from Eobs devices):
- Leverages VeDBA (or equivalent) as an optional additional metric to improve anomaly detection.
- Matches acceleration data to GPS fixes, even with irregular or non-aligned timestamps, and calculates mean VeDBA for preceding intervals. (See the Eobs_Data_Reader function for pre-processing acceleration data from e-obs devices).
- Requires both GPS and acceleration timestamps to be in the same POSIXct format (Y-m-d H:M:S or Y-m-d H:M:OS).
Interactive Visualization:
- Provides dynamic Leaflet maps to explore anomaly detection results, including filtering layers for intuitive data inspection.

Additional Features

Adaptive handling of missing or irregular data.
Post-modelling refinement using biologically meaningful thresholds to improve accuracy.
Optional integration of GPS height as an additional parameter for anomaly detection and timestamp subsetting.
Detailed outputs, including processed data and diagnostic visuals, for comprehensive analysis.

Input Parameters

Required Inputs

GPS_TS: POSIXct vector of GPS timestamps (must be in Y-m-d H:M:S format). Missing values (NA) are not allowed.
GPS_longitude: Numeric vector of GPS longitude values (decimal degrees). May include NAs or zeros.
GPS_latitude: Numeric vector of GPS latitude values (decimal degrees). May include NAs or zeros.

Optional Inputs

ACC_TS: POSIXct vector of acceleration timestamps. If supplied, it must match the format of GPS_TS (Y-m-d H:M:S or Y-m-d H:M:OS) and cannot contain NAs (default = NULL).
VeDBA: Numeric vector of VeDBA (or equivalent activity metric) values matching the length of ACC_TS (default = NULL).
drop_out: Numeric value (in seconds) defining the maximum temporal gap for identifying large drop-outs in GPS data (default = 3600). Although its functionality is not actively used in the function, the output includes a corresponding column for reference.
burst_method: Character string specifying the method for processing GPS bursts. Options include "median", "mean", "last", or "none" (default = "last").
burst_len: Numeric value (seconds) for defining the typical maximum burst durations (default = 1).
standardise_time_interval: Numeric value (seconds) for standardizing GPS fix intervals to a consistent time step (default = NULL).
standardise_universal: Logical value (TRUE/FALSE) indicating whether 'standardise_time_interval' is applied across all GPS data or restricted to continuous 1 Hz periods (default = FALSE).
IF_conf: Numeric confidence level for isolation forest anomaly detection (default = 0.99).
iso_sample_size: Numeric value for the sample size used in the isolation forest model (default = 256).
GPS_height: Numeric vector of GPS height values matching the length of GPS data (default = NULL). This is used as an additional parameter within the anomaly detection.
start_timestamp: POSIXct value for subsetting data, defining the start of the time range (optional; but must be used with end_timestamp).
end_timestamp: POSIXct value for subsetting data, defining the end of the time range (optional; but must be used with start_timestamp).
max_speed: Numeric value defining the maximum biologically plausible speed (in m/s) for the species of interest (default = 5).
GPS_accuracy: Numeric value estimating the GPS error radius (in meters) under stationary conditions (default = 25). A value of 25 m is recommended based on empirical data for forested habitats. Recommended to not make this value smaller than 10 m.
plot: Logical value (TRUE/FALSE) indicating whether to generate diagnostic plots (default = TRUE).

Outputs

Core Output

A data frame containing:

Observation: A sequential integer vector that uniquely identifies each row in the dataset. Used as a reference index (relative to initial supplied data) for processing and merging data.
Timestamp: Timestamp of the fix
Time_diff: Time difference (in seconds) between consecutive GPS fixes.
GPS_longitude & GPS_latitude: Raw GPS coordinates (in decimal degrees).
Fix_number: Relative order of each kept processed GPS fix per burst or continuous 1 Hz data collection session (unless standardise_time_interval was used).
Window_group: Grouping variable which incremented each time the cumualtive time exceeded the drop_out threshold.
orig_burst_length: The original length (in seconds) of each burst or continuous GPS session prior to processing.
GPS_longitude_filtered & GPS_latitude_filtered: GPS coordinates (in decimal degrees), corresponding to the processed values.
Time_diff_filtered: Represents the time interval (s) between filtered GPS fixes but is replicated across all rows associated with each fix group.
Ang_vertex: The turning angle (in degrees) at each fix, calculated using three consecutive GPS points. Values range from 0° to 180°, where higher angles indicate sharper turns.
Outgoing_speed: Movement speeds (in meters per second) calculated between the current GPS fix and the following one.
Incoming_speed: Movement speeds (in meters per second) calculated between the current GPS fix and the preceding one.
Dist_circular: The straight-line distance (in meters) between the GPS fix preceding the current one and the fix following the current one. Useful for identifying circular or looping movements.
GPS_height (optional): vertical altitude values (in meters) associated with each GPS fix. This field is included only if the user provides height data as an input.
mean_VeDBA_interval (optional): The mean VeDBA (or equivalent supplied metric) over the time interval leading up to each GPS fix. This field is included only if acceleration data is provided.
Verdict_IF: A categorical variable indicating whether a GPS fix is classified as "Anomalous" or "Not Anomalous" based on isolation forest analysis and custom rules.

Optional Outputs (if `plot = TRUE`)

Interactive Map:
- Dynamic leaflet map with color-coded anomaly results.
- Toggleable layers for unfiltered, filtered, and "Not Anomalous" tracks.
- Scale bar for distance estimation.
Summary Plots:
- Histograms showing distributions of key metrics, annotated with quantile thresholds.

Example Workflow

Step 1: Load Your Data

library(dplyr)
# GPS data
df <- read.csv("C:/Users/richard/xxxxxx/Gandalf.csv")
# Covert to POSIXct
df$timestamp = as.POSIXct(df$timestamp, format = "%Y-%m-%d %H:%M:%S")
head(df$timestamp, 1) ; tail(df$timestamp, 1)
# Ensure no duplicated time stamps
df<-df[!duplicated(df[c("timestamp")]),]
# make sure no NAs in timestamp
df = subset(df, !is.na(df$timestamp))
# Ensure GPS data are sorted by time
df <- df %>% arrange(timestamp) 

# Load ACC VeDBA data -->  Use the Eobs_Data_Reader function to processs and extract acceleration data from Eobs' devices
setwd("C:/Users/richard/xxxxxx/Eobs data [All studies]/Invisible networks/Processed data")
library(fst) # Fast, and easy way to serialize data frames when writing and reading data.
df.acc <- read_fst("Gandalf.processed.acc.fst")
# Covert to POSIXct
df.acc$interpolated_timestamp = as.POSIXct(df.acc$interpolated_timestamp, format = "%Y-%m-%d %H:%M:%OS")
head(df.acc$interpolated_timestamp, 1) ; tail(df.acc$interpolated_timestamp, 1)
# Filter out potentially duplicated timestamps and short ACC bursts  
df.acc = subset(df.acc, df.acc$duplicate_times == FALSE & df.acc$standardized_burst_duration >= 5)
# make sure no NAs in timestamp
df.acc = subset(df.acc, !is.na(df.acc$interpolated_timestamp))

# Group by standardized_burst_id and compute mean VeDBA and median interpolated_timestamp to obtain a single mean value per burst
df.acc_summary <- df.acc %>%
  group_by(standardized_burst_id) %>% # Mean VeDBA value per burst ID (here, about 10 s bursts every 1 min)
  summarise(
    mean_VeDBA = mean(VeDBA, na.rm = TRUE), # Mean VeDBA
    timestamp = median(interpolated_timestamp, na.rm = TRUE) # Median interpolated timestamp of the burst
  ) %>% arrange(timestamp) %>% # Ensure timestamps are in correct chronological order
  ungroup()

Step 2: Run the Function

results <- GPS_Screener(GPS_TS = df$timestamp,
                 GPS_longitude = df$location.long, 
                 GPS_latitude = df$location.lat, 
                 ACC_TS = df.acc_summary$timestamp,
                 VeDBA = df.acc_summary$mean_VeDBA,
                 drop_out = 1800, 
                 burst_method = "last",
                 standardise_time_interval = 240, #4 min intervals
                 standardise_universal = FALSE,
                 burst_len = 10, #bursts were 10 s
                 IF_conf = 0.99, #99 percentile for anomaly detection
                 iso_sample_size = 500,
                 GPS_height = df$height.above.ellipsoid,
                 start_timestamp = "2024-05-04 10:31:00",
                 end_timestamp = "2024-05-31 19:59:11",
                 max_speed = 5,
                 GPS_accuracy = 25,
                 plot = TRUE)
)

Required R Packages

The function installs and uses the following R packages:

zoo
dplyr
tidyr
data.table
ggplot2
leaflet
assertthat
isotree

Limitations

Input timestamps must consistently use the POSIXct format.
Results are heavily influenced by the quality and consistency of the input data.
A previously C++ implemented "distance from median" rolling time function—designed to calculate median fixes within a dynamically adjusted temporal window—was removed. This metric proved ineffective for datasets with temporally sparse fix intervals (e.g., >1 minute), especially for highly dynamic animal movement patterns. For people with high-res data sets, this maybe a useful metric. Contact for more info.

License

This project is licensed under the MIT License.

Contact

For questions, bug reports, suggestions, or contributions, please contact:

Richard Gunner
Email: [email protected]
GitHub: Richard6195

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitattributes		.gitattributes
GPSScreenerLeaflet.png		GPSScreenerLeaflet.png
GPS_screener.R		GPS_screener.R
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPS_Screener

Key Features

Core Functionality

Additional Features

Input Parameters

Required Inputs

Optional Inputs

Outputs

Core Output

Optional Outputs (if `plot = TRUE`)

Example Workflow

Step 1: Load Your Data

Step 2: Run the Function

Required R Packages

Limitations

License

Contact

About

Releases

Packages

Languages

License

livingingroups/GPS_Screener

Folders and files

Latest commit

History

Repository files navigation

GPS_Screener

Key Features

Core Functionality

Additional Features

Input Parameters

Required Inputs

Optional Inputs

Outputs

Core Output

Optional Outputs (if plot = TRUE)

Example Workflow

Step 1: Load Your Data

Step 2: Run the Function

Required R Packages

Limitations

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Optional Outputs (if `plot = TRUE`)

Packages