Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: Generate sqlite by frecuency #120

Open
joseant opened this issue Jan 7, 2025 · 8 comments
Open

[FEATURE]: Generate sqlite by frecuency #120

joseant opened this issue Jan 7, 2025 · 8 comments
Assignees

Comments

@joseant
Copy link

joseant commented Jan 7, 2025

Describe the solution you'd like

I have this gtfs: https://github.com/emtpalma/Open-Data/raw/master/lineas-paradas-horarios-gtfs.zip

I but it hasent all stops times only the firsts one, because the works with frecuencys, in the frecuencies.txt file. can you generate the sqllite with the frecuency?


https://support.google.com/transitpartners/answer/6388474?hl=en

Describe alternatives you've considered

No response

Additional context

No response

@vingerha
Copy link
Owner

vingerha commented Jan 7, 2025

Hi, when one thinks to have seen most of gtfs... then this passes by :)
The challenge: the db is created by an external library pygtfs and I have no control there, I can raise a PR but then I also need to know how to handle this and even then, I raised PR before (ages ago) and this is still not implemented even though the owner seemed to be OK.
So, I donot want to get any hopes up at this moment but still... trying to understand this a bit more
From what I find on the web with the data from your source
These are frequencies
image
This trip runs 06:30 > 08:00 every 1800s, then 08:00 > 19:40 every 1500 sec, etc. Importing this into the db is not an issue but the handling towards stop_times is challenging, i.e. my code and the HA-default gtfs use stop_times and rewriting this with frequencies is not easy at all. So, the solution would be that with the frequencies, the stop_times have to be extended for each frequency, e.g. copy for every 1800s then 1500s then 1800s (many many entries will follow, yes)

But, more important.... if I look at the stop times, aside the recurring start-times, I miss the stop-times between start and end and I cannot see how these need to be created. Say your stop is #1318...when does transport arrive?
image

@joseant
Copy link
Author

joseant commented Jan 7, 2025

Thank you, I think the data source is a bit incomplete. I'm going to ask you to complete at least the time it takes to get to each stop on the first trip.

@vingerha
Copy link
Owner

vingerha commented Jan 7, 2025

I just ran the pygtfs and the frequencies are already imported (my code search did not show this). With that, I can modify the data too but I am still not sure what to do with in-between stops, they must (!) have a arrival/departure to allow to be able to select and show the stop-of-interest and its timings (e.g. when I schedule trips between 1318 and 925). Else it will just be a line(trip) with start times for stop 6, as I have no clue when it arrives at stop 23

@joseant
Copy link
Author

joseant commented Jan 7, 2025

I think that an approximation could be, if we do not have the intermediate times, it could be dividing the time between the first stop and the last by the number of stops. It will not be exact, but it is an approximation.

@vingerha
Copy link
Owner

vingerha commented Jan 7, 2025

I found two other sources that speak of using frequencies, Montreal...they seem to have abandoned it and Sydney, same...i.e. their gtfs zips donot have frequencies.txt in them but very large stop_times.txt.... So, for the moment I only have Palma as the only known provider (there may be more but I will not search for more).
Guessing stop-times is dangerous as one assumes that timings to be alike a published schedule and not estimates which may be off quite a lot.
Here is another example where some (!) in-between stop times are known, it is an odd piece of data all in all.
image

I will give it a thought to see what the simplest approach would be, likely ignoring in-between times and only start/end stops plus estimate...but I just donot have a good vibe for this approach.

@vingerha
Copy link
Owner

vingerha commented Jan 7, 2025

Did some checks and my idea to add trips for each interval is not working as the trip_id needs to be unique for e.g. realtime data.
What I am now thinking is

  • add estimated stop times
  • create a separate (!) entity that allows for the frequency to work

It is however a lot of work, not just the dev but also the testing and the maintenance afterwards ... It is not very motivating for just 1 case as you may understand

@joseant
Copy link
Author

joseant commented Jan 7, 2025

Ok, don't worry, if you have time you can do it, although as you indicate for just 1 case it may not be worth it.

@vingerha vingerha self-assigned this Jan 8, 2025
@vingerha
Copy link
Owner

vingerha commented Jan 9, 2025

For my reference only: approach / option and thoughts related to frequencies only for this (!) provider (Palma)
stop_times are not complete and for some trips the first stop-time is not equal to the first departure along the frequencies
stop_times needs to be populated with arrival/departure but this can only be done on a estimate, based on first departure and last arrival of the trip
using SELECT count(trip_id), max(arrival_time),min(arrival_time), round((julianday(max(arrival_time)) - julianday(min(arrival_time)))*86400) as mint FROM stop_times where trip_id = "L001I01S1FES"

even if stop_times have their initial values, i.e. based on the first trip-departure, a next challenge is to find the next_departures for a specific stop/
Possible approach;

  • use diff-time between trip departure and actual stop departure
  • create a list of start-times based on the initial stop departure, iterate over the different frequencies if more are available. In that process add 'diff-time' as a separate value as that would represent the actual stop-departure time
  • from that list, select the x (10?) departure times that are later than 'now'

Con:

  • requires a completely new sensor with SQL as the current one uses the stop_times over the day, which are not available here (only initial)
  • additional testing & maintenance

Current pov:

  • as the stop-times will be based on an estimate approach, not sure of the added value
  • highly likely no options to use realtime data nor vehicle location
  • only one use-case (so far)

parking it for now or until I might get a moment where I have too much time at hand :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants