You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Opening this issue to zoom in into a specific problem I found while investigating siri and gtfs matching, that happens in a small operator in Jerusalem and we should look dipper into -
Example:
Line 234 of operator ירושלים-דרום איחוד, rides in 8/5
If we search for matching to the planned rides this day, we only get 7, even that if looking specifically on siri, we see 39 rows on this day
The siri.scheduled_start_time rows that didn't match are close, but not equal to the gtfs.start_time, and they also have the milliseconds part in the time not zeroed, which is also different from the pattern we've seen when matching works (same for the rides that did match this day)
-- the rides of this line that will match with 5 min toleranceselectsiri_ride.id, siri_route.operator_ref, gtfs_route.agency_name, siri_route.line_ref, gtfs_route.route_short_name, gtfs_route.route_long_name, gtfs_ride.start_time= scheduled_start_time, gtfs_ride.start_time, siri_ride.scheduled_start_time, gtfs_route.datefrom gtfs_ride, gtfs_route, siri_route, siri_ride
wheregtfs_route.id=gtfs_ride.gtfs_route_idandgtfs_route.operator_ref=siri_route.operator_refandgtfs_route.line_ref=siri_route.line_refandsiri_route.id=siri_ride.siri_route_idandgtfs_route.date between '2023-05-04'and'2023-05-10'-- and siri_ride.scheduled_start_time = gtfs_ride.start_timeandsiri_ride.scheduled_start_time>gtfs_ride.start_time-'5 minutes'::interval
andsiri_ride.scheduled_start_time<gtfs_ride.start_time+'5 minutes'::interval
and scheduled_time_gtfs_ride_id is nulland DATE_TRUNC('hour', scheduled_start_time) ='2023-05-08 01:00:00.000000'and route_short_name='234'-- the matched to gtfsselectgtfs_route.id, gr.start_time, scheduled_start_time, scheduled_time_gtfs_ride_id, sr.idfrom gtfs_route
join gtfs_ride gr ongtfs_route.id=gr.gtfs_route_idleft join siri_ride sr ongr.id=sr.scheduled_time_gtfs_ride_idwheredate='2023-05-08'and line_ref=15136and operator_ref=50-- all the siri rides from this dayselect scheduled_start_time from siri_ride
join siri_route sr onsiri_ride.siri_route_id=sr.idwhere DATE_TRUNC('day', scheduled_start_time) ='2023-05-08'and line_ref=15136and operator_ref=50
Questions:
In which other operator does this specific issue happen?
How common is this in percentages for those operators?
Does this happen in specific hours? Is there any pattern (e.g. un-planned rides)?
The text was updated successfully, but these errors were encountered:
If looking at the missing percentage per operator, we can see that all the top ones are also small operators from Jerusalem, so this is what made me think this is a pattern specifically for them (should be verified of course)-
-- misses per operatorSELECTgr.agency_name,
COUNT(*) FILTER (WHERE scheduled_time_gtfs_ride_id IS NULL) AS new_null_count,
COUNT(*) FILTER (WHERE scheduled_time_gtfs_ride_id IS NOT NULL) AS new_not_null_count,
COUNT(*) FILTER (WHERE scheduled_time_gtfs_ride_id IS NULL) *100.0/COUNT(*) AS new_null_percentage
FROM siri_ride
join siri_route sr onsiri_ride.siri_route_id=sr.idjoin gtfs_route gr onsr.line_ref=gr.line_refWHERE DATE_TRUNC('day', scheduled_start_time) >'2023-04-29'andgr.date>'2023-04-29'GROUP BYgr.agency_name
@ShayAdler when querying gtfs data you must limit by gtfs date, otherwise you are getting duplicated records per gtfs_date
You can see this by doing this query:
SELECT siri_ride.id, gr.date
FROM siri_ride
join siri_route sr on siri_ride.siri_route_id = sr.id
join gtfs_route gr on sr.line_ref = gr.line_ref
WHERE DATE_TRUNC('day', scheduled_start_time) > '2023-04-29'
and gr.date > '2023-04-29'
you will see that each siri_ride.id is duplicated by how many gr.dates there are
Opening this issue to zoom in into a specific problem I found while investigating siri and gtfs matching, that happens in a small operator in Jerusalem and we should look dipper into -
siri.scheduled_start_time
rows that didn't match are close, but not equal to thegtfs.start_time
, and they also have the milliseconds part in the time not zeroed, which is also different from the pattern we've seen when matching works (same for the rides that did match this day)The text was updated successfully, but these errors were encountered: