URL Dashoboard project (Bonus) : NHTSA
NHTSA is one such department government in the United States focused on reducing the number traffic accident on the highway. Currently NHTSA is brewing a new regulation that will be implemented next year. I was asked to analyze the data collected during 2021. This data is a complete information about accidents that occurred during 2021.
The main purpose is to provide a number of recommendations about how to reduce the number of accidents on the highway. To do this, first I need to identify the following data:
• Top 10 states where the most accidents occur
• The average number of accidents per day by hour accident
• Percentage of accidents caused by drunk drivers
• Percentage of accidents in rural and urban areas
• Number of accidents by day
Data can be access in the following link : Click here
All the data must be checked whetever there is a abnormal data. The queery for data checking and validating :
count(consecutive_number) from crash ;-- to count consecutive number (unique code in every traffic accident)
select count(distinct consecutive_number) from crash ; -- data is equivalent with previous queery
select distinct state_name from crash ; --no abnormal data
select max(number_of_vehicle_forms_submitted_all), min(number_of_vehicle_forms_submitted_all) from crash ; --no abnormal data
select max(number_of_motor_vehicles_in_transport_mvit), min(number_of_motor_vehicles_in_transport_mvit) from crash; --no abnormal data
select max(number_of_parked_working_vehicles), min(number_of_parked_working_vehicles) from crash; --no abnormal data
select max(number_of_forms_submitted_for_persons_not_in_motor_vehicles), min(number_of_forms_submitted_for_persons_not_in_motor_vehicles) from crash --no abnormal data ;
select max(number_of_persons_in_motor_vehicles_in_transport_mvit), min(number_of_persons_in_motor_vehicles_in_transport_mvit) from crash ; --no abnormal data
select max(number_of_persons_not_in_motor_vehicles_in_transport_mvit), min(number_of_persons_not_in_motor_vehicles_in_transport_mvit) from crash ; --no abnormal data
select distinct land_use_name from crash ; --no abnormal data
select distinct functional_system_name from crash ; --no abnormal data
select min(milepoint),max(milepoint) from crash ; --no abnormal data
select distinct manner_of_collision_name from crash ;--no abnormal data
select distinct type_of_intersection_name from crash ;--no abnormal data
select distinct light_condition_name from crash ;--no abnormal data
select distinct atmospheric_conditions_1_name from crash ;--no abnormal data
select distinct number_of_fatalities from crash ; --no abnormal data
select distinct number_of_drunk_drivers from crash ; --no abnormal data
select min(timestamp_of_crash),max(timestamp_of_crash) from crash ; -- found that time is not in local state of USA, so it must to convert to local time
From data validation, found that data time in column timestamp_of_crash isn't displayed in local time, so it must be converted to local time in every states in USA, the step to convert time is shown in this following item:
This table is contain the code of local time is every states in USA, new table can be access in the following link : Click here
output:
Querry for add new coloumn in table crash:
alter table crash
add column timezone_code text,
add column local_time timestamp
output:
Querry for add data in column timezone_code in table crash based on data in table us_timezone:
update crash
set timezone_code =
(select code
from timezone
where crash.state_name = timezone.state_name)
output:
Querry for add data in column local_time in table crash :
update crash
set local_time = timestamp_of_crash at time zone timezone_code
output:
Querry for Remove data before and after 2022 table crash :
delete from crash
where local_time not between '2021-01-01 00:00:00' and '2021-12-31 23:59:59'
The querry for identify the following item :
• Top 10 states where the most accidents occur
Querry
select state_name, count(consecutive_number) as jumlah_kecelakaan
from crash
group by 1
order by 2 desc
limit 10
Output
• The average number of accidents per day by hour accident
Querry
select to_char(local_time,'HH24') as jam,
count(consecutive_number)/(select extract(day from (max(local_time)-min(local_time)))+1 from crash)rata2_kecelakaan
from crash
group by jam
order by rata2_kecelakaan desc
Output
• Percentage of accidents caused by drunk drivers
Querry
select mabuk.kondisi, count(mabuk.kondisi) as jumlah
from
(select consecutive_number,
case
when number_of_drunk_drivers = 0
then 'tidak mabuk'
else 'mabuk'
end kondisi
from crash) as mabuk
group by 1
Output
• Percentage of accidents in rural and urban areas
Querry
select land, sum(count) from
(select land_use_name,
count(consecutive_number),
case
when land_use_name ='Rural'
then 'Rural'
when land_use_name ='Urban'
then 'Urban'
else 'Others'
end land
from crash
group by land_use_name) as x
group by land
Output
• Number of accidents by day
Querry
select hari, jumlah from
(select to_char(local_time,'D') harii, to_char(local_time,'Day') hari, count(consecutive_number) jumlah
from crash
group by hari,harii) as x
order by harii
Output
• Based on data, texas and california is states that have most traffic accident in 2022
• The interval time between 12-15 is the most prone to accident
• 25% percent of traffic accident in usa is because people drunk
• The probability of traffic accident is 25% higher in weekend than weekday
• Based on data, to reduce the number of traffic accident in 2022, We recommend increasing the number of sheriffs on duty during lunch hours, especially on weekends, as this is an accident-prone time in the Texas and California areas.