Answer: VeryBackground: The MTA sent out letters to some NYC employers asking them to let their employees have flexable work hours to avoid the "Summer of hell."http://nypost.com/2017/06/12/this-might-be-the-mtas-dumbest-solution-for-summer-commuters/ The idea is good, but the execution lacked hard evidence of when / where / why the employer should allow flexible schedules. I needed to brush up on my Six Sigma training and found a dataset that would be good practice on as well as provide an example of how data could help riders at least AVOID the peak of the hellish summer. What you see below is the: - AVERAGE # of riders (card swipes)
- in 6 minutes increments for a specific subway station(Manhattan Bound)
- from a historical year (within the past 5 years)
- during the AM rush hour.
- The data points did not include holidays or blizzards that might have caused outliers.
So this specific station on average has the highest number of people coming into the station at between 8:42AM and 8:47AM. Now the next question should be: "What is the distribution of riders for this specific period of time over the year being analyzed?" Is that specific 6 minute interval normal as well? Here it is: So over 240 data points for the year are recorded above. Remember, we are not including weekends and holidays which is why there is not 365 data points. We can see that 360 riders came through this 6 minute interval the most number of times at 17 times. Overall, I would say the data is pretty normal; most data points are clustering around the mean of 355.
So what can we now do with this? Well thanks to the normal data, we can utilize some basic statistics and figure out Variability and Confidence Intervals:- What is the variability of the # of people swiping through at 8:47AM?
__Answer: 90% of the size(1.28 standard deviations from the mean) will fall between 304 and 404 riders.__(Pure coincidence it was an even 50 riders from the mean) So this means 9 out of 10 times expect at the MINIMUM of 304 riders.- Referring back to the data above, if you came at 8:11AM or earlier, you would almost never (9 out of 10 times) see 304 riders swipe through yet coming at 8:47AM, that would considered one of the lowest turnouts.
- How confident am I that the mean of 354 is the true average for this time period? Answer: 95% Confident the average ridership @ 8:47AM is between 349 and 359. (Mean +/- tvalue*Standard Error).
Next blog I will document what DAX formulas helped me accomplish this. |
## Fav ToolsPowerBI ## Archives
April 2018
## Categories |