Problem with Here API
Incident Report for Yuso
Postmortem

SUMMARY

On Sunday November 3rd at 5:26pm CET(+1), all agents using Yuso’s back office started noticing :

  • a slow / non responsive dashboard page
  • an inability to book any new ride from any platform

All of these consequences had in fact the same cause which was that the directions API from Here was unresponsive.

After a first period of issues lasting from about 5:26 to 5:39, all seemed back in order and Here directions API seemed to be working fine. However, the same issue occurred again at precisely 6:39pm, at which point all users were switched to Google API (at ~ 6:43pm)

Metrics

Incident severity level (SLA): S0

Time to detect service interruption :

  • First occurrence : 10 minutes
  • Second occurrence : ~ 3 minutes

Time to resolution :

  • First occurrence : ~ 5 minutes (Here started working again after 15 minutes)
  • Second occurrence : ~ 2 minutes

ROOT CAUSE ANALYSIS

Here directions API started malfunctioning : https://status.here.com/status

STEPS TO RESOLUTION

Steps taken to diagnose, assess, and resolve :

  • 5:37 : Checking the back office pages, everything seemed ok.
  • 5:38 : Checking AWS dashboard -- seeing high latencies on one of our service
  • 5:39 : Upscaling our service from 3 to 6 processes. At this point, users were saying everything was back to normal.
  • 5:40-5:55 : Analyzing the latencies on monitoring tools. At this point it became clear that the problem originated from Here directions API. Preparing a script to switch all offices to Google , and testing the Google switch on a test account.
  • 6:39 : Users are having the same issue.
  • 6:40 : Checking our monitoring tools : same metrics than when the problem occurred.
  • 6:44 : Switching all office to Google.

LEARNINGS & NEXT STEPS

How do we prevent this issue from happening again?

  • Improve alerting when Here API is unresponsive
  • Implement a fallback
Posted Nov 04, 2019 - 14:37 CET

Resolved
There was an incident with our mapping provider (Here) that impacted many functionalities.
Posted Nov 03, 2019 - 17:30 CET
This incident affected: Back Office (Booking Page, Dashboard, Dispatch System) and Web Booker.