I know what your API did last summer

Making an API is a thankless task. Most users aren't aware they are using it, and you rarely hear from the developers unless something isn't working. While our weather site Yr is hugely popular, our API doesn't get much publicity. So when we finally get some exposure, it's sad to find only negative publicity, and misguided criticism at that. But before addressing the issues, let's watch the presentation (starting about 8:44 into the video).

"...a company called met.no, seems to be a weather type service..."

To start with, met.no isn't a company, it's the Norwegian Meteorological Institute – a state-owned, non-profit government organisation which has as one of its missions to give free (both as in beer and in freedom) weather data to the whole world. So far we've managed to do this without any developer registration or other conditions than nicely asking not to overload our services. As a consequence we have no clue about most of the thousands of apps and sites using our services, and no way of contacting them unless they subscribe to our mailing list or read the API front page.

From the interview one would get the impression we are making apps ourself using unsecure communications. Such is not the case – the only app we have some influence over is the Yr weather app, which doesn't speak with our API at all. So already the (admittedly valid) criticism is targeted at the wrong party.

"... it was making an HTTP request, totally normal"

Disregarding that "it" in this case is an app we have no control over, we have actually supported (and recommended) HTTPS for many years (although in the old API this wasn't default because of compatibility issues). Since the rewrite and launch of WeatherAPI v.3 (which came out in beta in 2016 and was gradually phased in on a per-user basis over the next year) we have only supported HTTPS. All unencrypted HTTP traffic is either being redirected or blocked (more on this later).

To this day we still get complaints about this, e.g. when no longer being able to use Squid as a local caching proxy, or using client libraries which don't support SSL/TLS. Incredible as this may seem in 2019 it is still an issue, and when your application is a watering plant run by a Saia-Burgess PLC controller there is no sofware upgrade path.

About half of the requests to api.met.no are either throttled, blocked or invalid. Of the latter, most are for products which were removed years ago. We're still getting requests for locationforecast/1.8 which was removed in 2014(!). Most developers don't bother checking the response status codes, so lots of sites and apps go on blindingly ignoring 304, 404 and 429 errors for several years. And since many users don't update their mobile apps until they have to, we're getting a lot of legacy traffic, included unencrypted HTTP which we're unable to prevent people from sending us.

As an example, let's look at the HTTP traffic from Android (pre <5.0) apps where the developer haven't read the TOS and uses the default "Dalvik" User-Agent header which amounts to 13% of the total unencrypted HTTP traffic. These are the amount of requests sorted by status code between 12 and 13 today (UTC):

requests status protocol
62054 301 HTTP
44758 429 HTTPS
35019 200 HTTPS
1408 404 HTTPS
313 400 HTTPS
19 499 HTTPS
17 503 HTTPS

As the table shows, the unencrypted traffic amounts to a whooping 43% of the total identifying only as Android (and an obsolete version at that). How much of this actually handles the redirect is difficult to see as we don't track users, but we find it hard to believe that 76% percent of the HTTPS traffic comes from redirects. This means that a large portion of clients never follow the redirect.

Of those who use HTTPS, an incredible 55% of the HTTPS requests are being throttled for too much traffic or missing identification, while only 43% get a meaningful answer. The remaining requests (apart from a handful of timeouts) are either for products which been discontinued (sunrise/1.1 is still very popular) or they can't manage to construct a valid URL. That's a lot of developers who can't be bothered to RTFM.

"... trying to figure out when the sun goes up"

Now, this is where it gets interesting. Most of the HTTP traffic is actually for locationforecast; sunrise doesn't count for more than about 5%. Of this, 5 out of 6 sunrise requests are actually zombie apps calling version 1.1, which was discountinued in February and returns a 404. (Incredibly we also get some for version 1.0 which was EOL in 2016!)

Still, that leaves about 1% of HTTP traffic for sunrise/2.0, which has never worked with unencrypted HTTP. Every piece of documentation, example code and tests we have published have been using HTTPS, so why are people still using HTTP? These requests are (if identified at all) coming from Android, so either there is something amiss with the Dalvik HTTP client stack, or developers are intentionally breaking security by rewriting our examples from "https:" to "http:". As the saying goes, it's hard to make stuff fool-proff because the fools are so smart.

"Latitude and longitude, that's harmless, right?"

Many of the requests do indeed contain geocoordinates. However, the vast majority of these originate from servers located in data centers totally remote from the geolocation in the request. Even when an app enable location services on your phone, we have no way of knowing if the coordinates are your actual GPS location or if you're just checking the weather for your next trip to Paris. We're still getting ~25,000 different IP addresses from all over the world downloading the weather forecast for a cornfield in Poznan, PL every 5 minutes, amounting to 3.15% of the total api.met.no traffic.

"... but you've got the MAC address [...] and lat/lon"

I confess this claim has us stumped, and after putting our collective heads together we still can't figure out what they're getting at. Sure, the MAC address is an identificator which can be used to link with personal data, but that is never transmitted to us (what the app developers are doing with your personal data you must ask them about).

Now, the only third party getting access to both your MAC address and your unencrypted traffic is either the owner of your Wifi access point/router or your phone operator. Admittedly, both these can be spoofed and your traffic intercepted by evil middlemen. However, in both cases they can already pretty much tell your geolocation even without snooping your data traffic. When a phone is connected to (in this case) a 800mW faked wifi access point and indoors it is pretty much given that the person is within a 50 m radius. In fact this is how shops track customer behaviour, logging how much time each person spends before each isle. Now, having the GPS coordinates in addition would probably give you a better resolution, but then GPS reception isn't great inside buildings anyway.

"So now we can track that person when their phone requests updates"

Actually, no. The API doesn't collect any personal data, except for the ubiquitous access log. We don't even use cookies or Google Analytics, not do we write any mobile apps ourselves. The only potentially identifiable data logged by us are the IP address (which for vast majority is either a website server or a NATed 3G/4G address which changes every time you move into a new cell tower area), the User-Agent string (set by the app) and the URL itself. This log is used for monitoring resources, debugging errors and trying to stem abuse, which is necessary to avoid the tragedy of the commons.

Sure, the app developers can (and probably will) track you, and the same goes for your phone company or free Wifi provider. But that has nothing to do with HTTP encryption, and very little to do with geolocation (which they know already).

HTTP redirects

Now, you might ask why we didn't go directly for HTTPS in the first place. At the time api.met.no was lauched in 2007, this was not usually an option; Facebook didn't offer HTTPS until 2011, and many client libraries did not support it. While this now has changed, our main priority has always been backwards compatibility, something we're proud of having had for 12 years now, even though individual products and versions have come and gone.

As mentioned earlier, all of the unencrypted HTTP traffic is either blocked or redirected, in the latter case to a similar HTTPS URL. Now, since we're mirroring the original URL in the Location response header, it can be argued that we still "support HTTP" since legacy apps can still go on working if they handle the 301 properly. If they instead got a 404, the theory goes, they would see the error in their ways and change their code to use HTTPS instead.

Unfortunately, in practice that is not the case. The sites topping the list of HTTP have all been blocked for abuse and get a 403 Forbidden response. They have been blacklisted for years, but still continue to hammer our servers with 14k req/hour each with no end in sight. For these large websites there is no connection between the IP address and the geolocation received by us, so any "leak of personal data" here would be between the browser and the website if they're not using HTTPS.

If we can conclude that policing obsolete websites is futile, then this goes double for obsolete mobile apps where it doesn't matter what the developer does unless the user updates the app. As can be seen from the previous table, a lot of the traffic isn't only against services which no longer exist, but are also more than two years old and run on obsolete versions of Android. There is not much hope that these apps will be updated any time soon.

Anyway, as a precaution we tried disabling the redirect to HTTPS to see what happened. Not only did we get a lot of support tickets from developers who couldn't figure out what was wrong, it also broke the website. Whereas you previously could type e.g. "api.met.no/apis.json" in the browser location bar and get JSON back, the browser actually defaults to unencrypted HTTP which no longer works. Instead you actually have to type "https://" before the hostname. Can you think of any other sites in 2019 where this is necessary? Me neither.

In fact, all major sites we have tested (e.g. Google, GitHub, Paypal, eBay) redirect HTTP requests to corresponding HTTPS URLs, which is also recommended in all best practice documentation we have found, including the HTTP Strict Transport Security. Here is an example of GitHub doing exactly the same thing:

$ GET -Se http://api.github.com/users/octocat/orgs
GET http://api.github.com/users/octocat/orgs
301 Moved Permanently
Connection: close
Date: Fri, 23 Aug 2019 11:08:56 GMT
Location: https://api.github.com/users/octocat/orgs
Content-Length: 0
Client-Date: Fri, 23 Aug 2019 11:08:57 GMT
Client-Peer: 140.82.118.6:80
Client-Response-Num: 1

GET https://api.github.com/users/octocat/orgs
200 OK

So we've rolled back the change, and until we can find a way to differentiate zombies from actual live humans we're keeping it that way.

The bottom line

So, the next time you hear someone boldly stating that

"APIs leak personal data"

you would be wise to interpret that as

"abandoned zombie apps, written by unknown parties for a defunct API, who send unsolicited, not-very-personal data to non-existent services, can be intercepted by hackers who will learn nothing they don't already know, and there is nothing the API owner can do to prevent it".

Admittedly that won't make any headlines at a security conference. Instead, if there's anything to learn from this, it's that most traffic against APIs is actually noise, which you'll just have to learn to deal with. But admittedly that doesn't make for good soundbites on YouTube.