Donate for the Cryptome archive of files from June 1996 to the present

26 November 2013

NSA Boundless Informant and GM-Place


This  message responds to:

http://cryptome.org/2013/11/nsa-boundless-informant-explicated.htm

From: tom <tom[at]cyber-dyne.com>
Date: Tue, 26 Nov 2013 04:51:37 -0800
Subject: update for Boundless Informant and GM-Place
To: cryptome[at]earthlink.net

Your correspondent above, jbond[at]MI5.mil.gov.uk, is not providing a valid email. It's about as plausible as The Terminator emailing you from a Cyber Dyne corporate account.

MI5 and MI6, despite MI for Military Intelligence, are not part of the UK's DoD but rather the Security Service, reporting to the Home Secretary. MI5 is domestic, whereas MI6 foreign may have employed a J. Bond at one time. His email there would have been jbond[at]sis.gov.uk (as MI6 is now called the Secret Intelligence Service). This may be an active dropbox today. However the associated twitter feed #007 is definitely invalid.

The description of Boundless Informant above is right on target at the granular level -- the rush to encrypt content won't provide effective privacy for email or web until countermeasures are taken on the metadata fields (database column headers) that NSA actually captures.

According to a remarkable unclassified, uncensored FAQ released back in June, Boundless Informant is the name of the reporting software associated to a metadata database called GM-Place (a place where general metadata is stored). Thus it is not a code name at all, in the sense STORMBREW and FAIRVIEW are protective cover names for Verizon and AT&T (per WaPo), and should not be written as all caps nor fused into a single word, should NSA adopt a consistent style manual.

http://www.theguardian.com/world/interactive/2013/jun/08/boundless-informant-nsa-full-text

Yes, Boundless Informant produces unattractive reports relative to say Excel, yet to do even that required a herculean effort in the realm of Big Data software development. Some 504 SIGAD collecting sites -- including US-984XN PRISM -- are feeding hundreds of billions of records into GM-Place, according to the Hindu Times. It's all non-FISA supposedly.

Desktop software freezes up long before these file sizes; to work at this scale requires a wholly different approach.

Although developed in-house (more likely as a lucrative defense contract), Boundless Informant is basically a wrapper for pre-existing open source software, primarily HDFS, MapReduce and Accumulo. Each of these cloud computing components has its own incomprehensible wikipedia article, as does the over-arching rack-aware Hadoop file system.

NSA did add a key component to Apache Accumulo, namely a key called Column Visibility that stores a security logic string. If the analyst isn't authorized for that level of security, the data in there will not be supplied in response to a query. This allows data of varying security requirements to be stored in the same table.

A run-of-the-mill analyst using Boundless Informant would not be able to see or count ECI or COI data feeds (roughly SNAP2 over at GCHQ) because GM-Place is restricted to TS//SI//NOFORN, no FISA. The FAQ above illustrates this in the Organization View section with GAO --> SSO --> RAM-A --> SPINNERET.

Here SPINNERET is the non-corporate intercept SIGAD US-3180 of unknown location, RAM-A is exceptionally controlled information about wiretapping a head of state, and Special Source Operations (which owns GM-Place) sits inside division S33, Global Access Operations of NSA's Signals Intelligence Directorate.

I'm going to say, like the Obamacare web site, NSA uses NoSQL here so analysts aren't standing around (latency), waiting for the computer to respond to their queries (tasking selectors). Since NoSQL means 'not only special query language' rather than 'no SQL', your correspondent is right about old-fashioned boolean logic still being supported.

Boundless Informant thus revolves around the colossal collection scale at NSA, all very similar to Google; indeed their BigTable software and ProtoBuf objects are highly intertwined. All this hassle is just for metadata (a few k per record), so you can see why NSA doesn't want content (or must whack it down mightily with selectors).

The Boundless Informant reports that we see are only an after-thought; they would be trivial to generate except for GM-Place being so gigantic and dispersed. Prior to the heat maps of Boundless Informant, management had no overview of the collection enterprise.

NSA is moving the whole cloud to Bluffdale, that's the purpose of that facility (site security, authorization security, reduced latency). Boundless Informant will run a lot faster there. But it only runs once a month.

Boundless Informant is mostly a nothing-burger for management (performance metrics).

However, we haven't seen what it can do interactively: in Map View, drill down on how much each SIGAD is collecting in each country with what collection method. By now it can export data behind any screen view, namely the PDDGs, SIGAD ids, intercept technology counts, sysid and separate survey intercepts from sustained. Section 8 of the FAQ is entitled "What is the technical architecture for the tool?" and offers a graphical view that Greenwald unfortunately did not provide.

It's all about GM-Place really: what's in those metadata fields, what can NSA learn about you from them, how can you block access to them.

Here we know that FALLOUT software acts early on to process DNI ingests (web metadata like url visited) deposited in GM-Place, while TUSKATTIRE does the same for DNR ingests (telephony metadata). After MapReduce has added stuff like the collecting organization (eg FORNSAT or SSO) and intercept technology type (DRTBOX, WHITEBOX, LOPERS, JUGGERNAUT), FASCIA delivers DNR metadata to Hadoop as lisp ASDF does for DNI.

Again, despite all caps, these are mostly not cover names for secrets but just the names of algorithms or physical intercept devices. How secret is DRTBOX given a cell phone metadata intercepter box is made by Digital Research Technology (now Boeing)?

There are non-trivial data management issues here around reducing duplication especially for address book metadata (SCISSORS), around screening by collection legality cover, and 'minimizing' metadata 'inadvertantly' collected on Americans. Nothing's discarded of course but it might end up in PINWALE, MAINWAY or MARINA instead of GM-Place.