26 November 2013
NSA Boundless Informant and GM-Place
This message responds to:
From: tom <tom[at]cyber-dyne.com>
Date: Tue, 26 Nov 2013 04:51:37 -0800
Subject: update for Boundless Informant and GM-Place
Your correspondent above, jbond[at]MI5.mil.gov.uk, is not providing a valid
email. It's about as plausible as The Terminator emailing you from a Cyber
Dyne corporate account.
MI5 and MI6, despite MI for Military Intelligence, are not part of the UK's
DoD but rather the Security Service, reporting to the Home Secretary. MI5
is domestic, whereas MI6 foreign may have employed a J. Bond at one time.
His email there would have been jbond[at]sis.gov.uk (as MI6 is now called
the Secret Intelligence Service). This may be an active dropbox today. However
the associated twitter feed #007 is definitely invalid.
The description of Boundless Informant above is right on target at the granular
level -- the rush to encrypt content won't provide effective privacy for
email or web until countermeasures are taken on the metadata fields (database
column headers) that NSA actually captures.
According to a remarkable unclassified, uncensored FAQ released back in June,
Boundless Informant is the name of the reporting software associated to a
metadata database called GM-Place (a place where general metadata is stored).
Thus it is not a code name at all, in the sense STORMBREW and FAIRVIEW are
protective cover names for Verizon and AT&T (per WaPo), and should not
be written as all caps nor fused into a single word, should NSA adopt a
consistent style manual.
Yes, Boundless Informant produces unattractive reports relative to say Excel,
yet to do even that required a herculean effort in the realm of Big Data
software development. Some 504 SIGAD collecting sites -- including US-984XN
PRISM -- are feeding hundreds of billions of records into GM-Place, according
to the Hindu Times. It's all non-FISA supposedly.
Desktop software freezes up long before these file sizes; to work at this
scale requires a wholly different approach.
Although developed in-house (more likely as a lucrative defense contract),
Boundless Informant is basically a wrapper for pre-existing open source software,
primarily HDFS, MapReduce and Accumulo. Each of these cloud computing components
has its own incomprehensible wikipedia article, as does the over-arching
rack-aware Hadoop file system.
NSA did add a key component to Apache Accumulo, namely a key called Column
Visibility that stores a security logic string. If the analyst isn't authorized
for that level of security, the data in there will not be supplied in response
to a query. This allows data of varying security requirements to be stored
in the same table.
A run-of-the-mill analyst using Boundless Informant would not be able to
see or count ECI or COI data feeds (roughly SNAP2 over at GCHQ) because GM-Place
is restricted to TS//SI//NOFORN, no FISA. The FAQ above illustrates this
in the Organization View section with GAO --> SSO --> RAM-A -->
Here SPINNERET is the non-corporate intercept SIGAD US-3180 of unknown location,
RAM-A is exceptionally controlled information about wiretapping a head of
state, and Special Source Operations (which owns GM-Place) sits inside division
S33, Global Access Operations of NSA's Signals Intelligence Directorate.
I'm going to say, like the Obamacare web site, NSA uses NoSQL here so analysts
aren't standing around (latency), waiting for the computer to respond to
their queries (tasking selectors). Since NoSQL means 'not only special query
language' rather than 'no SQL', your correspondent is right about old-fashioned
boolean logic still being supported.
Boundless Informant thus revolves around the colossal collection scale at
NSA, all very similar to Google; indeed their BigTable software and ProtoBuf
objects are highly intertwined. All this hassle is just for metadata (a few
k per record), so you can see why NSA doesn't want content (or must whack
it down mightily with selectors).
The Boundless Informant reports that we see are only an after-thought; they
would be trivial to generate except for GM-Place being so gigantic and dispersed.
Prior to the heat maps of Boundless Informant, management had no overview
of the collection enterprise.
NSA is moving the whole cloud to Bluffdale, that's the purpose of that facility
(site security, authorization security, reduced latency). Boundless Informant
will run a lot faster there. But it only runs once a month.
Boundless Informant is mostly a nothing-burger for management (performance
However, we haven't seen what it can do interactively: in Map View, drill
down on how much each SIGAD is collecting in each country with what collection
method. By now it can export data behind any screen view, namely the PDDGs,
SIGAD ids, intercept technology counts, sysid and separate survey intercepts
from sustained. Section 8 of the FAQ is entitled "What is the technical
architecture for the tool?" and offers a graphical view that Greenwald
unfortunately did not provide.
It's all about GM-Place really: what's in those metadata fields, what can
NSA learn about you from them, how can you block access to them.
Here we know that FALLOUT software acts early on to process DNI ingests (web
metadata like url visited) deposited in GM-Place, while TUSKATTIRE does the
same for DNR ingests (telephony metadata). After MapReduce has added stuff
like the collecting organization (eg FORNSAT or SSO) and intercept technology
type (DRTBOX, WHITEBOX, LOPERS, JUGGERNAUT), FASCIA delivers DNR metadata
to Hadoop as lisp ASDF does for DNI.
Again, despite all caps, these are mostly not cover names for secrets but
just the names of algorithms or physical intercept devices. How secret is
DRTBOX given a cell phone metadata intercepter box is made by Digital Research
Technology (now Boeing)?
There are non-trivial data management issues here around reducing duplication
especially for address book metadata (SCISSORS), around screening by collection
legality cover, and 'minimizing' metadata 'inadvertantly' collected on Americans.
Nothing's discarded of course but it might end up in PINWALE, MAINWAY or
MARINA instead of GM-Place.