26 November 2013. See response, NSA Boundless Informant and GM-Place:
25 November 2013
NSA BOUNDLESS INFORMANT Explicated
Date: Mon, 25 Nov 2013 15:37:33 -0800 (PST)
Subject: A very interesting forum post on electrospaces
This was written from a person who purports to actually use the Boundless
Informant tool. The email address is fake of course, but it sounds both
knowledgeable and credible.
If the source is genuine, it provides considerable insight into the use and
capabilities of the tool. It seems to do a lot more than we've seen so far,
including the ability to see individual call detail records.
It also gives us clues to how mobile interception is accomplished.
Anonymous jbond@MI5.mil.gov.uk said...
I'm seeing a great deal of confusion out there about NSA databases and how
reports are generated from their architecture. Here is how it works:
Let's begin with rows and columns making up a matrix, variously called a
table, array, grid, flatfile database, or spreadsheet. In the database world,
rows are called records, columns are called fields, and the individual boxes
specified by row and column coordinates -- which hold the actual data --
are called cells.
For cell phone metadata, each call generates one record. NSA currently collects
13 fields for that call, such as To, From, IMEI, IMSI, Time, Location,
CountryOrigin, Packet etc etc, primarily from small Boeing DRTBOXs placed
on or near cell towers.
Because metadata from a single call can be intercepted multiple times along
its path, generating duplicative records, NSA runs an ingest filtering tool
to reduce redundancy, which is possible but not trivial because metadata
acquisitions may not be entirely identical (eg timing). After this refinement,
one call = one metadata record = one row x 13 columns in the BOUNDLESS
Cell phone metadata is structured, unlike content (he said she said). However,
as collected from various provider SIGADs, it is not cleanly or consistently
structured -- see the messy example at wikipedia IMSI. So another refinement
is needed: NSA programmers write many small extractors to get the metadata
out of its various native protocols into the uniformly formatted taut database
fields that it wants.
After all this, for a hundred calls, a metadata database such as BOUNDLESS
INFORMANT consists of 100 records and 13 fields so 100 x 13 = 1300 cells.
A counting field (all 1's) and consecutive serial numbers (indexing field)
for each record may be added to facilitate report generation and linkage
to other databases, see below.
-1- The first point of confusion is between BOUNDLESS INFORMANT as a flatfile
database (we've never seen a single row, column or cell of it) and the one-page
summary reports that can be generated using BOUNDLESS INFORMANT as the driving
database (eg, the Norway slide).
These BOUNDLESS INFORMANT reports give the number of records (rows) in the
table after various filters have been applied (eg country, 1EF = one end
foreign, specified month, DNR type, intercept technology used, legal authority
cited FISA vs FAA vs EO 12333).
BOUNDLESS INFORMANT does NOT report the number of cells nor gigabytes of
storage taken up. It easily could, but it doesn't. Instead, it reports the
main object of interest: the number of calls, after some filtering scheme
has been applied.
-2- The second point of confusion arises over database viewing options. Myself,
I like scrolling down row after row, page after page, plain black text in
8 pt courier font, lots of records per screen, thin lines separating cells,
no html tables. A lot of people don't.
So a cottage industry has evolved around generating pretty monitor displays,
web pages, and ppts from databases; these typically display one record per
screen. All database views are equivalent: given a presentation, you can
recover the database; given the database, you can make the pretty user interface.
Views are dressed up injecting the data fields into a fixed but fancy template
(eg dept of motor vehicles putting your picture field into an antique wood
frame and your name field into drop-shadow text). Nothing but a warmed-over
version of spewing out form letters by mail-merging an address database into
a letter template.
We've not seen *any* view of BOUNDLESS INFORMANT records to date, only summary
reports it has generated. You cannot recover the underlying database from
a few summary reports, only information about the number of records and a
few of the 13 fields.
November 25, 2013 at 2:34 PM
Anonymous jbond@MI5.mil.gov.uk said...
-3- The third point of confusion: a given database like BOUNDLESS INFORMANT
is capable of self-generating many summary reports about itself. Summary
reports can have views too -- injections into templates. We've seen 3 of
them for BOUNDLESS INFORMANT, Aggregate, DNI and DNR.
Databases can be sorted, according to the values in any column. For example,
if NSA sorted by IMSI, that would pull together all the call records made
from a particular cell phone with that id. Using the counting field, allowing
the activity of each phone to be tallied. Or they could sort to pull up the
least active phones-- to identify the user who tosses her 'burner' phones
in the trash after one use.
Databases can be restricted. If NSA wanted to count the number of distinct
cell phone calls during a given month that originated in Norway and terminated
abroad (1EF one end foreign), it can restrict the records to the relevant
time and location fields, masking out the others. They could compress each
cell phone to a single line and count rows to get summary data on the number
of phones doing 1EF. That summary data could be injected into a template
for a BOUNDLESS INFORMANT slide.
Databases can be queried (tasked) to pull out only those records satisfying
some string of selector logic. For example, you could submit a FOIA request
to NSA in the form of a query that consisted of your selectors and a database
like BOUNDLESS INFORMANT to see what call metadata they have on you in storage.
Here you would be wise to request simple output (rows of plain text with
column values separated by commas,CSV format), to keep file size down. Then
you could make your own mail-merge templates and spew out colorful BOUNDLESS
INFORMANT graphs and reports about yourself, or just use the default templates
provided by Excel.
November 25, 2013 at 2:36 PM
Anonymous jbond@MI5.gov.uk said...
-4- Next up on confusion, relational databases. NSA maintains hundreds of
separate flatfile databases that might however share a field or two in common,
for example someone texting, google searching, or shopping as well as making
phone calls with with a given phone, the number or IMSI being the common
Those other activities involve different fields from those already in BOUNDLESS
INFORMANT, such as your login to eBay or search term text instead of email
It could all be put into BOUNDLESS INFORMANT by expanding the number of fields.
However this doesn't scale very well : it results in the voice call fields
being massively blank for an IMSI making lots of google searches, creating
a huge sparse table that is very slow to process, wasting analysts time (called
high latency by NSA).
Instead, BOUNDLESS INFORMANT will just link to all the other databases which
share a field. And those in turn could link to other simple databases sharing
some other field that BOUNDLESS INFORMANT might lack. And so on -- it's how
all the little constituent databases can be seamlessly integrated..
A query now calls through to this whole federation of linked databases, which
can reside geographically anywhere on the Five Eyes network (though NSA is
moving to one stop shopping from their Bluffdale cloud to improve security
and reduce latency).
The primary provider of relational database software of this complexity is
Oracle. However you can do about all of it free and friendly with open source
MySQL. The Q is for querying -- what NSA calls tasking -- sending off some
long-winded boolean logic string of field selector values and constituent
databases that does the filtering you want.
The result of the query is a new little database, usually temporary, that
you can use to generate fancy views and summary reports. The databases being
updated continuously and storage retention varying, the same query tomorrow
will give a slightly different outcome.
Your all-about-me FOIA request could be formulated in MySQL (first need to
know names of linked databases) and surprisingly, the query string would
be recognized and fulfilled by Oracle or whatever big relational database
NSA ended up using/developing, it's that standardized.
If you're online or call a lot, that could still be a big file given 12 agencies
keeping tabs, notably NSA, Homeland Security, and FBI's DITU. But if you
wrote the query right, it would only take a small data center in the garage
to host the response.
November 25, 2013 at 2:37 PM