Sideways Add / Population Count

31 January 1999: Add Axel Horns message.
30 January 1999: Add messages by Jitze Couperus and Steve Bellovin
28 January 1999
To: cryptography@c2.net
Date: Thu, 28 Jan 1999 11:06:34 -0500
From: MCKAY john <mckay@cs.concordia.ca>

This I got from computer historian, Simon Lavington.

The (Manchester) Ferranti Mark I had a hardware random number 
generator. This was specified by Alan Turing - (A copy of his 
original  Internal Report, dated 1949 I believe, still exists.)  
The random number instruction was based on electron noise in a 
thermionic diode.  Turing's report even gives a possible circuit 
diagram. The same strategy is used today in the UK Dept. of 
National Savings' ERNIE Premium Bond machine.

Another curiosity of the Mark I's instruction set was a sideways
add ('population count'), also specified by Turing.  I've always
assumed that the two instructions could be useful for cryptography -
eg for the generation of one-time coding pads and the testing of
decryption procedures.

[See S.H. Lavington's 1978 Comm ACM paper on the Manchester Mark 
I and Atlas: http://www.computer50.org/kgill/mark1/lav.html ]



From: "Jitze Couperus" <Jitze.Couperus@cdc.com>
To: <cryptography@c2.net>
Subject: Re: Pop Count Instruction and cryptanalysis
Date: Thu, 28 Jan 1999 11:32:16 -0800

John Mckay wrote:

[Snip]

About the "sideways add" or pop-count instruction - indeed
Seymour Cray's first supercomputer (the Control Data 6600)
sported such an instruction, as did all subsequent Control
Data machines until the advent of the 180 series in the mid 
'80s. 

It was almost a tradition that one of the first of any new
faster CDC machine was delivered to a "good customer" - picked
up at the factory by an anonymous truck, and never heard
from again.

We always wondered what such an instruction might be useful for - 
until one of the first of the 180 series (n'th generation successor
to the 6600) was delivered to such a customer, and cries of 
anguish erupted that this machine didn't have such an instruction. 
We scrambled and had to create a very tight code sequence within the 
instruction stack that could be generated via a Fortran intrinsic 
function.

Simon Lavington also mentions in his book that the venerable 6600 
inherited some ideas from the Manchester Atlas. I never found out 
what specifically - did the Atlas sport such an instruction? - 
perhaps also inherited from Mark I?

I'm not sure if Cray's other machines (built after he left Control 
Data) had such an instruction, but I'm told they did.

Jitze Couperus
Control Data Systems



To: cryptography@c2.net
Subject: Re: Pop Count Instruction and cryptanalysis 
Date: Thu, 28 Jan 1999 15:35:04 -0500
From: Steve Bellovin <smb@research.att.com>

Jitze Couperus writes:

[Snip]

For years, I had heard the story about NSA liking that instruction.
But I never understood why, until I started working on plaintext 
recognizers, and independently derived the need for it.  See, for 
example,

http://www.research.att.com/~smb/papers/probtxt.ps.

There are other instruction types that are useful for cryptanalysts.
The CDC Star had a lovely set of vector operations under masks.  And
the Harvest add-on to the IBM 7030 (Stretch), described in a book by
Buchholz ("Planning a Computer System", McGraw-Hill, 1962) was intended
for NSA as well.



From: "Jitze Couperus" <Jitze.Couperus@cdc.com>
To: <cryptography@c2.net>
Subject: Re: Pop Count Instruction and crytanalysis 
Date: Fri, 29 Jan 1999 13:15:23 -0800

A correspondent asked me if I could expound some more
on what I know about the use of the pop-count instruction.

The paper cited earlier in this thread by Steve Bellovin
tells you far more than I can, as well as a paper that
it cites in turn about "A Programmable Plaintext Recognizer".

However, and without boring through detail, I can perhaps 
give a flavor of how it was used  a long time ago and in a 
manner that is interesting from a cryptanalysis perspective
- and given that it jibes with similar discussion in the
papers mentioned above.

In one of the first "interactive" environments supported
on the 6600 - the Intercom time-sharing system, we had
a utility that allowed one to search a text file for
some arbitrary character string - much as any text editor
allows today.

This was implemented approx. as follows. Each line of text
in the file had appended to it a 60-bit word, the contents
of which were a bit vector. If there were one or more
of the letter "A" in the line, bit 1 was set, if there
were one or more of the letter "B" in the line, then
bit 2 was set, etc... thus setting some sub-set of 36
bits coresponding to the characters A..Z and 0..9. Let's
call this word appended to each line its "thumbprint".

There are 24 bits left over - more about that later.

Now to search for any line containing some search
string, all you have to do is create a similar thumbprint
for the search string, and loop through the thumbrints
of the lines in the text file, checking each against your
search thumbrint. "Checking" in this sense means doing a 
logical "AND" between the two to extract matching bits, 
and then counting how many bits matched - hence the pop-count 
instruction.

A "hit" would of course only be tentative. It does not
tell you that the line in question contains the actual string
you are looking for - only that the line contains somewhere some 
(count of) the characters contained in your search string.

This little loop can however be coded in an extremely tight 
manner to fit in the instruction stack (equivalent to instruction 
cache in today's technology) thus allowing a very fast winnowing
of interesting text lines that can then be examined more closely.

The actual instructions used within the loop also made good use
of the parallelism offered by the separate functional units
of the 6600 cpu - and in later machines - pipelining.

The extra 24 bits were used in other applications (of similar
ilk) to reflect not single characters, but the occurrence of
certain digraphs in the text line. Thus bit 40 in the the 
thumbprint might indicate whether a given "interesting" digraph 
did or did not occur in a given line.

Later the border between the 36 bits and 24 bits within the
60-bit word was moved and successively tuned. Certain bits 
of the original 36 were commoned up - i.e. a particular bit
might indicate the presence of any of  Z, X, or Q because
maybe they are relatively uncommon. This would free up two 
more bits for use as digraph flags - and so on.

Then, in another place, the whole mechanism was turned on
its head. It wasn't used to *look for* a certain character
string, but rather to *reject* certain lines of text as 
being uninteresting. (Others might use the word "implausible"
here.) Furthermore, the concept of a thumbprint per "line of 
text" was refined to support differing granularity of the
thumbprinting, and of course the size of the thumbprint
itself, as 64 bits became a more common word-size.

Some 30 years later, I find the paper cited by Steve Bellovin
on "Probable Plaintext Cryptanalysis" to be extrememely
interesting - in particular it cites another paper
about "A Programmable Plaintext Recognizer". This is
the only open documentation I have ever come across that
discusses this kind of mechanism in any detail. 

Jitze



To: cryptography@c2.net
Subject: Re: Pop Count Instruction and crytanalysis 
Date: Fri, 29 Jan 1999 22:22:52 -0500
From: Steve Bellovin <smb@research.att.com>

Jitze Couperus writes:
> 
> Some 30 years later, I find the paper cited by Steve Bellovin
> on "Probable Plaintext Cryptanalysis" to be extrememely
> interesting - in particular it cites another paper
> about "A Programmable Plaintext Recognizer"*. This is
> the only open documentation I have ever come across that
> discusses this kind of mechanism in any detail. 

Thanks for a very interesting article.  The other paper you cite
is by David Wagner and myself, and can be found at

* http://www.research.att.com/~smb/papers/recog.ps (or .pdf).



To: cryptography@c2.net
Date: Sat, 30 Jan 1999 14:39:10 +0100
Subject: Re: Pop Count Instruction and cryptoanalysis 
From: Horns@t-online.de (Axel H. Horns)

Inspired by this thread I remember that in the late 70ies I attended
a lecture on "COMPASS", the assembler language for CDC Computers, at
the University of Hannover (Germany), the computing center of which
being equipped with two CDC 6600 and one CYBER 76 mainframes in 
those days. The professor had explicitely noticed that there was no 
obvious use for a hardware-accelerated poupulation count opcode but 
he had not suggested to look on utilization for cryptography.

From this lecture I own a copy of "CONTROL DATA CYBER 70 SERIES -
6000 SERIES - 7600 COMPUTER SERIES" pocket manual "COMPASS VERSION
3" printed in 1973. 

This document clearly states that in the CYBER 70 Models 74 and 6600 
Computers, the opcode "47" for "population count" was executed by the 
DIVIDE UNIT. Contrary, the CYBER 70 Models 76 and 7600 Computers had 
a separete POPULATION COUNT UNIT.

If I understood the opcode table correctly  the respective opcode was 
executed in one or two clock cycles (very fast; the same as shift 
opcodes).

Hence, it looks like as if there has been some kind of evolution.
The NSA first had demanded the implementation of the "population
count" opcode and they got it in a more economical way by using the
hardware of the divide unit. The NSA software using this opcode must
have been very successfull; otherwise they would not have been able
to require to be provided with a separate hardware unit which I
think was *very* expensive in those days (The core CPU of said CYBER
model was made up of modularized and encapsulated printed circuit
boards equipped with discrete transistors, AFAIK).

Axel H Horns