Information Leaks
Prologue
In early 1943 the Economic Warfare Division of the American Embassy in London started to analyze markings and serial numbers obtained from captured German equipment in order to obtain estimates of German war production and strength. This report is the story of the development of this technique in terms of the problems which arose and the ways in which they were solved.
Various kinds of captured enemy equipment were studied by this technique. The first product to be so analyzed was tires, and after this tanks, trucks, guns, flying bombs, and rockets were studied. …
Thus begins a 1947 paper by Richard Ruggles (Harvard University) and Henry Brodie (Department of State). The paper was titled “An Empirical Approach to Economic Intelligence in World War II” and appeared in Journal of the American Statistical Association. It now has an exalted status in statistical literature for establishing the utility of serial number analysis. But I regard it more broadly as an enduring example of how information leaks despite the best efforts of individuals and institutions to keep everything secret. It also tells the sort of unsexy story which diverts the forever winding course of history. Obviously you’ll never learn about such intriguing documents from movies and media. Hence I shall highlight this paper in the hope that we may all realise its relevance to our time and our lives.
Casablanca Conference
During January of 1943, Franklin Roosevelt and Winston Churchill met along with their respective American and British entourages in the Moroccan city of Casablanca. Joseph Stalin and the Soviets were absent, so were the Chinese and Chiang Kai-Shek. But some notable Frenchmen turned up, mainly Charles de Gaulle. This was just one among dozens of conferences that brought together various Allied leaders throughout the war. The main outcome of this particular gathering was the decision to invade Sicily six months later. If you want an entertaining backstory of what else happened prior to Sicily, then watch the 2021 film Operation Mincemeat. As for the paper by Ruggles & Brodie, the following passage provides an overarching context:
Economic intelligence in World War II played an important and varied role during the conflict with Germany. Information as to Germany’s war potential was the frame of reference which shaped the pattern of allied mobilization and strategy. Knowledge of the quantities and types of war materiel possessed by the enemy was needed to fix the timing of the invasions and to plan the kind of warfare which was to be waged. In addition, both aggregative data about German industry and highly detailed facts about individual plants and products were very necessary to carry out the allied strategic bombing program as conceived at the Casablanca Conference. Behind each attack of the Eighth Air Force over Europe lay extensive research, involving such considerations as the essentiality of various German war products; the exact location, relative importance, and output rates of various producers; the length of time elapsing between the separate production processes and consumption of the finished article by the army; substitutability of various products; the availability of alternative production facilities; and finally recuperation rates of industries suffering from direct bomb damage.
The paper further explains that analysts made wildly erroneous estimates due to a range of reasons, though mostly thanks to fear and lack of knowledge. A myth of “German Invincibility” prevailed in the minds of Allied personnel. But the tide began to turn when statisticians started analysing data. Ruggles & Brodie provide an obscure footnote hinting at who these people were: “The Economic Warfare Division of the American Embassy (EWD) was a centralized intelligence agency in direct contact with British agencies. Its personnel consisted of analysts loaned by the Office of Strategic Services, the Foreign Economic Administration and the State department.”
Markings on Enemy Equipment
Did you ever pay attention to the serial number on the back of your favourite gadget? I didn’t—until I read this paper roughly fifteen years ago. Even now I take mainly a passing interest on such markings, rather than a scholarly obsession that would require a systematic study of serial numbers. But it amazes me, nonetheless, just how much information is embedded in something as innocent as a serial number on a phone or a watch. A complicated supply network that wraps the globe several times is symbolised by a serial number. Such a unique identifier may not encode everything about the supply network, but it’s the key that unlocks a treasure trove which holds valuable information. Ruggles & Brodie summarise their treasure trove from the war:
Each piece of enemy equipment, whether main assemblies or component parts, was liberally labelled with markings inscribed either on the equipment itself or on attached nameplates. Such markings varied as to completeness and included all or some portion of the following information: (a) the name and location of the marker, (b) the date of manufacture, (c) a production serial number, and (d) miscellaneous markings such as trade marks, mold numbers, casting numbers, etc. The purpose of these markings was twofold. First, they furnished information necessary to maintain an effective check on production standards. Faulty performance of equipment in the field due to defects of manufacture could readily be traced back to the original source and corrected. Second, some of the markings were essential for proper spare parts control. However, these same markings, if subjected to proper analysis, offered Allied intelligence officers a wealth of information about Germany industry.
This is an early example of data mining which is now ubiquitous in our time. This paper doesn’t mention the use of electrical or mechanical computers, although the hardware inventions by Alan Turing and others at Bletchley Park from that period are well-known. We also need to remember that the generic noun computer referred to a human back then. So the unseen and unsung heroes of this data mining endeavour were all the people who carried out tedious calculations by hand using elementary tools which are utterly alien to us today. And it’s a double dose of injustice that these uncredited computers were mostly women. The irony of Ruggles & Brodie not mentioning their colleagues is redeemable though, for this piece of implicit information leaks in the same way that markings on enemy equipment do. We need merely to read between the lines.
Studying Tyres
There’s a note in the paper regarding how several thousand German tyres came to be possessed by Allied personnel: “British experts working on the German rubber industry had accumulated a sample of markings from about 2000 enemy tires. These had been taken from German aircraft shot down over Britain and from supply dumps of aero and motor vehicle tires captured in North Africa.” It reminds me of a scene from the 2012 film Zero Dark Thirty where the Navy SEALs blow up one of their stealth helicopters that had crashed on arrival. Even before that film was made, I remember various newspaper photos showing distinctly marked panels from the real thing, which the Pakistanis had gathered up and put on display to taunt the Americans. I’m guessing that during World War II the Germans did their own counter-intelligence work on captured Allied tyres, which may not have been translated to English—or at least I have yet to read those books. In any case, Ruggles & Brodie describe how they analysed markings on German tyres:
In the main, the markings had been used by the British to identify German tire manufacturers, since the maker’s name was always inscribed clearly on each tire. In addition to the maker’s name, however, every tire bore a serial number and a two letter code for the date of manufacture. An array of the data for each manufacturer indicated that the tires were numbered systematically. … The first step in analyzing tire markings involved breaking the two letter date codes. These codes were not used by the Germans as a wartime security measure; their purpose was to indicate to the manufacturer and dealer the date when a tire was made without revealing it to the purchaser. Since there were two letters, it was assumed that one represented the month and the other the year of manufacture. The further assumption that there should be 12 letter variations for the month code and probably three to six for the year code also seemed reasonable if simple substitution codes were used. On this basis, the month code was distinguished from the year code by reason of its greater variation (12 variations if the sample were large enough). The fact that the tires were numbered serially helped break some of the month codes. Where the tires were numbered in a continuous increasing series, a simple array of the cases by number would reveal the order of month letters with any given year letter. If sufficient cases existed the code would be solved. Where the sample of a particular make of tire was too small for that purpose, other code solutions suggested themselves. Thus, it was apparent that some manufacturers based their month codes on words or simple arrangments [sic] of letters.
They elaborate further regarding the importance of intelligence derived from tyre markings:
The chief significance of the tire markings analysis was that it afforded for the first time a reasonably accurate picture of German tire production by individual makers. This information provided a firm basis for assessing the importance of the German tire industry as an air target system. Although the initial tire study was based on a sample representing only about 3/10 of one per cent of the universe, a number of factors supported the accuracy of the findings. First, representation of individual producers in the total sample was proportionate to their production as estimated by serial number analysis. Second, the deviations in monthly output for individual producers were of reasonable relative magnitudes. They showed the period of conversion, the way in which production fell off sharply, and then built up to a level considerably below the peacetime peak. This was to be expected since military tires are larger and more difficult to make than civilian tires. Third, production estimates for any one month well represented in the sample showed a high degree of stability even when based on only a fraction of the cases for that month selected at random.
In addition to bringing about a major revision in the aggregative tire output estimates, serial number analysis also furnished intelligence officers working on enemy target systems with a great deal of relevant material. The location and importance of each producer was now known, as well as the length of time elapsing between the manufacture of tires and their use by the army. While the first tire report was in preparation one plant was bombed. The study showed the monthly output figures for this plant before and after the attack, and thus furnished a valuable check on the results of the bomb damage.
It’s easy to get overconfident from such intelligence gathering, especially when the feedback loop is closed by actively bombing a target based on said analysis. The reality is that an irreducible element of luck plays a crucial role in any statistical inference. Gathering the tyre samples is a case in point, which came about through accident and not via deliberate espionage. Nevertheless, it’s quite remarkable that the Germans revealed so much of themselves in their utterly unremarkable tyres. For instance, Ruggles & Brodie say in a footnote that “many German units went to Africa from Russia and took their equipment with them.” Think about the coincidence of being able to track movement by looking at markings on tyres. Today we see its creepiness when infamous spy agencies track all of us from the phones in our pockets—even when powered off!
Studying Tanks
If studying tyres can lead to actionable intelligence, then studying tanks ought to be a pot of gold at the end of a rainbow. But how do you get hold of German tanks? It turns out that you don’t need physical tanks per se, because the Germans maintained detailed logs which were easier to obtain. Consider what Ruggles & Brodie say:
Tank markings were available from a number of sources. Documents captured in North Africa included German tank log books. These books contained the chassis and engine serial numbers of the tanks to which they belonged, along with the date of manufacture and the name or code of the assembler. Papers captured at divisional headquarters sometimes included lists of the tank holdings of specific armored units, enumerating types and chassis serial numbers of individual tanks. Captured records of German tank repair depots reported the chassis and engine serial numbers of every tank repaired. Also, spare parts order books and other technical publications issued by the Wehrmacht listed tank chassis serial number bands to indicate exactly the various models of spare parts required for different tanks. Finally, some tank markings had been recorded in North Africa by technical intelligence field personnel inspecting captured equipment, and a few German tanks were available for more detailed examination in both England and United States. From all of these sources, about 1200 tank chassis serial numbers were obtained along with more detailed markings for a small number of vehicles
They also explain the following simple source of information leaks: “Every tank had to be tested by a Wehrmacht inspector before it was accepted. One or more inspectors were located at each tank assembly plant. These signified their acceptance of a vehicle by stamping their individual inspector’s number on it. When the tank assembler’s names were coded these acceptance stamp numbers were not changed, and thus provided a basis for identification of the codes.” Then they point out that the significance of intelligence derived from tank markings was profound:
The first tank report yielded less detailed information than the tire report, but it was no less comprehensive. Annual tank production by type was obtained for the years 1939-42. The number and relative importance of the various assemblers also was determined. Analysis of tank engine markings indicated that two manufacturers were responsible for 100% of Germany’s engine production. The significance of this early tank study, just as in the case of the tire report, lay in the fact that its findings differed radically from accepted intelligence. At this time the accepted estimate of cumulative total German tank production was about 40,000; serial number analysis revealed that this was a gross overestimation; not more than 14,000 tanks had been produced. The 1942 production rate, originally accepted as being about 18,000, was estimated by the serial number technique as being only 3,400. In other words, Allied intelligence still suffered from the myth of German invincibility created by Nazi propagandists out of the successful blitzkrieg tactics in Poland and France, and it had grossly overestimated the enemy’s position; the serial number technique revealed this fact and introduced realism in our picture of the strength of the German war machine. Furthermore the number and importance of the various producers, hitherto unknown, was now revealed.
Circling back to the Allied invasion of Sicily, there’s a curious anecdote in the paper: “Just before D-day [i.e. the Normandy landings on 6th of June, 1944], army intelligence became vitally concerned with the rumors of large Mark V tank production. At the time, only one Mark V tank had been encountered by English and American troops. This tank had been captured in Sicily and was shipped to England. A second Mark V taken by the Russians was also turned over to the British. Careful examination of the markings on these tanks revealed that the probable assembly date of the one from Russia was March 1943, and of the one captured in Sicily, February 1944.” Ruggles & Brodie also note: “Mark V tank assembly was [estimated] at a rate of 270 per month by February 1944 if tank assembly was occurring at the capacity of bogie wheel production. After the war it was found that actual production in this month was 276. On the basis of the bogie wheel analysis, probable Mark V strength was calculated, and thus the Western Allies were forewarned that they would encounter this tank in larger quantities than originally anticipated.”
Indeed, after the war it was evident that the German ministry for war production had kept complete records which corroborated the Allied intelligence based on serial number analysis. The paper aptly concludes: “The relative accuracy of the serial number estimates indicates that this method of analysis was a valid and valuable source of economic intelligence. Within the limits of its capabilities, the technique of analyzing markings on enemy equipment was superior to the more abstract methods of intelligence such as reconciling widely divergent prisoner of war reports, basing production estimates on pre-war capabilities or projecting production trends based on estimates of the degree of utilization of resources in the enemy country.” Next time you look at a serial number, imagine the hidden information that’s leaking and try to visualise the invisible supply network.
Epilogue
Ruggles & Brodie left out equations from the paper, which made it easier to read. But if you need equations for serial number analysis then search for Leo Goodman’s articles from the early 1950s. Python packages should also be available somewhere. The reason I wanted to spotlight this paper was to emphasise how subtle inferences become feasible when there’s concealed structure in our world. But there are major limits here, of course. Unfortunately, we denizens of the 21st century fool ourselves into thinking that many things are foreseeable when they are fundamentally unknowable. Thankfully, however, we have the compass of poetry to guide us for intuiting that critical difference. So I shall finish with a clever poem crafted by W. H. Auden. It’s called The Unknown Citizen (1940). It chronicles the act of mining data on an ordinary person long before the era of social networks, which seems all too trivial to us today. But read the entire poem below, then reflect on the fact that absurdly vital questions about us cannot be known by mining more data on us.
(To JS/07 M 378
This Marble Monument
Is Erected by the State)He was found by the Bureau of Statistics to be
One against whom there was no official complaint,
And all the reports on his conduct agree
That, in the modern sense of an old-fashioned word, he was a saint,
For in everything he did he served the Greater Community.
Except for the War till the day he retired
He worked in a factory and never got fired,
But satisfied his employers, Fudge Motors Inc.
Yet he wasn’t a scab or odd in his views,
For his Union reports that he paid his dues,
(Our report on his Union shows it was sound)
And our Social Psychology workers found
That he was popular with his mates and liked a drink.
The Press are convinced that he bought a paper every day
And that his reactions to advertisements were normal in every way.
Policies taken out in his name prove that he was fully insured,
And his Health-card shows he was once in hospital but left it cured.
Both Producers Research and High-Grade Living declare
He was fully sensible to the advantages of the Instalment Plan
And had everything necessary to the Modern Man,
A phonograph, a radio, a car and a frigidaire.
Our researchers into Public Opinion are content
That he held the proper opinions for the time of year;
When there was peace, he was for peace: when there was war, he went.
He was married and added five children to the population,
Which our Eugenist says was the right number for a parent of his generation.
And our teachers report that he never interfered with their education.
Was he free? Was he happy? The question is absurd:
Had anything been wrong, we should certainly have heard.
