Digital History - Hacking the ПАМЯТЬ НАРОДА website

On the nature of archives

One the key features to remember when searching Pamyat Naroda is that fact that it is an archive not a library. (If you do not know the difference and can get to London, I suggest that you attend the excellent Methods and sources for historical research course at the Institute of Historical Research, University of London.) Basically libraries arrange information hierarchically by subject aka. Dewey Decimal System so it is easy to find. Archives have a very basic file structure but order documents within that, simply by the date of filing. So in the case of Pamyat Naroda, we have a basic file structure of FOND (a folder with information about the  same formation), OPUS (a sub-folder  with sometimes basic subject classifications), DELO (groups of documents) and LIST (which identifies the actual document) for each individual document. On the website it also has a Document ID number which is unique to that individual document. A document may comprise, 1 page, map sheet, scheme or table or be a complete book of several hundred pages, for instance I found a copy of Разгром немцев под Москвой (оперативный обзор) "Rout of the Germans before Moscow (Operational review) which runs to 236 pages and can be found in Fond 208 Opus 2511 Delo 225 List 1a to 170.

Subject coverage

Behind the Pamyat Naroda website, lies the Ministry of Defence (TsAMO) Military Archive and a long history of limited access for researchers. So there is bound to be an element of careful selection as to what is released both in subject areas covered and also documents relating to sensitive events. David Glantz has already explored many of these "Forgotten Battles" and shone light onto areas that had previously remained hidden but no doubt many things are left to be revealed. Whether Pamyat Naroda is a suitable vehicle for this remains to be seen. Yet there are surprises, for instance Andrey Andreyevich Vlasov appears on the Heroes page and there are some documents for the 2nd Shock Army such as the "Map of the position of troops 2 Shock Army from May 29 to June 26" showing the last positions of the doomed army.

However in part, the collection reflects the state of the archive, which has less 1941 and 1942 material due to the confusion and loss of territory and formations after the events of the 22nd June, compared to the organised and successful years of 1944 and 1945. It would be interesting to run a study, to see the coverage of the collection over time, geographically and on specific sensitive subjects. A few simple searches can give taste of what this study might look like, for instance the number of records by period:

1941   -    309,183
1942   -    908,236
1943   -  1,000,816
1944   -  1,685,953
1945   -    988,402
total  -  4,990,695

So the cataclysmic events of 1941 represent 6 months out 47 (13%) yet only contain 6% of the documents while 1944 representing 25% of the period has 34% of the documents. 

Similarly I did a quick survey of documents connected with Tank Armies:

Fond Unit Results        
      1942 1943 1944 1945
293 1 Tank Army (1GTA) 0 0 0 0 0
299 1 Tank Army 13,088 15 4,292 4,493 4,160
300 1 Tank Army 26 26 0 0 0
307 2 Tank Army (2GTA) 13,038 2 901 7,940 3,923
316 3 Tank Army 1,695 566 1,051 23 2
315 3 Guard Tank Army 8,616 4 1,327 3,451 3,690
422 4 Tank Army 9,453 2,797 3,756 1,610 1,214
323 4 Guard Tank Army 4,010 1 4 1,790 2,130
331 5 Tank Army 1,738 1,329 334 0 0
469 5 Tank Army 1,643 5 1,626 1 1
332 5 Guard Tank Army 9,795 8 3,367 3,636 2,651
339 6 Tank Army (6GTA) 2,578 1 1,327 1,342 1,171

As you can see, in some cases there is one Fond covering several formations such  as f.307 for 2nd Tank Army and 2nd Guards Tank Army, while in other cases there are several Fonds covering just one formation f.331 and f.469 for 5 Tank Army. However there a large gaps in the coverage, 2nd Tank Army was formed in January 1943, the two documents shown for 1942 are mis-dated and actually should be in 1944 while barely 900 documents cover 1943 compared to almost 8,000 for 1944. Similarly for the important Stalingrad counter-offensive, the 5th Tank Army shows just 568 documents for the period November-December 1942.

By Mil.ru, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=65494630

By Mil.ru, CC BY 4.0, https://commons.wikimedia.org/w/index.php?curid=65494630

Search Results

As I demonstrated in the last post, the results from searches on the Pamyat Naroda website have to be considered carefully because searching in different sections and in different ways brings up over-lapping results. As the Tank Army results show,  searches have to be conducted over one or several Fonds, there are gaps in the coverage and the density of coverage is uneven. Documents are spread over spans of files so that Fronts have to be searched just as much as Armies or Divisions, bearing in mind that duplication occurs throughout as one would expect when consulting an archive. To go back to the volume I found earlier Разгром немцев под Москвой (оперативный обзор) "Rout of the Germans before Moscow (Operational review)". This is of interest because a book of a similar name was later published and the author was none other than Boris Mikhailovitch Shaposhnikov - Chief of the General Staff during the battle. Using this as an example of some of the problems of searching Pamyat Naroda with the websites own search engine. Lets try putting that term into the search box of the Documents of Units. The results give us no less than 9 versions of the same brochure and the one I quoted earlier is only 29 pages long. That is because the rest of the document is spread over another 20 Delo references each of which has its own listing and title. So the only way you can find them is to put the f.208 op.2511 d.225 reference into the Advanced Search engine which will return a complete 21 results for the whole brochure. The next brochure is only 44 pages long and trying the same advanced search engine trick with f.208 op.2511 d.1043 does not bring up any connected documents, so that trick does not work twice. The next result is a brochure from the Military Historical Department of the General Staff, with three authors and runs to 158 pages, while the one after that is the "3rd Volume" from the Staff of the Western Front and is 244 pages long. Since my original document was annotated 1st Volume and was also by the Staff of the Western Front, this looks like another part of it. Moving swiftly on the next result is another copy of the 1st volume with 151 pages but a closer examination reveals that it is missing the last few sections and parts of it are quire different. The next one is only 44 pages however a quick search of reference f.208 op.2511 d.1042 l.105 reveals an interesting document on the drafting of the "Rout of the Germans.......". As you can see this is searching through a proper archive, with documents broken up over several files, duplicates of the same thing, different versions of similar documents and supporting papers. However trying to navigate your way through this maze is not made easier by the limited number of search results visible and their fairly basic descriptions which forces you to open every likely document.

Search Engineering

If you type a search phrase into any of the Pamyat Naroda search boxes (remembering to use your Cyrillic keyboard) it will return a result, for instance typing the word Авто produces 5,412 results. Using Wildcard characters does not change that result so using * or ? or " " makes no difference but unsurprisingly using the Advanced Search Settings does, although this is limited to just a few options such as dates, unit name, combat operation, author and file reference. So not very helpful if you want to distinguish between автотранспорт (3075 results), автотранспорта (3,083 results) and автомашина (3,030 results). Results are sorted by relevance with the actual search term at the top and then variants lower down. In all cases 10 results are shown with the option to move onto further pages or to increase the number of results shown on that page, by pressing "More".

Effective searches

In order to devise effective search strategies, we need to determine what is happening here, bearing in mind that it is likely to be a complex algorithm. So taking a search term and reducing it by one letter each time gives us the following result:

автотранспорта - 3,083
автотранспорт - 3,075
автотранспо - 2,590
автотрансп - 292 (mainly автотрансп.)
автотранс - 1
автотран - 10
автотра - 41 (now finding abbreviations)
автотр - 195
автот - 1,495 (mainly автоп or атбр !)
авто - 5,412 (mainly авто or авто- or Автор)

From these results it is plain that the search used is the term "авто??" or the term with two characters at the end which can be any character, sorted by relevance with the search term coming first.

tags

Yet the search box performs other searches using another different method, one using TAGS. Type "2 ТА" into the Military Units search box and it will come up with all the formations of the 2nd Tank Army and it is doing this by finding a Tag attached to every document that identifies its Unit, Front, Commander, Combat Operations and it is this that is powering the search boxes in the Heroes and Combat Operations sections as well. Doing the same search in the Documents of Units brings up search results similar to those shown above ie. matching the key word, however it will bring up an additional set of documents which include the Tag. Many of these will have no reference to the keyword in their title, author or other fields so this is a powerful way of finding otherwise hidden documents. It works particularly well with higher formations such as Fronts & Armies and interestingly Mechanised Corps.


Alternative Search Resources

We are fortunate that there is such a widespread interest in modern-day Russia, for the period of the Soviet-German War, from large numbers of enthusiasts who search for the bodies of fallen soldiers in the forests, recover drowned tanks from rivers or search for the fate of relatives in the archives. Their efforts are of aid to the historian because they provide a number of 'tools' to make the searching of Pamyat Naroda much easier. One of the most useful is this listing of the Fond numbers of the majority of the TsAMO archive which can be found at: http://www.teatrskazka.com/Raznoe/Fondy_TsAMO/Fonds_PamyatNaroda.html. The Fond number can be used in the Documents of Units advanced search box and can used as a useful term for reducing a large number of search results, say for a particular operation to a specific unit. There are other tables and lists on this website which might prove of lesser use, for instance a list of Fronts and their constituent Armies is a quick way of identifying connections however the links have not been changed when Pamyat Naroda was updated and so no longer work. On the other hand this list of unit's casualty reports still works well with the site "Memorial".

an alternative search engine

The main search engine of Pamyat Naroda has its uses yet it has a major limitation in that it only displays ten search results unless you go through a load of button clicking and navigation. What if you could see thousands of result at the same time and then download them into a spreadsheet so that you could conduct further sorting and grading? What if those results were straight-forward results which could be amended by standard Boolean search terms and could be combined with all those advanced search fields too? Well the answer is that you can by using the search engine available through GITHUB. I have included a screenshot of the engine with its labels translated into English below and underneath that the instructions given on the "i" button. Do not be fooled by its simple looks, it will transform your searches from vague overlapping lists to precise focused lists, although you will still have to input the searches in Cyrillic text. The results can be translated into English in Chrome by right clicking the mouse button to bring up the context menu and selecting Translate to English however the downloaded lists will always be in Russian. To give an illustration, a straight-forward search for the term "автотранспорта" produces 2,169 results or 918 fewer than the Pamyat search engine which gives and indication of the amount of 'bloat' resulting from the fuzzy ending and lack of Boolean terms. The results will show 50 per page but changing the Number of Results can increase this to show every result on one page, although for large tranches of results this can slow your computer down quite a bit. Each column can be sorted and the results shown can be sorted by Document ID, Date or Relevance while the drop-down allows results to be shown by document type. The results show the Document ID, full document reference, Title, Author and Date while the downloaded results show further fields such as the actual web address of the images. Furthermore you can open each document by simply clicking on the Document ID which opens the document directly in a new tab. Geo-referenced maps are shown by the letter 'M' and will open on top of a Yandex map showing precisely their location and tools on the map allow distances to be measured from selected points. Selecting the ДОУ button shows all results while the ЖБД button just shows Combat Reports.

Instructions for using the When searching the "Document title" and "Document author" fields, use wildcards ? and *, for example "weather?", "guerrilla *", "captivity *". To exclude a word or phrase before it, you must put a sign - for example "Govorov - (Lieutenant Colonel Govorov)." To find the exact phrase, enclose it in quotation marks: "5 a".

If you enter latitude and longitude in the 55.5 37.5 format in the Coordinates field, then maps will be found that cover this geographic point. Unfortunately, there are very few geographical maps in the database.

Search of the files of the "OMD Memorial of the People" website is available at https://vnr.github.io/obd-search/

Also can be downloaded as Excel spreadsheets:

If the search stops working, please write to venireman@yandex.ru