Kaikki joukolla vanhoja käsialoja tulkitsemaan

Kirj. Petteri Impola | 6.6.2024

Kyky lukea vanhoja käsialoja on historiantutkijan ja lähitieteiden edustajien perustaito, mikäli tutkimuskohteena on 1900-luvun alku tai sitä vanhemmat vuosisadat. Paleografia on tekstuaalitieteiden tutkimusala, joka tutkii kirjoitusmerkkien, kirjoitusvälineiden ja ennen muuta kirjoittamisen kehitystä. Ymmärrys kirjainten ja käsialojen kehitysvaiheista auttaa myös niiden ajoittamisessa ja tulkitsemisessa. Esimerkiksi varhaismodernin ajan Ruotsissa vallitseva käsialatyyli oli saksalainen eli uusgoottilainen tyyli, jonka yleisilme vaihteli vuosisadoittain. Se poikkesi läntisen ja eteläisen Euroopan latinalaisesta kirjoitustyylistä, joka tuli Suomessa vallitsevaksi, ja sittemmin nykylukijalle tutuksi, tyyliksi vasta 1800-luvun aikana.

Perusopintojen aikaan tutustuin vanhojen käsialojen tulkinnan perusteisiin muun muassa 1600-luvun tuomiokirjoja sekä 1800-luvun kirkonkirjoja sisältäneiden harjoitusten myötä. Tehtävät olivat varsin mekaanisia, mutta viime kädessä vanhojen aineistojen tulkinnan opetteleminen vaatii kärsivällisyyttä ja istumalihaksia. Kandidaatintutkielmaa varten sain tulkittua muutamia oikeudenkäyntipöytäkirjojen tapauksia, graduun useita kymmeniä. Tuolloin oma työskentely tuntui kovin hitaalta – mitä se varmasti olikin – mutta toisaalta taidot olivat muutaman vuoden tasaisen uurastamisen myötä huomattavat verrattuna alkupisteeseen.

Sittemmin olen ollut kehittämässä vanhojen käsialojen opetusta niin Jyväskylän yliopiston, avoimen yliopiston kuin kansalaisopistojen tunneilla. Lisäksi olen muutamana vuotena pyörittänyt Agricola – Suomen humanistiverkon kautta valtakunnallista paleografia-verkkokurssia. Vanhojen käsialojen tulkitsemisen opetuksessa olen korostanut erityisesti sitä, että perustaidot ja ymmärrys niiden itsenäisen kehittämisen potentiaalista on riittävää. Osa itsekriittisistä opiskelijoista on kurssien päätyttyä hieman harmissaan, kun eivät saa laajojen harjoitustehtävien tulkintojaan sataprosenttisella kirjaintarkkuudella oikein. Yritänkin painottaa, että aihepiiri on niin haastava, että jokainen voi olla tyytyväinen, kunhan saa aineistoista tulkittua suurimman osan ja eritoten ymmärrettyä olennaisen. Yhden kurssin jälkeen kukaan ei ole täydellisen valmis uuden ajan alun lähteiden tulkitsija, mutta ilokseni useimmat opiskelijat ovat siinä hämmästyttävän taitavia, taidollisen loppusilauksen syntyessä vuosien harjaantumisen myötä.

Kursseilla aloitan tehtävät helpommista 1800- ja 1900-luvun taitteen aineistoista. Tehtävien edetessä varhaisempiin vuosisatoihin olen lopulta viimeisinä opetuskertoina luetuttanut 1500-luvun käsialaa. Siihen voi opettajan kanssa sukeltaa yhdessä turvallisesti ja vaikka kaikki Kustaa Vaasan ajan aineistot eivät parin kokeilukerran jälkeen selkeytyisi täydellisesti, ovat opiskelijat syväsukelluksen jälkeen huomanneet, kuinka helppoa on palata kurssin alussa mahdottomilta tuntuneisiin 1700- ja 1800-luvun aineistoihin. Koen tärkeänä, että jokainen opiskelija saisi onnistumisen kokemuksia. Tässä on kuitenkin suuria eroja: osalle 1600-luvun tuomiokirja voi olla luontaisesti helpompaa luettavaa kuin 1800-luvun kirkonkirjat. Siksi harjoituksia on hyvä olla eri vuosisadoilta eri käsien kirjoittamina erilaisista aineistotyypeistä.

Kuten muillakin kursseilla, on paleografian opetuksessa kyse teoreettisen perustan sisäistämisestä ja rajatusta määrästä harjoitteita. Todelliseksi ammattilaiseksi tullaan vasta toistojen myötä, esimerkiksi pro gradu -tutkielmaa tai väitöskirjaa työstettäessä. Kohtalaisen kokeneena vanhojen aineistojen lukijana opin itsekin yhä uutta. Lähes jokaisella tuomiokirja-aukeamalla on ilmaus tai pari, jotka aiheuttavat tulkintavaikeuksia joko käsialan epäselvyyden takia tai jos sanat eivät ole merkitykseltään entuudestaan tuttuja. Oikeastaan tutkijan arjessa näiden pähkinöiden selvittäminen on virkistävää, vaikka välillä tutkimuksen kokonaisuuden kannalta toissijaisten sanojen tulkinta imee turhankin aikaa vievästi mukaansa. Mutta kyse on nimenomaan kumuloituvasta oppimisesta, tietotaitoa ja samalla ymmärrystä tutkimuskohteesta karttuu huomaamatta lisää, kun ottaa pähkinöitä purtavakseen.

Tulevina vuosina opiskelijoiden valmiuksia lukea uuden ajan alun ja keskiajan käsinkirjoitettuja lähteitä tulee haastamaan sekin, että kaunokirjoituksen pakollinen opetus loppui peruskouluissa vuonna 2016 – vaikka sitä yhä voi osassa kouluja opetella valinnaisaineena. Käsialakiemuroiden lisäksi varhaismodernin Suomen ja Ruotsin lähteissä haasteen asettaa ruotsin kieli, vanhahtava sellainen. Osa opiskelijoista lyö jo pelkän ruotsin takia neliraajajarrutuksen päälle: ”ei pysty”. Tätä kovin reaktiivista hidastetta olen opetuksessa purkanut siten, että ensiksi harjoitellaan tulkitsemaan teksteistä erisnimiä, jotka eivät vaadi kielipäätä. Tämän jälkeen on edetty etsimään teksteistä ruotsin kielen yleisimpiä sanoja, kuten är, efter, det, men, april, han/hon, som, jotka varmasti jokainen tuntee. Tässä vaiheessa yleensä jo huomataan, että erisnimien ja perussanaston auettua puuttuu virkkeistä enää vain muutamia erityisempiä sanoja, jotka nekin lopulta avautuvat helpommista sanoista löytyviin kirjaimiin vertaamalla. Ylipäätään vanhojen asiakirjojen sanasto on suppeaa: kuten kirjaimet koukeroineen, alkaa sanastokin hyvin pian toistaa itseään. Tavoitteena onkin siirtyä heti kun mahdollista tarkastelemaan yksittäisten kirjainten sijaan sanoja ja niiden muodostamia loogisia kokonaisuuksia.

Olen yksi Vanhojen käsialojen lukuoppaan (Gaudeamus 2021/2023) toimittajista. Teos sisältää vinkkejä vanhojen käsialojen tulkintaan harjoitteineen ja painottaa erilaisten aineistojen rakennetta ja sisältöä lähdekriittisesti: vanhan aineiston tulkinta on helpompaa, kun jo etukäteen ymmärtää, mitä lähde todennäköisimmin sisältää. Olenkin huomannut, että mitä enemmän jaksaa nähdä vaivaa sen eteen, että harjoitusaineistot sisältäisivät kontekstiltaan mielenkiintoista juttuja, sitä innostuneempia ja motivoituneempia opiskelijat ovat syventymään ja käyttämään aikaresurssejaan vaikean taidon opettelemiseen. Esimerkiksi tuomiokirjan äärellä jaksaa istua omaa tulkintaa nysväämässä, jos lähtötietona tietää sen sisältävän 1600-luvun ihmiskohtaloita esimerkiksi salavuoteusoikeudenkäynnistä. Omana opiskeluaikanani kirjaimia tulkittiin välillä varsin mekaanisesti kirjainten ja yksittäisten sanojen itsensä takia, ei niinkään mielekkään tapauksen ja sen kontekstin avaamiseksi. Omien kurssieni päätteeksi olenkin edennyt tehtäviin, joissa en enää toivo kirjaintarkan transkription kirjoittamista, vaan pyydän vastauksia sisältökysymyksiin: tekstiä pitää osata lukea, mutta nimenomaan löytäen olennaisimmat asiat. Tällöin yksittäiset sanat eivät enää jumiuta tulkintaa, mikäli aineistoa pystyy muutoin lukemaan riittävällä tarkkuudella.

Väitän, että vanhojen käsialojen tulkitsemisen ja laajemmin paleografian taitoja tullaan tarvitsemaan jatkossakin. Mutta miksi, kun hiljalleen esimerkiksi Kansallisarkistonkin aineistoihin laajenee automaattinen tekstintunnistus ja koneluenta Transkribus-yhteistyön myötä? Olenko opettajana ja tutkijana neliraajajarrutuksessa vastustamassa vääjäämätöntä kehitystä? Mielestäni en, koska pidän vanhojen tekstien tulkinnan taitoa historia-alan ammattilaisen perustaitona, josta tulee tuntea ammattiylpeyttä. Kaikkien miljoonien ihmisten, jotka hyödyntävät tekoälyä ei tarvitse tuntea syvällisesti sen toimintaperiaatteita algoritmeineen ja laskentatehoineen. Samoin kattolamppua käyttääkseen ei tarvitse olla sähköasentaja. Sen sijaan tekoälyohjelmoijan olisi syytä osata koodata, samoin sähköasentajan olla ammattiinsa kouluttautunut. Vastaavasti koen, että vaikka suuren historiasta kiinnostuneen massan kannattaa lähestyä menneisyyttä kustannustehokkaasti ja saavutettavasti automaattisen tekstintunnistuksen avulla tuntematta käsialoja sen tarkemmin, on historian ammattilaisten kuitenkin syytä kouluttautua paleografian perustaitoihin ja harjoittaa itseään vanhojen aineistojen itsenäisessä tulkinnassa.

Tekoälyn hyödyistä mutta myös sen virheistä ja vaaroista käytävä keskustelu on vasta alullaan. Historia-alalla juuri akateemiset tutkijat ja arkistoalan ammattilaiset toimivat portinvartijoina, mikäli vanhojen aineistojen koneellisessa tulkinnassa syntyy ongelmia. Jos ammatti-ihminen ei osaa tarvittaessa korjata koneen virheitä, niin kuka sitten. Vain inhimillisen pääoman myötä valta pysyy viime kädessä ihmisellä, ei koneella. On ilmiselvää, että tutkijoiden kannattaa hyödyntää koneluentaa, koska niin voidaan käydä tehokkaasti läpi isoja aineistomassoja. Silti vähänkään laadullisemmat tutkimusaiheet vaativat myös aineistoyksityiskohtien lähempää tarkastelua, jolloin on syytä pystyä silmäilemään myös alkuperäisaineistoja. Sanatarkat ilmaisut on tulisi aina tarkastaa alkuperäislähteestä koneluennan tuottamien mahdollisten virhetulkintojen välttämiseksi.

Olen parhaillaan tutkijana Ruotsin 1600-luvun kaupunginkirjureita ja heidän laatimiaan tuomiokirjoja analysoivassa hankkeessa. Tutkimuksessa hyödynnetään myös koneluentaa, mutta koska hankkeemme on kiinnostunut nimenomaan kaupunginkirjurien erilaisista käsialoista ja persoonallisesta kielenkäytöstä luemme isojakin aineistomassoja lävitse omin silmin. Koneen tuottama transkriptio ei nimittäin paljasta esimerkiksi niitä kohtia, joissa käsiala vaihtuu yllättäen kesken tuomiokirjaniteen. Käytännössä lähes kaikissa Ruotsin kaupungeissa on tuomiokirjoja laatineita (apu)käsiä ollut enemmän kuin virassa olleita kaupunginkirjureita. Koneluento ei myöskään osoita kirjurin tekemiä pieniä mutta olennaisia merkintöjä, esimerkiksi hänen korostettuaan tiettyä sanaa muuta teksti paksummilla kirjaimilla, alleviivauksin tai koristeellisemmin sulkakynän vedoin. Sen sijaan tällaiset tehokeinot on helppo havaita ihmissilmällä.

Hovioikeus on merkinnyt alleviivauksin huomioita Kajaanin raastuvanoikeuden tuomiokirjaan vuodelta 1686. Esimerkiksi kaupunginkirjurin toisteinen kirjoitusvirhe ”pitäjänpitäjänkirjuri” (*Sochne Sochne Skrifwaren*) on alleviivattu. Samoin alempana kirkkoherra Jakob Teudschoviuksen sukunimen omituinen kirjoitusasu (*Teuscovi*) on huomioitu. Kuva: Kansallisarkisto.

Kokemukseni perusteella alkuperäisaineistojen lukeminen omin silmin ja tulkinnoin on ylipäätään hyvä tapa syventyä tutkittavaan aikakauteen, käsinkosketeltavalla ja inhimillisellä tasolla. Koska historiatiede tutkii menneisyyttä viime kädessä tekstien avulla, on ainakin joskus hyvä vilkaista alkuperäisaineistojen ilmettä ja miettiä niitä ihmisiä, jotka ovat laatineet aineiston satoja vuosia aiemmin persoonallisella tyylillään. Työergonomisesti järkevän etä- ja koneluennan vaarana on se, että tutkija etääntyy menneisyyden artefakteista, joten ainakin välillä on hyvä harjoittaa myös aineiston itsenäistä lähiluentaa. Siksi vanhojen käsialojen tulkinta on perustaito, joka on hyvä löytyä takataskusta vähintään tarvittaessa. Kannustan siis jokaista varhaismodernista ajasta kiinnostunutta uusien teknologioiden suomien mahdollisuuksien ohessa opettelemaan ja aika ajoin ylläpitämään henkilökohtaisia paleografian taitoja. Se tekee tutkimuksesta myös inhimillisempää!

FM Petteri Impola on väitöskirja- ja projektitutkija Jyväskylän yliopiston historian ja etnologian laitoksella. Hän tutkii parhaillaan kaupunginkirjureita 1600-luvun Ruotsissa Koneen Säätiön rahoittamassa hankkeessa Kielen asiantuntijat, menneisyyden tallentajat: Kaupunginkirjureiden ammattikunta, toimijuus ja kielenkäyttö Ruotsin valtakunnassa varhaismodernina aikana 1614–1714 (Kaski).

New 17th-Century Swedish-Recognizing HTR Model Published

Kirj. Ville-Pekka Kääriäinen | 12.6.2023

Handwritten Text Recognition (HTR) is an AI-based technology that allows historical handwritten sources to be converted into machine-readable and digital format. This technology is fairly new, with most development occurring during the EU-funded READ (Recognition and Enrichment of Archival Documents) project from 2016 to 2019. Unlike Optical Character Recognition (OCR) for printed text, HTR requires the user to first teach the machine to read old handwriting before it can make interpretations. The work is done through a program called Transkribus, which generally does not read all handwriting; separate models must be created for different handwriting types. Even though models can be shared with the public, one of the biggest challenges is getting users to work together and share their models with each other.

It gives me great pleasure to share that I have now published my own HTR model that is adept at recognizing 17th-century Swedish, which is now available for use in the Transkribus program. This is a significant contribution to early modern historians, as the model is specifically designed to read the so-called ’New Gothic’ script of the 17th century, also known as the ’German script’. The model, named ’Swedish 17th Century (Savo, Eastern Finland)’, can be accessed by all interested researchers in the public models section of the program. At the moment, Transkribus provides 500 free credits to new users of the program, in addition to which, researchers working on their master’s or doctoral thesis can apply for free processing through the developers.

In HTR technology, the model’s ability to understand handwriting is dependent upon the training it has undergone. My model includes training data from various document collections held in the National Archives of both Finland (Kansallisarkisto) and Sweden (Riksarkivet). The majority of the training material consists of court records. As a whole, the model incorporates:

• District court (Finnish: kihlakunnanoikeus, Swedish: häradsrätten) records related to the Iisalmi parish from 1639–1699 (jurisdictions of Savolax 1639–1650, Kajana friherreskap 1651–1680, Lilla Savolax 1681–1699)

• Mentions of Iisalmi parish residents in the lagmansrätt (Finnish: laamanninoikeus) records from 1643–1699 (Legal district of Karjala, Karelska lagsagan)

• Letters sent to Count Per Brahe by local officials, clergy, and townspeople (Skokolstersamlingen, Rydboholmssamlingen)

• The complaints of the common people (Finnish: rahvaanvalitukset, Swedish: allmogens besvär) from the jurisdiction of Lilla-Savolax (Finnish: Pien-Savo)

Example image of court records. Iisalmi parish district court 1688. National Archives, jurisdiction of Lilla Savolax.

How and Why Was the Model Created?

The idea of developing my own HTR model was born when I was working on the READ project at the National Archives in 2019, where automatic recognition of handwritten text was being used on 19th century court records. I joined the project at a point when significant results were already being achieved with the launch of the HTR+ engine. I was naturally impressed with how well it could read 19th-century handwriting, and I wanted to test it on my own material, although I knew that the 17th-century handwriting would likely be more challenging for the AI to interpret and the quality of the existing digitized images was during that time substantially poorer.

Simultaneously, as I began working at the National Archives, I also embarked on postgraduate studies at the University of Helsinki. In my dissertation, I’m examining the progression of state formation in the parish of Iisalmi from 1639–1699. My primary source material is the 17th-century district court records. I had already used court records in my master’s thesis, during which I faced challenges interpreting the handwriting. Given the gap between my thesis and the start of my doctoral work, I became concerned about how I would interpret my source material at the required level. For these reasons, I thought that technology could aid in understanding old handwriting and perhaps even facilitate the task. As everyone who uses digital research methods knows, one can hardly speak of ”saving time”. However, I do not regret the hours spent with Transkribus, as painstaking transcription of each word forced me to learn 17th-century handwriting. I can recommend the use of Transkribus to other researchers for learning to read any old handwriting.

Developing my now-published HTR model required a significant personal commitment. While working at the National Archives, I started to create the model as an ”evening job” and could devote myself to it full-time in the fall of 2020 when I received funding to become a full-time doctoral researcher. The majority of the material in the model was created between 2020–2021, and I have not significantly updated it since my main work with the source material has ended. Originally, I had thought to publish the model only in conjunction with my dissertation, but I decided to publish it now when the final funding for my dissertation was secured from the SLS Foundation.

How Has the Model Developed and What Are Its Limitations?

I started developing the model from 1680s court records, as the handwriting of the scribe during this period was familiar to me. At that time, there were no other models available (currently, the public models also include a model specialized for the 17th-century court records of Jämtland province, Sweden), necessitating manual transcription of the records. The first model was based on 12 double-pages of court records and yielded a promising Character Error Rate (CER) of 13 %, meaning it could correctly recognize 87 out of 100 characters. Initial test were encouraging, especially when the model was tested on material from the same scribe. Despite its shortcomings, it was quicker to correct the model’s errors than to transcribe everything manually. However, the model’s performance dropped noticeably when confronted with another scribe’s slightly different handwriting style.

Today, my model demonstrates a competent ability to read random 17th-century Swedish ’New Gothic’ script. The CER for the validation set stands at 3.8 %, which is concerned to be a good result. However, the more I have worked with Transkribus, the less I believe that CER alone is a reliable indicator of a model’s quality. This is because the validation set would need to be more extensive and include unfamiliar handwriting styles. Only through testing can one truly determine how well the model is able to interpret different handwriting styles. From a model development perspective, I would appreciate it if the individuals who have tested the model could also share their experiences regarding the types of data it performs well with, as well as the types of data that present challenges.

Given my geographical focus, which is primarily the parish of Iisalmi with some inclusions from other areas, the model’s ability to read proper names (both personal and place names) is somewhat limited. HTR models that utilize machine learning risk overfitting if the training material does not adequately represent the material under study. If the primary goal of my doctoral project had been to create a model that could read 17th-century handwriting as effectively as possible, I would have selected material from different scribes and geographical areas. Now the artificial intelligence has mainly been taught a very limited number of different surnames and place names. Because these occur so frequently in the training material, it is very likely that a researcher whose material reads Leppänen and Perttunen will get recognition results where these names have been interpreted as Lappalainen and Partanen, as these are the most common surnames in Upper Savo (Ylä-Savo).

Additionally, the distribution of training material over time is uneven, since there is more training material from the end of the 17th century. This bias is due to the increased frequency and duration of court sessions during this period, with the length of court records surging dramatically from a single double-page record in 1639 to as many as 38 double-pages in 1693. Similar biases are present in other material, mostly originating from the end of the 17th century. For instance, more complaints from the common people have survived from the latter half of the century. Conversely, the Count Per Brahe letter collection from 1650–1680 offers a slight counterbalance.

When preparing training material, the researcher should be very consistent in how they transcribe historical sources. For machine learning, it would be best to operate systematically and in the same way in every situation. This is somewhat more achievable for individual researchers, while group-based researchers must communicate extensively about transcription rules. However, it is crucial even for individual researchers to document their process, a step unfortunately overlooked in this model’s development. This has likely led to some inconsistencies in the model that would require work to correct. Some of them relate to my personal learning process, I myself did not know the material well enough at first to define perfect practices, although fortunately working in the National Archives project gave me some examples to follow. One example of inconsistency is marking the letter ”k” in many places with a capital and sometimes with a lowercase, as I could not perceive the difference between a capital and a lowercase ”k”.

Similar mistakes can be also found. For several scribes, certain letters also look the same, and I probably did not always know to systematically decide whether to mark a letter as ”a” or ”e”. The genitives of the Swedish language, which are marked as underlines, are sometimes ”s”, sometimes ”z”, and sometimes left unmarked. In general, I have favored efficiency over precision in transcribing, so there would probably be a lot to correct in the training material, especially now that I can read the handwriting better.

In general, I have followed the principle that the transcription should remain faithful to the original source. Consequently, I have not expanded abbreviations. For commonly occurring abbreviations, I have maintained a specific way of noting them down. For instance, the currency thaler (Swedish: daler) is transcribed as either ”D” or ”D:r”, and the mark as either ”m” or ”m:r”, depending on the scribe. My goal has been to convey the scribe’s intent, while respecting the variations in spelling for the same words. In cases of uncertainty, I have used the unclear tag provided in Transkribus, and these lines were excluded from the training material when creating the model.

Although I cannot say with certainty, I speculate that the diversity in image quality of the training material has influenced my model’s recognition capabilities. The model’s development primarily took place before the National Archives of Finland initiated the redigitization of the 17th-century court records. Consequently, I had to rely on digitizations made from microfilms created in the 1950s. The quality of these black-and-white images ranged from excellent to extremely poor, with many images requiring processing with photo editing software before being input into the program. I also photographed parts of the district court records in the National Archives of Finland myself, with special permission. The material sourced from the Swedish Riksarkivet also varied in quality. Some pages were captured with a mobile phone camera, others digitized from microfilm, and yet others were high-quality digitizations from the original source. As a result, the model’s performance might improve if the task was carried out on a more uniform and higher-quality set of images.

Example image showing how the quality of microfilm digitization can sometimes be poor. Unedited and edited images side by side. With image editing, the quality and recognition accuracy improves significantly.

What are the Advantages of the HTR Model and the Future Prospects?

The HTR processing and model creation constituted a relatively small segment of my ongoing doctoral research. After the model’s construction, I have established a research database from the material, wherein each entry represents an individual court case. The creation of this research database might have been possible without transcriptions, but it might have remained unfinished. In the database, I include, for instance, legal history data of all individuals cited in court, which, to my knowledge, is a fairly unique approach to 17th-century court records.

During my master’s studies, I did not participate in any specific methodological courses, and Transkribus served as a springboard for my exploration of other digital methods applicable to 17th-century court records. Having sources in a digital and structured format enables the use of different methodologies. Besides traditional descriptive tables and charts, I have, for example, incorporated network analysis into my data. I have also developed an interest in mapping data using the QGIS program and experimenting with spatial analysis of land disputes reflected in court records.

Map. The neglected challenges (failure to appear in the court).

It is worth emphasizing that HTR introduces many new opportunities for a researcher primarily focused on qualitative research. Although I have reviewed the court records twice, identifying individual cases and case histories pertinent to my research themes without the database and transcriptions would have been a challenge.

After completing work on Iisalmi’s court records, I have sporadically transcribed new teaching material. Typically, I use the model to speed up information retrieval, thus I run my model and use the transcription for data extraction rather than correcting mistakes made by the model, which does not generate new teaching material. Nevertheless, my future intentions still include model development. At this point, it also seems logical to evolve the model in collaboration with other researchers.

Being a board member of the newly established association, ‘Network for Early Modern (ca 16th–17th century) Research in Finland’ (Uuden ajan alun tutkimuksen verkosto ry), I share a mutual interest with other board members in developing HTR models for earlier periods. Specifically, our goal is to develop a better model focused on the recognition of 17th-century Swedish language. We have initiated preliminary cooperation with the National Archives of Finland towards this end, and releasing my model publicly is the first step in this collaboration. Our objective is to create a temporally and geographically diverse model capable of interpreting 17th-century handwriting from a variety of source material as accurately as possible. One of the substantial challenges of HTR processing is generating training material. If you are interested in jointly developing an improved model, sharing your own data, or participating in source transcription, please contact:
ville-pekka.kaariainen@helsinki.fi or the association’s email address directly at toimikunta@uudenajanalku.fi.

Acknowledgements

Firstly, my gratitude goes to all the developers and maintainers of Transkribus. I have been operating during a period when the program was still free, and I also received complimentary credits from the program developers as part of the Scholarship Programme to process my material. I extend my thanks to the National Archives of Finland, especially to Maria Kallio who led the READ project, as well as other people who worked in the team. Secondly, I express my gratitude to my supervisor Anu Lahtinen, who introduced me to the world of old handwriting. Lastly, I would like to acknowledge all the funders of my doctoral dissertation, including the OLVI Foundation, the Finnish Cultural Foundation, the Emil Aaltonen Foundation, and the Svenska litteratursällskapet i Finland (SLS Foundation).

Ville-Pekka Kääriäinen

The author is an MA and currently a PhD researcher at the University of Helsinki. The doctoral research focuses on the progress of state formation in the parish of Iisalmi (Upper Savo) during the years 1639–1699.

Cover Image:

Screenshot from the Transkribus Lite. Winter court of Iisalmi parish 1652. National Archives, jurisdiction of Kajana Friherreskap. Person names are tagged in green and place names in violet. Errors in the transcription have been corrected.

Links:

The description details of the model in the Transkript’s public collection of models. Retrieved June 6, 2023, from here

Sources:

Kääriäinen, Ville-Pekka (2021). Käsin kirjoitetun tekstin automaattinen tunnistus (Automatic Handwritten Text Recognition) in Vanhojen käsialojen lukuopas (Old Handwriting Manual), edited by Petteri Impola, Pirita Frigren, Petri Karonen, Matti Roitto, Antti Räihä, & Jari Eilola. Helsinki: Gaudeamus.