Catholic alphabet

What follow are the notes taken for the writing of the Unicode proposal L2/26-117. The proposed letter is hopefully to be present in the release of Unicode 19, in 2027. The proposal was ultimately trimmed down for convenience, here remain full details about the alphabet as a whole.

Historical background

What we proceed to call the “Albanian Catholic alphabet” is the Latin-based alphabet used in Albanian literature from the earliest work of literature in the language (1555) till the onset of the 20th century. It does not constitute a standardised and regular orthographic scheme, but rather a family of orthographies sharing a varyingly Italianate choice of digraphs and four special characters for the sounds dh ~ th /ð/ ~ /θ/, ll /ɫ/, y /y/ and z /z/.

All works written with these symbols were written by Catholics, and almost all of them were written for religious purposes, hence the label we give (Skendi 1960: 264f.). All of the books written in the alphabet published in the 17th–18th centuries were printed in Italy, most commonly by Propaganda Fide in Rome, while in the second half of the 19th century a good number of works using these symbols began to be printed by the Italian missionaries in Shkodër and Catholic Albanians of the city, which is why, whithin that context, the alphabet is also referred to as “Old Shkodran” (shkodranishte e vjetër, see Rrota 1936: 92, Çabej 1968: 50). The alphabet fell out of use gradually after the creation of the Bashkimi alphabet in 1895, which got rid of these symbols for a much cheaper to print in pure Latin orthography.

Extant Albanian literature, putting aside small Albanian-text portions in foreign-language texts of the 15th century, begins with Gjon Buzuku, Catholic priest, who printed in 1555 the so-called “Missal” (Meshari), most likely in Venice. The only known original copy of the work is held in the Vatican Library. The book is an outlier both historically and linguistically, making it the most heavily researched work of early Albanian literature. Its writing system is Latin with five extra letters borrowed from Old Bosnian Cyrillic, also known as Bosančica, and a limited number of Italian digraphs. There is no scholarly consensus, although the most commonly endorsed view is that this script was not an invention of Buzuku, but an earlier one.

In the 17th century the authors Pjetër Budi, Frang Bardhi and Pjetër Bogdani, all reportedly unaware of the existence of Buzuku’s work, which was rediscovered only later, inherit three out of the five Cyrillic-derived characters that we find in Buzuku. They use increasingly more Italianate spelling rules and in Bogdani we find the earliest attestation another letter from Greek. From this period we also find an interesting letter directed to the Pope by the Elders of Gashi, handwritten in this orthography.

In this polished form, the orthography continues being the most widely form of written Albanian for the following two centuries. In the second half of the 19th century there is a boom of Albanian-language books written by the Jesuits and Franciscans in Shkodër. Over time the spelling rules gradually lose their Italian appearence and get rationalised, e.g. the sequence sha /ʃa/, previously ⟨scia⟩ according to Italian orthography, starts being written as ⟨sca⟩. Eventually the trend of rationalisation of the script culminated in the creation of the Bashkimi alphabet in 1895, which got rid of all non-Latin characters, and which after more innovations in the following decade, became at the Congress of Manastir in 1908 the co-official script of Albania alongside the Frashëri alphabet, which it managed to outclass, eventually becoming the standard Albanian orthography used today.

One note-worthy outlier still written with the Catholic symbols, both for its late date and for its content, is the T’ nnollunat e Shqypnis (1898), written by Austro-Hungarian Albanologist Ludwig von Thallóczy and translated by Stefë Curani (none of which are explicitely credited), which unlike all the other major works in the script deals with history rather than religion.

A new codepoint for ethe

First attested in Buzuku (1555), this letter ambiguously stood for both dh and th. It is rarely found written double, and in those cases it also can stand for both dh and th. Already from Budi onwards the letter is regularly found standing for dh when single and th when double, and as such it remains until the end.

Appearence of the ethe through the centuries. From left to right, 16th c. (Buzuku, no casing), 17th c. (Budi, Bardhi, Bogdani, no casing), 18th c. (Da Lecce, the Kuvendi, Kazazi, with casing), 1845–1875 (with casing), 1870–1904 (with casing and a bold-face and italic variant)

The precise shape of the letter varied greatly through time, beginning in Buzuku (1555) it reached both the descender and ascender line, while near the end it perfectly fit within the baseline and the x-height. The letter’s origin lies in the uncommonly used Old Cyrillic letter ksiѯ⟩, with a shift of phonetic value represented.

When within the alphabetical order, the letter was placed between ⟨f⟩ and ⟨g⟩ in Bogdani (1685, see § Gallery), mimicking the position of ⟨θ⟩ in the Greek alphabet. The Albanian-Italian dictionary by Rossi da Montalto (1875) alfabetised ⟨ξdh and ⟨ξξth as separate letters, after ⟨z⟩ together with all the other non-Latin letters: his order is ⟨..., u, v, z, ξ, ξξ, ɛ, ȣ, λ⟩. The dictionary by Jungg (1895) on the other hand placed ⟨ξdh right after ⟨d⟩ and ⟨ξξth right after ⟨t⟩, matching the order of the modern Albanian alphabet. It is noted that none of these positions try to mimic the position of Greek ⟨ξ⟩, which we would have expected between ⟨n⟩ and ⟨o⟩.

Until now, to digitalise it, scholars have most often used the Greek codepoint ⟨Ξ⟩ ~ ⟨ξ⟩ U+039E ~ U+03BE GREEK CAPITAL ~ SMALL LETTER XI in lack of a better alternative, however we consider this misleading both visually and semantically. It was perceived as a distinct letter, and over time acquired a stable and distinct shape, especially near the end not going above the x-height in its lowercase form and gaining a circular bowl shape in its uppercase form. Some works using the symbols also have some text in Greek, showing which would make the idea of simply entrusting the issue to the typeface non-trivial to implement (example from Bogdani, see § Gallery). We propose the addition of two new codepoints for the lower-case and the upper-case forms.

Bogdani (1685), volume 2, calls the letter ethe (see § Gallery). In the Bashkimi primer we find that it was called ⟨eξξeethe when double and ⟨eξeedhe when single (see § Gallery). We propose LATIN CAPITAL ~ SMALL LETTER ALBANIAN ETHE as the Unicode name. An acceptable alternative would be LATIN CAPITAL ~ SMALL LETTER ALBANIAN XI, or KSI.

How to encode the other letters

The letter ⟨ȣ⟩ appears in Buzuku (1555) standing for both /u/ and /y/, just like ⟨u⟩ could. In all the following works, ⟨u⟩ stood for /u/ while ⟨ȣ⟩ for /y/. It derives from Bosnian Cyrillic uk⟩, which simply stood for /u/. In the 1618–1621 editions of Budi’s works the letter has the top closed, with a shape identical to the digit 8. From the 18th century onwards, the lower-case form stands below the x-height. It should be encoded as ⟨Ȣ⟩ ~ ⟨ȣ⟩ U+0222 ~ U+0223 LATIN CAPITAL ~ SMALL LETTER OU.

The letter ⟨ɛ⟩ is attested in all works regularly standing for /z/. It derives from a mirrored Bosnian Cyrillic zemljaз⟩. Many works also employ it in the digraph ⟨ɛc⟩ or ⟨ɛg⟩ for zh /ʒ/, as a parallel to the Italian-based ⟨sc⟩ for sh /ʃ/. It should be encoded as ⟨Ɛ⟩ ~ ⟨ɛ⟩ U+0190 ~ U+025B LATIN CAPITAL ~ SMALL LETTER OPEN E, which matches the letter visually in both casing forms. We propose to annotate the codepoint to document this usage, which is otherwise not well described by the Unicode name. The Cyrillic codepoint ⟨Ԑ⟩ ~ ⟨ԑ⟩ U+0510 ~ U+0511 CYRILLIC CAPITAL ~ SMALL LETTER REVERSED ZE is only closer in name, as it is merely the Latin epsilon adopted into Cyrillic as a vowel letter, so we would be mixing scripts with no benefit.

The letter ⟨λ⟩ is first attested in Bogdani (1685), and is the only letter of the orthography to be borrowed directly from Greek, rather than through Bosnian Cyrillic. We encode it straightforwardly as ⟨Λ⟩ ~ ⟨λ⟩ U+039B ~ U+03BB GREEK CAPITAL ~ SMALL LETTER LAMDA, which is a perfect match both visually and semantically.

The letter ⟨ћ⟩, derived from Bosnian Cyrillic djerv⟩, is only attested in Buzuku (1555). As the Cyrillic equivalent, it could stand for both q /c/ and gj /ɟ/, but unlike Cyrillic it is also found to represent /ɡ/, alongside ⟨g⟩. We suggest the modern Serbian codepoint ⟨ћ⟩ U+045B CYRILLIC SMALL LETTER TSHE. Another good match is the Old Cyrillic codepoint ⟨ꙉ⟩ U+A649 CYRILLIC SMALL LETTER DJERV. The two however can be considered to be the same letter, and the former has greater font support.

There is another character exclusive of Buzuku, descending from Bosnian Cyrillic vediв⟩, resembling a white square (for the Bosnian Cyrillic letter, see § Gallery). It occurs 15 times in total, only found in the word “hunger” (whence modern standard Albanian uri), possibly standing for /w/ or similar (see Çabej 1968: 49, Genesin & Matzinger 2019: 109f., Demiraj 2025: 7). In two occations the word is spelt without the character, as ⟨vu⟩ and ⟨ȣnii⟩. Whatever precise phoneme the letter stood for was already marginal in Buzuku’s times and died out presumably shortly after, and as such the letter is not found in any other author. Given that Buzuku does not make casing distinction for any of the non-Latin letters, this letter only ever appears lower-case. Visually, the issue could be avoided by using ⟨□⟩ U+25A1 WHITE SQUARE, which however is a symbol rather than a letter and fails to recognise the Bosnian Cyrillic origin. The best option seems to be the Cyrillic codepoint for ⟨в⟩ U+0432 CYRILLIC SMALL LETTER VE, as the transcription on the TITUS project by Wolfgang Hock does, with the downside of visually misleading.

Buzuku employs a tailed ⟨ʒ⟩ for x /d͡z/, which is to be encoded as ⟨Ʒ ʒ⟩ as U+01B7 ~ U+0292 LATIN CAPITAL ~ SMALL LETTER EZH. (See gallery for origin discussion.)

He also adopts a ⟨cz⟩ symbol for c /t͡s/. Typographically, the glyph was achieved by rotating the ⟨ʒ⟩, however semantically it was supposed to stand for the ⟨ç⟩, following Venetian tradition of using ⟨ç⟩ for /t͡s/ (Fergusson 2007: 77). We suggest the codepoint ⟨ç⟩ U+00E7 LATIN SMALL LETTER C WITH CEDILLA.

Gallery

Buzuku (1555), folio 31r, 44r and 100v (left to right). Highlighted only ⟨ξ⟩. (Image source: BKSH)
Bogdani (1685), volume 2, page 141 (left) and page 146 (right), showing how while ⟨λ⟩ looks identical in the Albanian and Greek portions, the Albanian ⟨ξ⟩ and Greek ⟨ξ⟩ look very different. (Image source: BKSH)
Left, one of the last pages of Bogdani (1685), volume 2, calling ⟨ξethe, ⟨λ⟩ λuλa, ⟨ɛ⟩ se and ⟨ȣ⟩ eu. Also note the position in the alphabetical order matches Greek ⟨θ⟩ rather than ⟨ξ⟩. Right, Abetari i qitun prej Shoqniet t’ Bashkimit t’ gjuhës shqype (1895), page 6, comparing the new proposed orthography with the old symbols. (Image source: BKSH)
Domenico Pasi (1896), Disa habere përmi apostollim t’ urats, Shkodër: Catholic College Press, page 3. A z with a hook is employed regularly in bold-face, both in this and other short pamphlets published in the same period. Non-bold text keeps the usual form. (Image source: BKSH)
The most common choice for capitalising the double letter was ⟨Ξξ⟩, here Da Lecce (1716: 156). However, ⟨ΞΞ⟩ is also attested, here Guagliata (1845: 35). (Image source: Google Books)
Marcozzi (1882: 34, 358, 375). [Fixed caption...], where the round-based variant is used for lowercase in the case of voiceless th, while the flat-based variant is employed for the voiced dh. Given that the main point of differentiation is still whether the letter is single or double, the round-based variant can be considered a contextual variant. In capital forms, both phonemes use the same glyph, capitalising both letters for Th. (Image source: BKSH)
Bashkimi primer (1895: 8). An instance of cursive ⟨ξ⟩, appearing like a striked z with a hook. (Image source: BKSH)
Title pages of Jungg (1895) and Pesmdhet biseda (1900), showing an alternative capital form, only found in headers. The rest of the book contains the usual round-bowled form. (Image source: Google Books, BKSH)
Elçija i Zemërs Jezu Krishtit, February 1909. The periodical began being printed in 1891 and kept using the Catholic orthography all the way until 1909, after which they began using the modern orthography. The other picture is from the bibliography by Legrand (1912: 196). [title in bibliography, sorry to find these now, they are not necessary to include]

Figures in the separate UTN

Excerpts of Bosnian literature. Left, the Charter of Ban Kulin (1189). Right, Matija Divković (1611), Nauk krstjanski. The usage of the characteristic Bosnian shape of ⟨в⟩ are highlighted. (Image source: Wikimedia Commons)
Budi (1618: 184a) on a single occasion, uses a particular five-pointed symbol to represend zh /ʒ/. In every other occasion, he uses ⟨X⟩ (here for example, p. 194a, in the same sentence ⟨giξξe Xȣem epegaam⟩ gjithë zhyem e pëgām). Due to the rarity of the glyph, we do not request a codepoint for it. The dingbat U+273C OPEN CENTRE TEARDROP-SPOKED ASTERISK ✼ or any other ad hoc encoding could be employed. (Image source: BKSH)
Francesco Rossi da Montalto, Gaspare Crasnich, ed. (1870), Il vangelo di S. Matteo, tradotto dalla volgata nel dialetto albanese ghego scutarino, London, uses a regularised and 1:1 phonetic alphabet to represent the dialect of Shkodër, using the alphabet in use in the city as a starting gound. The italic vs. non-italic distinction used to differentiate e from ë is carried over from the other Gospel translations made in London, and the same concept is used to differentiate g from gj and o from œ. This practice cannot be seen elsewhere from the London Gospels. (Image source: British Bible Society)
Buzuku (1555: 11v). Other glyphs used in the text, not dissimilarly to other texts of the time, are ⟨ſ⟩ (in orange), ⟨ꝛ⟩ (in yellow) and ⟨ꝺ⟩ (in blue), for which our recommendations are U+017F LATIN SMALL LETTER LONG S, U+A75B LATIN SMALL LETTER R ROTUNDA and U+A77A LATIN SMALL LETTER INSULAR D respectively. These are graphical variant which do not differ in value from the coexisting ⟨s⟩, ⟨r⟩ and ⟨d⟩. (Image source: BKSH)

In Northern Italian Gothic, which Buzuku’s Missal is printed in, the ⟨ʒ⟩ was simply the usual shape of the letter ⟨z⟩, see for example various forms of scandalizo from the Sacerdotale Romanum (f. 3r), i.e. BSB, 4 Liturg. 592, published in Venice in 1554 (Image source: Bayerische Staatsbibliothek). The same can be said of the letter in Buzuku (f. 108r in top-right): ⟨tih meh en ʒoreh enſeh bdieri: emeh en ʒore ẽ motit seh cheћ⟩ ti më ënxore ën së bdjeri, e më ënxore ën motit së keq. However we find to different glyphs for the uppercase form: ⟨Ʒ⟩, with 15 occurrences, and ⟨Z⟩ with 5 occurrences. These are graphical variants and do not differ in value, here are given folios 97r and 31v, in the same sentence ⟨e hini ende(h) ſtepii teh Zachariſſe⟩ e hini ëndë shtëpī të Xakarisë. Lowercase ⟨z⟩ without the loop is never attested, against the 217 attestations of ⟨ʒ⟩. To keep the graphical distinction of the uppercase forms, our recommendation is to use U+0292 LATIN SMALL LETTER EZH for the lowercase form as well rather than U+007A LATIN SMALL LETTER Z. (Image source: BKSH)
An instance of ⟨ç⟩ in Buzuku (f. 26v), ⟨ћi ћiξe ſpirt eççen: encheſo iete per tuu⟩ qi gjithë shpirt ecën ën këso jetë për tȳ (Image source: BKSH). The letter ⟨ç⟩ is also found in later works, here Kazazi (1743: 46), ⟨etænaçmoin Turt⟩ e të na çmojn t urt spacing intentional, where it however stands for /t͡ʃ/, as it does in the modern alphabet. (Image source: Google Books)

Table with comparisons

These are some conveniently gathered images of the four main Catholic symbols throughout the centuries. The sources they are taken from are below. All are from BKSH, except for the letter of the elders of Gashi, from Wikimedia Commons.
year dh Dh ll Ll y Y z Z
1555 not used
1618
1635
1664
1685
1689 not used
1706
1856
1873
1882
1887
1900
1904

Additional characters used by Buzuku (1555).

image encoding
⟨ћ⟩ CYRILLIC SMALL LETTER TSHE
⟨ʒ⟩ LATIN SMALL LETTER EZH
⟨Ʒ⟩ LATIN CAPITAL LETTER EZH
⟨в⟩ CYRILLIC SMALL LETTER VE
⟨ç⟩ LATIN SMALL LETTER C WITH CEDILLA

Changes to Unicode

New codepoints

image possible name
LATIN SMALL LETTER ALBANIAN ETHE
LATIN CAPITAL LETTER ALBANIAN ETHE

Annotations

Annotate ⟨ɛ⟩ U+025B LATIN SMALL LETTER OPEN E to mention its use as /z/ in the Albanian Catholic alphabet.

Secondary sources

Notable works in Catholic Alphabet

For each, only the first edition is considered. Does not include works of lesser importance.
Pink text stands for misencoded text, which is to be substituted with the properly displayed character in the finalised proposal.
See also the Frashëri alphabet proposal.
By Catonif, 2026. catonif.dev@gmail.com