Applying a replacement map to characters in emacs

Table of Contents

img

The Problem

I have text in Cyrillic and need to replace individual characters with their transliteration. I have a tiny json of the mappings:

{"в": "v","а": "a","ф": "f","ё": "yo","д": "d","ж": "zh","ы": "y","э": "e","л": "l","щ": "shch","я": "ya","й": "j","у": "u","н": "n","г": "g","с": "s","п": "p","ч": "ch","б": "b","х": "kh","е": "ye","ъ": "\"","з": "z","ю": "yu","ь": "'","ш": "sh","о": "o","к": "k","и": "i","ц": "ts","м": "m","т": "t","р": "r"}

And I have a number of files that contain lists with entries like

<p>основа</p>
<p>заставил</p>
<p>Лобзик</p>

Given those two inputs, how can I quickly add a links to files named with the transliteration of these words? EG, if I had a different json, change “abc” to “alphabravocharlie”?

Solution: Json -> Hashmap -> Replacement Function

The process turned out pretty smoothly with only a couple pitfalls. Here was the winning strategy:

  1. Convert JSON transliteration mappings into an emacs hash-table
    • I had to extend the json table to include both capitalizations of cyrillic, and to replace spaces with underscores per my particular project needs
  2. apply a function leveraging (gethash json) to every char of an input string
  3. Make entry functions: primary one that takes a string and outputs the transliteration, and secondary that takes a marked region and replaces it with the output of the first function. I only turned out to need the first because…
  4. Perform a regexp find-and-replace on the desired sections to change them completely

Code

(require 'json)

(defun tsa/transliterate-cyrillic (in-string)
  (interactive "P")
  (let* ((json-object-type 'hash-table)
         (json-array-type 'list)
         (json-key-type 'string)
         (json (json-read-from-string "{\"в\": \"v\",\"а\": \"a\",\"ф\": \"f\",\"ё\": \"yo\",\"д\": \"d\",\"ж\": \"zh\",\"ы\": \"y\",\"э\": \"e\",\"л\": \"l\",\"щ\": \"shch\",\"я\": \"ya\",\"й\": \"j\",\"у\": \"u\",\"н\": \"n\",\"г\": \"g\",\"с\": \"s\",\"п\": \"p\",\"ч\": \"ch\",\"б\": \"b\",\"х\": \"kh\",\"е\": \"ye\",\"ъ\": \"\'\",\"з\": \"z\",\"ю\": \"yu\",\"ь\": \"'\",\"ш\": \"sh\",\"о\": \"o\",\"к\": \"k\",\"и\": \"i\",\"ц\": \"ts\",\"м\": \"m\",\"т\": \"t\",\"р\": \"r\", \"В\": \"V\",\"А\": \"A\",\"Ф\": \"F\",\"Ё\": \"Yo\",\"Д\": \"D\",\"Ж\": \"Zh\",\"Ы\": \"Y\",\"Э\": \"E\",\"Л\": \"L\",\"Щ\": \"Shch\",\"Я\": \"Ya\",\"Й\": \"J\",\"У\": \"U\",\"Н\": \"N\",\"Г\": \"G\",\"С\": \"S\",\"П\": \"P\",\"Ч\": \"Ch\",\"Б\": \"B\",\"Х\": \"Kh\",\"Е\": \"Ye\",\"Ъ\": \"\'\",\"З\": \"Z\",\"Ю\": \"Yu\",\"Ь\": \"'\",\"Ш\": \"Sh\",\"О\": \"O\",\"К\": \"K\",\"И\": \"I\",\"Ц\": \"Ts\",\" \"\: \"_\",\"М\": \"M\",\"Т\": \"T\",\"Р\": \"R\"}")))
    (cl-flet* ((replace-char (x) (gethash (char-to-string x) json (char-to-string x)))
               (replace-all (s) (apply #'concat (mapcar #'replace-char s))))
      (replace-all in-string)))) 

(defun tsa/cyrillic-area (beg end)
  "Translate a selected region to cyrillic"
  (interactive "r")
  (let ((in-string (buffer-substring-no-properties beg end)))
    (save-excursion
      (delete-region beg end)
      (goto-char beg)
      (insert (tsa/transliterate-cyrillic in-string)))))
;; (tsa/transliterate-cyrillic "лЛЛЛобзик") ;; "lLLLobzik"

replace-regexp with elisp call

Mark this region:

<p>основа</p>
<p>заставил</p>
<p>Лобзик</p>

Replacement with elisp call:

M-x replace-regexp
<p>\(.*?\)</p>
# replace with:
<li data-audioname="\,(tsa/transliterate-cyrillic \1)">\1</li>

Result:

<li data-audioname="osnova">основа</li>
<li data-audioname="zastavil">заставил</li>
<li data-audioname="Lobzik">Лобзик</li>

Gotchas

  • It turned out, contrary to my given json, my project distinguishes between uppercase and lowercase cyrillic (hint: M-x upcase makes a very quick fix to add an uppercase section to the json), and links to multiword audio files with underscores for spaces
  • As it turns out, emacs has a function standard-display-cyrillic-translit which appears to do almost what I need; however, it asks for versions of transliteration, and it it also just modifies the display without allowing me to save the non-cyrillic in certain places.
  • Emacs has a function replace-region-contents that sounds like it is exactly what I want, but actually, not. It wants to replace a section with a buffer of something else, which I wasn’t prepared to figure out.

Resources

Tory Anderson avatar
Tory Anderson
Full-time Web App Engineer, Digital Humanist, Researcher, Computer Psychologist