Romaji2Kana

Dedicated to an amazing university teacher, this website enables users of common keyboards with Latin characters to write Japanese syllables like おちゃ or パン.
Leander Christmann
Leander Christmann
Romaji2Kana

Context

For a long time the Japanese did not have a writing system of their own.

In the 5th century court officials therefor started using the Chinese characters called Hànzì (in Japanese: Kanji), but those turned out to be a bit of a mismatch for the Japanese language. While Hànzì portray the short, unchanging Chinese words, in Japanese there are lots of mutable words with countless suffixes and particles.

It was only over the centuries, that methods were found to record the Japanese language in writing. By the 10th century, two syllabic scripts, together called Kana, had been created for this purpose. Initially, court ladies used the Hiragana script for their diaries, later Buddhist monks created a second syllabic alphabet Katakana.

Today, Kanji as well as Hiragana and Katana are used every day by the Japanese people:

  • Kanji characters are used to express basic words and ideas with usually the same or similar meaning that they had in Chinese originally + place and personal names, e.g. 先生
  • Hiragana syllables are primarily used to form native Japanese words, grammatical elements, verb conjugations and particles e.g. はじめまして + Kanji can always be written in Hiragana (if one doesn’t remember how to draw 橋 for example)
  • Katakana syllables are primarily used to form foreign names or loanwords, onomatopoeia, technical terms, and scientific names e.g. レアンダ

Now a problem occurs when you try to chat in Japanese. Because such a keyboard would need to have 46 characters just for the basic alphabet (that’s how many hiragana and katakana there are respectively). This is hardly feasible especially on mobile devices.

That’s why virtually everyone, even Japanese people, normally use American keyboards to type Japanese with a so called Input Method Editor (IME). That is an operating system component or installed program that allows user to enter letters available to them and convert them into a different writing system (here: Japanese).

But some people don’t have an IME at their fingertips, to convert their Roman letters to Japanese characters (e.g. Microsoft Japanese IME). Maybe because they don’t want to install an Input Editor. Maybe there is no IME. Or they find it quite troubling to use.

That’s where web-based IMEs come into play. They are readily available online and offer uncomplicated conversion services. Using them myself from time to time, I experienced however, that most of them have extremely unpleasant user interfaces resulting in rather poor user experience.

Planning

Because of this I decided to build a beautiful, easy-to-use, blazingly fast, free and secure web-based IME: the Romaji2Kana Converter.
(Our letters stemming from the Latin/Roman alphabet are called Romaji in Japanese)

On top of it there shall be an API for developers who want to build on top of this software.

1. Website

The website shall consist of the following pages:

  • Home: offers Romaji to Hiragana and Katakana conversion functionality, provide information about conversion standards, serve as a landing page
  • About: provide some context regarding me, my affiliation with the Japanese language and the purpose and values behind the Romaji2Kana project
  • Legal: specify details required for every website by the German Telemedia Act §5
  • Contact: put e-mail address for inquiries and put links to my StackOverflow, GitHub and Discord as well as this portfolio website
  • API: document all endpoints of the below described API with usage examples

2. API

The REST API shall serve various endpoints regarding

  • conversion (from one or more writing systems to one or multiple others)
  • validation (whether a given input is Japanese, Kana, Hiragana,… or not)

Design

Architecture

For maximum performance (the single most important factor when using a converter) the architecture should be kept to a minimum: a simple website and a standalone API.

Technology

The tech stack for the website will be:

  • HTML (of course)
  • Tailwind CSS for the beauty of the UI
  • JavaScript for simple, but crucial interactivity (the functionality of the conversion)
  • WanaKana Javascript library for detecting and transforming between Hiragana, Katakana and Romaji
  • Alpine.js lightweight JavaScript framework for more advanced interactivity with animations (like modals and accordions)

The tech stack for the API will be:

  • WanaKana Javascript library for detecting and transforming between Hiragana, Katakana and Romaji
  • JavaScript to use the WanaKana library again
  • Node.js as a runtime environment
  • Express.js only in local development as a server for the API

Infrastructure

The infrastructure supporting the software is in the Cloud on Amazon Web Services. In order for me to not pay any capacity that I don’t use, it must be 100% serverless.

Since the the website is mostly static plus a bit of client-side interactivity (which is done via JavaScript by the client), I do not need a a server for it anyways. A simple file storage with static hosting option is enough.

For the API I do need a server though, so I’ll go with the serverless compute offering of AWS. Sounds wrong, doesn’t it? Well, serverless ultimately just means a server that you don’t have to manage. And as it is the case here, you don’t pay for, if you don’t use it.

1. Website

Romaji2Kana Website Infrastructure

  • An Amazon S3 bucket stores the files of the website and serves them (having static hosting enabled and being publicly accessible by anyone).
  • An Amazon CloudFront CDN distribution serves cached copies of those files from edge locations.
    • While doing so, it encrypts the traffic from and to the website with an SSL certificate obtained from the AWS Certificate Manager.
  • On Route53 I bought the domain romaji2kana.com. I’ve set a DNS A-Record that points requests to the CloudFront distribution delivering the site.
Deployment to this infrastructure

The deployment to this infrastructure is automated by a CI/CD Pipeline via GitHub Actions.

Romaji2Kana Website CI/CD Pipeline

  • The GitHub Actions Runner will checkout the repository’s code on every push and use an Access Key to operate on my AWS account.
  • The “2. compile and minify CSS” step refers to the Tailwind CSS executable putting together the vanilla CSS required for my project and the subsequent minification.
  • I remove the .html extension from HTML files to produce clean URLs. Setting the --content-type "text/html" will make them behave just the same afterall.

2. API

Romaji2Kana Website Infrastructure

  • The RESTful API with all its endpoints is created in the Amazon API Gateway console with the API type being “Edge”.
  • This choice deployed our API to CloudFront, which serves it on edge locations.
  • In Route53 I set up a DNS A-Record for a subdomain api.romaji2kana.com pointing to that CloudFront distribution’s domain name.
    • As with the website, CloudFront gets a SSL certificate from AWS Certificate Manager to encrypt its traffic
  • Functionality to fulfill the requests is implemented as an AWS Lambda function

Implementation

You can find the code for both parts of the project publicly available on GitHub along with additional documentation in the respective READMEs.

Testing

The website is extensively tested by frequent personal usage.

The API on the other hand has a dedicated automated test suite of 180+ tests in my Postman account. It validates that exactly the expected responses are returned, even employing the Ajv schema validator to verify the JSON response bodies.

The below code snippet displays one of those tests. It makes 5 assertions, which build on top of each other to progressively ensure everything meets expectations.

GET https://api.romaji2kana.com/v1/to/hiragana?q=Ohayou gozaimasu.
pm.test("Status code is 200", () => {
    pm.response.to.have.status(200);
});

pm.test("Content-Type header is present", () => {
    pm.response.to.have.header("Content-Type");
});

pm.test("Content-Type header is application/json", () => {
  pm.expect(pm.response.headers.get('Content-Type')).to.eql('application/json');
});

const schema = {
    type: "object",
    "properties": {
        "a": { "type": "string" }
    },
    "required": ["a"]
};

pm.test("Response body is JSON and has a string property named \"a\"", () => {
    pm.response.to.have.jsonSchema(schema);
});

pm.test("The value of that \"a\" property is correct", () => {
    pm.expect(pm.response.json().a).to.eql("おはよう ございます。");
});


Read more about the project on the About page of the Romaji2Kana website.

WaldorfConnect
Collaboration

WaldorfConnect

A collaboration platform for German Steiner schools' students. It supports storage of files, communication via chat and user management for productive work.