Understanding how to Work with Bi-Directionality (BiDi) Text

I’m a native Hebrew Speaker. Hebrew, as well as Arabic and several other languages, is a Right-to-Left (abbreviated ‘RTL’) language, meaning that it is written from Right to Left. Translators and other content creators working with bidirectionality (abbreviated ‘BiDi’) content – i.e. text containing both Left-to-Right (abbreviated ‘LTR’) and Right-to-Left characters and segments – are facing some unique challenges when it comes to formatting the text correctly. The development of the Unicode Bidirectional Algorithm and the growing support that it has enjoyed in the last decade or so have significantly improved how softwares handle BiDi content. Today all modern Operating Systems – for the desktop and mobile – support BiDi virtually out-of-the-box. Usually there is still a setting or two to tweak, but long gone are the days of installing awkward languages packs and/or workarounds to overcome or mask some of the technical limitations of BiDi support.

Despite these advancements, some issues still remain. The algorithm is not perfect and although it makes life much easier, the difference in directionality is still a concern and something to keep in mind when working with BiDi content. The algorithm should be perceived as a mechanism that lays down the technical foundations upon which the BiDi content is built. It is largely the responsibility of the content creators to understand how the BiDi algorithm works and use best practices when preparing the content.

In this article I will attempt to explain how the BiDi algorithm parses and handles the directionality of text, its shortcomings, and describe some of the most common BiDi issues and how to solve them.

This article focuses on BiDi issues in a word processor and Translation Environment Tool (abbreviated ‘TEnTs’) enviroments, but the principles and solutions described here apply globally. For details about the corresponding terminology in a plain text or (X)HTML enviroment, please see Directionality of Paragraphs and documents section below.
If you are interested in the tl;dr version of this article, please jump to the Takeaway section.
Continue reading →

Analyzing Files in SDL Studio

One common complaint that experienced users of SDL Trados and some users coming from other tools have after switching to SDL Studio is that getting the file analysis statistics is complicated, slow, and cumbersome.

On the occasion of the upcoming release of Studio 2014 I thought to share the method I use for analyzing files in SDL Studio.

I want to lead by clearly stating that the purpose of this article is by no mean to encourage the practice of those arbitrary so-called ‘CAT tool discounts’, as if the translation work is about trading words by the bulk. The purpose of this article is to give a short overview of the difference in translation resources management and workflow between SDL Trados (and other tools that use the same concept) and SDL Studio, and suggest a relatively quick and efficient method to get the basic statistics for the professional’s own internal use for quoting and scheduling purposes.Continue reading →

Removing Tags from a Translation Memory

Earlier this week a colleague contacted me with an issue. One of his agency clients sent him a Translation Memory, but much to his dismay, the TM was cluttered with unnecessary formatting tags that rendered it pretty much useless. This clutter is usually being referred to as a “Tag soup”, and it is primarily the result of a PDF or image file conversion into formatted text. A Tag Soup could also form out of style mismatch when copying and pasting formatted text between applications, for example, copying an HTML email message and pasting it into a Word Processor. The conversion process or style mismatch introduces all kinds of unnecessary formatting tags that need to be removed in a post-processing stage before the converted document can be worked on. Emma Goldsmith wrote an excellent and exhaustive article about how to get rid of a tag soup in documents before starting to work on them, but what if the tag soup is already in the Translation Memory?Continue reading →

Counting Inserted and Deleted Words in Track Changes

While pondering which topic should I write about first, along came a project, and with it an interesting challenge that I thought could turn into an article.

In some highly regulated industries such as Legal and Pharmaceuticals/Healthcare it is not uncommon to get requests for updating an existing translation where the edits in the new version of the source document are indicated by the Track Changes feature.
This type of project presents a unique challenge for quoting, scheduling and processing.Continue reading →

Introductory Post

Hi Everyone,
This is just a short introductory post to start things off. You can find more details about this blog in the About Translation Therapy page.

In summary, in this blog I will attempt to cover all sorts of relevant content to the translation profession and work. From topics about the business/commercial side of things, through technical issues and challenges, to any other idea, thought or subject that are, or might be, relevant.
I also wish for this blog to become a worthy supplement to the invaluable information and advice provided by other great translation related blogs (a representative list of which can be found in the About Translation Therapy page) which I follow, enjoy and draw inspiration from.

Although I would like to dive in and experiment with the design and adding/removing elements and features, my main priority is start adding some content. But little by little I will certainly work work towards tweaking and optimizing the blog in terms of design, functionality, and performance.

Please feel free to register to the RSS Atom feedFollow me [@HiFiText] on Twitter, connect with me on LinkedIn (all links are also available at the footer of the webpage), contact me via this blog (the contact form will be added soon) or leave a comment with feedback, suggestions, or anything else.

I hope that you will enjoy the read.