Step by Step Instructions for

BUILDING A TM

FROM AN ORIGINAL DOCUMENT

AND ITS TRANSLATION

by Koral Özgül

No liability accepted. Use at your own risk.


CONTENTS

A. PREPARATION OF THE SOURCE AND TARGET STRINGS

B. PREPARING THE INTERMEDIATE EXCEL TEMPLATE (one time task)

C. TRANSFERRING THE STRINGS TO EXCEL TEMPLATE

D. PREPARING THE IMPORT FILE

E. IMPORTING THE FILE TO TRADOS


 

Introductory Note: The real starting point for the TM compilation is a table with two columns (source column and target column). As we usually don't have that ready, we first edit the initial files to match our needs. If you already have two columns of data containing the source and target strings, jump to B. PREPARING THE INTERMEDIATE EXCEL TEMPLATE.

  

A. PREPARATION OF THE SOURCE AND TARGET STRINGS

As a starting point, we have two Word documents; an English file (Figure 1) and a Turkish translation of it (Figure 2). These are variably formatted. We need to change their format.

Figure 1 - Original English Word document

Figure 2 - Translated Turkish Word document

 

Note: The below operations should be applied to both the Eng. and Tur. file separately.

A.1. The original document has a header. Since the header will be lost during this process otherwise, we copy it and paste to the top of the body text.

A.2. Some parts of text are separated with TAB characters. We remove these by searching and replacing them with Paragraph Marks (Figure 3):

Find what:         ^t

Replace with:     ^p

Replace All

Figure 3 - Replace TABs with Paragraph Marks

A.3. Now, we have to convert the table(s) in the document to normal text. Place your cursor into the table. Use the menu command Table > Select > Table to select the whole table.

A.4. With the table selected, use the menu command Table > Convert > Table to Text. Under Separate text with in the Convert Table To Text dialog box, select Paragraph marks and click OK (Figure 4). Repeat this with every table in the file.

Figure 4 - Convert table to text separated with Paragraph marks

A.5. Now we are almost done for the first stage, but the resulting text has a variable number of paragraph marks between the strings (Figure 5). This is partially because of the initial formatting. But another reason may be the changes done in the Turkish document deviating from the original. For example, there was only one tab in the English version between the entries at the beginning, while the second line in the TR file was aligned with two tabs instead, because the Turkish phrase was shorter.

Figure 5 - The result, with variable number of paragraph marks

We'll have to straighten up these irregularities and remove the gaps by repeatedly replacing every double paragraph mark with a single paragraph mark, until there are no double paragraph marks remaining in the document:

Find what:      ^p^p

Replace with:  ^p

Replace All

Figure 6 - Replace successive paragraph marks with a single paragraph mark

Observe the message that tells how many instances have been replaced. 19 instances have been found and replaced in this case.

Click OK.

Click Replace All again and watch the number of replacements (Figure 7).

Figure 7 - The number of replacements decreased from 19 to 5

Repeat the replacement procedure until the number doesn't decrease any more. The ideal case would be zero. But due to unknown reasons to me, many documents stop at 1 remaining. This one doesn't fall below 2 (Figure 8).

Figure 8 - The number of replacements doesn't decrease any more

Be sure that the number of replacements doesn't decrease any more and continue with the next step.

Now we have only one paragraph mark each, after every string (Figure 9) except at the very beginning and end, but that's not important.

Figure 9 - A single paragraph mark after each string

A.6. Our next task is to convert the whole text to a table with a single column.

To do this, place the pointer at the very beginning of the text (Figure 10).

Figure 10 - Marking text (only), starting point

Leave the pointer there and scroll to the very end of the file. Press and hold Shift key and click just behind the last character of all text in the document (Figure 11). Release the Shift key.

Figure 11 - Marking text (only), ending point

Alternatively, after placing the pointer at the beginning of text, you can press and hold Ctrl+Shift and press End key, then without releasing the Shift key (release Ctrl), navigate to behind the last character with PageUp/PageDown and/or arrow keys, and release Shift key when all the text is selected. Don't include empty lines at the end.

A.7. Use the menu command Table > Convert > Text to Table to convert the selected text into a table. Number of columns value should be 1 (Figure 12).

Figure 12 - Convert text to table with one column

Now we have an orderly table containing the source English strings in each cell (Figure 13).

Figure 13 - Table containing the English strings

A.8. Apply all of the above steps to the Turkish file. At the end, we'll have two tables (in separate files), one with English strings and the other with their Turkish translations.

Figure 14 - Table containing the translated Turkish strings

 

A.9. Save both files with different names in a temporary place.

 

 

B. PREPARING THE INTERMEDIATE EXCEL TEMPLATE

(This needs to be done only once for each language pair)

 

B.1. Prepare an Excel sheet with the following column entries in a single (first) row. Enter only the texts marked with yellow; blue texts here are for clarification.

Cell

Value

Explanation

A1

<TrU>

Translation Unit

This code will start a TM entry

B1

<CrD>22082008, 04:30:00

Creation Date

The first number signifies the current date in ddmmyyyy format and the second is the current time. These will be used in the TM entries as the Translation Unit entry date and time.

C1

<CrU>USER

Creating User

This will be used as the user name who has entered the Translation Units. You can replace "USER" with anything you want. Examples:
JOHN, TMIMPORT, SOME_COMPANY_NAME

D1

<Seg L=EN_US>

Segment Language

Use the appropriate code for the source language. The code for German is "DE-DE".

E1

 

Empty

This cell should remain empty. It will be used for entering the source strings later on.

F1

<Seg L=TR_01>

Segment Language

Use the appropriate code for the target language.

G1

 

Empty

This cell should remain empty. It will be used for entering the target strings later on.

H1

</TrU>

End of Translation Unit

This code will end a TM entry

The Excel sheet should look like this:

Figure 15 - TM import template example

 

B.2. Save the Excel file with a descriptive name (for example "TMImportTemplate_EN-TR.xls") in a suitable folder. You can use it for other conversions for the same language pair later.

 



C. TRANSFERRING THE STRINGS TO EXCEL TEMPLATE

 

C.1. Go to the Word document that contains the English string table. Place your mouse pointer somewhere inside the table. Select the whole table using the menu command Table > Select > Table.

Press Ctrl+Insert keys to copy the whole table to the Windows clipboard.

 

C.2. Go back to the Excel template. Select the cell E1 (the cell between <Seg L=EN_US> and <Seg L=TR_01>).

Press Shift+Insert keys to paste the Source strings into that column (column E), starting from the topmost cell (Figure 16).

Figure 16 - Source strings pasted into the import template

 

C.3. Go to the Word document that contains the Turkish string table. Place your mouse pointer somewhere inside the table. Select the whole table using the menu command Table > Select > Table.

Press Ctrl+Insert keys (Copy command) to copy the whole table to the Windows clipboard.

 

C.4. Go back to the Excel template. Select the cell G1 (the cell between <Seg L=TR_01> and </TrU>).

Press Shift+Insert (Paste command) keys to paste the Target strings into that column (column G), starting from the topmost cell (Figure 17).

Figure 17 - Target strings pasted into the import template

 

Caution: This is a good time to check the alignment. The resulting source and target columns should have the same number of rows. If there is a misalignment, you have to correct it before continuing.

 

C.5. Select the cells A1 through D1 (from <TrU> to <Seg L=EN_US>) (Figure 18).

Figure 18 - Select the cells A1-D1

Press Ctrl+Insert keys to copy the contents to the Windows clipboard.

 

C.6. Select the cell A2 (just below <TrU>). Scroll down to the last entry in the source and target columns. Press and hold the Shift key, click the row from the last entry in the column D to select all cells from A2 to Dn (where n is the last row containing string entries) (Figure 19).

It's the 33rd row in our example:

Figure 19 - Select the cells A2-D33

Press Shift+Insert keys to fill in the selected cells with the copied contents (Figure 20).

Figure 20 - Paste the copied contents into the empty cells A2-D33

 

C.7. Select the cell F1 (<Seg L=TR_01>).

Press Ctrl+Insert keys to copy the contents to the Windows clipboard.

 

C.8. Select the cell F2 (just below <Seg L=TR_01>). Scroll down to the last entry in the source and target columns. Press and hold the Shift key, click the row from the last entry in the column F to select all cells from F2 to Fn (where n is the last row containing string entries).

Press Shift+Insert keys to fill in the selected cells with the copied contents.

 

C.9. Finally, Select the cell H1 (</TrU>).

Press Ctrl+Insert keys to copy the contents to the Windows clipboard.

 

C.10. Select the cell H2 (just below </TrU>). Scroll down to the last entry in the source and target columns. Press and hold the Shift key, click the row from the last entry in the column H to select all cells from H2 to Hn (where n is the last row containing string entries).

Press Shift+Insert keys to fill in the selected cells with the copied contents.

You're done with this stage (Figure 21 and 22).

Figure 21 - The top...

Figure 22 - ... and bottom of the filled in template



 

D. PREPARING THE IMPORT FILE

 

D.1. Select the top left cell (A1).

Scroll down to the bottom right cell (Hn). Press and hold the Shift key and click the bottom right cell (Hn) to select all the populated cells (Figure 23).

Figure 23 - Select all the non-empty cells

Press Ctrl+Insert keys to copy it into the clipboard.

Warning: For too large files, this may take some time, depending on your system and hardware configuration (especially RAM). Besides, Excel has a limitation for maximum number of rows (65 thousand something). Thus, you may have to split very large files first and handle in several partial files.

 

D.2. Open an empty new Word document in MS Word.

Figure 24 - New Word document for TM import file preparation

 

D.3. Press Shift+Insert to paste the whole table into the Word document (Figure 25).

Figure 25 - Excel data pasted into the new Word document

It's not a problem if the table is not fully visible at the right side of the page - we won't do any manual editing in this file.

Warning: Word is even choosier than Excel in accepting large tables. But if you have prepared the initial string tables in Word (that is, if you haven't acquired the entries directly into Excel from another source), you should usually be able to paste the table into Word. It can take a long time. Be patient and wait. Go and get a cup of coffee!

If Word stops responding or your system hangs, reboot and try to do the procedure with smaller parts of the table.

 

D.4. Place the mouse pointer anywhere within the table.

Use the menu command Table > Select > Table to select the whole table (Figure 26).

Figure 26 - Select the table

 

D.5. Use the menu command Table > Convert > Table to Text to convert the table to text.

Under Separate text with in the Convert Table To Text dialog box, select Paragraph marks and click OK.

You'll get something like this:

Figure 27 - Table converted to text (paragraph marks shown)

 

D.6. Now, open the Replace dialog box (Edit > Replace). Apply the following replacement:

Find what:         <Seg L=EN_US>^p

Replace with:     <Seg L=EN_US>

Replace All

This will remove the paragraph marks at the end of "<Seg L=EN_US>" codes and append the English stings immediately to the code. That's the proper format for Trados export/import files.

 

D.7. Do the same formatting change for the target strings:

Find what:         <Seg L=TR_01>^p

Replace with:     <Seg L=TR_01>

Replace All

The paragraph marks at the end of "<Seg L=TR_01>" codes will be removed and the Turkish strings will be appended immediately to the code.

The result will look like this:

Figure 28 - The final result in Word format

 

D.8. The only remaining step is to convert the file to plain text format:

  • Use the menu command File > Save As,

  • enter a descriptive name in the File name field

  • and browse to a suitable folder,

  • select Plain text (*.txt) in the Save as type field,

  • click the Save button.

The File Conversion dialog box will appear. Check whether the special characters (if any) for that target language appear correctly in the Preview pane. If not, try to change the encoding from default to Unicode-UTF8 or another suitable codepage.

Figure 29 - File Conversion dialog box, showing correct TR special characters

Click OK button.

You're done. The resulting file is a plain text file in the correct format that Trados accepts for imports:

Figure 30 - Final import file ready

Now proceed with importing.

 



E. IMPORTING THE FILE TO TRADOS

 

Either you would want to use the entries as complementary to an existing TM (jump to Step E.2) or you might want to create a new TM with these entries.

In the latter case:

 

E.1. Open Trados Translator's Workbench and create a new translation memory with the exactly same source and target languages as you used in the compiled import file. (For detailed information about how to create a new TM, refer to Trados Workbench Help.)

 

E.2. Use the menu command File > Import to open the Import dialog box (Figure 31).

Figure 31 - Trados Translator's Workbench Import dialog

Click OK button.

Open Import File dialog box opens. Browse to the text file you have prepared and select it. Click Open button. Trados will import the entries into the current TM.

Mission accomplished.

 

Note: Check the result shown on the Status line of Trados Workbench. If 0 (zero) translation units are imported, it means that you've done something wrong. Now you're on your own to check and find out the error. Good luck!

 

 

Koral Özgül, Istanbul, September 2008

 




Member of Babylon's Outreach Program for Translators
Babylon
Translation Software - Provides Turkish and German translation tools