Oryx Translation · Document Studio
A precise, RTL-aware converter built by translators. We honour your typography, your script, and your layout — whether the source is digital text or a scanned page.
Every PDF is read first by PyMuPDF. Pages with embedded text route to the digital pipeline; image-only pages route to OCR.
Digital pages keep their layout via pdf2docx. Scans pass through Tesseract 5 trained on Arabic, with paragraph-level RTL preserved.
You receive a Word file ready for editing — with Noto Naskh Arabic embedded, right-to-left flow, and clean paragraph structure.
Right-to-left flow, Arabic ligatures, and proper paragraph direction set on every line of output.
Columns, tables, and embedded fonts are preserved on digital PDFs via pdf2docx — not just plain text dumps.
Tesseract 5 with the Arabic + English language models. Clean modern print yields 90 %+ character accuracy.
Files are removed from disk within 60 minutes. No accounts, no logs of file content, no third parties.
No signup. Drop a file, get the .docx. Bring your own API key for premium quality and unlimited pages.
Crafted by Oryx Translation — a studio specialising in Arabic content for technical, legal, and editorial work.