Oryx ORYX Translation

Oryx Translation · Document Studio

Arabic PDFs, dropped into
.docxa starting draft.

A free utility that pulls Arabic text out of a PDF and into an editable Word file. Expect rough edges — broken layouts, ligature errors, and OCR mistakes are unavoidable. Polish the rest in Word. For higher accuracy, use the Mistral OCR option below — free to set up.

Files auto-deleted within 60 minutes No account · No tracking Free up to 5 pages
01
Upload

Drop your PDF

Recommended for Arabic

For better results, use Mistral OCR — free to set up

Tesseract (the free engine) gets Arabic text into Word, but mangles ligatures, breaks tables, and misreads many words. Mistral OCR reads Arabic at near-human accuracy. The free Experiment tier covers typical use — no credit card, just an email and phone number.

1Go to console.mistral.ai & sign up (Google login is fastest)
2Verify your email via Mistral's link
3Verify your phone with the SMS code
4Sidebar → API KeysCreate new key
5Copy the key (shown once) and paste it below

Already have a key? Click the button — your saved key auto-fills.

II
Method

What the tool actually does

01أ

Read the PDF

PyMuPDF parses every page. Pages with embedded text use the digital extraction path; image-only pages run through OCR. You don't pick — the tool decides per page.

02ب

Extract or OCR

Digital pages go through pdf2docx — simple layouts usually survive, complex ones (multi-column, footnotes, nested tables) often break. Scanned pages run through Tesseract 5 (Arabic + English); accuracy depends heavily on scan quality.

03ج

Clean & deliver

RTL direction is tagged at section, paragraph, and run level. Common Arabic bugs (lam-alef ligature flips, spurious hamzas, justification tatweels) get post-processed. You receive a Word file you can edit further.

III
What you actually get

Honest about what works & what doesn't

RTL set, not perfect

Direction is tagged at section, paragraph, and run level. Word usually renders Arabic correctly, but mixed-language paragraphs and unusual fonts can still misbehave — check before sending.

Layout: best effort

Single-column body text usually survives. Multi-column layouts, footnotes, and complex tables often break or merge. Plan to redo non-trivial layout in Word.

OCR is rough on free

Tesseract 5 reads Arabic + English. Quality varies wildly with scan, font, and justification. We post-process common Arabic OCR bugs (ligature flips, spurious hamzas), but expect mistakes. Use Mistral OCR for materially better results.

Privacy by default

Files are removed from disk within 60 minutes. No accounts, no logs of file content, no third parties. Your API key is used once per request and never stored on our server.

Free up to 5 pages

No signup, no email. Drop a file, get the .docx. For more pages or better Arabic accuracy, bring your own Mistral key — free to set up, takes 3 minutes.

Built by translators

Made by Oryx Translation — a studio that does this work daily. We know which corners can be cut and which can't. This tool removes the first 5% of any Arabic translation job. The rest is yours.