Content Areas & Crop Regions with PDFs
How do I set content areas on PDFs with IronOCR?
ContentAreas and PDFs
OcrInput.AddPdf() and AddPdfPage() methods all have the option to add a ContentArea.
The question - How do I know how big my content area is as PDFs are not sized in Pixels, but content areas are generally measured in them?
Option 1
OcrInput.TargetDPI Default is 225 - dictates the size of the PDF image in pixels. IronOCR will read this.
Option 2 (ideal use case)
- Use OcrInput.AddPdf() with your PDF template
- Use to get OcrInput.Pages[0].Width and Height
- Use OcrInput.Pages[0].ToBitmap() to get the exact image the OCR engine will read
- You can now measure ContentAreas in pixels
To get your info:
using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) Input.AddPdf("example.pdf"); OcrInput.Pages[0].ToBitmap().Save("measure-me.bmp") var width = OcrInput.Pages[0].Width; var height = OcrInput.Pages[0].Height; }
Final Result:
using IronOcr; var Ocr = new IronTesseract(); using (var Input = new OcrInput()) { var ContentArea = new System.Drawing.Rectangle() { X = 215, Y = 1250, Height = 280, Width = 1335 }; //<-- the area you want in px Input.AddPdf("example.pdf", ContentArea); var Result = Ocr.Read(Input); }
Documentation:
https://ironsoftware.com/csharp/ocr/object-reference/api/IronOcr.OcrInput.html
https://ironsoftware.com/csharp/ocr/object-reference/api/IronOcr.OcrInput.Page.html