Content Areas & Crop Regions with PDFs

How do I set content areas on PDFs with IronOCR?

ContentAreas and PDFs
OcrInput.AddPdf() and AddPdfPage() methods all have the option to add a ContentArea.

The question - How do I know how big my content area is as PDFs are not sized in Pixels, but content areas are generally measured in them?

Option 1
OcrInput.TargetDPI Default is 225 - dictates the size of the PDF image in pixels. IronOCR will read this.

Option 2 (ideal use case)

  1. Use OcrInput.AddPdf() with your PDF template
  2. Use to get OcrInput.Pages[0].Width and Height
  3. Use OcrInput.Pages[0].ToBitmap() to get the exact image the OCR engine will read
  4. You can now measure ContentAreas in pixels

To get your info:

using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
Input.AddPdf("example.pdf");
OcrInput.Pages[0].ToBitmap().Save("measure-me.bmp")
var width =  OcrInput.Pages[0].Width;
var height =  OcrInput.Pages[0].Height;
}

Final Result:

using IronOcr;
var Ocr = new IronTesseract();
using (var Input = new OcrInput())
{
var ContentArea = new System.Drawing.Rectangle()
{ X = 215, Y = 1250, Height = 280, Width = 1335 };  //<-- the area you want in px
Input.AddPdf("example.pdf", ContentArea);
var Result = Ocr.Read(Input);
}

Documentation:
https://ironsoftware.com/csharp/ocr/object-reference/api/IronOcr.OcrInput.html
https://ironsoftware.com/csharp/ocr/object-reference/api/IronOcr.OcrInput.Page.html