Document Image Recognition (PDF Recognition)

SimpleTex document image recognition currently supports recognition in both Chinese and English languages. It is the OCR interface used for PDF file recognition in the formula and chart enhanced mode. [Note: This interface may change at any time and is currently only for testing and reference]

API Call Method

Lightweight model API address: https://server.simpletex.net/api/doc_ocr
Model version: SimpleTex Doc OCR V1
Interface method: POST
Request parameters:
- Header: Authentication parameters (UAT or APP information)
- Body: Multipart/form-data
Other notes: Currently, this interface only supports uploading one file at a time and does not support batch calls.
Parameter details

Parameter Name	Parameter Type	Required	Description	Example
file	File	Yes	Valid PDF page image binary file, including png/jpg formats. Batch processing is not supported, only one image can be uploaded at a time	/
inline_formula_wrapper	String JSON array	No	Used to modify the wrapper symbols for inline formulas in markdown. Enter in JSON format. If the format is incorrect, default wrapper symbols will be used	["$","$"]
isolated_formula_wrapper	String JSON array	No	Used to modify the wrapper symbols for isolated line formulas in markdown. Enter in JSON format. If the format is incorrect, default wrapper symbols will be used	["$$","$$"]

Pricing

Monthly API Calls (times)	Price (CNY/page)
<1000	Free
1000+	0.02

Concurrency Limits	Default Free Quota
Request Processing Concurrency	1
Normal Request QPS	1

Response Example

Single file upload

{
  "status": true,  // Whether the API call was successful
  "res": { // Call result
      "content": "...", // Markdown information
  },
  "request_id": "tr_16755479007123063412063155819" // Request ID
}

Sample Code

The following code can be used to convert PDF files to Markdown files. It uses the PyMuPDF library for reading PDF files, the PIL library for image processing, the requests library for file uploading, and the tqdm library for progress bar display.

You can first install the required libraries PyMuPDF, requests, Pillow, tqdm using pip with the following command

pip install PyMuPDF requests Pillow tqdm

Detailed code

  import io
  import fitz
  from PIL import Image
  import requests
  from tqdm import tqdm
  
  UAT = "xxxxx"  # User Authorization Token
  
  def pillow_image_to_file_binary(image):
    btyes_io = io.BytesIO()
    image.save(btyes_io, format='PNG')
    return btyes_io.getvalue()
  
  
  def convert_pdf_to_images(pdf_binary, dpi=100):
    doc = fitz.open("pdf", pdf_binary)
    images = []
    for i in range(doc.page_count):
    page = doc[i]
    image = page.get_pixmap(dpi=dpi)
    image = Image.frombytes("RGB", [image.width, image.height], image.samples)
    images.append(image)
    return images
  
  
  def pdf_ocr(image):
    api_url = "https://server.simpletex.net/api/doc_ocr/"
    header = {"token": UAT}  # Authentication information, using UAT method here
    img_file = {"file": pillow_image_to_file_binary(image)}
    res = requests.post(api_url, files=img_file, data={}, headers=header).json()  # Use requests library to upload files
    print(res)
    return res["res"]["content"]
  
  
  if __name__ == '__main__':
    pdf_path = 'test.pdf'  # Input PDF file
    
    file_binary = open(pdf_path, 'rb').read()
    images = convert_pdf_to_images(file_binary)
    final_markdown_content = ""
    for image in tqdm(images):
        final_markdown_content += pdf_ocr(image) + "\n"
    
    open("test.md", "w", encoding="utf-8").write(final_markdown_content)
    print(final_markdown_content)  # Save and output the final markdown file

In the future, we will further support asynchronous direct upload service for PDF files. Currently, during the testing period, users can automatically receive 1000 free recognitions per day.