SimpleTex Documentation

This document explains how to use the SimpleTex Open Platform services and provides important information. If you have any questions, please feel free to contact the SimpleTex team.

Open Platform API Documentation

I. Open Platform Authentication

After registering a regular SimpleTex account and enabling the open platform account in the user center, you can access the open platform either through a User Access Token (UAT) or by signing the request body using the APP ID and APP Secret of the open platform application.
The UAT method is only for development purposes and should not be used in production environments. Additionally, if your application needs to be deployed on the client side rather than on your own server, do not directly expose the core API authentication interfaces within the application, but instead make requests through temporary application authorization tokens.

1. UAT Authentication

Access through User Authorization Token, the simplest and quickest method
How to obtain: Go to the User Center (https://simpletex.net/user/center), create it in the "User Authorization Token" menu
Request method: Simply place the user authorization token information in the request header, the field name is token, such as header={"token":"XXXXX"}
Note: To ensure the security of your application and account, please do not use this method for requests in production environments.

2. APP Authentication

Access through the Open Platform application method (APP method, a more secure approach)
How to obtain: Go to the User Center (https://simpletex.net/user/center), after enabling the open platform function, create it in the "Application List" menu. Upon creation, you will receive the APP ID and APP Secret for the new application. Note that the APP Secret is important sensitive information and is only displayed once. If lost, please create a new application.
Request method:
The authentication method for open platform applications requires signing the POST uploaded data (i.e., form key-value pairs, parameters in the non-binary file part). The signature algorithm process is as follows:
(1) Generate a 16-character random string (numbers and uppercase/lowercase letters), place it in the random-str field of the header
(2) Get the current timestamp, place it in the timestamp field of the header (accurate to the second)
(3) Place the current APP ID information in the app-id field of the header
(4) Take out other keys in the data, sort the keys based on string order (from a-z, ascending), connect each key and its corresponding field information with &, such as key1=xxx&key2=xxx&...&keyn=xxx (the random-str, timestamp, and app-id fields from steps 1-3 also need to be included in this string)
(5) Add the APP Secret information to the end of the string generated in step 4 via &secret=xxx (ignoring the key sorting order), becoming key1=xxx&key2=xxx&...&keyn=xxx&secret=xxx
(6) Sign the string obtained in the previous step using the MD5 signature algorithm, and get its string expression (32 characters in length)
(7) Place this signature string in the header, with the field name sign
(8) The final header format for authentication should be:
```
header={
    "app-id":xxx,
    "random-str":xxx,
    "timestamp":xxx,
    "sign":xxx
}
```
(Note: Do not include the APP Secret information in any other part of the business request. This information is only used to generate the signature to prove legitimate identity and should not be included in the request body)
(9) At this point, the preparation of information required for APP authentication is complete
(10) Example
- Original data information: use_batch=True (In this example, we use what is shown below)
```
{
  "use_batch"=True
}
```
- Obtain other required information:
```
{
   'timestamp': '1675550577',
   'random-str': 'mSkYSY28N4WkvidB',
   'app-id': '19X4f10YM1Va894nvFl89ikY',  // For testing purposes only
}
```
- For this example, the APP Secret is fu4Wfmna4153DFN12ctBsPqgVI3vvGGK, so the string to be signed can be calculated as (fields sorted as app-id, random-str, use_batch, secret): app-id=19X4f10YM1Va894nvFl89ikY&random-str=mSkYSY28N4WkvidB&timestamp=1675550577&use_batch=True&secret=fu4Wfmna4153DFN12ctBsPqgVI3vvGGK
- Using the MD5 algorithm to calculate the signature, we get 5f271e1deccd95d467c7dd430ca2c8b1 (you can use the website (http://tool.pfan.cn/md5) for testing, or search for online MD5)
- The final header information is:
```
{
  'timestamp': '1675550577',
  'random-str': 'mSkYSY28N4WkvidB',
  'app-id': '19X4f10YM1Va894nvFl89ikY',
  'sign': '5f271e1deccd95d467c7dd430ca2c8b1'
}
```

II. API Response Information

1. Response Structure

In the API response information, status represents whether the request was successful, other result information is placed in the res field, and request_id contains the ID of this request

The standard return format is

{
 "status": true/false, // Whether the interface was successfully called
 "res": { // Call result
     ...
 },
 "request_id": "tr_xxxxxxxxxx" // Request ID
}

2. Error Codes

errType Name	HTTP Status Code	Description
api_not_find	404	API or corresponding version not found
req_method_error	405	Incorrect request method (such as GET, POST)
req_unauthorized	401	Authentication failed (any detailed error will result in this response, please check carefully!)
resource_no_valid	402	No resources available to call the interface, such as no resource package or insufficient account balance
image_missing	413	No image file uploaded
image_oversize	413	Image file too large
sever_closed	503	Server not started/under maintenance
server_error	500	Internal server error
exceed_max_qps	429	Exceeded maximum QPS, please try again later
exceed_max_ccy	429	Exceeded maximum concurrent requests, please try again later
server_inference_error	500	Server inference error
image_proc_error	500	Error processing uploaded image
invalid_param	500	Server error caused by invalid parameters
too_many_file	500	Server error caused by too many files
no_file_error	500	Server error caused by no files

III. Sample Code

Python sample code

2. APP Authentication Method

import datetime
import json
import requests
from random import Random
import hashlib

SIMPLETEX_APP_ID = "xxxxx"
SIMPLETEX_APP_SECRET = "xxxxxxxxxxxxxxx"

def random_str(randomlength=16):
    str = ''
    chars = 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz0123456789'
    length = len(chars) - 1
    random = Random()
    for i in range(randomlength):
        str += chars[random.randint(0, length)]
    return str


def get_req_data(req_data, appid, secret):
    header = {}
    header["timestamp"] = str(int(datetime.datetime.now().timestamp()))
    header["random-str"] = random_str(16)
    header["app-id"] = appid
    pre_sign_string = ""
    sorted_keys = list(req_data.keys()) + list(header)
    sorted_keys.sort()
    for key in sorted_keys:
        if pre_sign_string:
            pre_sign_string += "&"
        if key in header:
            pre_sign_string += key + "=" + str(header[key])
        else:
            pre_sign_string += key + "=" + str(req_data[key])

    pre_sign_string += "&secret=" + secret
    header["sign"] = hashlib.md5(pre_sign_string.encode()).hexdigest()
    return header, req_data


img_file = {"file": open("./image/1.png", 'rb')}
data = { } # Request parameter data (non-file parameters), fill in as needed, refer to each interface's parameter description
header, data = get_req_data(data, SIMPLETEX_APP_ID, SIMPLETEX_APP_SECRET)
res = requests.post("https://server.simpletex.net/xxxx", files=img_file, data=data, headers=header)

print(json.loads(res.text))

The formula recognition model is divided into lightweight and standard models. The lightweight model is faster while the standard model performs slightly better. You can choose based on your specific scenario testing.

The specialized formula recognition currently supports recognition of various texts in more than 80 languages, as well as LaTeX symbols, matrices, chemical structure formulas, and complex equations. It supports both handwritten and printed text recognition. If you need to recognize document-type images, please use the SimpleTex General Image Recognition API. Online testing experience address: https://simpletex.net/ai/latex_ocr

Pricing Models: Pay-as-you-go and Prepaid Resource Packages

There are currently two billing methods: pay-as-you-go and prepaid resource packages for call deductions. The deduction order is Free Resource Package -> Paid Resource Package (sorted by expiration time, with earlier expiration dates prioritized) -> Pay-as-you-go
Service billing only charges for successful calls. Users can check their usage and related orders on the open platform. Note: Each successful file/image calculation in a Batch call counts as a separate call.
API Pay-as-you-go (Note that the prices listed below are not the official prices after SimpleTex formally launches the service, they are for reference only! If you have any questions, please contact us)

1. Lightweight Formula Recognition Model API

Service Pricing (Lightweight)

Monthly Usage (Times)	Price (CNY/Time)
<1000	Free
1000-5000	0.04
>5000	0.01

API Speed Limits

Limit Type	Default Free Quota
Request Processing Concurrency	5
Regular Request QPS	5
Batch Request QPS/Concurrency	25

API Usage Method

API Endpoint: https://server.simpletex.net/api/latex_ocr_turbo

Model Version: SimpleTex V2.5
Request Method: POST
Request Parameters:
- Header: Authentication parameters (UAT or APP information)
- Body: Multipart/form-data
Parameter Details

Parameter Name	Parameter Type	Required	Description	Example
file	File	Yes	Valid image binary file information, including png/jpg formats. If batch requests are enabled, file names cannot be duplicated, otherwise results for files with the same name will conflict and overwrite each other	/

2. Standard Formula Recognition Model API

Service Pricing (Standard)

Monthly Usage (Times)	Price (CNY/Time)
<1000	Free
1000-5000	0.05
>5000	0.02

Records are cleared at 00:00 on the 1st of each month, and billing is done according to usage tiers. The lightweight model provides 2000 free calls daily, while the standard model provides 500 free calls daily.

Prepaid Resource Packages (Due to business adjustments, please contact us for pricing)
If you need resource packages of other specifications, please contact us. Resource packages are non-refundable, so please estimate a reasonable number of uses before purchasing. If you need to upgrade the number of calls, please purchase a new resource package or use the pay-as-you-go billing method.

API Speed Limits

Limit Type	Default Free Quota
Request Processing Concurrency	2
Regular Request QPS	2
Batch Request QPS/Concurrency	10

QPS refers to the number of requests per second, and request processing concurrency refers to how many threads the server has simultaneously processing user requests. QPS can be expanded by purchasing QPS add-on packages. For a Batch request, each individual object to be processed within the Batch will occupy one Batch request QPS quota and use the same amount of request processing concurrency.

Example: If an interface has QPS and concurrency both set to 1, and assuming a request takes 0.3s for server calculation and response, the maximum request speed is limited by QPS to 1 request/second. If a request takes 3s for server calculation and response, the maximum request speed is limited by concurrency to 1 request/3s. (According to Little's Law, concurrency = QPS * average interface processing time)

For businesses with high request speed requirements, please contact us to select and add QPS resource packages. If you have special requirements, please contact us.

API Usage Method

API Endpoint: https://server.simpletex.net/api/latex_ocr

Model Version: SimpleTex V2.5
Request Method: POST
Request Parameters:
- Header: Authentication parameters (UAT or APP information)
- Body: Multipart/form-data
Parameter Details

Parameter Name	Parameter Type	Required	Description	Example
file	File	Yes	Valid image binary file information, including png/jpg formats. If batch requests are enabled, file names cannot be duplicated, otherwise results for files with the same name will conflict and overwrite each other	/

3. API Response Examples

Single File Upload

{
  "status": true,  // Whether the API call was successful
  "res": { // Call result
      "latex": "a^{2}-b^{2}", // LaTeX information, more information will be available in this section in the future
      "conf":0.95 // Confidence level
  },
  "request_id": "tr_16755479007123063412063155819" // Request ID
}

Multiple Files Upload

{
  "status": true, // Whether the API call was successful
  "res": { // Call result
      "stats": { // Success and failure call statistics
          "fail": 0,
          "success": 2
      },
      "fail_res": {}, // Error information for failed image calls
      "success_res": { // Result information for successfully recognized images
          "test_1.png": {
              "latex": "a^{2}-b^{2}",
              "conf":0.95 // Confidence level
          },
          "test_2.png": {
              "latex": "a^{3}+b^{3}",
              "conf":0.90
          }
      }
  },
  "request_id": "tr_16755477466238226695895375638" // Request ID
}

Special Return Values [EMPTY]: Image is empty

[DOCIMG]: Image is a document type, it is recommended to use the general image recognition interface, as the formula model cannot output results

2. SimpleTex General Image Recognition

SimpleTex General Image Recognition currently supports recognition of various text in over 80 languages as well as LaTeX symbols, matrices, and complex equations. It supports tables, mixed text layouts, document pages, double-column papers, and common handwritten/printed text recognition.

1. API Usage Method

Lightweight Model API Endpoint: https://server.simpletex.net/api/simpletex_ocr
Model Version: SimpleTex General OCR V1
Request Method: POST
Request Parameters:
- Header: Authentication parameters (UAT or APP information)
- Body: Multipart/form-data
Parameter Details

Parameter Name	Parameter Type	Required	Description	Example
file	File	Yes	Valid image binary file information, including png/jpg formats. Batch upload is not supported, only one image can be uploaded at a time	/
rec_mode	String	No	Can be set to "auto", "document", or "formula" to specify the type of image recognition. "auto" will automatically detect the type, "document" will return markdown document results, and "formula" will return LaTeX results	"auto"
enable_img_rot	Boolean	No	When enabled, the model will automatically correct the orientation of the uploaded image based on 0°, 90°, 180°, 270°. Disabled by default	"false"
inline_formula_wrapper	JSON String Array	No	Used to modify the wrapper symbols for inline formulas in markdown. Input in JSON format. Default wrapper symbols will be used if format is incorrect	["$","$"]
isolated_formula_wrapper	JSON String Array	No	Used to modify the wrapper symbols for isolated formulas in markdown. Input in JSON format. Default wrapper symbols will be used if format is incorrect	["$$","$$"]

2. API Pricing

Monthly Usage (Times)	Price (USD/Page)
<1000	Free
1000-5000	0.1
>5000	0.04

Concurrency Limits	Default Free Quota
Request Processing Concurrency	1
Regular Request QPS	1

During the current testing period, 50 free recognitions are automatically provided daily.

3. SimpleTex Document Image Recognition (PDF Recognition)

SimpleTex Document Image Recognition currently supports recognition in both Chinese and English languages. It is the OCR interface used for PDF file recognition in the formula and chart enhancement mode. [Note: This interface may change at any time and is currently only provided for testing and reference purposes]

1. API Usage Method

Lightweight Model API Endpoint: https://server.simpletex.net/api/doc_ocr
Model Version: SimpleTex Doc OCR V1
Request Method: POST
Request Parameters:
- Header: Authentication parameters (UAT or APP information)
- Body: Multipart/form-data
Additional Notes: Currently, this API only supports uploading one file at a time, batch processing is not supported.
Parameter Details

Parameter Name	Parameter Type	Required	Description	Example
file	File	Yes	Valid image binary file information of PDF pages, including png/jpg formats. Batch upload is not supported, only one image can be uploaded at a time	/
inline_formula_wrapper	JSON String Array	No	Used to modify the wrapper symbols for inline formulas in markdown. Input in JSON format. Default wrapper symbols will be used if format is incorrect	["$","$"]
isolated_formula_wrapper	JSON String Array	No	Used to modify the wrapper symbols for isolated formulas in markdown. Input in JSON format. Default wrapper symbols will be used if format is incorrect	["$$","$$"]

2. API Pricing (New Lower Prices for 2025!)

Monthly Usage (Times)	Price (USD/Page)
<1000	Free
1000+	0.003

Concurrency Limits	Default Free Quota
Request Processing Concurrency	1
Regular Request QPS	1

3. API Response Example

Single File Upload

{
  "status": true,  // Whether the API call was successful
  "res": { // Response result
      "content": "...", // Markdown content
  },
  "request_id": "tr_16755479007123063412063155819" // Request ID
}

4. PDF Recognition Example Code

The following code can be used to convert PDF files to Markdown files. It uses the PyMuPDF library for reading PDF files, the PIL library for image processing, the requests library for file uploading, and the tqdm library for progress bar display.

First, install the required libraries PyMuPDF, requests, Pillow, tqdm etc. using pip with the following command:

pip install PyMuPDF requests Pillow tqdm

Detailed Code

  import io
  import fitz
  from PIL import Image
  import requests
  from tqdm import tqdm
  
  UAT = "xxxxx"  # User Authorization Token
  
  def pillow_image_to_file_binary(image):
    btyes_io = io.BytesIO()
    image.save(btyes_io, format='PNG')
    return btyes_io.getvalue()
  
  
  def convert_pdf_to_images(pdf_binary, dpi=100):
    doc = fitz.open("pdf", pdf_binary)
    images = []
    for i in range(doc.page_count):
    page = doc[i]
    image = page.get_pixmap(dpi=dpi)
    image = Image.frombytes("RGB", [image.width, image.height], image.samples)
    images.append(image)
    return images
  
  
  def pdf_ocr(image):
    api_url = "https://server.simpletex.net/api/doc_ocr/"
    header = {"token": UAT}  # Authentication info, using UAT method here
    img_file = {"file": pillow_image_to_file_binary(image)}
    res = requests.post(api_url, files=img_file, data={}, headers=header).json()  # Use requests library to upload file
    print(res)
    return res["res"]["content"]
  
  
  if __name__ == '__main__':
    pdf_path = 'test.pdf'  # Input PDF file
    
    file_binary = open(pdf_path, 'rb').read()
    images = convert_pdf_to_images(file_binary)
    final_markdown_content = ""
    for image in tqdm(images):
        final_markdown_content += pdf_ocr(image) + "\n"
    
    open("test.md", "w", encoding="utf-8").write(final_markdown_content)
    print(final_markdown_content)  # Save and output the final markdown file

Direct PDF file asynchronous upload service will be supported in the future. During the current testing period, 1000 free recognitions are automatically granted daily.

Other open capabilities will be gradually released (General OCR, Word Embedding, Chatbot, Chinese-English Translation, Layout Analysis)

Open Platform API Documentation

I. Open Platform Authentication

1. UAT Authentication

2. APP Authentication

II. API Response Information

1. Response Structure

2. Error Codes

III. Sample Code

1. UAT Authentication Method

2. APP Authentication Method

Open Capabilities

I. SimpleTex Formula Recognition

1. Lightweight Formula Recognition Model API

Service Pricing (Lightweight)

API Speed Limits

API Usage Method

2. Standard Formula Recognition Model API

Service Pricing (Standard)

API Speed Limits

API Usage Method

3. API Response Examples

2. SimpleTex General Image Recognition

1. API Usage Method

2. API Pricing

3. SimpleTex Document Image Recognition (PDF Recognition)

1. API Usage Method

2. API Pricing (New Lower Prices for 2025!)

3. API Response Example

4. PDF Recognition Example Code