Skip to content

markdown 格式问题 #2565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
Richard-Wth opened this issue Jun 3, 2025 · 1 comment
Open
3 tasks done

markdown 格式问题 #2565

Richard-Wth opened this issue Jun 3, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@Richard-Wth
Copy link

🔎 Search before asking | 提交之前请先搜索

  • I have searched the MinerU Readme and found no similar bug report.
  • I have searched the MinerU Issues and found no similar bug report.
  • I have searched the MinerU Discussions and found no similar bug report.

Description of the bug | 错误描述

Image 在使用 magic-pdf 解析得到的 markdown 文件中,所有标题都是一级标题,例如图片中的章节 3 和小章节 3.1 都是一级标题。

How to reproduce the bug | 如何复现

3 Arithmetic Reasoning

We begin by considering math word problems of the form in Figure 1, which measure the arithmetic reasoning ability of language models. Though simple for humans, arithmetic reasoning is a task where language models often struggle (Hendrycks et al., 2021; Patel et al., 2021, inter alia). Strikingly, chainof-thought prompting when used with the 540B parameter language model performs comparably with task-specific finetuned models on several tasks, even achieving new state of the art on the challenging GSM8K benchmark (Cobbe et al., 2021).

3.1 Experimental Setup

We explore chain-of-thought prompting for various language models on multiple benchmarks.

Benchmarks. We consider the following five math word problem benchmarks: (1) the GSM8K benchmark of math word problems (Cobbe et al., 2021), (2) the SVAMP dataset of math word problems with varying structures (Patel et al., 2021), (3) the ASDiv dataset of diverse math word problems (Miao et al., 2020), (4) the AQuA dataset of algebraic word problems, and (5) the MAWPS benchmark (Koncel-Kedziorski et al., 2016). Example problems are given in Appendix Table 12.

Operating System Mode | 操作系统类型

Linux

Operating System Version| 操作系统版本

Ubuntu22.04

Python version | Python 版本

3.12

Software version | 软件版本 (magic-pdf --version)

1.3.x

Device mode | 设备模式

cuda

@Richard-Wth Richard-Wth added the bug Something isn't working label Jun 3, 2025
@myym0
Copy link

myym0 commented Jun 5, 2025

我也遇到了同样的问题,请问这种情况下如何去对markdown划分章节,有没有高效一点的方法呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants