Skip to content

翻译后的文件中关键字词自动加标注 #954

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
whubao opened this issue May 26, 2025 · 2 comments
Open

翻译后的文件中关键字词自动加标注 #954

whubao opened this issue May 26, 2025 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed Planned

Comments

@whubao
Copy link

whubao commented May 26, 2025

Is your feature request related to a problem?

pdfzh.py 中添加:

text_zh=["可是","然而","但","却","不过","尽管","虽然","即使","而","相反","反之","相比之下","尽管如此","贡献","创新性","挑战","不足","动机","限制","缺陷"]

并修改:
doc_zh = pymupdf.open('output-zh.pdf')
for page in doc_zh:
for text in text_zh:
rects = page.search_for(text)
page.add_highlight_annot(rects)

Describe the solution you'd like

No response

Additional context

No response

@whubao whubao added the enhancement New feature or request label May 26, 2025
@awwaawwa awwaawwa added Planned help wanted Extra attention is needed good first issue Good for newcomers labels May 26, 2025
@awwaawwa
Copy link
Collaborator

pdf2zh这边能做,BabelDOC暂时没搞这个功能的想法。蹲好心人

@awwaawwa
Copy link
Collaborator

https://github.com/funstory-ai/BabelDOC/blob/604e25ead71da561e18873394906f4fa737ee9db/babeldoc/high_level.py#L907

实现此功能时,请参考上述代码,使用BabelDOC封装好的子进程+带timeout的PDF保存函数。根据经验,有部分PDF,会让pymupdf直接卡死................

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed Planned
Projects
None yet
Development

No branches or pull requests

2 participants