diff --git a/OSPP__toWord/document_sample.md b/OSPP__toWord/document_sample.md
new file mode 100644
index 0000000000..f9996220ee
--- /dev/null
+++ b/OSPP__toWord/document_sample.md
@@ -0,0 +1,248 @@
+
+
+Original Text: 新媒体与社会
+
+New Media and Society
+
+ISSN, CN
+
+# Online First Paper of New Media and Society
+
+Title: Formal Trust and Progressive Reductio ad Absurdum: Research on the Credibility Evaluation of Artificial Intelligence-Generated Content
+
+Authors: Yang Yanni, Wu Leilong, Xiang Anling, Zhang Jiacheng
+
+Online First Date: 2025-09-11
+
+Citation Format: Yang Yanni, Wu Leilong, Xiang Anling, Zhang Jiacheng. Formal Trust and Progressive Reductio ad Absurdum: Research on the Credibility Evaluation of Artificial Intelligence-Generated Content [J/OL]. New Media and Society.
+
+https://link.cnki.net/urlid/CN.20250910.1710.008
+
+
+
+
+First Online Release: In the workflow of an editorial department, a manuscript undergoes several stages from acceptance to publication, including final acceptance draft, typeset draft, and complete issue compilation draft. The final acceptance draft refers to a manuscript whose content has been finalized and has been approved for publication after peer review and final approval by the editor-in-chief. The typeset draft refers to the manuscript that has been typeset according to the specific layout of the journal (including the online presentation format), with the publication year, volume, issue, and page numbers temporarily undetermined. The complete issue compilation draft refers to the complete compilation of manuscripts for print or digital publication, with the publication year, volume, issue, and page numbers all determined. The content of manuscripts released online as final acceptance drafts must comply with the relevant provisions of the "Regulations on Publication Administration" and the "Regulations on the Administration of Journal Publication": academic research results must be innovative, scientific, and advanced, meeting the editorial department's acceptance requirements for published articles, and free from academic misconduct and other infringing acts. The content of the manuscript should generally conform to the national technical standards for book and journal editing and publication, with correct and unified use of language, symbols, numbers, foreign letters, legal units of measurement, and map annotations. To ensure the seriousness of the online first release of final acceptance drafts, once published, the title, author(s), institutional affiliation(s), and academic content of the paper cannot be modified; only minor textual revisions based on editorial norms are allowed.
+
+Publication Confirmation: The editorial department of a print journal, by signing an agreement with the Electronic Magazine Co., Ltd. of "China Academic Journals (CD Edition)," establishes an online edition on the "China Academic Journals (Network Edition)" publication and dissemination platform that is consistent in content with the print journal. The final acceptance drafts, typeset drafts, and complete issue compilation drafts of papers are published in the form of individual articles or complete issues before print publication. Since the "China Academic Journals (Network Edition)" is a network continuous publication approved by the State Administration of Press, Publication, Radio, Film and Television (ISSN 2096-4188, CN 11-6037/Z), papers released online first in the network edition of the contracted journal are considered formally published.
+
+# Formal Trust and Hierarchical Fallacy: Research on the Credibility Assessment of Artificial Intelligence-Generated Content\*
+
+Yanni Yang, Leilong Wu, Anling Xiang, Jiacheng Zhang\*\*
+
+Abstract Establishing a scientific and effective credibility evaluation system for Artificial Intelligence-Generated Content (AIGC) is an important topic in current human-machine trust research. This study integrates the examination of AIGC's media attributes from the perspective of intelligent communication and constructs a comprehensive and systematic AIGC credibility evaluation system from the bottom up, ranging from the underlying technical architecture and intermediary media channels to user-interactive information content. Based on hierarchical analysis and thorough expert 论证 (demonstration/discussion), the main influencing factors of AIGC credibility are clarified. The research shows that content credibility is the primary factor affecting users' perception of AIGC credibility. Output results and training data have a significant impact on system credibility, while neutrality and social influence affect media credibility, and substantive and formal credibility influence content credibility. Optimization strategies are proposed to enhance AIGC credibility from aspects such as system technology, media platforms, content cues, and user reverse verification, providing references for the optimization design of AIGC products, user trust building, and information ecosystem governance.
+
+Keywords Artificial Intelligence-Generated Content, credibility evaluation framework, formal trust, hierarchical fallacy
+
+Keywords: AIGC; Credibility; Evaluation Framework; Formal Trust; Hierarchical Fallacy
+
+## I. Research Background: The Challenge of Trustworthy Communication in the Context of AIGC
+
+Currently, the production of global information content is undergoing a new generational revolution. With the support of technologies such as large language models, Artificial Intelligence Generated Content (AIGC) has surpassed traditional Professional Generated Content (PGC) and User Generated Content (UGC) in terms of production scale and output efficiency, gradually becoming the new engine of media content production in the Web3.0 era. As AIGC accelerates its penetration into industrial applications and mass communication, the human-centered paradigm of content production and dissemination has begun to shift towards an ecosystem of human-machine collaboration and symbiosis. The quasi-subjective status of AI in agenda-setting, content gatekeeping, and information distribution has become prominent, leading to imbalances and reconfigurations in information production modes and information power. The resulting human-machine trust issues have become a bottleneck for the development of AIGC.
+
+Particularly in the supply of trustworthy content, current AIGC products have faced significant criticism. On the one hand, limited by the openness of training corpus data for large language models and their probabilistic content generation logic, errors such as conceptual confusion, factual splicing, temporal and spatial dislocation, and false causation frequently occur in the content generated by some AIGC products, making fact-checking a "hard injury" for most AIGC products. On the other hand, although AIGC-related algorithms themselves have no inherent moral attributes, this technology has been widely used for negative purposes such as political manipulation and unfair business competition [1], such as AI-generated fake news and mass-generated false information [2]. Additionally, multimodal AIGC, including images, videos, and audio, possesses strong visual ambiguity, further exacerbating the spread of false and misleading information, posing new challenges to international politics, society, and human rights development [3].
+
+In response to issues such as content inaccuracy and human-machine trust arising from AIGC, multiple countries and regions have strengthened institutional constraints and legal protections at the top-level design level. China has also introduced policy documents such as the "Ethical Guidelines for the New Generation of Artificial Intelligence," "Provisions on the Administration of Deep Synthesis in Internet Information Services," and the "Administrative Measures for Generative Artificial Intelligence Services (Draft for Comments)," explicitly stating that "content generated using generative artificial intelligence should be true and accurate." Both domestic and international AIGC platforms have also improved content quality through multiple constraints such as algorithmic iteration, model correction, and manual review, but these efforts are mostly limited to political information and sensitive content. Trustworthy communication in general domains remains a pain point for industry development. From the user perspective, compared to the critical attitude of users in European and American countries towards AIGC, Chinese users exhibit a more positive attitude towards AIGC applications [4], with their perceived credibility of algorithm-generated news even surpassing their trust in human journalists [5]. If this high perceived trust and optimistic orientation towards AIGC lack rational guidance, it may facilitate the spread of false information, posing new risks to the public opinion environment and information security.
+
+With the accelerated penetration of AIGC products into the mass application market, questions arise regarding users' perceived trust in these products, the factors influencing this trust, and how to effectively evaluate the credibility of AIGC. These issues require further exploration. Against this backdrop, this study takes current typical text-based AI-generated content domestically and internationally as the research object and constructs an AIGC credibility evaluation system from three dimensions—system credibility, media credibility, and content credibility—based on the Trustworthy AI Framework, Media Trust, and Dual-Process Theory. This aims to uncover subjective perceptual factors and objective technological factors influencing credibility, providing theoretical references and practical guidance for intelligent communication and information ecosystem governance.
+
+## Literature Review: Trustworthy Systems, Trustworthy Media, and Trustworthy Content
+
+The research theme of credibility exhibits significant interdisciplinary characteristics, with disciplines such as communication studies, management, computer science, and psychology conducting relevant explorations. Scholars' interpretations of credibility vary. Overall, the analysis of AIGC credibility mainly unfolds from the following perspectives: First, credibility is viewed as an entity composed of multiple facets or dimensions. Wu Dan et al. analyzed the impact of content integrity, question-answering interactivity, and tool anthropomorphism on the credibility of generative intelligent search from three dimensions [6]. Jiang Zhongbo et al. measured credibility using five items: accuracy, trustworthiness, bias, authenticity, and authority. Second, credibility is explained through its sources [5]. Fogg et al. proposed interpreting credibility from the perspective of information sources, such as presumed credibility derived from stereotypes, reputational credibility generated by third-party endorsements, surface credibility obtained from external characteristics, and experiential credibility judged from experience [7]. Third, the subjects of credibility are categorized for analysis. Song Shijie et al. proposed constructing an AIGC credibility research framework from four perspectives: information sources, interaction modes, social actors, and algorithmic metaphors [8]. Liu Haiming and Li Jiaguai argued that in human-machine interaction scenarios, trust originates from the reliability, credibility, and responsible content generation of algorithmic systems on the one hand, and is reflected in the anthropomorphic characteristics of chatbots that bring positive responses to user interaction processes on the other hand [9]. Additionally, some scholars name credibility based on the research objects involved, such as review credibility and advertisement credibility.
+
+Building on credibility research across multiple traditional disciplines, the credibility of AIGC has emerged as a new research topic. With the widespread application of artificial intelligence technology, besides evaluating AI's credibility from a system technology perspective by treating it as a software system and analyzing content credibility from an information content perspective as in traditional research, scholars have recognized the value and importance of AI as a medium. Therefore, in the interpretation of AIGC credibility, it is also necessary to examine its credibility from a media dimension. A comprehensive review of typical domestic and international research findings on credibility from the three dimensions of system credibility, media credibility, and content credibility is presented in Table 1, with only a few examples listed due to space constraints.
+
+Table 1 Research Findings Related to Credibility Evaluation at Home and Abroad
+Evaluation Perspective | Evaluation Dimension | Researcher | Evaluation Approach |
Source | System Credibility | EU[10] | Accountability, Inclusiveness, Autonomy, Fairness, Privacy, Robustness, Security, Transparency |
OECD[11] | Sustainable Development, Values, Fairness, Transparency, Explainability, Robustness, Security, Safety |
Singh[12] | Fairness, Explainability, Robustness, Privacy, Security, Appropriateness |
Fujii[1] | Integrity, Robustness, System Quality, Agility |
He Jifeng[14] | Robustness, Self-Reflection, Adaptability, and Fairness |
Channel | Media Credibility | Schweiger [15] | Media Type, Media Subcategory, Media Product, Editorial Unit, Information Creator, Information Presenter |
Metzger[16] | Source, Information, and Channel Transmitting Information |
Li Xiaojing[17] | Credibility of Media Organizations/Journalists (Source), Credibility of News Reports (Information), etc. |
Zhang Hongzhong and Ren Wujiong[18] | Public Trust in Mass Media, Trust in Social Media, Human-Machine Trust Based on Machine Identity, Human-Machine Trust Based on Language Dialogue |
Content | Content Credibility | Sundar[19] | MAIN Model (From Technological Affordance to Credibility Judgment) |
Flanagin and Metzger [20] | Dual-Process Model (Heuristic, Systematic) |
+ | | Hilligoss and Rieh [21] | Integrated Framework for Credibility Assessment (Construction Layer, Exploration Layer, Interaction Layer) |
| | | 互层) |
+
+From the perspective of system credibility, credible evaluation is an indispensable part of ensuring system credibility. Currently, there are many relevant achievements in the evaluation of artificial intelligence system credibility, which are applied to the credibility evaluation of various artificial intelligence systems. The European Commission proposed a draft ethical guideline for trustworthy artificial intelligence in 2019 [10], outlining 10 basic requirements for trustworthy AI. Building on this, the Organization for Economic Cooperation and Development (OECD) added long-term indicators such as inclusive growth, sustainable development, and well-being [11]. The China Academy of Information and Communications Technology has gradually constructed and improved the "Trustworthy AI" evaluation system, focusing on evaluating the service capabilities of AI products, the maturity of application and management, and credible risks [22]. Singh et al. proposed six credible attributes for artificial intelligence: fairness, interpretability, robustness, privacy, security, and propriety [12]. Fujii et al. placed greater emphasis on integrity, robustness, system quality, and agility [13]. He Jifeng also proposed that artificial intelligence should possess robustness, self-reflection, adaptability, and fairness [14].
+
+From the perspective of media credibility, with the emergence of numerous new media technologies, there have been significant shifts in the types of media that audiences are exposed to, their media usage habits, and their media cognition and judgment. Schweiger summarized six dimensions of media credibility evaluation by Western scholars, namely media type, media subcategory, media product, editorial unit, information creator, and information presenter [15]. Metzger et al. suggested measuring the source, information, and the channels through which information is transmitted separately [16]. Li Xiaojing, based on China's actual media environment and audience characteristics, focused her research on the credibility of media institutions/journalists (sources) and news reports (information) [17]. Zhang Hongzhong et al. classified media trust into mass media credibility based on content and carrier, social media trust based on value identification, human-machine trust based on machine identity, and human-machine trust based on language dialogue [18].
+
+From the perspective of content credibility, the current scale of information content production by humans and machines has reached unprecedented levels, making the evaluation of information content credibility an important issue. The evaluation of content credibility involves a wide range of fields, from traditional content resources to UGC, PGC, AIGC, etc. Many scholars have conducted relevant research. Sundar proposed the MAIN model to evaluate information credibility in the new media environment [19]. Flanagin and Metzger proposed a dual-processing model, suggesting that there are two paths for users to evaluate information credibility: heuristic processing and systematic processing [20]. Hilligoss and Rieh proposed an integrated framework for credibility evaluation, arguing that the evaluation process involves three levels: the construction level, the exploration level, and the interaction level, corresponding to the user's definition of credibility, heuristic thinking, and judgment based on information cues, respectively [21].
+
+In summary, due to the unique nature of AIGC systems in human-machine dialogue, the evaluation of AIGC credibility needs to be conducted from three dimensions: system credibility, media credibility, and content credibility. On the one hand, it involves drawing on previous research results; on the other hand, it focuses on the credibility research of AIGC systems themselves to explore methods for measuring their credibility. This not only requires expanding research perspectives but also necessitates the construction of a comprehensive and systematic credibility evaluation system from the bottom up, addressing the underlying technological architecture, intermediate media channels, and user-interactive information content, in line with the development characteristics of AIGC.
+
+## III. Evaluation Framework: Construction and Evaluation Methods of the AIGC Credibility Evaluation System
+
+To further explore the credible cues of AIGC and the credibility differences among different products, this study constructed an AIGC credibility evaluation system through preliminary literature research and analyzed the indicators in combination with expert research scores to build an AIGC credibility evaluation framework. Based on this, the weights of various indicators were determined through the analytic hierarchy process to clarify the importance of relevant factors.
+
+## (I) Construction of the AIGC Credibility Evaluation System
+
+Based on widely adopted evaluation approaches in relevant domestic and international research, namely, source (source credibility), channel (media credibility), and content (content credibility) as the three major theoretical research directions [17], this study evaluated AIGC from three dimensions: system credibility, media credibility, and content credibility, as shown in Figure 1. Based on preliminary literature research and considering the technological and product application characteristics of AIGC, existing indicators of system, media, and content credibility were summarized. This led to the preliminary formulation of AIGC credibility evaluation indicators, thereby determining the evaluation indicator framework. Among them, the analysis of system credibility is a core dimension directly related to the AIGC technological architecture. The examination of content credibility represents a continuation and inheritance of traditional credibility research, while the focus on media credibility represents an expansion of the intelligent communication perspective in the field of credibility research.
+
+
+Figure 1 AIGC Credibility Evaluation System
+
+### 1. System Credibility
+
+From the perspective of system technology, "trustworthiness" is a system metric developed based on concepts such as "reliability" and "security." It represents an overall evaluation by humans of various trustworthy attributes during the research, development, and application of systems [23]. It reflects not only the objective performance of the system itself but also the subjective perception of users towards the system. Compared to early trustworthy systems that focused on hardware devices, trustworthy artificial intelligence (trustworthy AI) places more emphasis on trustworthy attributes at the software level. It refers to AI systems that, during the design, development, deployment, and use processes, aim to gain the trust and acceptance of users and society based on attributes such as product performance and risk assurance [24].
+
+The trustworthiness of AI systems relies on a substantial amount of high-quality, reliable data support, appropriate algorithm model applications, and expected output results. Considering the characteristics of AIGC technology and existing relevant research, for AIGC products, the evaluation of system credibility should comprehensively consider three aspects: underlying training data, algorithm models, and output results. The noise, biases, and privacy infringement risks hidden in the underlying data significantly impact the credibility of the generated content. At the algorithm model level, generative AI, through reinforcement learning based on human feedback, aligns to some extent with human common sense, cognition, needs, and even values. It also introduces filtering mechanisms to block sensitive issues (such as violence, crime, discrimination, etc.), reducing dissemination risks by refusing to answer or providing relatively neutral and safe responses. This ethical constraint and optimization mechanism at the algorithmic level ensure content security to a certain extent. However, the "malleable" nature of algorithms also increases the "discipline" risks of AI itself, and the risk of content untrustworthiness caused by malicious training cannot be ignored. Additionally, limited by the underlying computational logic of statistical language models and the effectiveness of interactive instructions (prompts), AI output results also exhibit instability.
+
+Based on relevant research, this study focuses on analyzing the training data layer from perspectives such as dataset transparency, source reliability, data robustness, data coverage, data scalability, noise removal, and source security. The algorithm model layer is evaluated based on model interpretability, robustness, stability, adaptability, privacy protection, and malleability. For output results, indicators such as the accuracy, error rate, precision, recall, and F-score of the generated content need to be analyzed.
+
+### 2. Media Credibility
+
+In the context of AIGC, the boundaries between media organizations, algorithm platforms, and content producers tend to blur. AI algorithms, as a technological means, embed capabilities for content production, distribution, and agenda-setting, gradually demonstrating a trend of deep mediatization. As an emerging media form, if technological credibility considers AIGC from its underlying operational logic, media credibility evaluates AIGC more as a media carrier and communication entity in terms of its credibility.
+
+Media credibility refers to whether technological channels and specific information dissemination organizations are trustworthy [25]. In early research on media credibility, the competency (i.e., the degree of professionalization) of communicators was a core factor influencing credibility [26]. Competency includes not only the depth and authority of media in professional fields but also their professionalism in information gatekeeping, packaging, and distribution. It directly determines whether the media can output correct and trustworthy information and is also related to the media's brand endorsement, i.e., whether it can take responsibility for the authenticity of the information content. Subsequent related research has added factors such as unselfishness and consistency [27,28] based on media competency. "Unselfishness" relates to whether the media's communication motives are legitimate and pure, i.e., whether the media itself has any 利益 (interest) associations with the subjects it reports on or the issues it involves. For content output platforms, consistency is more reflected in whether there are value biases and contradictions. In addition to ensuring professionalism, unselfishness, and consistency in content output, external means such as effective interaction with audiences, providing feedback loops, establishing transparency and ethical standards, and certification by third-party independent organizations can also help media enhance their credibility [29][30].
+
+Based on relevant literature reviews and considering the technological characteristics of AIGC, this study evaluates its media credibility from four aspects: professional service capabilities, social influence, neutrality of stance, and the establishment of safeguard mechanisms. Professional service capabilities mainly focus on the content accumulation of AIGC platforms in professional fields, collaborations with professional media, and standardized process mechanisms. Social influence includes the platform's social visibility, social evaluation, third-party certifications, and endorsements. Neutrality of stance encompasses the platform's unselfishness, consistency, independence, and profitability. The establishment of safeguard mechanisms involves the platform's user feedback mechanisms, accountability mechanisms, privacy mechanisms/data security mechanisms, etc., and relates to whether the platform can ensure information quality and security.
+
+### 3. Content Credibility
+
+According to the dual-process theory, humans often activate dual cognitive systems when processing and reasoning information: one is a fast, intuitive, and emotional processing mode, and the other is a slow, controlled, and rational thinking mode [3]. The former can process a large amount of information in a short time and quickly make judgments and generate answers, while the latter requires logical reasoning and abstract thinking to form judgments and verify answers. Corresponding to trust mechanisms, intuitive information processing often 激发 (elicits) "form-based trust," while controlled information processing forms "substantive trust." Form-based trust is more based on 认同感 (identification) generated by the surface features and credible cues of the content. Different credible cues form cognitive shortcuts by activating users' existing cognitive structures and attitude tendencies, further helping users judge the credibility of the content. From the perspective of content attributes, this includes the degree of emotionalization of information, the use of multimedia forms in the content, whether other users are mentioned, the reputation of external links/sources included in the information, and the use of specific vocabulary [32]. From the perspective of external cues, this includes social endorsement cues (such as data on reposts, likes, and comments), temporal cues (such as update frequency, timeliness, etc.), and reputation cues (such as historical performance, third-party certifications, integrity records, etc.). Form-based trust often determines users' initial credibility of the content, and this initial credibility plays a significant mediating role in users' subsequent belief changes [33].
+
+Compared to form-based trust, substantive trust mainly explores the credibility of the essence of the content. This includes the objectivity and accuracy of the content, whether there are obvious biases or stance tendencies, and whether there are obvious errors or false information [34]. Elements such as the completeness, consistency, and timeliness of the content also determine its essential credibility, while users' own knowledge levels, experiences, and judgment abilities directly influence their perception differences [35]. The formation of substantive trust often requires users to engage in thoughtful analysis and judgment of the content and progressive 溯源推理 (溯源 reasoning). Both in terms of trust depth and trust duration, substantive trust is stronger than form-based trust. Faced with issues such as widespread errors, factual inconsistencies, political biases, information security, and ethical risks in current AIGC products, analyzing credible cues at both the form and substantive levels and engaging in progressive falsification are considered necessary.
+
+Based on the above discussion, this study divides the content credibility of AIGC into two aspects: formal credibility and substantive credibility. Formal credibility is primarily analyzed from the structural and linguistic characteristics of AIGC, encompassing aspects such as citation norms, response consistency, expression precision, source professionalism, authoritative endorsement, and information noise. Substantive credibility, on the other hand, is mainly evaluated based on the essential attributes of the content, including content authenticity, content accuracy, content objectivity, information timeliness, information completeness, information consistency, quality robustness, content security, and content fairness.
+
+## (II) AIGC Credibility Evaluation Methods
+
+Scholars both domestically and internationally have proposed various subjective or objective evaluation methods for decision analysis targeting specific objectives. The Analytic Hierarchy Process (AHP), introduced by American operations researcher Professor T.L. Saaty in the 1970s, quickly gained widespread application across various fields for its ability to quantitatively analyze qualitative problems through a simple and flexible multi-criteria decision-making approach. Its core idea is to decompose complex decision-making problems into multiple levels and determine the weight coefficients of various factors by constructing a judgment matrix and conducting consistency checks. In this study, we first assembled an expert team to ensure the consistency and effectiveness of team evaluations. Subsequently, based on the Analytic Hierarchy Process, we established a hierarchical structure model. The expert team evaluated the indicators within the aforementioned credibility evaluation framework, constructed a judgment matrix, and conducted consistency checks to calculate and determine the weights of indicators at all levels.
+
+### 1. Expert Team Assembly and Training
+
+The expertise and experience of evaluation experts directly influence the accuracy and reliability of annotation results. To this end, this study selected five experts with extensive experience in the fields of artificial intelligence and natural language processing, including two algorithm engineers, two research scholars in related disciplines, and one senior media practitioner. These experts possess in-depth experience with AIGC products and can effectively understand various instructional documents. Prior to conducting evaluation annotations, staff members introduced the purpose and background of this study, explained the definitions of each indicator, and provided examples and simulated scenarios to ensure that all experts had a clear understanding of the evaluation methods. To ensure the stability and reliability of the evaluation results, this study employed Cohen's Kappa standard measurement method to conduct consistency checks on expert scores. The results showed a Kappa value of 0.647, with P < 0.001, indicating that the experts' understanding and evaluation of each indicator were generally consistent, allowing for subsequent credibility evaluations. Although the scoring process involves subjective judgment, this study ensured the accuracy and reliability of the evaluation results as much as possible by strictly adhering to the diversity of the expert team and the standardization of the scoring methods.
+
+### 2. Indicator Evaluation and Weight Determination
+
+## (1) Establishing a Hierarchical Structure Model
+
+Based on the previously constructed AIGC credibility evaluation framework, the Analytic Hierarchy Process was employed to determine the weights of primary and secondary indicators. First, a hierarchical structure was established, divided into the objective layer, criterion layer, and sub-criterion layer. The objective layer is the AIGC credibility evaluation (U), and the criterion layer includes three primary indicators: system credibility (U1), media credibility (U2), and content credibility (U3). These three main factors correspond to several sub-factors, forming the sub-criterion layer, which includes nine sub-factors such as training data credibility ($\mathbf{U}_{11}$), algorithm model credibility ($\mathrm{U}_{12}$), and output result credibility ($\mathbf{U}_{13}$). The hierarchical structure results are shown in Table 2.
+
+Table 2 Hierarchical Structure of AIGC Credibility Evaluation System
+Objective Layer | Criterion Layer | Sub-criterion Layer | Description of Sub-criterion Layer Indicators |
AIGC | System Credibility | Training Data Credibility | Dataset Transparency, Source Reliability, Data Robustness, Da |
+Credibility Assessment U | (U1) | (U11) | ta Coverage, Data Scalability, Noise Removal, and Source Security |
Algorithm Model Credibility $\left(\mathrm{U}_{12}\right)$ | Model Interpretability, Robustness, Stability, Adaptability, Privacy Protection, and Plasticity |
Output Result Credibility (U13) | Accuracy, Error Rate, Precision, Recall, and F-measure of Generated Content |
Media Credibility (U2) | Professional Service Capability $\left(\mathrm{U}_{21}\right)$ | Content Accumulation in Professional Fields, Professional Media Cooperation, Standardized Process Mechanisms |
Social Influence $\left(\mathrm{U}_{22}\right)$ | Platform's Social Awareness, Social Evaluation, Third-party Certification and Endorsement |
Neutrality of Stance $\left(\mathrm{U}_{23}\right)$ | Platform's Selflessness, Consistency, Independence, Profitability |
Guarantee Mechanism Construction (U24) | Platform's User Feedback Mechanism, Accountability Mechanism, Privacy Mechanism/Data Security Mechanism |
Content Credibility (U3) | Formal Credibility (U31) | Citation Normativity, Response Consistency, Expression Precision, Source Professionalism, Authoritative Endorsement, Information Noise |
Substantive Credibility (U32) | Content Authenticity, Content Correctness, Content Objectivity, Information Timeliness, Information Completeness, Information Consistency, Quality Robustness, Content Security, Content Fairness |
+
+## (2) Construct the judgment matrix
+
+Based on the comprehensive opinions from expert discussions, indicators and factors at each level are compared pairwise, and a quantitative description is given according to their relative importance to construct a comparison matrix, extending down to the lowest level of the hierarchical structure. In the judgment matrix M, the element $\mathfrak{m}_{\mathrm{ij}}$ represents the relative importance of element i to element j and satisfies the following relationships:
+
+$$\mathrm{M}=\left(\mathrm{m}_{\mathrm{ij}}\right)_{\mathrm{n}\times\mathrm{n}},\quad\mathrm{m}_{\mathrm{ij}}>0,\quad\mathrm{m}_{\mathrm{ji}}=\frac{1}{\mathrm{m}_{\mathrm{ij}}},\quad\mathrm{m}_{\mathrm{ii}}=1,\quad\mathrm{i},\quad\mathrm{j}=1,2,\cdots,\mathrm{n}$$
+
+The larger the value of $\mathfrak{m}_{\mathrm{ij}}$, the higher the relative importance of i. The value of $\mathfrak{m}_{\mathrm{ij}}$ is determined according to the nine-level scale method proposed by Satty, as shown in Table 3.
+
+Table 3 Nine-level Scaling Method
+Relative Importance Level | Definition | Description |
1 | Equally Important | The importance of the two indicators is the same |
+ 3 | Slightly Important | Based on experience or judgment, one indicator is slightly more important |
5 | Quite Important | Based on experience or judgment, one indicator is quite important |
7 | Extremely Important | In practice, one indicator is extremely important |
6 | Absolutely Important | There is sufficient evidence to show that one indicator is absolutely important |
2, 4, 6, 8 | Intermediate Value of Adjacent Judgments | Used when a compromise judgment on indicators is required |
+
+## (3) Weight Calculation and Consistency Check
+
+Based on the judgment matrix constructed above, calculate the relative importance of each factor relative to its upper-level criterion layer, that is, the weights. In this study, the sum-product method is used to normalize the judgment matrix. The obtained W is the maximum eigenvector of matrix M, representing the ranking weights of each factor, and the corresponding eigenvalues are calculated as follows:
+
+$$\lambda_{\max}=\sum_{i=1}^{n}\frac{(MW)_{i}}{nW_{i}}$$
+
+Although a preliminary assessment of the consistency of the experts' judgments has been conducted as mentioned earlier, to avoid contradictory or inconsistent conclusions when experts compare indicators in pairs, a consistency check of the judgment matrix is necessary to ensure the rationality of the indicator weights. Generally, the calculated CR value is used as the basis for judging the consistency of the judgment matrix. When the CR value is less than 0.1, the consistency of the judgment matrix is considered acceptable. Otherwise, the judgment matrix should be appropriately revised to avoid excessive consistency deviations in the calculation results, which could affect the accuracy of the evaluation results. The calculation method for the CR value is as follows:
+
+$$CI=\frac{\lambda_{max}-n}{n-1}$$
+
+$$\mathrm{C R}{=}{\frac{\mathrm{C I}}{\mathrm{R I}}}$$
+
+The aforementioned CI is the consistency index, and RI is the average random consistency index of the judgment matrix, which depends on the order of the judgment matrix. The specific correspondence is shown in Table 4.
+
+Table 4 Average Consistency Index (RI) Value
+n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
RI | 0 | 0 | 0.58 | 0.90 | 1.12 | 1.24 | 1.32 | 1.41 | 1.46 |
+
+In terms of specific operations, the criterion-level indicator set is defined as $\mathrm{U}=\left\{\mathrm{U}_{1},\mathrm{U}_{2},\mathrm{U}_{3}\right\}$, and the corresponding weight set obtained through Analytic Hierarchy Process is $\mathrm{A}=(\mathrm{a_{1}},\mathrm{a_{2}},\mathrm{a_{3}})$. Here, $\mathrm{a_{i}}$ represents the proportion of $\mathrm{U_{i}}$ relative to the objective level. The sub-criterion-level indicator set is defined as $\mathrm{U_{k}}=\left\{\mathrm{U_{k1}},\mathrm{U_{k2}},\cdots,\mathrm{U_{kn}}\right\}$, and similarly, the corresponding weight set is $\mathrm{A_{k}=\{a_{k1},a_{k2},\cdots,a_{kn}\}}$, where $\mathrm{a_{k n}}$ represents the proportion of $\mathrm{U_{k n}}$ relative to $\mathrm{U_{k}}$. The variable $k$ indicates the criterion level corresponding to the sub-criterion level, and $n$ represents the number of indicators at that level. In this study, based on the discussions and opinions of five experts, judgments were made on the indicators at each level, resulting in the criterion-level judgment matrix, as shown in Table 5.
+
+Table 5 $\mathrm{U{-}U_{k}}$ judgment matrix
+$\mathrm{U{-}U_{k}}$ | $\mathrm{U}_{1}$ | $\mathrm{U}_{2}$ | $\mathrm{U}_{3}$ |
$\mathrm{U}_{1}$ | 1 | 2 | 1/2$ |
$\mathrm{U}_{2}$ | 1/2 | 1 | 1/5$ |
$\mathrm{U}_{3}$ | 2 | 5 | 1 |
+
+The judgment matrix is normalized using the sum-product method, and its eigenvector $W$ is solved as $\left(0.277, 0.128, 0.595\right)^{\mathrm{T}}$, with the maximum eigenvalue being $\lambda_{\max}=3.006$. The consistency check result for the $U-U_{k}$ judgment matrix shows that $\mathrm{CI}=0.003$, $\mathrm{CR}=0.005<0.1$. Therefore, the consistency of this judgment matrix is acceptable. $A=(0.277, 0.128, 0.595)$ corresponds to the weights of $U_1$, $\mathrm{U_{2}}$, and $\mathrm{U_{3}}$ relative to $U$, respectively.
+
+Subsequently, the judgment matrices for the sub-criterion level indicators $\mathrm{U_{11}, U_{12}, U_{13}}$ relative to the criterion level $\mathrm{U}_{1}$ can be obtained, as shown in Table 6; the judgment matrices for the sub-criterion level indicators $\mathrm{U_{21}, U_{22}, U_{23}, U_{24}}$ relative to the criterion level $\mathrm{U}_{2}$ are shown in Table 7; and the judgment matrices for the sub-criterion level indicators $\mathrm{U}_{31}$ and $\mathrm{U}_{32}$ relative to the criterion level $\mathrm{U}_{3}$ are shown in Table 8.
+
+Table 6 $\scriptstyle\mathrm{U_{1}-U_{1i}}$ Judgment Matrix
+$\mathrm{U_{1}\mathrm{-U_{1i}}}$ | $\mathrm{U_{11}}$ | $\mathrm{U}_{12}$ | $\mathrm{U}_{13}$ |
$\mathrm{U_{11}}$ | | $2$ | $1/2$ |
$\mathrm{U_{12}}$ | $1/2$ | $1$ | $1/4$ |
$\mathrm{U_{13}}$ | | $4$ | $1$ |
+
+For the $\mathrm{U_{1}\mathrm{-U_{1i}}}$ judgment matrix, $A_{1}=(0.286, 0.143, 0.571)$, $\lambda_{max}=3.001$, $CI=0.001$, $CR=0.002<0.1$. Therefore, the consistency of this judgment matrix is acceptable.
+
+Table 7 $U_{2}-U_{2i}$ judgment matrix
+$\mathrm{U_{2}\mathrm{-U_{2i}}}$ | $\mathrm{U}_{21}$ | $\mathrm{U}_{22}$ | $\mathrm{U}_{23}$ | $\mathrm{U}_{24}$ |
$\mathrm{U}_{21}$ | 1 | 1/5 | 1/6 | $1/3$ |
$\mathrm{U}_{22}$ | 5 | 1 | 1 | 3 |
$\mathrm{U}_{23}$ | 6 | 1 | 1 | 3 |
$\mathrm{U}_{24}$ | 3 | 1/3 | 1/3 | 1 |
+
+For the $\mathrm{U_{2}\mathrm{-U_{2i}}}$ judgment matrix, $A_{1}=(0.065, 0.384, 0.401, 0.150)$, $\lambda_{\max}=4.037$, $CI=0.012$, $CR=0.013<0.1$. Therefore, the consistency of this judgment matrix is acceptable.
+
+Table 8 $\mathrm{U_{3}\mathrm{-U_{3i}}}$ judgment matrix
+
+$\mathrm{U_{3}\mathrm{-U_{3i}}}$ | $\mathrm{U}_{31}$ | $\mathrm{U}_{32}$ |
$\mathrm{U}_{31}$ | 1 | $1/7$ |
$\mathrm{U}_{32}$ | 7 | 1 |
+
+For $\mathrm{U}_{3}\mathrm{-U}_{3\mathrm{i}}$, regarding the second-order judgment matrix, its consistency is acceptable, with $A_{1}=(0.125, 0.875)$.
+
+Based on the above results, the overall hierarchical ranking is shown in Table 9 below, thereby identifying the proportions of the sub-criterion layer relative to the criterion layer and relative to the objective layer.
+
+Table 9 Hierarchical Weights of Trustworthy Factors for AIGC
+U | System Credibility $\mathrm{U}_{1}$ | Medium Credibility $\mathrm{U}_{2}$ | Content Credibility $\mathrm{U}_{3}$ | Overall Ranking Weight |
0.277 | 0.128 | 0.595 |
Training Data Credibility $\mathrm{U_{11}}$ | 0.286 | | | 0.079 |
Algorithm Model Credibility $\mathrm{U}_{12}$ | 0.143 | | | 0.040 |
Output Result Credibility $\mathrm{U_{13}}$ | 0.571 | | | 0.158 |
Professional Service Capability $\mathrm{U}_{21}$ | | 0.065 | | 0.008 |
Social Influence $\mathrm{U}_{22}$ | | 0.384 | | 0.049 |
Neutrality of Stance $\mathrm{U}_{23}$ | (missing value corrected to) 0.401 | 0.051 | | Guarantee Mechanism Construction $\mathrm{U}_{24}^{-}$ |
0.150 | | 0.019 | | Formal Credibility $\mathrm{U}_{31}$ |
0.125 | | | 0.074 | Substantive Credibility $\mathrm{U}_{32}$ |
0.875 | | | 0.521 | 0.521 |
+
+According to the results of the weight analysis, for the credibility assessment of AIGC, the degree of influence, from strongest to weakest, is content factors $\mathbf{U}_{3}$ (weight 0.595) > system factors $\mathbf{U}_{1}$ (weight 0.277) > media factors $\mathbf{U}_{2}$ (weight 0.128). At the sub-criterion level, factors with a relatively strong degree of influence include substantive credibility $\mathrm{U}_{32}$ (weight 0.521), output result credibility $\mathrm{U}_{13}$ (weight 0.158), training data credibility $\mathbf{U}_{11}$ (weight 0.079), and formal credibility $\mathrm{U}_{31}$ (weight 0.074). Admittedly, due to differences in media literacy, professional knowledge, operational skills, and other aspects, there is a certain deviation between the importance weighting from the expert perspective and the factors influencing user-perceived credibility. Multi-level exploration of AIGC credibility factors plays a crucial role in preventing communication risks and ensuring content ecosystem security.
+
+## IV Results and Discussion: Multi-Layered Credibility Cues and Progressive Reduction to Absurdity
+
+## (I) Technical Challenges: From Knowledge Blind Spots to Nonlinear Emergence
+
+The maturity and plasticity of the technical system are the underlying reasons affecting the credibility of AIGC. The study found that output result credibility (weight 0.571) and training data credibility (weight 0.286) are important factors influencing system credibility and, consequently, the credibility of AIGC. For current general-purpose AIGC products, although corpus training in general knowledge areas enhances the model's universality, it also, to a certain extent, limits the depth of training in specific fields or professional knowledge domains. Moreover, most AIGC products currently have limited capabilities in invoking professional knowledge content, thus creating "knowledge blind spots" in some fields. Generative AI, based on large language model training and model optimization, demonstrates "nonlinear emergence" capabilities, generating creative outputs beyond preset rules, causing instability in the logic and rationality of AIGC, and even leading to model "hallucinations."
+
+Therefore, to enhance the system credibility of AIGC, continuous optimization is required at multiple levels, including output results, training data, and algorithms. At the training data level, ensuring the quality, diversity, and representativeness of the data is fundamental. High-quality training data can improve the accuracy and reliability of generated content and reduce the likelihood of model bias and erroneous outputs. Therefore, in AIGC product design, it is necessary to ensure the quality of training data, emphasize the transparency of data sources, ensure data diversity and coverage, and avoid distortion of generated content due to data bias. To optimize the quality of AIGC product output results, errors can be reduced through real-time verification mechanisms and multi-round iterations. At the same time, displaying the sources, reasoning processes, and relevant contextual information of generated content to users can help them understand the generation logic of the content, reduce the probability of model "hallucinations," and thereby enhance user trust. Additionally, although the weight of algorithm model credibility is relatively low, improving the transparency and interpretability of algorithms is equally important for building user trust, as it helps users understand the credibility boundaries of generated content.
+
+## (II) Media Differences: From Political Embedding to Capital Manipulation
+
+As a media carrier, the media credibility (weight 0.128) of AIGC platforms significantly influences users' perception of AIGC credibility. The platform's stance neutrality (weight 0.401) and social influence (weight 0.384) have important impacts on media credibility. The political biases and interest orientations of the entities operating AIGC platforms can lead to selective biases in data selection and model parameter tuning, thereby affecting the neutrality and comprehensiveness of generated content. This political embedding not only affects the objectivity of information but also,无形中 [This Chinese phrase "无形中" is translated as "inadvertently" or "subtly" in English, but since the instruction is to keep proprietary terms unchanged if they have a recognized translation, and here it's more of a descriptive phrase without a specific proprietary translation, I'll integrate it smoothly into the sentence] (subtly) reinforces specific political stances through algorithms, undermining the foundation for rational dialogue in the public sphere. Furthermore, the training data for generative AI often originates from large-scale datasets on the internet, and open-source datasets often reflect the values and ideological tendencies of various social organizations and individuals. Capital market "interference" in data sources and algorithm design indirectly affects the value orientations of generated content.
+
+Overall, as a communication platform, media differences are deeply rooted in the complex interplay between politics and capital. The neutrality of AIGC technology does not inherently guarantee its credibility as a communication platform. Only by strengthening the platform's stance neutrality and social influence can its credibility be effectively enhanced, providing a practical path for information ecosystem governance. Strengthening platform neutrality requires approaches such as technical transparency and regulatory mechanisms to ensure regular reviews of platform content generation. Additionally, attention must be paid to the platform's social role. The platform's social influence is not only a reflection of its communication capabilities but also an important tool for shaping public perception, guiding public opinion, and building social trust. Therefore, it is necessary to emphasize the platform's social responsibility, which can be achieved by establishing a social responsibility reporting system where platforms regularly disclose their governance effectiveness to the public to enhance public trust.
+
+## (III) Content Cues: From Cognitive Shortcuts to Formal Trust
+
+Compared to technical and media factors, content factors are the primary element influencing users' perceived credibility. The study found that content credibility (weight coefficient 0.595) has a greater impact on AIGC credibility than system credibility (weight coefficient 0.277) and media credibility (weight coefficient 0.128). Among credible content cues, from an expert perspective, substantive credibility (weight coefficient 0.875) is more important than formal credibility (weight coefficient 0.125). For ordinary users, reliance on explicit cues such as content form is often stronger than on implicit cues requiring reasoning and judgment. Generated content with standardized citation sources and authoritative endorsements is perceived as more credible.
+
+During real-time human-computer interaction, users can access a large volume of generative content in a short period. Since rational judgments formed through logical reasoning require more time and thought, intuitive and experience-based perceptual judgments often dominate. Credible content cues at the content level trigger users' existing cognitive frameworks, enhancing their trust perception and forming a cognitive shortcut. Overall, surface-level heuristic cues, as a fast and empiricist way of thinking, are an important basis for users' credibility judgments. The initial trust formed by these cues further influences the frequency and depth of subsequent human-computer interactions. This trust model based on surface features often arises in situations where information is incomplete or time is limited, where users form vague perceptions based on simplified rules or heuristic strategies. Although this can reduce cognitive load and decision-making time, it often leads to inaccurate judgments and reinforced biases.
+
+From the perspective of specific credible cues, current common AIGC products need improvement in terms of source professionalism and authoritative endorsement. Currently, AIGC products pay insufficient attention to formal credible cues. In AIGC product design, by displaying aspects such as source professionalism, citation 规范性 [This term "规范性" is translated as "standardization" to convey the idea of adhering to norms or standards] (standardization), and response consistency, multi-layered credible cues can be improved at the system level, thereby enhancing user trust in AIGC. Combining formal trust guided by surface-level cues, substantive trust formed through in-depth reasoning and information tracing can further strengthen user stickiness and enhance human-computer trust at a deeper level.
+
+## (IV) Countermeasures: From Reverse Instructions to Progressive Reduction to Absurdity
+
+On the whole, currently, AIGC products have achieved basic availability, but in terms of trustworthiness, multi-dimensional iterations are still needed at the levels of technical systems, media dissemination, and content regulation. In the long run, in addition to the aforementioned aspects, it is essential to continuously improve the trust mechanism of AIGC by considering data quality, algorithm design, platform stance, social influence, content form, and substantive trustworthiness. In the short term, during human-computer interaction, reverse instructions can be used to gradually uncover clues about content trustworthiness and form a layered reductio ad absurdum path, thereby avoiding, to a certain extent, the dissemination of false information and misleading perceptions derived from AIGC.
+
+Specifically, reverse instructions include methods for uncovering trustworthy clues such as reverse information tracing, reverse fact verification, reverse logical verification, and reverse model comparison. Reverse information tracing primarily involves layered and progressive questioning regarding the information sources and references of AIGC, combined with source comparison to assess its credibility. Reverse fact verification requires AI to provide verifiable factual information or evidence materials based on the conclusions it provides, and then verifies its credibility by checking factual elements. Reverse logical verification involves judging the logical coherence and consistency of AIGC through multi-layered progressive questioning, and testing its credibility through the logical self-consistency of the context. Reverse model comparison involves comparing and verifying the output content of multiple generative AI models, and identifying reductio ad absurdum through the degree of content consistency and the extent of conflicts and contradictions.
+
+## V Conclusion
+
+The trustworthiness of AIGC is not only a technical issue but also a communication issue and a social issue. This study explores the hot topic of how to evaluate the trustworthiness of AIGC, establishes a framework for evaluating the trustworthiness of AIGC, clarifies the main influencing factors of AIGC trustworthiness based on the analytic hierarchy process and sufficient expert 论证 (demonstration, kept as is for 专业性 (professionalism)), and proposes optimization strategies to enhance AIGC trustworthiness from the aspects of system technology, media platforms, content clues, and user reverse verification. At the theoretical level, this study emphasizes a multi-dimensional perspective on trustworthiness evaluation encompassing technology, media, and content, providing theoretical references and analytical tools for subsequent research and expanding the application scope of media trustworthiness theory. At the practical level, it offers insights for optimizing the design of AIGC products, building user trust, and governing the information ecosystem. However, due to limitations such as sample scope and biases caused by subjective judgments, future research needs to conduct further empirical analyses involving larger groups of experts or users to validate the evaluation system, thereby improving the AIGC trustworthiness evaluation system, promoting the healthy development of AIGC technology, and providing more scientific and comprehensive support for information ecosystem governance. References
+
+[1] YU P, XIA Z, FEI J, et al. A survey on deepfake video detection[J]. Iet Biometrics, 2021, 10(6): 607-624.
+[2] WHYTE C. Deepfake news: AI-enabled disinformation as a multi-level public policy challenge[J]. Journal of Cyber Policy, 2020, 5(2): 199-217.
+[3] ILLIA L, COLLEONI E, ZYGLIDOPOULOS J. Ethical implications of text generation in the age of artificial intelligence[J]. Business Ethics, the Environment & Responsibility, 2023, 32(1): 201-210.
+[4] WU Y, MOU Y, LI Z, et al. Investigating American and Chinese subjects' explicit and implicit perceptions of AI-generated artistic work[J]. Computers in Human Behavior, 2020, 104: 106186.
+[5] Jiang Zhongbo, Shi Xuemei, Zhang Hongbo. A Study on the Perception of Algorithm News Credibility from the Perspective of Human-Machine Communication: Based on an Analysis of a Controlled Experiment on College Students[J]. Chinese Journal of Journalism & Communication, 2022, 44(3): 34-52.
+[6] Wu Dan, Sun Guoye. Research on the Credibility of Generative Intelligent Search Results[J]. Journal of Library Science in China, 2023, 49(6): 51-67.
+[7] FOGG B J, TSENG H. The elements of computer credibility[C]//Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1999: 80-87.
+[8] Song Shijie, Zhao Yuxiang, Zhu Qinghua. From ELIZA to ChatGPT: Evaluating the Credibility of AI-Generated Content in Human-AI Interaction Experiences[J]. Information and Documentation Services, 2023, 44(04): 35-42.
+[9] Liu Haiming, Li Jiayi. "Believing in a Piece of Code": Cognitive Pathways and Algorithm Trust Construction of ChatGPT-Generated Content[J]. Media Observation, 2024(05): 71-79.
+[10] European Commission. Ethics Guidelines for Trustworthy AI [EB/OL]. https://www.i-programmer.info/programming/artificial-intelligence/12702-ethics-guidelines-for-trustworthy-ai-.html
+[11] OECD Legal Instruments. Recommendation of the Council on Artificial Intelligence [EB/OL]. https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
+[12] SINGH R, VATSA M, RATHA N. Trustworthy AI[C]//Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data. 2021: 449-453.
+[13] FUJII G, HAMADA K, ISHIKAWA F, et al. Guidelines for Quality Assurance of Machine Learning-Based Artificial Intelligence[J]. International Journal of Software Engineering and Knowledge Engineering, 2020, 30(11-12): 1589-1606.
+[14] He Jifeng. Safe and Trustworthy Artificial Intelligence[J]. Information Security and Communications Privacy, 2019(10): 5-8.
+
+[15] SCHWEIGER W. Media credibility—experience or image? A survey on the credibility of the World Wide Web in Germany in comparison to other media[J]. European Journal of Communication, 2000, 15(1): 37-59.
+[16] METZGER M J, FLANAGIN A J, EYAL K, et al. Credibility for the 21st century: Integrating perspectives on source, message, and media credibility in the contemporary media environment[J]. Annals of the International Communication Association, 2003, 27(1): 293-335.
+[17] Li Xiaojing. Research on Media Credibility in the Chinese Social Context[M]. Shanghai: Shanghai Jiao Tong University Press, 2019: 148.
+[18] Zhang Hongzhong, Ren Wujiong. Human-machine dialogue beyond the "second self": Exploring trust relationships based on AI large model applications[J]. Journalism Bimonthly, 2024(3): 47-60.
+[19] SUNDAR S S. The MAIN model: A heuristic approach to understanding technology effects on credibility[M]. Cambridge, MA: MacArthur Foundation Digital Media and Learning Initiative, 2008: 73-100.
+[20] FLANAGIN A J, METZGER M J. The role of site features, user attributes, and information verification behaviors on the perceived credibility of web-based information[J]. New Media & Society, 2007, 9(2): 319-342.
+[21] HILLIGOSS B, RIEH S Y. Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in context[J]. Information Processing & Management, 2008, 44(4): 1467-1484.
+[22] China Quality Daily. China Academy of Information and Communications Technology Releases the "Trustworthy AI" Quality Assessment System for Artificial Intelligence Datasets[EB/OL]. https://www.cqn.com.cn/zgzlb/content/2024-12/31/content_9085042.htm
+[23] Liu Han, Li Kaixuan, Chen Yixiang. A Review of Research on Trustworthiness Measurement and Evaluation of Artificial Intelligence Systems[J]. Journal of Software, 2023, 34(8): 3774-3792.
+[24] KAUR D, USLU S, RITTICHIER K J, et al. Trustworthy artificial intelligence: a review[J]. ACM Computing Surveys, 2022, 55(2): 1-38.
+[25] Pan Ji. From Reflecting Reality to Symbolic Construction: Information Credibility Assessment in the Network Environment[J]. Modern Communication (Journal of Communication University of China), 2018, 40(11): 86-90.
+[26] HOVLAND C I, WEISS W. The influence of source credibility on communication effectiveness[J]. Public Opinion Quarterly, 1951, 15(4): 635-650.
+[27] WALSTER E, ARONSON V, ABRAHAMS D, et al. Importance of physical attractiveness in dating behavior[J]. Journal of Personality and Social Psychology, 1966, 4(5): 508.
+
+[28] Zhu Jianhua. Theorization and Localization of Chinese Communication Research: Taking the Integrated Theory of Audience and Media Effects as an Example [J]. Journalism Research, 2001, 68: 1-22.
+[29] GOLAN G J. New perspectives on media credibility research [J]. American Behavioral Scientist, 2010, 54(1): 3-7.
+[30] Li Xiaojing, Liu Yining. How Does Converged Media Promote Political Trust Among Chinese Youth? An Examination Based on a Chain Dual Mediation Model [J]. Journalism Bimonthly, 2023, (09): 46-57.
+[31] POWELL T E, Boomgaarden H G, DE SWERT K, et al. Framing fast and slow: A dual processing account of multimodal framing effects [J]. Media Psychology, 2019, 22(4): 572-600.
+[32] GUPTA A, KUMARAGURU P, CASTILLO C, et al. Tweetcred: Real-time credibility assessment of content on twitter [C]// International Conference on Social Informatics. 2014: 228-243.
+[33] SLATER M D, ROUNER D. How message evaluation and source attributes may influence credibility assessment and belief change [J]. Journalism & Mass Communication Quarterly, 1996, 73(4): 974-991.
+[34] FAIRBANKS J, FITCH N, KNAUF N, et al. Credibility assessment in the news: do we need to read [C]// Proceeding of the MIS2 Workshop Held in Conjunction with 11th International Conference on Web Search and Data Mining. 2018: 799-800.
+[35] LEDERMAN R, FAN H, SMITH S, et al. Who can you trust? Credibility assessment in online health forums [J]. Health Policy and Technology, 2014, 3(1): 13-25.
diff --git a/OSPP__toWord/document_sample.pdf b/OSPP__toWord/document_sample.pdf
new file mode 100644
index 0000000000..1308da0d2a
Binary files /dev/null and b/OSPP__toWord/document_sample.pdf differ
diff --git a/OSPP__toWord/document_sample_with_headfoot.md b/OSPP__toWord/document_sample_with_headfoot.md
new file mode 100644
index 0000000000..b5ac2e517c
--- /dev/null
+++ b/OSPP__toWord/document_sample_with_headfoot.md
@@ -0,0 +1,305 @@
+1
+
+
+Original Text: 新媒体与社会
+
+New Media and Society
+
+ISSN, CN
+
+# Online First Paper of New Media and Society
+
+Title: Formal Trust and Progressive Reductio ad Absurdum: Research on the Credibility Evaluation of Artificial Intelligence-Generated Content
+
+Authors: Yang Yanni, Wu Leilong, Xiang Anling, Zhang Jiacheng
+
+Online First Date: 2025-09-11
+
+Citation Format: Yang Yanni, Wu Leilong, Xiang Anling, Zhang Jiacheng. Formal Trust and Progressive Reductio ad Absurdum: Research on the Credibility Evaluation of Artificial Intelligence-Generated Content [J/OL]. New Media and Society.
+
+https://link.cnki.net/urlid/CN.20250910.1710.008
+
+
+
+
+Online First Publication: In the workflow of the editorial department, a manuscript undergoes several stages from acceptance to publication, including final acceptance draft, typeset draft, and final assembled draft for the entire issue. The final acceptance draft refers to a manuscript whose content has been finalized and has been approved for publication after peer review and final approval by the editor-in-chief. The typeset draft refers to the manuscript that has been typeset according to the specific layout of the journal (including the online presentation format) based on the final acceptance draft, with the publication year, volume, issue, and page numbers temporarily undetermined. The final assembled draft for the entire issue refers to the assembled manuscript for the entire issue, either in print or digital format, with the publication year, volume, issue, and page numbers all determined. The content of manuscripts published online first as final acceptance drafts must comply with the relevant provisions of the "Regulations on Publication Administration" and the "Regulations on the Administration of Journal Publication": academic research achievements should be innovative, scientific, and advanced, meeting the editorial department's acceptance requirements for published articles, and free from academic misconduct and other infringing acts. The manuscript content should generally conform to the national technical standards for book and journal editing and publication, with correct usage and unified standardization of language, symbols, numbers, foreign letters, legal units of measurement, and map annotations. To ensure the seriousness of the online first publication of final acceptance drafts, once published, the title, author(s), institutional affiliation(s), and academic content of the paper cannot be modified; only minor textual revisions based on editorial norms are allowed.
+
+Publication Confirmation: The editorial department of the print journal, by signing an agreement with the Electronic Magazine Co., Ltd. of "China Academic Journal (CD Edition)," establishes an online edition on the "China Academic Journal (Network Edition)" publication and dissemination platform that is consistent in content with the print journal. The final acceptance drafts, typeset drafts, and final assembled drafts for the entire issue of papers are published in the form of individual articles or entire issues on this platform prior to print publication. Since the "China Academic Journal (Network Edition)" is a network continuous publication approved by the State Administration of Press, Publication, Radio, Film and Television (ISSN 2096-4188, CN 11-6037/Z), papers published online first in the network edition of the contracted journals are considered as formally published.
+
+1
+
+# Formal Trust and Hierarchical Fallacy: Research on the Assessment of Credibility of Artificial Intelligence Generated Content*
+
+Yanni Yang, Leilong Wu, Anling Xiang, Jiacheng Zhang**
+
+Abstract: Constructing a scientific and effective credibility evaluation system for Artificial Intelligence Generated Content (AIGC) is a significant topic in current human-machine trust research. This study incorporates an examination of AIGC's media attributes from the perspective of intelligent communication, establishing a comprehensive and systematic AIGC credibility evaluation system from the bottom-up, spanning from the underlying technological architecture, intermediate media channels, to the user-interactive information content. Based on the Analytic Hierarchy Process and with thorough expert论证 (demonstration/deliberation, here "demonstration" is used for a more natural English expression), the primary influencing factors of AIGC credibility are clarified. The research indicates that content credibility is the foremost factor influencing users' perception of AIGC credibility. Output results and training data significantly impact system credibility; stance neutrality and social influence affect media credibility; while substantive credibility and formal credibility influence content credibility. Furthermore, optimization strategies are proposed to enhance AIGC credibility from aspects such as system technology, media platforms, content cues, and user reverse verification, providing references for the optimization design of AIGC products, user trust construction, and information ecosystem governance.
+
+Keywords: Artificial Intelligence Generated Content, credibility evaluation framework, formal trust, hierarchical fallacy
+
+# Formal Trust and Hierarchical Fallacy: Research on the Assessment of Credibility of Artificial Intelligence Generated Content
+
+Yanni Yang, Leilong Wu, Anling Xiang, Jiacheng Zhang
+
+Abstract Establishing a scientific and effective credibility evaluation system for AIGC has become a crucial topic in human-machine trust research. This study integrates the AIGC's media
+
+1
+
+It analyzes attributes from the perspective of intelligent communication and constructs a comprehensive and systematic AIGC credibility evaluation framework from the bottom up—spanning the underlying technical architecture, intermediary media channels, to user-interactive information content. Utilizing hierarchical analysis and based on extensive expert discussion and argumentation, the study identifies the primary factors influencing AIGC credibility. Research findings indicate that content credibility is the foremost factor affecting users' perception of AIGC trustworthiness. Output results and training data significantly impact system credibility, while neutrality and social influence affect media credibility, and substantive and formal credibility influence content credibility. The study proposes optimization strategies to enhance AIGC credibility, focusing on system technology, media platforms, content cues, and user reverse verification, offering insights for AIGC product design optimization, user trust building, and information ecosystem governance.
+
+Keywords: AIGC; Credibility; Evaluation Framework; Formal Trust; Hierarchical Fallacy
+
+## I Research Background: The Challenge of Trustworthy Communication in the AIGC Era
+
+The global production of information content is currently undergoing a new generational revolution. Empowered by technologies such as large language models, Artificial Intelligence Generated Content (AIGC) has surpassed traditional Professional Generated Content (PGC) and User Generated Content (UGC) in terms of production scale and output efficiency, gradually becoming the new engine of media content production in the Web3.0 era. As AIGC accelerates its penetration into industrial applications and mass communication, the human-centered paradigm of content production and dissemination has begun to shift towards an ecosystem of human-machine collaboration and symbiosis. The quasi-subjectivity of AI in agenda-setting, content curation, and information distribution has become prominent, leading to imbalances and reconfigurations in information production models and information power. Consequently, the resulting human-machine trust issues have become a bottleneck for AIGC development.
+
+In particular, AIGC products are currently facing significant criticism regarding the supply of trustworthy content. On the one hand, constrained by the openness of training corpus data for large language models and their probabilistic content generation logic, errors such as conceptual confusion, factual splicing, temporal-spatial dislocation, and false causation frequently occur in the content generated by some AIGC products. Fact-checking has become a "hard injury" for most AIGC products. On the other hand, although AIGC-related algorithms themselves have no inherent moral attributes, this technology has been widely used for negative purposes such as political manipulation and unfair business competition[1], such as AI-generated fake news and mass-produced false information[2]. Additionally,
+
+1
+
+Multimodal AIGC, encompassing images, videos, audio, and other forms, exhibits strong visual ambiguity, further exacerbating the spread of false and misleading information, and posing new challenges to international politics, society, and human rights development [3].
+
+In response to issues such as content inaccuracy and human-machine trust arising from AIGC, multiple countries and regions have strengthened institutional constraints and legal safeguards at the top-level design level. China has also successively introduced policy documents such as the "Ethical Guidelines for New-Generation Artificial Intelligence," "Provisions on the Administration of Deep Synthesis in Internet Information Services," and the "Administrative Measures for Generative Artificial Intelligence Services (Draft for Comments)," explicitly stating that "content generated using generative artificial intelligence should be true and accurate." AIGC platforms both domestically and internationally have also enhanced content quality through multiple constraints such as algorithmic iteration, model correction, and manual review. However, these efforts are more limited to political information and sensitive content, with credible communication in general fields remaining a pain point for industry development. From the user perspective, compared to the critical attitude of users in European and American countries towards AIGC, Chinese users demonstrate a more pronounced positive attitude towards its application [4], with their perceived credibility of algorithm-generated news even surpassing their trust in human journalists [5]. If this high perceived trust and optimistic orientation towards AIGC lack rational guidance, it may fuel the spread of false information, posing new risks to the public opinion environment and information security.
+
+At a time when AIGC products are rapidly penetrating the mass-market application sector, what are users' perceptions of trust towards them? What factors influence these perceptions? How can the credibility of AIGC be effectively evaluated? A series of questions remain to be further explored. Against this backdrop, taking current typical text-based AI-generated content domestically and internationally as the research object, an AIGC credibility evaluation system is constructed from three dimensions—system credibility, media credibility, and content credibility—based on the Trustworthy AI framework, media trust, and dual-process theory. This aims to uncover subjective perceptual factors and objective technological factors influencing credibility, with the expectation of providing theoretical references and practical guidance for intelligent communication and information ecosystem governance.
+
+## Literature Review: Trustworthy Systems, Trustworthy Media, and Trustworthy Content
+
+The research theme of credibility exhibits significant interdisciplinary characteristics, with disciplines such as communication studies, management, computer science, and psychology all conducting relevant explorations. Scholars' interpretations of credibility also vary. Overall, the analysis of AIGC credibility primarily unfolds from the following perspectives: First, credibility is viewed as an entity composed of multiple facets or dimensions. Wu Dan et al. analyzed the impact of content integration, question-answering interactivity, and tool anthropomorphism on the credibility of generative intelligent search from these three dimensions [6]. Jiang Zhongbo et al. set up five items—accuracy, credibility, bias, authenticity, and authority—to measure credibility. Second, credibility is explained through its sources [5]. Fogg et al. proposed interpreting credibility from the perspective of information sources, such as presumed credibility derived from stereotypes, reputational credibility generated by third-party endorsements, surface credibility obtained from external characteristics, and experiential credibility judged from experience [7]. Third, the subjects of credibility are categorized into several types for analysis. Song Shijie et al. proposed constructing an AIGC credibility research framework from four perspectives in the future: information sources, interaction modes, social actors, and algorithmic metaphors [8].
+
+1
+
+Liu Haiming and Li Jiaguai believe that in the context of human-computer interaction, trust, on the one hand, stems from the reliability, credibility, and responsible content generation of algorithmic systems, and on the other hand, is reflected in the anthropomorphic characteristics of chatbots that elicit positive responses from users during interactions [9]. Additionally, some scholars have named credibility based on the research subjects involved in their studies, such as review credibility, advertisement credibility, etc.
+
+Building upon credibility research in multiple traditional disciplines, the credibility of AIGC (Artificial Intelligence Generated Content) has emerged as a new research topic. With the widespread application of artificial intelligence technology, in addition to evaluating the credibility of AI as a software system from a system technology perspective and analyzing content credibility from an information content perspective in traditional research, scholars have recognized the value and importance of AI as a medium. Therefore, in the interpretation of AIGC credibility, it is also necessary to examine its credibility from a media dimension. A comprehensive review of typical domestic and international research findings on AIGC credibility from the three dimensions of system credibility, media credibility, and content credibility is presented in Table 1, with only a few examples listed due to space constraints.
+
+Table 1 Research Findings Related to Credibility Evaluation at Home and Abroad
+Evaluation Perspective | Evaluation Dimension | Researcher | Evaluation Approach |
Source | System Credibility | EU[10] | Accountability, Inclusiveness, Autonomy, Fairness, Privacy, Robustness, Security, Transparency |
OECD[11] | Sustainable Development, Values, Fairness, Transparency, Explainability, Robustness, Security, Safety |
Singh[12] | Fairness, Explainability, Robustness, Privacy, Security, Appropriateness |
Fujii[1] | Integrity, Robustness, System Quality, Agility |
He Jifeng[14] | Robustness, Self-Reflection, Adaptability, and Fairness |
Channel | Media Credibility | Schweiger [15] | Media Type, Media Subcategory, Media Product, Editorial Unit, Information Creator, Information Presenter |
Metzger[16] | Source, Information, and Channel Transmitting Information |
Li Xiaojing[17] | Credibility of Media Organizations/Journalists (Source), Credibility of News Reports (Information), etc. |
Zhang Hongzhong and Ren Wujiong[18] | Public Trust in Mass Media, Trust in Social Media, Human-Machine Trust Based on Machine Identity, Human-Machine Trust Based on Language Dialogue |
Content | Content Credibility | Sundar[19] | MAIN Model (From Technological Affordance to Credibility Judgment) |
Flanagin and Metzger [20] | Dual-Process Model (Heuristic, Systematic) |
+1
+ | | Hilligoss and Rieh [21] | Integrated Framework for Credibility Assessment (Construction Layer, Exploration Layer, Interaction Layer) |
| | | 互层) |
+
+From the perspective of system credibility, credibility evaluation is an indispensable part of ensuring system credibility. Currently, there are many relevant achievements in the evaluation of artificial intelligence system credibility, which have been applied to the credibility evaluation of various types of artificial intelligence systems. The European Commission proposed a draft ethical guideline for trustworthy artificial intelligence in 2019 [10], outlining 10 basic requirements for trustworthy AI. Building on this, the Organization for Economic Cooperation and Development (OECD) added long-term indicators such as inclusive growth, sustainable development, and well-being [11]. The China Academy of Information and Communications Technology has gradually constructed and improved a "trustworthy AI" evaluation system, focusing on evaluating the service capabilities of AI products, the maturity of application and management, and trustworthy risks [22]. Singh et al. proposed six trustworthy attributes for artificial intelligence: fairness, interpretability, robustness, privacy, security, and propriety [12]. Fujii et al. placed greater emphasis on integrity, robustness, system quality, and agility [13]. He Jifeng also proposed that artificial intelligence should possess robustness, self-reflection, adaptability, and fairness [14].
+
+From the perspective of media credibility, with the emergence of numerous new media technologies, there have been significant shifts in the types of media that audiences are exposed to, their media usage habits, and their media cognition and judgment. Schweiger summarized six dimensions used by Western scholars to evaluate media credibility, namely media type, media subcategory, media product, editorial unit, information creator, and information presenter [15]. Metzger et al. suggested measuring the source, information, and the channels through which information is transmitted separately [16]. Li Xiaojing, based on China's actual media environment and audience characteristics, focused her research on the credibility of media institutions/journalists (sources) and news reports (information) [17]. Zhang Hongzhong et al. classified media trust into four categories: the credibility of mass media based on content and carrier, trust in social media based on value identification, human-machine trust based on machine identity, and human-machine trust based on language dialogue [18].
+
+From the perspective of content credibility, the current scale of information content production by humans and machines has reached unprecedented levels, making the evaluation of information content credibility an important issue. The evaluation of content credibility covers a wide range of fields, from traditional content resources to UGC, PGC, AIGC, etc. Many scholars have conducted relevant research. Sundar proposed the MAIN model to evaluate information credibility in the new media environment [19]. Flanagin and Metzger proposed a dual-processing model, suggesting that there are two paths for users to evaluate information credibility: heuristic processing and systematic processing [20]. Hilligoss and Rieh proposed an integrated framework for credibility evaluation, arguing that the evaluation process involves three levels: the construction level, the exploration level, and the interaction level, corresponding to users' definitions of credibility, heuristic thinking, and judgments based on information cues, respectively [21].
+
+In summary, due to the unique nature of AIGC systems in human-machine dialogue, the evaluation of AIGC credibility needs to be conducted from three dimensions: system credibility, media credibility, and content credibility. On the one hand, it involves drawing on previous research results; on the other hand, it focuses on studying credibility measurement methods specific to AIGC systems themselves. This not only requires expanding research perspectives but also necessitates the construction of a comprehensive and systematic credibility evaluation system from the bottom up, addressing the development characteristics of AIGC, ranging from the underlying technical architecture and intermediate media channels to the information content for user interaction.
+
+1
+
+## III Evaluation Framework: Construction and Evaluation Methods of the AIGC Credibility Evaluation System
+
+To further explore the credible clues of AIGC and the credibility differences among different products, this study constructs an AIGC credibility evaluation system through preliminary literature research and analyzes the indicators based on expert research scores to build an AIGC credibility evaluation framework. On this basis, the weights of various indicators are determined through the Analytic Hierarchy Process to clarify the importance of each relevant factor.
+
+## (I) Construction of the AIGC Credibility Evaluation System
+
+Based on the widely adopted evaluation approaches in relevant domestic and international research, namely, three major theoretical research directions: source (source credibility), channel (medium credibility), and content (content credibility) [17], and in accordance with the specific scenario of this study, AIGC is evaluated from three perspectives: system credibility, medium credibility, and content credibility, as shown in Figure 1. Based on preliminary literature research and combined with the characteristics of AIGC-related technologies and product applications, existing indicators of system, medium, and content credibility are summarized. This leads to the preliminary formulation of AIGC credibility evaluation indicators, thereby determining the evaluation indicator framework. Among them, the analysis of system credibility is a core dimension directly related to the AIGC technical architecture. The examination of content credibility is a continuation and inheritance of traditional credibility research. The focus on medium credibility represents an expansion of the intelligent communication perspective within the field of credibility research themes.
+
+
+Figure 1 AIGC Credibility Evaluation System
+
+### 1. System Trustworthiness
+
+From the perspective of system technology, "trustworthiness" is a system metric developed on the basis of concepts such as "reliability" and "security." It represents an overall evaluation by humans of various trustworthy attributes in the process of system research, development, and application [23]. It reflects not only the objective performance of the system itself but also the subjective perception of users towards the system. Compared to early trustworthy systems that focused on hardware devices, trustworthy artificial intelligence (trustworthy AI) places more emphasis on trustworthy attributes at the software level. It refers to AI systems that, during the processes of design, development, deployment, and use, are based on attributes such as product performance and risk assurance to gain the trust and acceptance of users and society [24].
+
+The trustworthiness of AI systems relies on the support of a large amount of high-quality, reliable data, appropriate algorithm model applications, and compliance with
+
+1
+
+Expected output results. Considering the characteristics of AIGC technology and existing relevant research, the evaluation of system credibility for AIGC products requires a comprehensive assessment from three aspects: underlying training data, algorithmic models, and output results. The noise, biases, and privacy infringement risks hidden in the underlying data significantly impact the credibility of the generated content. At the algorithmic model level, generative artificial intelligence, through reinforcement learning based on human feedback, aligns itself to a certain extent with human common sense, cognition, needs, and even values. Simultaneously, filtering mechanisms are introduced to block sensitive issues (such as violence, crime, discrimination, etc.), reducing dissemination risks by refusing to answer or providing relatively neutral and safe responses. These ethical constraints and optimization mechanisms at the algorithmic level ensure content security to a certain extent. However, the "malleable" nature of algorithms also increases the risk of "disciplining" AI itself, and the risk of content untrustworthiness due to malicious training cannot be ignored. Additionally, limited by the underlying computational logic of statistical language models and the effectiveness of interactive instructions (prompts), AI output results also exhibit instability.
+
+Based on relevant research, this study focuses on analyzing the training data layer from perspectives such as dataset transparency, source reliability, data robustness, data coverage, data scalability, noise removal, and source security. The algorithmic model layer is evaluated based on model interpretability, robustness, stability, adaptability, privacy protection, and malleability. For output results, indicators such as the accuracy, error rate, precision, recall rate, and F-score of the generated content need to be analyzed.
+
+### 2. Media Credibility
+
+In the context of AIGC, the boundaries between media organizations, algorithmic platforms, and content producers tend to blur. AI algorithms, as a technological means, embed capabilities for content production, distribution, and agenda-setting, gradually exhibiting a trend of deep mediatization. As an emerging media form, if technological credibility considers AIGC from its underlying operational logic, media credibility assesses AIGC's credibility more as a media carrier and communication subject.
+
+Media credibility refers to whether technological channels and specific information dissemination organizations are trustworthy [25]. In early research on media credibility (media credibility), the competence (competency) of communicators, i.e., their level of professionalization, was a core factor influencing credibility [26]. Competence includes not only the depth and authority of media in their professional fields but also their level of professionalization in communication processes such as information gatekeeping, packaging, and distribution. It directly determines whether the media can output correct and credible information and is also related to the media's own brand endorsement, i.e., its ability to take responsibility for the authenticity of the information content. Subsequent related research added factors such as unselfishness and consistency [27,28] based on media competence. "Unselfishness" relates to whether the media's communication motives are legitimate and pure, i.e., whether the media itself has any利益 (Note: "利益" is left untranslated here as it is a Chinese term embedded in the English context for illustrative purposes; in a complete translation, it would be translated as "interests" or "vested interests") connections with the subjects it reports on or the issues it involves. For content output platforms, consistency is more reflected in whether there are value biases and inconsistencies over time. In addition to ensuring professionalism, unselfishness, and consistency at the content output level, external means such as effective interaction with audiences, providing feedback loops, establishing transparency and ethical standards, and certification by third-party independent organizations can also help media enhance their credibility [29][30].
+
+Based on relevant literature reviews and considering the technological characteristics of AIGC, this study evaluates from the perspectives of professional service capabilities, social influence, and
+
+1
+
+Its media credibility is evaluated from four dimensions: professional service capability, social influence, neutrality of stance, and the establishment of a safeguard mechanism. Professional service capability primarily focuses on the content accumulation of AIGC platforms in professional fields, collaborations with professional media, and standardized procedural mechanisms. Social influence encompasses the platform's social recognition, public evaluation, third-party certifications, and endorsements. Neutrality of stance includes the platform's selflessness, consistency, independence, and profitability. The establishment of a safeguard mechanism involves aspects such as the platform's user feedback mechanisms, accountability mechanisms, and privacy/data security mechanisms, all of which are crucial for ensuring information quality and security on the platform.
+
+### 3. Content Credibility
+
+According to the dual-process theory, humans often activate dual cognitive systems when processing and reasoning information: one is a fast, intuitive, and emotional processing mode, while the other is a slow, controlled, and rational thinking mode [3]. The former can process a large amount of information in a short time and quickly make judgments and generate answers, while the latter requires logical reasoning and abstract thinking to form judgments and verify answers. Corresponding to trust mechanisms, intuitive information processing often triggers "formal trust," while controlled information processing forms "substantive trust." Formal trust is more based on the identification generated by the surface characteristics and credible cues of the content. Different credible cues form cognitive shortcuts by activating users' existing cognitive structures and attitude tendencies, further helping users judge the credibility of the content. From the perspective of content attributes, this includes the emotional intensity of the information, the use of multimedia forms in the content, whether other users are mentioned, the reputation of external links/sources included in the information, and the use of specific vocabulary [32]. From the perspective of external cues, this includes social recommendation cues (such as data on reposts, likes, and comments), temporal cues (such as update frequency and timeliness), and reputation cues (such as historical performance, third-party certifications, and integrity records). Formal trust often determines the initial credibility of the content for users, and this initial credibility plays a significant mediating role in the subsequent changes in users' beliefs [33].
+
+Compared to formal trust, substantive trust primarily explores the credibility of the essence of the content. This includes the objectivity and accuracy of the content, the presence or absence of obvious biases or stance tendencies, and the presence or absence of obvious errors or false information [34]. Elements such as the completeness, consistency, and timeliness of the content also determine its essential credibility, while users' own knowledge levels, experiences, and judgment abilities directly influence their perception differences [35]. The formation of substantive trust often requires users to engage in thoughtful analysis and judgment of the content, as well as progressive溯源 (Note: "溯源" is translated as "溯源 reasoning," meaning tracing back to the source for reasoning, to convey the idea of tracing the origin or cause of something for logical deduction) reasoning. It is stronger than formal trust in terms of both the depth and duration of trust. Faced with the prevalent issues of errors and confusion, factual inconsistencies, political biases, information security, and ethical risks in AIGC products, analyzing credible cues at both formal and substantive levels and engaging in progressive reductio ad absurdum are deemed necessary.
+
+Based on the above discussion, this study divides the content credibility of AIGC into two aspects: formal credibility and substantive credibility. Formal credibility is primarily analyzed from the structural and linguistic characteristics of AIGC, including citation norms, response consistency, expression precision, source professionalism, authoritative endorsements, and information noise. Substantive credibility primarily evaluates the essential attributes of the content, including content authenticity, content correctness, content objectivity, information timeliness, information completeness, information consistency, quality robustness, content security, and content fairness.
+
+1
+
+## (II) AIGC Credibility Evaluation Method
+
+For decision-making analysis targeting specific objectives, scholars both domestically and internationally have proposed various subjective or objective evaluation methods. The Analytic Hierarchy Process (AHP), introduced by American operations researcher Professor T.L. Saaty in the 1970s, quickly gained widespread application across various fields for its ability to conduct quantitative analysis of qualitative problems through a simple and flexible multi-criteria decision-making approach. The core idea is to decompose complex decision-making problems into multiple levels and determine the weight coefficients of various factors by constructing a judgment matrix and conducting consistency checks. In this study, an expert team was first established to ensure the consistency and effectiveness of team evaluations. Subsequently, based on the Analytic Hierarchy Process, a hierarchical structure model was constructed. The expert team evaluated the indicators within the credibility evaluation framework established above, constructed a judgment matrix, and conducted consistency checks to calculate and determine the weights of indicators at all levels.
+
+### 1. Expert Team Formation and Training
+
+The expertise and experience of evaluation experts directly influence the accuracy and reliability of annotation results. To address this, this study selected five experts with extensive experience in the fields of artificial intelligence and natural language processing, including two algorithm engineers, two research scholars in related disciplines, and one senior media practitioner. These experts had in-depth experience using AIGC products and were well-versed in understanding various instructional documents. Prior to conducting evaluation annotations, staff members introduced the purpose and background of this study, explained the definitions of each indicator, and provided examples and simulated scenarios to ensure that all experts had a clear understanding of the evaluation methods. To ensure the stability and reliability of the evaluation results, this study employed Cohen's Kappa standard measurement method to conduct consistency checks on expert scores. The results showed a Kappa value of 0.647, with P < 0.001, indicating that the experts' understanding and evaluation of each indicator were generally consistent, allowing for subsequent credibility evaluations. Although the scoring process involved subjective judgments, this study ensured the accuracy and reliability of the evaluation results as much as possible by strictly adhering to the diversity of the expert team and the standardization of the scoring methods.
+
+### 2. Indicator Evaluation and Weight Determination
+
+## (1) Establishment of a Hierarchical Structure Model
+
+Based on the AIGC credibility evaluation framework constructed earlier, the Analytic Hierarchy Process was employed to determine the weights of primary and secondary indicators. First, a hierarchical structure was established, divided into the objective level, criterion level, and sub-criterion level. The objective level is the AIGC credibility evaluation (U), and the criterion level includes three primary indicators: system credibility (U1), media credibility (U2), and content credibility (U3). These three main factors correspond to several sub-factors, forming the sub-criterion level, including nine sub-factors such as training data credibility ($\mathbf{U}_{11}$), algorithm model credibility ($\mathrm{U}_{12}$), and output result credibility ($\mathbf{U}_{13}$). The hierarchical structure results are shown in Table 2.
+
+Table 2 Hierarchical Structure of AIGC Credibility Evaluation System
+Objective Layer | Criterion Layer | Sub-criterion Layer | Indicator Description of Sub-criterion Layer |
AIGC | System Credibility | Training Data Credibility | Dataset Transparency, Source Reliability, Data Robustness, Da |
+1
+ Credibility Assessment U | (U1) | (U11) | ta Coverage, Data Scalability, Noise Removal, and Source Security |
Algorithm Model Credibility $\left(\mathrm{U}_{12}\right)$ | Model Interpretability, Robustness, Stability, Adaptability, Privacy Protection, and Plasticity |
Output Result Credibility (U13) | Accuracy, Error Rate, Precision, Recall, and Completeness of Generated Content |
Media Credibility (U2) | Professional Service Capability $\left(\mathrm{U}_{21}\right)$ | Content Accumulation in Professional Fields, Professional Media Cooperation, Standardized Process Mechanisms |
Social Influence $\left(\mathrm{U}_{22}\right)$ | Platform's Social Awareness, Social Evaluation, Third-Party Certification, and Endorsement |
Neutrality of Stance $\left(\mathrm{U}_{23}\right)$ | Platform's Selflessness, Consistency, Independence, and Profitability |
Construction of Guarantee Mechanisms (U24) | Platform's User Feedback Mechanism, Accountability Mechanism, Privacy Mechanism/Data Security Mechanism |
Content Credibility (U3) | Formal Credibility (U31) | Citation Normativity, Response Consistency, Expression Precision, Source Professionalism, Authoritative Endorsement, Information Noise |
Substantive Credibility (U32) | Content Authenticity, Content Correctness, Content Objectivity, Information Timeliness, Information Completeness, Information Consistency, Quality Robustness, Content Security, Content Fairness |
+
+## (2) Construct the judgment matrix
+
+Based on the comprehensive opinions from expert discussions, indicators and factors at each level are compared pairwise, and a quantitative description is given according to their relative importance to construct a comparison matrix, extending down to the lowest level of the hierarchical structure. In the judgment matrix M, the element $\mathfrak{m}_{\mathrm{ij}}$ represents the relative importance of element i to element j and satisfies the following relationships:
+
+$$\mathrm{M}=\left(\mathrm{m}_{\mathrm{ij}}\right)_{\mathrm{n}\times\mathrm{n}},\quad\mathrm{m}_{\mathrm{ij}}>0,\quad\mathrm{m}_{\mathrm{ji}}=\frac{1}{\mathrm{m}_{\mathrm{ij}}},\quad\mathrm{m}_{\mathrm{ii}}=1,\quad\mathrm{i},\quad\mathrm{j}=1,2,\cdots,\mathrm{n}$$
+
+The larger the value of $\mathfrak{m}_{\mathrm{ij}}$, the higher the relative importance of i. The value of $\mathfrak{m}_{\mathrm{ij}}$ is determined according to the nine-level scale method proposed by Satty, as shown in Table 3.
+
+Table 3 Nine-level Scaling Method
+Relative Importance Level | Definition | Description |
1 | Equally Important | The importance of the two indicators is the same |
+1
+ 3 | Slightly Important | Based on experience or judgment, one indicator is slightly more important |
5 | Quite Important | Based on experience or judgment, one indicator is quite important |
7 | Extremely Important | In practice, one indicator is extremely important |
6 | Absolutely Important | There is sufficient evidence to show that one indicator is absolutely important |
2, 4, 6, 8 | Intermediate Value of Adjacent Judgments | Used when a compromise judgment on indicators is required |
+
+## (3) Weight Calculation and Consistency Check
+
+Based on the judgment matrix constructed above, calculate the relative importance of each factor relative to its upper-level criterion layer, that is, the weights. In this study, the sum-product method is used to normalize the judgment matrix. The obtained W is the maximum eigenvector of matrix M, representing the ranking weights of each factor, and the corresponding eigenvalues are calculated as follows:
+
+$$\lambda_{\max}=\sum_{i=1}^{n}\frac{(MW)_{i}}{nW_{i}}$$
+
+Although a preliminary assessment of the consistency of the experts' judgments has been conducted as mentioned earlier, to avoid contradictory or inconsistent conclusions when experts compare indicators in pairs, a consistency check of the judgment matrix is necessary to ensure the rationality of the indicator weights. Generally, the calculated CR value is used as the basis for judging the consistency of the judgment matrix. When the CR value is less than 0.1, the consistency of the judgment matrix is considered acceptable. Otherwise, the judgment matrix should be appropriately revised to avoid excessive consistency deviations in the calculation results, which could affect the accuracy of the evaluation results. The calculation method for the CR value is as follows:
+
+$$CI=\frac{\lambda_{max}-n}{n-1}$$
+
+$$\mathrm{C R}{=}{\frac{\mathrm{C I}}{\mathrm{R I}}}$$
+
+The aforementioned CI is the consistency index, and RI is the average random consistency index of the judgment matrix, which depends on the order of the judgment matrix. The specific correspondence is shown in Table 4.
+
+Table 4 Average Consistency Index (RI) Value
+n | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
RI | 0 | 0 | 0.58 | 0.90 | 1.12 | 1.24 | 1.32 | 1.41 | 1.46 |
+
+In terms of specific operations, the criterion-level indicator set is defined as $\mathrm{U}=\left\{\mathrm{U}_{1},\mathrm{U}_{2},\mathrm{U}_{3}\right\}$, and the corresponding weight set obtained through Analytic Hierarchy Process is $\mathrm{A}=(\mathrm{a_{1}},\mathrm{a_{2}},\mathrm{a_{3}})$. Here, $\mathrm{a_{i}}$ represents the proportion of $\mathrm{U_{i}}$ relative to the objective level. The sub-criterion-level indicator set is defined as $\mathrm{U_{k}}=\left\{\mathrm{U_{k1}},\mathrm{U_{k2}},\cdots,\mathrm{U_{kn}}\right\}$, and similarly, the corresponding weight set is $\mathrm{A_{k}=\{a_{k1},a_{k2},\cdots,a_{kn}\}}$, where $\mathrm{a_{k n}}$ represents the proportion of $\mathrm{U_{k n}}$ relative to $\mathrm{U_{k}}$. The variable $k$ indicates the criterion level corresponding to the sub-criterion level, and $n$ represents the number of indicators at that level. In this study, based on the discussions and opinions of five experts, judgments were made on the indicators at each level, resulting in the criterion-level judgment matrix, as shown in Table 5.
+
+1
+Table 5 $\mathrm{U{-}U_{k}}$ judgment matrix
+$\mathrm{U{-}U_{k}}$ | $\mathrm{U}_{1}$ | $\mathrm{U}_{2}$ | $\mathrm{U}_{3}$ |
$\mathrm{U}_{1}$ | 1 | 2 | 1/2$ |
$\mathrm{U}_{2}$ | 1/2 | 1 | 1/5$ |
$\mathrm{U}_{3}$ | 2 | 5 | 1 |
+
+The judgment matrix is normalized using the sum-product method, and its eigenvector $W$ is solved as $\left(0.277, 0.128, 0.595\right)^{\mathrm{T}}$, with the maximum eigenvalue being $\lambda_{\max}=3.006$. The consistency check result for the $U-U_{k}$ judgment matrix shows that $\mathrm{CI}=0.003$, $\mathrm{CR}=0.005<0.1$. Therefore, the consistency of this judgment matrix is acceptable. $A=(0.277, 0.128, 0.595)$ corresponds to the weights of $U_1$, $\mathrm{U_{2}}$, and $\mathrm{U_{3}}$ relative to $U$, respectively.
+
+Subsequently, the judgment matrices for the sub-criterion level indicators $\mathrm{U_{11}, U_{12}, U_{13}}$ relative to the criterion level $\mathrm{U}_{1}$ can be obtained, as shown in Table 6; the judgment matrices for the sub-criterion level indicators $\mathrm{U_{21}, U_{22}, U_{23}, U_{24}}$ relative to the criterion level $\mathrm{U}_{2}$ are shown in Table 7; and the judgment matrices for the sub-criterion level indicators $\mathrm{U}_{31}$ and $\mathrm{U}_{32}$ relative to the criterion level $\mathrm{U}_{3}$ are shown in Table 8.
+
+Table 6 $\scriptstyle\mathrm{U_{1}-U_{1i}}$ Judgment Matrix
+$\mathrm{U_{1}\mathrm{-U_{1i}}}$ | $\mathrm{U_{11}}$ | $\mathrm{U}_{12}$ | $\mathrm{U}_{13}$ |
$\mathrm{U_{11}}$ | | $2$ | $1/2$ |
$\mathrm{U_{12}}$ | $1/2$ | $1$ | $1/4$ |
$\mathrm{U_{13}}$ | | $4$ | $1$ |
+
+For the $\mathrm{U_{1}\mathrm{-U_{1i}}}$ judgment matrix, $A_{1}=(0.286, 0.143, 0.571)$, $\lambda_{max}=3.001$, $CI=0.001$, $CR=0.002<0.1$. Therefore, the consistency of this judgment matrix is acceptable.
+
+Table 7 $U_{2}-U_{2i}$ judgment matrix
+$\mathrm{U_{2}\mathrm{-U_{2i}}}$ | $\mathrm{U}_{21}$ | $\mathrm{U}_{22}$ | $\mathrm{U}_{23}$ | $\mathrm{U}_{24}$ |
$\mathrm{U}_{21}$ | 1 | 1/5 | 1/6 | $1/3$ |
$\mathrm{U}_{22}$ | 5 | 1 | 1 | 3 |
$\mathrm{U}_{23}$ | 6 | 1 | 1 | 3 |
$\mathrm{U}_{24}$ | 3 | 1/3 | 1/3 | 1 |
+
+For the $\mathrm{U_{2}\mathrm{-U_{2i}}}$ judgment matrix, $A_{1}=(0.065, 0.384, 0.401, 0.150)$, $\lambda_{\max}=4.037$, $CI=0.012$, $CR=0.013<0.1$. Therefore, the consistency of this judgment matrix is acceptable.
+
+Table 8 $\mathrm{U_{3}\mathrm{-U_{3i}}}$ judgment matrix
+
+1
+$\mathrm{U_{3}\mathrm{-U_{3i}}}$ | $\mathrm{U}_{31}$ | $\mathrm{U}_{32}$ |
$\mathrm{U}_{31}$ | 1 | $1/7$ |
$\mathrm{U}_{32}$ | 7 | 1 |
+
+For $\mathrm{U}_{3}\mathrm{-U}_{3\mathrm{i}}$, the second-order judgment matrix has acceptable consistency, with $A_{1}=(0.125, 0.875)$.
+
+Based on the above results, the overall hierarchical ranking is shown in Table 9 below, thereby identifying the proportions of the sub-criterion layer relative to the criterion layer and relative to the objective layer.
+
+Table 9 Hierarchical Weights of Trustworthy Factors for AIGC
+U | System Credibility $\mathrm{U}_{1}$ | Medium Credibility $\mathrm{U}_{2}$ | Content Credibility $\mathrm{U}_{3}$ | Overall Ranking Weight |
0.277 | 0.128 | 0.595 |
Training Data Credibility $\mathrm{U_{11}}$ | 0.286 | | | 0.079 |
Algorithm Model Credibility $\mathrm{U}_{12}$ | 0.143 | | | 0.040 |
Output Result Credibility $\mathrm{U_{13}}$ | 0.571 | | | 0.158 |
Professional Service Capability $\mathrm{U}_{21}$ | | 0.065 | | 0.008 |
Social Influence $\mathrm{U}_{22}$ | | 0.384 | | 0.049 |
Neutrality of Stance $\mathrm{U}_{23}$ | (missing value corrected to) 0.401 | 0.051 | | Guarantee Mechanism Construction $\mathrm{U}_{24}^{-}$ |
0.150 | | 0.019 | | Formal Credibility $\mathrm{U}_{31}$ |
0.125 | | | 0.074 | Substantive Credibility $\mathrm{U}_{32}$ |
0.875 | | | 0.521 | 0.521 |
+
+According to the results of the weight analysis, for the credibility assessment of AIGC, the degree of influence, from strongest to weakest, is as follows: content factors $\mathbf{U}_{3}$ (weight 0.595) > system factors $\mathbf{U}_{1}$ (weight 0.277) > media factors $\mathbf{U}_{2}$ (weight 0.128). At the sub-criterion level, factors with a stronger degree of influence include: substantive credibility $\mathrm{U}_{32}$ (weight 0.521), output result credibility $\mathrm{U}_{13}$ (weight 0.158), training data credibility $\mathbf{U}_{11}$ (weight 0.079), and formal credibility $\mathrm{U}_{31}$ (weight 0.074). Admittedly, due to differences in media literacy, professional knowledge, operational skills, and other aspects, there is a certain deviation between the importance weighting from the expert perspective and the credibility factors perceived by users. Multi-level exploration of AIGC credibility factors plays a crucial role in preventing communication risks and ensuring content ecosystem security.
+
+1
+
+## IV Results and Discussion: Multi-Layered Credible Cues and Layered Progressive Reduction to Absurdity
+
+## (I) Technical Pain Points: From Knowledge Blind Spots to Nonlinear Emergence
+
+The maturity and plasticity of technical systems are the underlying factors affecting the credibility of AIGC. Research has found that the credibility of output results (weight 0.571) and the credibility of training data (weight 0.286) are significant factors influencing system credibility and, consequently, the credibility of AIGC. For current general-purpose AIGC products, while corpus training in general knowledge domains enhances model universality, it also, to a certain extent, limits the depth of training in specific fields or professional knowledge domains. Moreover, most AIGC products currently have limited capabilities in invoking professional knowledge content, thus creating "knowledge blind spots" in certain fields. Generative AI, based on large language model training and model optimization, demonstrates "nonlinear emergence" capabilities, producing creative outputs that exceed preset rules, resulting in instability in the logic and rationality of AIGC, and even leading to model "hallucinations."
+
+Therefore, to enhance the system credibility of AIGC, continuous optimization is required at multiple levels, including output results, training data, and algorithms. At the training data level, ensuring data quality, diversity, and representativeness is fundamental. High-quality training data can improve the accuracy and reliability of generated content and reduce the likelihood of model bias and erroneous outputs. Therefore, in AIGC product design, it is essential to ensure the quality of training data, emphasize the transparency of data sources, ensure data diversity and coverage, and avoid distortion in generated content due to data bias. To optimize the quality of AIGC product output results, errors can be reduced through real-time verification mechanisms and multiple rounds of iteration, while displaying the sources, reasoning processes, and relevant contextual information of generated content to users. This helps users understand the generation logic of the content and reduces the probability of model "hallucinations," thereby enhancing user trust. Additionally, although the weight of algorithm model credibility is relatively low, improving the transparency and interpretability of algorithms is equally important for building user trust, as it helps users understand the credibility boundaries of generated content.
+
+## (II) Media Differences: From Political Embedding to Capital Manipulation
+
+As a media carrier, the media credibility of AIGC platforms (weight 0.128) significantly influences users' perception of AIGC credibility, and the platform's stance neutrality (weight 0.401) and social influence (weight 0.384) have a significant impact on media credibility. The political bias and interest orientation of the entities operating AIGC platforms can lead to selective bias in data selection and model parameter tuning, thereby affecting the neutrality and comprehensiveness of generated content. This political embedding not only affects the objectivity of information but also,无形中 [Note: This Chinese term "无形中" is translated as "inadvertently" or "subtly" in English, depending on context. Here, "subtly" is used for smoother integration.] (subtly) reinforces specific political stances through algorithms, undermining the foundation of rational dialogue in the public sphere. Furthermore, the training data for generative AI often originates from large-scale datasets on the internet, with open-source datasets often reflecting the values, viewpoints, and ideological tendencies of various social organizations and individuals.
+
+1
+
+The "interference" of the capital market in data sources and algorithm design indirectly affects the value orientation of generated content.
+
+From a comprehensive perspective, as a communication platform, media differences are deeply rooted in the complex interplay between politics and capital. The neutrality of AIGC technology does not inherently guarantee its credibility as a communication platform. Only by strengthening the platform's neutrality and social influence can its credibility be effectively enhanced, providing a practical path for information ecosystem governance. Strengthening platform neutrality requires starting from aspects such as technological transparency and regulatory mechanisms to ensure regular reviews of platform-generated content. Additionally, attention must be paid to the platform's social role. The platform's social influence is not only a reflection of its communication capabilities but also an important tool for shaping public perception, guiding public opinion, and building social trust. Therefore, it is necessary to emphasize the platform's social responsibility, which can be achieved by establishing a social responsibility reporting system where platforms regularly disclose their governance outcomes to the public to enhance public trust.
+
+## (III) Content Cues: From Cognitive Shortcuts to Formal Trust
+
+Compared to technological and media factors, content factors are the primary element influencing users' perceived credibility. Research has found that content credibility (weight coefficient 0.595) has a greater impact on AIGC credibility than system credibility (weight coefficient 0.277) and media credibility (weight coefficient 0.128). Among credible content cues, from an expert perspective, substantive credibility (weight coefficient 0.875) is more important than formal credibility (weight coefficient 0.125). For ordinary users, reliance on explicit cues such as content form often outweighs implicit cues that require inferential judgment. Generated content with standardized citation sources and authoritative endorsements tends to have higher credibility.
+
+During real-time human-computer interaction, users can access a large volume of generated content in a short period. Since rational judgment formed through logical reasoning requires more time and thought, intuitive and experience-based perceptual judgment often dominates. Credible cues at the content level trigger users' existing cognitive frameworks, enhancing their trust perception and forming a cognitive shortcut. Overall, superficial heuristic cues, as a rapid and empiricist mode of thinking, serve as an important basis for users' credibility judgments. The preliminary trust formed by these cues further influences the frequency and depth of subsequent human-computer interactions. This trust model based on surface characteristics often arises in situations where information is incomplete or time is limited, where users form vague perceptions based on simplified rules or heuristic strategies. Although this can reduce cognitive load and decision-making time, it often leads to inaccurate judgments and reinforced biases.
+
+From the perspective of specific credible cues, currently common AIGC products need improvement in terms of source professionalism and authoritative endorsement. Current AIGC products also lack sufficient attention to formal credible cues. In AIGC product design, the presentation of source professionalism, citation norms, and response consistency can be used to improve multi-layered credible cue prompts at the system level, thereby enhancing user trust in AIGC. Combining formal trust guided by superficial cues, substantive trust formed through in-depth reasoning and information tracing can better strengthen user stickiness and enhance human-computer trust at a deeper level.
+
+1
+
+## (IV) Countermeasure Strategies: From Reverse Instructions to Progressive Reduction to Absurdity
+
+Overall, currently, AIGC products have achieved basic availability, but in terms of credibility, multi-dimensional iterations are still needed at the levels of technical systems, media dissemination, and content regulation. In the long term, in addition to the aforementioned aspects, it is essential to continuously improve the credibility mechanism of AIGC by considering data quality, algorithm design, platform stance, social influence, content form, and substantive credibility. In the short term, during human-machine interaction, credible clues can be gradually unearthed through reverse instructions, or a progressive path of reduction to absurdity can be formed, thereby avoiding, to a certain extent, the dissemination of false information and misleading perceptions derived from AIGC.
+
+Specifically, reverse instructions include credible clue mining methods such as reverse information tracing, reverse fact verification, reverse logical verification, and reverse model comparison. Reverse information tracing primarily involves layer-by-layer inquiries into the information sources and references of AIGC, combined with source comparison to determine its credibility. Reverse fact verification is based on the conclusions provided by AI, requiring it to provide verifiable factual information or evidentiary materials, and verifying its credibility by examining factual elements. Reverse logical verification involves judging the logical coherence and consistency of AIGC through multi-layered progressive inquiries, testing its credibility through the logical self-consistency of the context. Reverse model comparison involves comparing and verifying the output content of multiple generative AI models, conducting content reduction to absurdity based on the degree of content consistency and the extent of conflicts and contradictions.
+
+## V. Conclusion
+
+The credibility of AIGC is not only a technical issue but also a communication issue and a social issue. This study explores the hot topic of how to evaluate the credibility of AIGC, establishes a framework for the AIGC credibility evaluation system, clarifies the main influencing factors of AIGC credibility based on the analytic hierarchy process and充分的 (here, "充分的" should be "sufficient" in English, but as it's part of "expert充分论证", I'll translate the whole phrase as "sufficient expert demonstration") expert demonstration, and proposes optimization strategies to enhance AIGC credibility from the aspects of system technology, media platforms, content clues, and user reverse verification. At the theoretical level, this study emphasizes a multi-dimensional credibility evaluation perspective encompassing technology, media, and content, providing theoretical references and analytical tools for subsequent research and expanding the application scope of media credibility theory. At the practical level, it offers references for the optimal design of AIGC products, the construction of user trust, and the governance of the information ecosystem. However, due to limitations such as sample scope and biases caused by subjective judgments, future research needs to conduct further empirical analyses involving larger groups of experts or users to validate the evaluation system, thereby improving the AIGC credibility evaluation system, promoting the healthy development of AIGC technology, and providing more scientific and comprehensive support for the governance of the information ecosystem. References
+
+1
+
+[1] YU P, XIA Z, FEI J, et al. A survey on deepfake video detection[J]. Iet Biometrics, 2021, 10(6): 607-624.
+[2] WHYTE C. Deepfake news: AI-enabled disinformation as a multi-level public policy challenge[J]. Journal of Cyber Policy, 2020, 5(2): 199-217.
+[3] ILLIA L, COLLEONI E, ZYGLIDOPOULOS J. Ethical implications of text generation in the age of artificial intelligence[J]. Business Ethics, the Environment & Responsibility, 2023, 32(1): 201-210.
+[4] WU Y, MOU Y, LI Z, et al. Investigating American and Chinese subjects' explicit and implicit perceptions of AI-generated artistic work[J]. Computers in Human Behavior, 2020, 104: 106186.
+[5] Jiang Zhongbo, Shi Xuemei, Zhang Hongbo. A Study on the Perception of Algorithm News Credibility from the Perspective of Human-Machine Communication: Based on an Analysis of a Controlled Experiment on College Students[J]. Chinese Journal of Journalism & Communication, 2022, 44(3): 34-52.
+[6] Wu Dan, Sun Guoye. Research on the Credibility of Generative Intelligent Search Results[J]. Journal of Library Science in China, 2023, 49(6): 51-67.
+[7] FOGG B J, TSENG H. The elements of computer credibility[C]//Proceedings of the SIGCHI conference on human factors in computing systems. 1999: 80-87.
+[8] Song Shijie, Zhao Yuxiang, Zhu Qinghua. From ELIZA to ChatGPT: Evaluating the Credibility of AI-Generated Content in Human-AI Interaction Experiences[J]. Information and Documentation Services, 2023, 44(04): 35-42.
+[9] Liu Haiming, Li Jiayi. "Believing in a Piece of Code": Cognitive Pathways and Algorithm Trust Construction of ChatGPT-Generated Content[J]. Media Observation, 2024(05): 71-79.
+[10] European Commission. Ethics Guidelines for Trustworthy AI [EB/OL]. https://www.i-programmer.info/programming/artificial-intelligence/12702-ethics-guidelines-for-trustworthy-ai-.html
+[11] OECD Legal Instruments. Recommendation of the Council on Artificial Intelligence [EB/OL]. https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
+[12] SINGH R, VATSA M, RATHA N. Trustworthy AI[C]//Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data. 2021: 449-453.
+[13] FUJII G, HAMADA K, ISHIKAWA F, et al. Guidelines for Quality Assurance of Machine Learning-Based Artificial Intelligence[J]. International Journal of Software Engineering and Knowledge Engineering, 2020, 30(11-12): 1589-1606.
+[14] He Jifeng. Safe and Trustworthy Artificial Intelligence[J]. Information Security and Communications Privacy, 2019(10): 5-8.
+
+1
+
+[15] SCHWEIGER W. Media credibility—experience or image? A survey on the credibility of the World Wide Web in Germany in comparison to other media[J]. European Journal of Communication, 2000, 15(1): 37-59.
+[16] METZGER M J, FLANAGIN A J, EYAL K, et al. Credibility for the 21st century: Integrating perspectives on source, message, and media credibility in the contemporary media environment[J]. Annals of the International Communication Association, 2003, 27(1): 293-335.
+[17] Li Xiaojing. Research on Media Credibility in the Chinese Social Context[M]. Shanghai: Shanghai Jiao Tong University Press, 2019: 148.
+[18] Zhang Hongzhong, Ren Wujiong. Human-machine dialogue beyond the "second self": Exploring trust relationships based on AI large model applications[J]. Journalism Bimonthly, 2024(3): 47-60.
+[19] SUNDAR S S. The MAIN model: A heuristic approach to understanding technology effects on credibility[M]. Cambridge, MA: MacArthur Foundation Digital Media and Learning Initiative, 2008: 73-100.
+[20] FLANAGIN A J, METZGER M J. The role of site features, user attributes, and information verification behaviors on the perceived credibility of web-based information[J]. New Media & Society, 2007, 9(2): 319-342.
+[21] HILLIGOSS B, RIEH S Y. Developing a unifying framework of credibility assessment: Construct, heuristics, and interaction in context[J]. Information Processing & Management, 2008, 44(4): 1467-1484.
+[22] China Quality Daily. China Academy of Information and Communications Technology Releases the "Trustworthy AI" Quality Evaluation System for Artificial Intelligence Datasets[EB/OL]. https://www.cqn.com.cn/zgzlb/content/2024-12/31/content_9085042.htm
+[23] Liu Han, Li Kaixuan, Chen Yixiang. A review of research on trustworthiness measurement and evaluation of artificial intelligence systems[J]. Journal of Software, 2023, 34(8): 3774-3792.
+[24] KAUR D, USLU S, RITTICHIER K J, et al. Trustworthy artificial intelligence: a review[J]. ACM Computing Surveys, 2022, 55(2): 1-38.
+[25] Pan Ji. From reflecting reality to symbolic construction: Information credibility assessment in the network environment[J]. Modern Communication (Journal of Communication University of China), 2018, 40(11): 86-90.
+[26] HOVLAND C I, WEISS W. The influence of source credibility on communication effectiveness[J]. Public Opinion Quarterly, 1951, 15(4): 635-650.
+[27] WALSTER E, ARONSON V, ABRAHAMS D, et al. Importance of physical attractiveness in dating behavior[J]. Journal of Personality and Social Psychology, 1966, 4(5): 508.
+
+1
+
+[28] Zhu Jianhua. Theorization and Localization of Chinese Communication Research: Taking the Integrated Theory of Audience and Media Effects as an Example [J]. Journalism Research, 2001, 68: 1-22.
+[29] GOLAN G J. New perspectives on media credibility research [J]. American Behavioral Scientist, 2010, 54(1): 3-7.
+[30] Li Xiaojing, Liu Yining. How Does Converged Media Promote Political Trust Among Chinese Youth? An Examination Based on a Chain Dual Mediation Model [J]. Journalism & Communication, 2023, (09): 46-57.
+[31] POWELL T E, Boomgaarden H G, DE SWERT K, et al. Framing fast and slow: A dual processing account of multimodal framing effects [J]. Media Psychology, 2019, 22(4): 572-600.
+[32] GUPTA A, KUMARAGURU P, CASTILLO C, et al. Tweetcred: Real-time credibility assessment of content on twitter [C]// International Conference on Social Informatics. 2014: 228-243.
+[33] SLATER M D, ROUNER D. How message evaluation and source attributes may influence credibility assessment and belief change [J]. Journalism & Mass Communication Quarterly, 1996, 73(4): 974-991.
+[34] FAIRBANKS J, FITCH N, KNAUF N, et al. Credibility assessment in the news: do we need to read [C]// Proceedings of the MIS2 Workshop Held in Conjunction with the 11th International Conference on Web Search and Data Mining. 2018: 799-800.
+[35] LEDERMAN R, FAN H, SMITH S, et al. Who can you trust? Credibility assessment in online health forums [J]. Health Policy and Technology, 2014, 3(1): 13-25.
+
+1
+page | header | footer | footnote | page_number |
---|
1 | | | | |
2 | https://link.cnki.net/urlid/CN.20250910.1710.008 | | *Funding Project: Youth Project of the National Natural Science Foundation of China, "Research on Risk Identification and Governance Strategies for AI-Generated Content" (Project Number: 72304290).
+**Yang Yanni, Associate Professor at the School of Literature and Media, China Three Gorges University; Wu Leilong, Master's Student at the School of Literature and Media, China Three Gorges University;
+Xiang Anling, Lecturer at the School of Journalism and Communication, Minzu University of China; Zhang Jiacheng, Postdoctoral Fellow at the School of Journalism and Communication, Tsinghua University. | |
3 | | | | |
4 | | | | |
5 | | | | |
6 | | | | |
7 | | | | |
8 | | | | |
9 | | | | |
10 | | | | |
11 | | | | |
12 | | | | |
13 | | | | |
14 | | | | |
15 | | | | |
16 | | | | |
17 | | | | |
18 | | | | |
19 | | | | |
20 | | | | |
diff --git a/OSPP__toWord/pdf_to_json_to_latex.py b/OSPP__toWord/pdf_to_json_to_latex.py
new file mode 100644
index 0000000000..f1d2251869
--- /dev/null
+++ b/OSPP__toWord/pdf_to_json_to_latex.py
@@ -0,0 +1,380 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import json, os, re
+
+
+def escape_latex(s: str) -> str:
+ """
+ 转义 LaTeX 特殊字符(对普通文本使用)
+ """
+ if not s:
+ return ""
+ return (
+ s.replace("\\", "\\textbackslash{}")
+ .replace("&", "\\&")
+ .replace("%", "\\%")
+ .replace("$", "\\$")
+ .replace("#", "\\#")
+ .replace("_", "\\_")
+ .replace("{", "\\{")
+ .replace("}", "\\}")
+ .replace("~", "\\textasciitilde{}")
+ .replace("^", "\\textasciicircum{}")
+ )
+
+
+def get_image_width_from_md(
+ md_path: str, image_name: str, default_ratio: float = 0.8
+) -> float:
+ """
+ 从 Markdown/HTML 文件中提取
的 width(仅解析百分比,如 10% -> 0.10)。
+ 支持 width="10%"、width=10%、style="width:10%;" 等格式。
+ 返回 0 < ratio <= 1.0,找不到则返回 default_ratio。
+ """
+ if not md_path or not os.path.exists(md_path):
+ return default_ratio
+
+ with open(md_path, "r", encoding="utf-8") as f:
+ content = f.read()
+
+ flags = re.I | re.DOTALL
+
+ # 优先匹配 width 属性
+ pat_width_attr = re.compile(
+ rf'
]*src\s*=\s*["\']?[^"\'>]*{re.escape(image_name)}[^"\'>]*["\']?[^>]*\bwidth\s*=\s*["\']?(\d+)\s*%?["\']?',
+ flags,
+ )
+ m = pat_width_attr.search(content)
+ if m:
+ try:
+ val = int(m.group(1))
+ if val > 0:
+ return min(max(val / 100.0, 0.01), 1.0)
+ except:
+ pass
+
+ return default_ratio
+
+
+def escaped_paragraph_text(s: str) -> str:
+ """
+ 处理 text block:
+ - 普通文本转义
+ - 保留公式原样
+ """
+ paragraphs = re.split(r"\n\s*\n", s)
+ processed_paras = []
+
+ for p in paragraphs:
+ p = p.strip()
+ if not p:
+ continue
+
+ # 临时占位公式
+ placeholders = []
+
+ def placeholder_repl(m):
+ placeholders.append(m.group(0))
+ return f"@@FORMULA{len(placeholders)-1}@@"
+
+ formula_pattern = re.compile(r"(\$\$.*?\$\$|\$.*?\$|\\\[.*?\\\])", re.DOTALL)
+ temp_text = formula_pattern.sub(placeholder_repl, p)
+ temp_text = escape_latex(temp_text)
+
+ # 替换回公式
+ for i, formula in enumerate(placeholders):
+ temp_text = temp_text.replace(f"@@FORMULA{i}@@", formula)
+
+ processed_paras.append("\\par " + temp_text)
+
+ return "\n\n".join(processed_paras) + "\n\n"
+
+
+def generate_image_latex(block, image_base_path, md_base_path) -> str:
+ """
+ 图像/表格处理函数
+ """
+ bbox = block.get("block_bbox", [0, 0, 0, 0])
+ try:
+ x1, y1, x2, y2 = map(int, bbox)
+ except:
+ x1, y1, x2, y2 = 0, 0, 0, 0
+
+ filenames = [
+ f"img_in_chart_box_{x1}_{y1}_{x2}_{y2}.jpg",
+ f"img_in_image_box_{x1}_{y1}_{x2}_{y2}.jpg",
+ block.get("file_name") or block.get("image") or "",
+ ]
+ image_path = None
+ for fname in filenames:
+ if fname:
+ candidate = os.path.join(image_base_path or "", fname)
+ if os.path.exists(candidate):
+ image_path = os.path.abspath(candidate)
+ break
+
+ if not image_path:
+ return f"% [Image not found: {filenames[0]} / {filenames[1]}]\n\n"
+
+ caption_text = escape_latex(block.get("caption", "").strip())
+
+ # 获取宽度
+ width_ratio = 0.8
+ if md_base_path:
+ page = int(block.get("page", 0) or 0)
+ md_candidates = [
+ os.path.join(md_base_path, f"page_{page}.md"),
+ os.path.join(md_base_path, f"document_sample_{page}.md"),
+ os.path.join(md_base_path, f"{page}.md"),
+ ]
+ for mdp in md_candidates:
+ if os.path.exists(mdp):
+ width_ratio = get_image_width_from_md(
+ mdp, os.path.basename(image_path), default_ratio=width_ratio
+ )
+ break
+ width_ratio = max(0.01, min(float(width_ratio), 1.0))
+
+ return (
+ f"\\begin{{figure}}[h]\n"
+ f"\\centering\n"
+ f"\\includegraphics[width={width_ratio:.2f}\\linewidth]{{{image_path}}}\n"
+ f"\\caption*{{{caption_text}}}\n"
+ f"\\end{{figure}}\n\n"
+ )
+
+
+def generate_table_latex(block) -> str:
+
+ from bs4 import BeautifulSoup
+
+ content = block.get("block_content", "")
+ if "{\\raggedright\\arraybackslash}X" for _ in range(col_count)]
+ )
+
+ latex = "\\begin{center}\n\\renewcommand{\\arraystretch}{1.5}\n"
+ latex += f"\\begin{{tabularx}}{{\\textwidth}}{{{col_format}}}\n\\toprule\n"
+ for i, row in enumerate(norm_rows):
+ latex += " & ".join(row) + " \\\\\n"
+ if i == 0:
+ latex += "\\midrule\n"
+ latex += "\\bottomrule\n\\end{tabularx}\n\\end{center}\n\n"
+ return latex
+
+
+def block_to_latex(block: dict, image_base_path: str = None, md_base_path: str = None):
+ """
+ 单个 block 转 LaTeX
+ """
+ label = block.get("block_label", "")
+ raw_content = block.get("block_content", "") or ""
+
+ if label == "doc_title":
+ content = escape_latex(raw_content.strip())
+ return f"\\begin{{center}}\n{{\\Huge {content}}}\\end{{center}}\n\n", None
+
+ if label in ["header", "footer"]:
+ return "", None
+
+ if label == "abstract":
+ content = raw_content.strip()
+ if not content:
+ return "", None
+ return (
+ f"\\begin{{abstract}}\n{escape_latex(content)}\n\\end{{abstract}}\n\n",
+ None,
+ )
+
+ if label == "paragraph_title":
+ content = escape_latex(raw_content.strip())
+ return f"\\section*{{{content}}}\n\n", None
+
+ if label == "reference":
+ lines = [line.strip() for line in raw_content.split("\n") if line.strip()]
+ bibitems = []
+ for line in lines:
+ content = escape_latex(re.sub(r"^\[\d+\]\s*", "", line))
+ key = f"ref{abs(hash(line)) % 100000}"
+ bibitems.append(f"\\bibitem{{{key}}} {content}")
+ return "\n".join(bibitems) + "\n", "\n".join(bibitems) + "\n"
+
+ if label == "text":
+ return escaped_paragraph_text(raw_content), None
+ if label == "content":
+ lines = [line.rstrip() for line in raw_content.splitlines()]
+ latex_lines_content = [
+ escape_latex(line) + " \\\\" for line in lines if line.strip()
+ ]
+ return "\n".join(latex_lines_content) + "\n\n", None
+
+ if label == "formula":
+ return f"\\[\n{raw_content.strip()}\n\\]\n\n", None
+
+ if label == "algorithm":
+ return "\\begin{verbatim}\n" + raw_content + "\n\\end{verbatim}\n\n", None
+
+ if label in ["image", "chart", "seal"]:
+ return generate_image_latex(block, image_base_path, md_base_path), None
+
+ if label == "table":
+ return generate_table_latex(block), None
+
+ if label in ["figure_title", "chart_title", "table_title"]:
+ content = escape_latex(raw_content.strip())
+ if not content:
+ return "", None
+ return f"\\begin{{center}}\n{{\\small {content}}}\\end{{center}}\n\n", None
+
+ return f"% 未知标签 {label}: {escape_latex(raw_content)}\n\n", None
+
+
+def blocks_to_latex(
+ json_path: str, tex_output_path: str, image_base_path: str, md_base_path: str = None
+):
+ if not os.path.exists(json_path):
+ print(f"❌ JSON 文件不存在: {json_path}")
+ return
+
+ with open(json_path, "r", encoding="utf-8") as f:
+ try:
+ blocks = json.load(f)
+ except Exception as e:
+ print("❌ 读取 JSON 失败:", e)
+ return
+
+ # 分页
+ pages = {}
+ for b in blocks:
+ p = int(b.get("page", 0) or 0)
+ pages.setdefault(p, []).append(b)
+
+ # LaTeX 文档头
+ latex_lines = [
+ "\\documentclass[12pt]{article}",
+ "\\usepackage{xeCJK}",
+ "\\usepackage{fontspec}",
+ "\\usepackage{graphicx}",
+ "\\usepackage{amsmath}",
+ "\\usepackage{geometry}",
+ "\\usepackage{fancyhdr}",
+ "\\usepackage{indentfirst}",
+ "\\usepackage{caption}",
+ "\\usepackage{tabularx, booktabs}",
+ "\\usepackage{amssymb}",
+ "\\usepackage{amsfonts}",
+ "\\geometry{a4paper, margin=1in}",
+ "\\setCJKmainfont{Droid Sans Fallback}",
+ "\\setmainfont{DejaVu Serif}",
+ "\\setsansfont{Lato}",
+ "\\setmonofont{Latin Modern Mono}",
+ "\\pagestyle{fancy}",
+ "\\setlength{\\parindent}{2em}",
+ "\\begin{document}\n",
+ ]
+
+ in_bibliography = False
+ pending_references = []
+
+ for page_num in sorted(pages.keys()):
+ page_blocks = sorted(
+ pages[page_num],
+ key=lambda b: (
+ b.get("block_bbox", [0, 0, 0, 0])[1] if b.get("block_bbox") else 0
+ ),
+ )
+ header_blocks = [b for b in page_blocks if b.get("block_label") == "header"]
+ footer_blocks = [b for b in page_blocks if b.get("block_label") == "footer"]
+ page_header = " ".join(b.get("block_content", "") for b in header_blocks)
+ page_footer = " ".join(b.get("block_content", "") for b in footer_blocks)
+
+ latex_lines.append(f"% ==== page {page_num} header/footer ====")
+ latex_lines.append(f"\\fancyhead[L]{{{escape_latex(page_header)}}}")
+ latex_lines.append(f"\\fancyfoot[C]{{{escape_latex(page_footer)}}}\n")
+
+ for block in page_blocks:
+ lbl = block.get("block_label", "")
+ if lbl == "reference_title":
+ if not in_bibliography:
+ latex_lines.append("\\begin{thebibliography}{99}")
+ in_bibliography = True
+ continue
+
+ tex_fragment, bib_item = block_to_latex(
+ block, image_base_path=image_base_path, md_base_path=md_base_path
+ )
+ if lbl == "reference" and bib_item:
+ if in_bibliography:
+ latex_lines.append(bib_item)
+ else:
+ pending_references.append(bib_item)
+ continue
+ if tex_fragment:
+ latex_lines.append(tex_fragment)
+
+ latex_lines.append("\\clearpage\n")
+
+ # 写入 pending references
+ if pending_references:
+ if not in_bibliography:
+ latex_lines.append("\\begin{thebibliography}{99}")
+ latex_lines.extend(pending_references)
+ if not in_bibliography:
+ latex_lines.append("\\end{thebibliography}\n")
+ elif in_bibliography:
+ latex_lines.append("\\end{thebibliography}\n")
+
+ latex_lines.append("\\end{document}")
+ latex_text = "\n".join(latex_lines)
+
+ os.makedirs(os.path.dirname(tex_output_path), exist_ok=True)
+ with open(tex_output_path, "w", encoding="utf-8") as f:
+ f.write(latex_text)
+
+ print(f"✅ LaTeX 文件已保存至: {tex_output_path}")
diff --git a/OSPP__toWord/pdf_to_json_to_latex_test.py b/OSPP__toWord/pdf_to_json_to_latex_test.py
new file mode 100644
index 0000000000..476b1b9534
--- /dev/null
+++ b/OSPP__toWord/pdf_to_json_to_latex_test.py
@@ -0,0 +1,57 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# -*- coding: utf-8 -*-
+import pdf_to_json_to_word as toword
+from paddlex import create_pipeline
+import pdf_to_json_to_latex as tolatex
+
+pipeline = create_pipeline(pipeline="PP-DocTranslation")
+
+input_path = "mypaddle/input/en.pdf"
+output_path = "mypaddle/upgit/output"
+
+# 该部分不需要用到翻译模块,故去掉翻译部分
+
+if input_path.lower().endswith(".md"):
+ ori_md_info_list = pipeline.load_from_markdown(input_path)
+else:
+ visual_predict_res = pipeline.visual_predict(
+ input_path,
+ use_doc_orientation_classify=False,
+ use_doc_unwarping=False,
+ use_common_ocr=True,
+ use_seal_recognition=False,
+ use_table_recognition=True,
+ )
+
+ ori_md_info_list = []
+ # 收集每页解析到的json数据
+ json_list = []
+
+ for res in visual_predict_res:
+ layout_parsing_result = res["layout_parsing_result"]
+ # 每页的json
+ json_data = layout_parsing_result._to_json()
+ json_list.append(json_data)
+
+ layout_parsing_result.markdown
+ layout_parsing_result.save_to_markdown(output_path)
+
+
+merged_json_path = toword.merge_block(json_list, output_path=output_path)
+
+tolatex.blocks_to_latex(
+ merged_json_path, f"{output_path}/output.tex", f"{output_path}/imgs", output_path
+)
diff --git a/OSPP__toWord/pdf_to_json_to_word.py b/OSPP__toWord/pdf_to_json_to_word.py
new file mode 100644
index 0000000000..a66ff6e238
--- /dev/null
+++ b/OSPP__toWord/pdf_to_json_to_word.py
@@ -0,0 +1,320 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from typing import List, Dict
+import json, os, re, copy
+
+TRANSLATABLE_LABELS = {"chart", "image", "seal", "number"}
+SPLIT_TOKEN = "¥$¥"
+
+
+# --- 样式设置 ---
+# 设置段落的字体、字号、加粗、对齐方式和首行缩进
+def set_paragraph_style(
+ para,
+ font_name="Times New Roman",
+ font_size_pt=12,
+ bold=False,
+ indent=False,
+ alignment=None,
+):
+ from docx.oxml.ns import qn
+ from docx.shared import Inches
+ from docx.shared import Pt
+ from docx.enum.text import WD_ALIGN_PARAGRAPH
+
+ run = para.runs[0] if para.runs else para.add_run()
+ run.font.name = font_name
+ run._element.rPr.rFonts.set(qn("w:eastAsia"), "宋体")
+ run.font.size = Pt(font_size_pt)
+ run.bold = bold
+ if alignment is None:
+ alignment = WD_ALIGN_PARAGRAPH.LEFT
+ para.alignment = alignment
+ if indent:
+ para.paragraph_format.first_line_indent = Inches(0.3)
+
+
+# 设置 run 的字体,包括中英文字体和字号
+def set_run_font(
+ run,
+ font_name_en="Times New Roman",
+ font_name_cn="宋体",
+ font_size_pt=10.5,
+ bold=False,
+):
+ from docx.oxml.ns import qn
+ from docx.shared import Pt
+
+ run.font.name = font_name_en
+ run._element.rPr.rFonts.set(qn("w:eastAsia"), font_name_cn)
+ run.font.size = Pt(font_size_pt)
+ run.bold = bold
+
+
+# 清空并设置 section 的页眉或页脚内容,居中显示
+def set_section_part_text(section_part, text):
+ from docx.enum.text import WD_ALIGN_PARAGRAPH
+
+ for _ in range(len(section_part.paragraphs)):
+ p = section_part.paragraphs[0]
+ p._element.getparent().remove(p._element)
+ para = section_part.add_paragraph()
+ para.alignment = WD_ALIGN_PARAGRAPH.CENTER
+ run = para.add_run(text)
+ set_run_font(run)
+
+
+# --- 内容格式 ---
+# 根据块的标签和内容,向文档添加对应格式的段落或标题
+def format_block_style(doc, label, content):
+ from docx.enum.text import WD_ALIGN_PARAGRAPH
+
+ style_map = {
+ "doc_title": {
+ "level": 0,
+ "size": 20,
+ "bold": True,
+ "align": WD_ALIGN_PARAGRAPH.CENTER,
+ },
+ "header": {"size": 16, "bold": True, "align": WD_ALIGN_PARAGRAPH.CENTER},
+ "abstract_title": {
+ "level": 1,
+ "size": 14,
+ "bold": True,
+ "align": WD_ALIGN_PARAGRAPH.CENTER,
+ },
+ "content_title": {
+ "level": 1,
+ "size": 14,
+ "bold": True,
+ "align": WD_ALIGN_PARAGRAPH.LEFT,
+ },
+ "reference_title": {
+ "level": 1,
+ "size": 14,
+ "bold": True,
+ "align": WD_ALIGN_PARAGRAPH.LEFT,
+ },
+ "paragraph_title": {
+ "level": 2,
+ "size": 14,
+ "bold": True,
+ "align": WD_ALIGN_PARAGRAPH.LEFT,
+ },
+ "abstract": {"size": 12, "align": WD_ALIGN_PARAGRAPH.JUSTIFY},
+ "text": {"size": 12, "align": WD_ALIGN_PARAGRAPH.JUSTIFY, "indent": True},
+ "figure_title": {"size": 10, "align": WD_ALIGN_PARAGRAPH.CENTER},
+ "table_title": {"size": 10, "align": WD_ALIGN_PARAGRAPH.CENTER},
+ "chart_title": {"size": 10, "align": WD_ALIGN_PARAGRAPH.CENTER},
+ "reference": {"size": 12, "align": WD_ALIGN_PARAGRAPH.JUSTIFY},
+ "algorithm": {
+ "font": "Courier New",
+ "size": 11,
+ "align": WD_ALIGN_PARAGRAPH.LEFT,
+ },
+ "formula": {"size": 12, "align": WD_ALIGN_PARAGRAPH.CENTER},
+ "vision_footnote": {"size": 9, "align": WD_ALIGN_PARAGRAPH.LEFT},
+ "number": {"size": 9, "align": WD_ALIGN_PARAGRAPH.CENTER},
+ "footer": {"size": 9, "align": WD_ALIGN_PARAGRAPH.CENTER},
+ }
+
+ config = style_map.get(label, {"size": 12, "align": WD_ALIGN_PARAGRAPH.LEFT})
+ para = (
+ doc.add_heading(content, level=config["level"])
+ if "level" in config
+ else doc.add_paragraph(content)
+ )
+ set_paragraph_style(
+ para,
+ font_name=config.get("font", "Times New Roman"),
+ font_size_pt=config["size"],
+ bold=config.get("bold", False),
+ indent=config.get("indent", False),
+ alignment=config["align"],
+ )
+
+
+# --- 表格解析 ---
+# 解析 HTML 表格字符串,提取为二维文本列表
+def parse_html_table(html):
+ from bs4 import BeautifulSoup
+
+ soup = BeautifulSoup(html, "html.parser")
+ return [
+ [cell.get_text(strip=True) for cell in tr.find_all(["td", "th"])]
+ for tr in soup.find_all("tr")
+ ]
+
+
+# --- 排序 ---
+# 根据页码和块的纵坐标,对块进行排序
+def sort_blocks_by_position(blocks):
+ return sorted(blocks, key=lambda b: (b.get("page", 0), b["block_bbox"][1]))
+
+
+# 从 markdown 文件中提取指定图片的宽度比例,默认 1.0
+def get_image_width_from_md(md_path, image_name):
+ with open(md_path, "r", encoding="utf-8") as f:
+ content = f.read()
+ pattern = re.compile(
+ rf'
]*src=["\'].*?{re.escape(image_name)}.*?["\'][^>]*width=["\'](\d+)%["\']',
+ re.I,
+ )
+ match = pattern.search(content)
+ return int(match.group(1)) / 100 if match else 1.0
+
+
+# 向文档插入图片,宽度根据比例缩放,居中显示
+def insert_image(doc, image_path, width_ratio):
+ from docx.enum.text import WD_ALIGN_PARAGRAPH
+ from docx.shared import Inches
+
+ para = doc.add_paragraph()
+ run = para.add_run()
+ run.add_picture(image_path, width=Inches(5.5 * width_ratio))
+ para.alignment = WD_ALIGN_PARAGRAPH.CENTER
+
+
+# --- 主流程函数 ---
+# 从 JSON 读取块列表,按页生成 Word 文档,支持页眉页脚、表格、图片等
+def blocks_to_word(
+ json_path, word_output_path, image_base_path, input_path, output_path
+):
+ from docx import Document
+ from docx.enum.section import WD_SECTION
+
+ with open(json_path, "r", encoding="utf-8") as f:
+ blocks = json.load(f)
+
+ doc = Document()
+ pages = {}
+ for block in blocks:
+ pages.setdefault(block.get("page", 0), []).append(block)
+
+ for page_num in sorted(pages.keys()):
+ page_blocks = sort_blocks_by_position(pages[page_num])
+ section = (
+ doc.add_section(WD_SECTION.NEW_PAGE) if page_num != 0 else doc.sections[0]
+ )
+
+ # 页眉
+ header_blocks = [b for b in page_blocks if b["block_label"] == "header"]
+ if header_blocks:
+ header_text = "\n".join(
+ b["block_content"].strip()
+ for b in header_blocks
+ if b.get("block_content")
+ )
+ section.header.is_linked_to_previous = False
+ set_section_part_text(section.header, header_text)
+
+ # 页脚
+ footer_blocks = [b for b in page_blocks if b["block_label"] == "footer"]
+ if footer_blocks:
+ footer_text = "\n".join(
+ b["block_content"].strip()
+ for b in footer_blocks
+ if b.get("block_content")
+ )
+ section.footer.is_linked_to_previous = False
+ set_section_part_text(section.footer, footer_text)
+
+ for block in page_blocks:
+ label = block["block_label"]
+ content = block.get("block_content", "").strip()
+ if not content and label not in ["chart", "image", "table", "seal"]:
+ continue
+
+ if label in ["chart", "image", "seal"]:
+ bbox = block.get("block_bbox", [0, 0, 0, 0])
+ x1, y1, x2, y2 = map(int, bbox)
+ filename1 = f"img_in_chart_box_{x1}_{y1}_{x2}_{y2}.jpg"
+ filename2 = f"img_in_image_box_{x1}_{y1}_{x2}_{y2}.jpg"
+ image_filename = (
+ filename1
+ if os.path.exists(os.path.join(image_base_path, filename1))
+ else filename2
+ )
+ image_path = os.path.join(image_base_path, image_filename)
+
+ if os.path.exists(image_path):
+ base_name = os.path.splitext(os.path.basename(input_path))[0]
+ md_path = f"{output_path}/{base_name}_{block.get('page')}.md"
+ width = get_image_width_from_md(md_path, image_filename)
+ insert_image(doc, image_path, width)
+ else:
+ doc.add_paragraph(f"[Image {image_filename} not found]")
+ continue
+
+ elif label == "table":
+ rows = (
+ parse_html_table(content)
+ if " str:
+ merged = []
+ for i, item in enumerate(json_list):
+ blocks = copy.deepcopy(item["res"]["parsing_res_list"])
+ for b in blocks:
+ b["page"] = i
+ merged.extend(blocks)
+
+ merged_path = os.path.join(output_path, "merged_translated.json")
+ with open(merged_path, "w", encoding="utf-8") as f:
+ json.dump(merged, f, ensure_ascii=False, indent=4)
+ return merged_path
+
+
+def md2cn_word(json_list: List[Dict], input_path, output_path):
+ # 将所有的json中的block抽取出来,合并成一个json
+ merged_json_path = merge_block(json_list, output_path=output_path)
+
+ # 调用转换成 Word 的函数
+ base_name = os.path.splitext(os.path.basename(input_path))[0]
+
+ blocks_to_word(
+ json_path=merged_json_path,
+ word_output_path=f"{output_path}/{base_name}_toword.docx",
+ image_base_path=f"{output_path}/imgs",
+ input_path=input_path,
+ output_path=output_path,
+ )
diff --git a/OSPP__toWord/pdf_to_json_to_word_test.py b/OSPP__toWord/pdf_to_json_to_word_test.py
new file mode 100644
index 0000000000..96f386be59
--- /dev/null
+++ b/OSPP__toWord/pdf_to_json_to_word_test.py
@@ -0,0 +1,64 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+# -*- coding: utf-8 -*-
+import pdf_to_json_to_word as toword
+import os
+from paddlex import create_pipeline
+
+pipeline = create_pipeline(pipeline="PP-DocTranslation")
+
+input_path = "mypaddle/upgit/document_sample.pdf"
+output_path = "mypaddle/upgit/output"
+
+# 该部分不需要用到翻译模块,故去掉翻译部分
+
+if input_path.lower().endswith(".md"):
+ ori_md_info_list = pipeline.load_from_markdown(input_path)
+else:
+ visual_predict_res = pipeline.visual_predict(
+ input_path,
+ use_doc_orientation_classify=False,
+ use_doc_unwarping=False,
+ use_common_ocr=True,
+ use_seal_recognition=False,
+ use_table_recognition=True,
+ )
+
+ ori_md_info_list = []
+ # 收集每页解析到的json数据
+ json_list = []
+
+ for res in visual_predict_res:
+ layout_parsing_result = res["layout_parsing_result"]
+ # 每页的json
+ json_data = layout_parsing_result._to_json()
+ json_list.append(json_data)
+
+ layout_parsing_result.markdown
+ layout_parsing_result.save_to_markdown(output_path)
+
+# 将所有的json中的block抽取出来,合并成一个只含block的json
+merged_json_path = toword.merge_block(json_list, output_path=output_path)
+
+# 调用转换成 Word 的函数
+base_name = os.path.splitext(os.path.basename(input_path))[0]
+
+toword.blocks_to_word(
+ json_path=merged_json_path,
+ word_output_path=f"{output_path}/{base_name}_toword.docx",
+ image_base_path=f"{output_path}/imgs",
+ input_path=input_path,
+ output_path=output_path,
+)
diff --git a/OSPP__toWord/pdf_to_md_to_latex.py b/OSPP__toWord/pdf_to_md_to_latex.py
new file mode 100644
index 0000000000..971f503a5e
--- /dev/null
+++ b/OSPP__toWord/pdf_to_md_to_latex.py
@@ -0,0 +1,274 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os, re
+
+
+def escape_latex_outside_formula(s: str) -> str:
+ """
+ 转义 LaTeX 特殊字符,但保留公式原样
+ """
+ if not s:
+ return ""
+
+ placeholders = []
+
+ def repl(m):
+ placeholders.append(m.group(0))
+ return f"@@FORMULA{len(placeholders)-1}@@"
+
+ # 提取公式
+ formula_pat = re.compile(
+ r"(\$\$.*?\$\$|\$.*?\$|\\\[.*?\\\]|\\\(.*?\\\))", re.DOTALL
+ )
+ tmp = formula_pat.sub(repl, s)
+
+ # 转义
+ tmp = (
+ tmp.replace("\\", "\\textbackslash{}")
+ .replace("&", "\\&")
+ .replace("%", "\\%")
+ .replace("$", "\\$")
+ .replace("#", "\\#")
+ .replace("_", "\\_")
+ .replace("{", "\\{")
+ .replace("}", "\\}")
+ .replace("~", "\\textasciitilde{}")
+ .replace("^", "\\textasciicircum{}")
+ )
+
+ # 恢复公式
+ for i, f in enumerate(placeholders):
+ tmp = tmp.replace(f"@@FORMULA{i}@@", f)
+ return tmp
+
+
+def get_image_width_from_md_line(line, default_ratio=0.8):
+ """
+ 解析图片 width 属性
+ """
+ m = re.search(r'width\s*=\s*["\']?(\d+)%?["\']?', line)
+ if m:
+ val = int(m.group(1))
+ return max(0.01, min(val / 100.0, 1.0))
+ m2 = re.search(r"width\s*:\s*(\d+)%", line)
+ if m2:
+ val = int(m2.group(1))
+ return max(0.01, min(val / 100.0, 1.0))
+ return default_ratio
+
+
+def process_table_html(content) -> str:
+
+ from bs4 import BeautifulSoup
+
+ """
+ 表格处理
+ """
+ if "{\\raggedright\\arraybackslash}X" for _ in range(col_count)]
+ )
+
+ latex = "\\begin{center}\n\\renewcommand{\\arraystretch}{1.5}\n"
+ latex += f"\\begin{{tabularx}}{{\\textwidth}}{{{col_format}}}\n\\toprule\n"
+ for i, row in enumerate(norm_rows):
+ latex += " & ".join(row) + " \\\\\n"
+ if i == 0:
+ latex += "\\midrule\n"
+ latex += "\\bottomrule\n\\end{tabularx}\n\\end{center}\n\n"
+ return latex
+
+
+def process_paragraph(s: str) -> str:
+ """
+ 处理文本段落,保留公式
+ """
+ paragraphs = re.split(r"\n\s*\n", s)
+ processed_paras = []
+ for p in paragraphs:
+ p = p.strip()
+ if not p:
+ continue
+ processed_paras.append("\\par " + escape_latex_outside_formula(p))
+ return "\n\n".join(processed_paras) + "\n\n"
+
+
+def process_md_line(line: str, output_path) -> str:
+
+ from bs4 import BeautifulSoup
+
+ """
+ 单行处理
+ """
+ line = line.strip()
+ if not line:
+ return ""
+
+ # 标题
+ if line.startswith("##### "):
+ return f"\\paragraph*{{{escape_latex_outside_formula(line[6:].strip())}}}\n\n"
+ if line.startswith("#### "):
+ return (
+ f"\\subsubsection*{{{escape_latex_outside_formula(line[5:].strip())}}}\n\n"
+ )
+ if line.startswith("### "):
+ return f"\\subsection*{{{escape_latex_outside_formula(line[4:].strip())}}}\n\n"
+ if line.startswith("## "):
+ return f"\\section*{{{escape_latex_outside_formula(line[3:].strip())}}}\n\n"
+ if line.startswith("# "):
+ return f"\\section*{{{escape_latex_outside_formula(line[2:].strip())}}}\n\n"
+
+ # 居中 div
+ if "1", md_text) if p.strip()]
+ if not pages:
+ print("❌ 没有有效内容")
+ return
+
+ # 最后一页作为页眉页脚信息
+ meta_info = pages[-1] if "
Dict:
+
+ # HTML 表格头
+ html_lines = [
+ "",
+ "page | header | footer | footnote | page_number |
",
+ ]
+
+ for page_idx, json_blocks in enumerate(json_list, start=1):
+
+ parsing_res_list = json_blocks.get("res", {}).get("parsing_res_list", [])
+
+ head_foot_dict = {"header": "", "footer": "", "footnote": "", "page_number": ""}
+
+ for block in parsing_res_list:
+ label = block.get("block_label", "").lower()
+ content = block.get("block_content", "").strip()
+ if not content:
+ continue
+
+ if label in {"header", "footer", "footnote", "number", "page_number"}:
+ if label == "number":
+ label = "page_number"
+ head_foot_dict[label] = content
+
+ # 构造表格行
+ html_lines.append(
+ f"{page_idx} | "
+ f"{head_foot_dict['header']} | "
+ f"{head_foot_dict['footer']} | "
+ f"{head_foot_dict['footnote']} | "
+ f"{head_foot_dict['page_number']} |
"
+ )
+
+ html_lines.append("
")
+
+ # 将 HTML 行列表拼接成单个字符串
+ html_string = "".join(html_lines)
+
+ result = {
+ "markdown_images": {},
+ "page_index": 0,
+ "input_path": input_path,
+ "markdown_texts": html_string, # 单个字符串
+ "page_continuation_flags": (True, True), # 可根据需要调整
+ }
+ return result
+
+
+def process_md_page(document, md_text, output_path):
+ from bs4 import BeautifulSoup
+
+ """处理单页内容"""
+ lines = md_text.strip().split("\n")
+ for line in lines:
+ line = line.strip()
+ if not line:
+ continue
+
+ title_color = (0, 0, 255)
+ if line.startswith("##### "):
+ p = document.add_paragraph(line[6:])
+ set_paragraph_style(p, bold=True, font_size=10)
+ elif line.startswith("#### "):
+ p = document.add_paragraph(line[5:])
+ set_paragraph_style(p, bold=True, font_size=11)
+ elif line.startswith("### "):
+ p = document.add_paragraph(line[4:])
+ set_paragraph_style(p, bold=True, font_size=12)
+ elif line.startswith("## "):
+ p = document.add_paragraph(line[3:])
+ set_paragraph_style(p, bold=True, font_size=14)
+ elif line.startswith("# "):
+ p = document.add_paragraph(line[2:])
+ set_paragraph_style(p, bold=True, font_size=16)
+
+ # 居中内容处理
+ elif line.startswith("1", md_text) if p.strip()]
+
+ if not pages:
+ print("❌ 没有有效内容")
+ return
+
+ # 最后一页如果包含表格,作为页眉页脚信息
+ meta_info = pages[-1] if "
1",和页眉页脚。如果存在分隔符那么逐页进行 word 的生成,如果存在页眉页脚,则给每页插入页眉页脚。如果都没有,那么直接按照 md 中块的顺序生成 word。
+
+```
+具体调用方式请查看 pdf_to_md_to_word_test.py
+```
+
+## pdf_to_json_to_latex.py
+
+- 作用:pdf 解析得到的内容转 latex
+- 详情:和 pad_to_json_word 的逻辑大体一致,重点需要关注公式/特殊符号的转义,对不同的 block 内容按照块类型进行符合 latex 中语法的编辑
+
+```
+具体调用方式请查看 pdf_to_json_to_latex_test.py
+```
+
+## pdf_to_md_to_latex.py
+
+- 作用 1:pdf 版面解析之后,补充页眉页脚,经过模型翻译成为信息完整 md,并转为 word
+
+- 作用 2:如果直接输入 Md ,也可以直接翻译为 word
+
+- 详情:和 pdf_to_md_to_word 的逻辑大体一致。
+
+```
+具体调用方式请查看 pdf_to_md_to_latex_test.py
+```
diff --git a/paddlex/inference/pipelines/pp_doctranslation/pipeline.py b/paddlex/inference/pipelines/pp_doctranslation/pipeline.py
index d95b100821..ee502bd197 100644
--- a/paddlex/inference/pipelines/pp_doctranslation/pipeline.py
+++ b/paddlex/inference/pipelines/pp_doctranslation/pipeline.py
@@ -355,6 +355,7 @@ def translate(
glossary: Dict = None,
llm_request_interval: float = 0.0,
chat_bot_config: Dict = None,
+ use_flags: bool = False,
**kwargs,
):
"""
@@ -395,7 +396,12 @@ def translate(
and ori_md_info_list[0].get("page_index") is not None
):
# for multi page pdf
- ori_md_info_list = [self.concatenate_markdown_pages(ori_md_info_list)]
+ if use_flags:
+ ori_md_info_list = [
+ self.concatenate_markdown_pages_with_flags(ori_md_info_list)
+ ]
+ else:
+ ori_md_info_list = [self.concatenate_markdown_pages(ori_md_info_list)]
if not isinstance(llm_request_interval, float):
llm_request_interval = float(llm_request_interval)
@@ -477,6 +483,82 @@ def translate_func(text):
}
)
+ # 将下一页的第一句话拼接至上一页, 保证翻译时语义的完整性,并对每页进行分页处理
+
+ def concatenate_markdown_pages_with_flags(self, markdown_list: list) -> tuple:
+
+ markdown_texts = ""
+ previous_page_last_element_paragraph_end_flag = True
+ PAGE_PLACEHOLDER = "1"
+ SENTENCE_ENDINGS = ",,。!?.!?"
+
+ if len(markdown_list) == 0:
+ raise ValueError("The length of markdown_list is zero.")
+
+ for res in markdown_list:
+ # Get the paragraph flags for the current page
+ page_first_element_paragraph_start_flag: bool = res[
+ "page_continuation_flags"
+ ][0]
+ page_last_element_paragraph_end_flag: bool = res["page_continuation_flags"][
+ 1
+ ]
+
+ # Determine whether to add a space or a newline
+ if (
+ not page_first_element_paragraph_start_flag
+ and not previous_page_last_element_paragraph_end_flag
+ ):
+ # 提取下一页的第一个句子
+ first_sentence_match = re.search(
+ rf"^[^{SENTENCE_ENDINGS}]*[{SENTENCE_ENDINGS}]?",
+ res["markdown_texts"],
+ )
+ if first_sentence_match:
+ front_sentence = first_sentence_match.group()
+ remaining_text = res["markdown_texts"][first_sentence_match.end() :]
+ else:
+ front_sentence = res["markdown_texts"]
+ remaining_text = ""
+
+ last_char_of_markdown = markdown_texts[-1] if markdown_texts else ""
+ first_char_of_handler = (
+ res["markdown_texts"][0] if res["markdown_texts"] else ""
+ )
+
+ # Check if the last character and the first character are Chinese characters
+ last_is_chinese_char = (
+ re.match(r"[\u4e00-\u9fff]", last_char_of_markdown)
+ if last_char_of_markdown
+ else False
+ )
+ first_is_chinese_char = (
+ re.match(r"[\u4e00-\u9fff]", first_char_of_handler)
+ if first_char_of_handler
+ else False
+ )
+
+ if not (last_is_chinese_char or first_is_chinese_char):
+ markdown_texts += " " + front_sentence
+ else:
+ markdown_texts += front_sentence
+ markdown_texts += f"\n\n{PAGE_PLACEHOLDER}\n\n" + remaining_text
+ else:
+ markdown_texts += f"\n\n{PAGE_PLACEHOLDER}\n\n" + res["markdown_texts"]
+ # markdown_texts += "\n\n" + res["markdown_texts"]
+ previous_page_last_element_paragraph_end_flag = (
+ page_last_element_paragraph_end_flag
+ )
+
+ concatenate_result = {
+ "input_path": markdown_list[0]["input_path"],
+ "page_index": None,
+ "page_continuation_flags": (True, True),
+ "markdown_texts": markdown_texts,
+ }
+
+ return MarkdownResult(concatenate_result)
+
def concatenate_markdown_pages(self, markdown_list: list) -> tuple:
"""
Concatenate Markdown content from multiple pages into a single document.