Skip to content

Commit 8b3a4f9

Browse files
committed
Merge branch 'release/0.2'
2 parents adcac78 + 963e99b commit 8b3a4f9

37 files changed

+3793
-556
lines changed

.github/ISSUE_TEMPLATE/software_release.md

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,26 +13,33 @@ assignees: ''
1313

1414
### prep release candidate for acceptance testing
1515

16-
- [ ] Update the version number to the appropriate release candidate number (e.g., `0.5rc1`)
16+
- [ ] Update the version number in `src/remarx/__init__.py` to the appropriate release candidate number (e.g., `0.5rc1`)
1717
- [ ] Create a draft PR (since it should not be merged)
1818
- [ ] Review the changelog to make sure that all features, changes, bugfixes, etc included in the release are documented. You may want to review the git revision history to be sure you've captured everything.
1919
- [ ] Review the README to make sure that its contents are up to date
2020
- [ ] Check python requirements for any internal dependencies that should be released (or at least pinned to a specific git commit)
2121
- [ ] Confirm that all checks for the draft PR pass (e.g., unit tests, code coverage checks)
22-
- [ ] Review code documentation to make sure it is up to date.
22+
- [ ] Build documentation on the release branch and run the server to review and make sure it is up to date.
2323

24-
### acceptance testing fails
24+
### BEFORE acceptance testing
25+
26+
- [ ] Review issues included in the release to make sure they have testing instructions before marking them as ready for acceptance testing.
27+
- [ ] Give project team members instructions about how to install the release candidate version.
28+
29+
### IF acceptance testing fails
30+
31+
*These steps are only needed if acceptance testing fails and you need to update and retest the release candidate.*
2532

2633
- [ ] Increment the version number to the next release candidate (e.g., `0.5rc1``0.5rc2`)
2734
- [ ] Address the changes raised in acceptance testing and repeat the previous section.
2835

29-
### acceptance testing passes
36+
### WHEN acceptance testing passes
3037

3138
- [ ] Set the final release version number (e.g., `0.5rc1``0.5`)
3239
- [ ] Use git flow to finish the release (merge release branch into both main and develop, create a tag, remove the release branch, etc.). (`git flow release finish 0.5`)
3340

3441
## after release
3542

36-
- [ ] Increase the develop branch version so it is set to the next expected release (i.e., if you just released `0.5` then develop will probably be `0.6-dev` unless you are working on a major update, in which case it will be `1.0-dev`)
43+
- [ ] Increase the develop branch version so it is set to the next expected release (i.e., if you just released `0.5` then develop will probably be `0.6.dev0` unless you are working on a major update, in which case it will be `1.0.dev0`)
3744
- [ ] Push all updates to GitHub (main branch, develop branch, tags)
3845
- [ ] Create a GitHub release for the new tag, to trigger package publication on PyPI (and eventually Zenodo DOI)

.github/workflows/check.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,5 @@ jobs:
2727
# Note: default deps currently include ruff, but probably should not
2828
- name: Check with ruff
2929
run: uv run ruff check
30+
- name: Check formatting with ruff
31+
run: uv run ruff format --check

.pre-commit-config.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ repos:
2525
additional_dependencies:
2626
- mdformat-mkdocs
2727
- mdformat-frontmatter
28+
- mdformat-pyproject # support configuration in pyproject.toml
2829
args: [--wrap, keep]
2930
# Temporarily exclude docs/api/index.md due because it messes up the formatting
3031
exclude: ^docs/api/index\.md$
@@ -34,6 +35,7 @@ repos:
3435
hooks:
3536
- id: codespell
3637
exclude_types: ["xml"] # ignore sample xml file
38+
args: [--uri-ignore-words-list=archiv] # ignore German "archiv" in URLs
3739
# Common file checks (no dependencies)
3840
- repo: https://github.com/pre-commit/pre-commit-hooks
3941
rev: v4.5.0
@@ -51,3 +53,8 @@ repos:
5153
rev: 0.8.6
5254
hooks:
5355
- id: uv-lock
56+
# validate GitHub Actions workflow files
57+
- repo: https://github.com/mpalmer/action-validator
58+
rev: v0.8.0
59+
hooks:
60+
- id: action-validator

CHANGELOG.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,35 @@
11
# CHANGELOG
22

3+
## 0.2.0
4+
5+
### Application
6+
7+
- The app now consists of two notebooks (Sentence Corpus Builder & Quote Finder)
8+
- Logging is now automatically configured by the application, and the log file location is reported to the user
9+
- Quote Finder notebook now supports quotation detection between two sentence corpus files (original and reuse)
10+
11+
### Documentation
12+
13+
- Add technical design document to MkDocs documentation
14+
15+
### Sentence corpus creation
16+
17+
- Add sentence id field (`sent_id`) to generated sentence corpora
18+
- Processes TEI/XML documents to yield separate chunks for body text and footnotes, with each footnote yielded individually as a separate element
19+
20+
### Quotation detection
21+
22+
- Add a method for generating sentence embeddings from a list of sentences
23+
- Added method for identifying likely quote sentence pairs
24+
25+
### Scripts
26+
27+
- Add `parse_html` script for converting the manifesto html files to plain text for sentence corpus input (one-time use)
28+
29+
### Misc
30+
31+
- Add a utility method (`configure_logging`) to configure logging, supporting logging to a file or to stdout
32+
333
## [0.1.0] - 2025-09-08
434

535
_Initial release._
@@ -26,7 +56,6 @@ _Initial release._
2656

2757
### Misc
2858

29-
- Add GitHub Actions workflow to build and publish python package on PyPI when a new GitHub release
30-
created
59+
- Add GitHub Actions workflow to build and publish python package on PyPI when a new GitHub release created
3160

3261
[0.1.0]: https://github.com/Princeton-CDH/remarx/tree/0.1

DEVELOPERNOTES.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,9 @@ changes should not be included in the changelog unless it is substantial enough
5353
- Via GitHub web interface: Go to your PR → Labels section in right sidebar → Click gear icon → Type "no changelog" and select it
5454
- Via GitHub CLI: Run `gh pr edit --add-label "no changelog"`
5555
56-
1. The changelog check will automatically re-run and pass when the label is applied
56+
2. The changelog check will automatically re-run and pass when the label is applied
5757
58-
1. Remove the label if you later decide the PR does need a changelog entry
58+
3. Remove the label if you later decide the PR does need a changelog entry
5959
6060
## Documentation
6161

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ The app will not close automatically when you close the browser window or tab.
5151
To close the app:
5252

5353
1. Type control+c within the terminal where the `remarx-app` command was run
54-
1. Then, when prompted, type `y` followed by enter.
54+
2. Then, when prompted, type `y` followed by enter.
5555

5656
## Documentation
5757

docs/api/index.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@ hide: [navigation]
55

66
# ::: remarx
77

8+
## ::: remarx.utils
9+
810
## ::: remarx.app
911

10-
## ::: remarx.app_utils
12+
## ::: remarx.app.utils
1113

1214
## ::: remarx.sentence
1315

@@ -26,3 +28,9 @@ hide: [navigation]
2628
#### ::: remarx.sentence.corpus.text_input
2729
options:
2830
members: false
31+
32+
## ::: remarx.quotation
33+
34+
### ::: remarx.quotation.embeddings
35+
36+
### ::: remarx.quotation.pairs
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
# System Architecture Diagram
2+
3+
```mermaid
4+
5+
%%{init: {
6+
"theme": "base",
7+
"themeVariables": { "fontFamily": "Inter, ui-sans-serif, system-ui" },
8+
"flowchart": { "curve": "basis", "nodeSpacing": 40, "rankSpacing": 60 }
9+
}}%%
10+
flowchart LR
11+
classDef source fill:#f3f4f6,stroke:#475569,stroke-width:1px,rx:6,ry:6;
12+
classDef builder fill:#00d5e0,stroke:#0891b2,stroke-width:2px,stroke-dasharray:4 3,rx:14,ry:14,color:#00323a;
13+
classDef process fill:#e6eefc,stroke:#6b8bd6,stroke-width:1.5px,rx:12,ry:12;
14+
classDef corpus fill:#ffffff,stroke:#64748b,stroke-width:1px,rx:8,ry:8;
15+
classDef output fill:#ffffff,stroke:#111827,stroke-width:1.5px,rx:8,ry:8;
16+
classDef invis fill:transparent,stroke:transparent;
17+
18+
19+
B["Communist<br/>Manifesto<br/>(HTML)"]:::source --> E
20+
C["DNZ<br/>Texts<br/>(XML)"]:::source --> E
21+
A["MEGAdigital<br/>Texts<br/>(XML)"]:::source --> E
22+
D["Other<br/>Plaintext<br/>(TXT)"]:::source -.-> E
23+
24+
E["sentence<br/>corpus<br/>builder"]:::builder
25+
E --> F["Original<br/>Sentences<br/>(CSV)"]:::corpus
26+
E --> G["Reuse<br/>Sentences<br/>(CSV)"]:::corpus
27+
28+
F -.-> K
29+
30+
subgraph CP["core pipeline"]
31+
direction LR
32+
I["Sentence-Level<br/>Quote Detection"]:::process
33+
J["Quote<br/>Sentence Pairs<br/>(CSV)"]:::corpus
34+
K["Quote<br/>Compilation"]:::process
35+
L["Quotes<br/>Corpus<br/>(CSV)"]:::output
36+
I --> J --> K --> L
37+
end
38+
style CP fill:#c8f7ff,stroke:#0891b2,stroke-width:2px,rx:18,ry:18;
39+
40+
F --> I
41+
G --> I
42+
43+
G -.-> K
44+
45+
```

docs/images/system-architecture.svg

Lines changed: 1 addition & 0 deletions
Loading

0 commit comments

Comments
 (0)