Skip to content

Support CoNLL-U Plus #26

@BramVanroy

Description

@BramVanroy

As requested as part of #24

It would be neat to support CoNLL-U Plus:

  • export only the requested fields (and mark the output CoNLL-U with global.columns)
  • allow reading in a CoNLL-U Plus file
  • it also supports custom columns but I am hesitant to support those. Perhaps we can use them, if a custom field is present in the private spaCy registered space ._. then we may use that destination. Will have to think about it some more.

Here is an example of a CoNLL-U Plus file. Note how the first line indicates which fields are present (separated by spaces).

# global.columns = ID FORM LEMMA UPOS XPOS FEATS HEAD DEPREL DEPS MISC
# newdoc id = mf920901-001
# newpar id = mf920901-001-p1
# sent_id = mf920901-001-p1s1A
# text = Slovenská ústava: pro i proti
# text_en = Slovak constitution: pros and cons
1   Slovenská   slovenský   ADJ     AAFS1----1A---- Case=Nom|Degree=Pos|Gender=Fem|Number=Sing|Polarity=Pos 2 amod _ _
2   ústava      ústava      NOUN    NNFS1-----A---- Case=Nom|Gender=Fem|Number=Sing|Polarity=Pos 0 root _ SpaceAfter=No
3   :           :           PUNCT   Z:------------- _          2       punct   _       _
4   pro         pro         ADP     RR--4---------- Case=Acc   2       appos   _       LId=pro-1
5   i           i           CCONJ   J^------------- _          6       cc      _       LId=i-1
6   proti       proti       ADP     RR--3---------- Case=Dat   4       conj    _       LId=proti-1

If you want to see this implemented, please give this post a thumbs up so that I know what to prioritize.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions