Skip to content

find-fix-paths does not uniq result #686

@TobiasNx

Description

@TobiasNx
inputFile
| open-file
| as-lines
| decode-pica
| find-fix-paths(".*DNB.*")
| print
;
007G.a	|	DNB
007G.a	|	DNB
047I.d	|	DNB

007G.a is a duplicate line. (See Playground) I would prefer if the results would be unique.

Expected would be then:

007G.a	|	DNB
047I.d	|	DNB

The flux-command find-fix-paths is based on:

| flatten
| stream-to-triples
| filter-triples(objectPattern=".*DNB.*")
| template("${p} | ${o}")

This can create duplicate results, if the pattern appears in multiple records and in the same elements with the same fix paths.
Therefore I suggest that we should add a sort-triples and a filter-duplicate-objects

| stream-to-triples
| filter-triples(objectPattern=".*DNB.*")
| sort-triples(by="Predicate")
| template("${p} | ${o}")
| filter-duplicate-objects

See Playground for the expected result.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Review

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions