Make polis CSV exports easily extensible

Re-ticketed from https://github.com/polis-community/red-dwarf/pull/78#issuecomment-3036985621

Am I summarizing in a way that does this justice? (feel free to edit my text if you have that repo permission! 🙏 🙂 )

## Discussion Summary

<details>
<summary>@nicobao [original comment]</summary>

> Another thought, Agora provides AI summaries and AI labels generated from a LLM for each cluster. Meaning, instead of "A/B/C" the cluster have actual names, such as here: https://agoracitizen.network/feed/conversation/29LRvQ (click on Analysis and Opinion Groups)
> 
> Would it be possible to support an additional param to export these AI labels in the file that export the cluster information? It should be done in a way that is backward compatible with the existing exports (so adding the fields at the end in a column/row that's not being used in an array for example)
>
> In order not to be too specific to Agora, it would be nice to be able to add any key:value that complement information related to the clusters.
</details>

> @nicobao [paraphrased]: in the export tables that contain cluster information (numeric `group-id` column of `participant-votes.csv` table) our implementation has complementary AI-generated string `group-labels` for each `group-id`. We'd like to be able to transform the table data to include more columns like this, where key:value offers one-to-one mapping of complementary information like this in the exported CSVs

---

<details>
<summary>@patcon [original comment]</summary>

> > Would it be possible to support an additional param to export these AI labels in the file that export the cluster information?
> 
> @nicobao do you mean like a 1:1 match of integer label to AI-assigned named labels? I wonder if this best fits in consumer code, as regular CSV manipulation? python ecosystem is very good at csv's even without extra packages, since `csv` package is part of standard lib.
> 
> Here's the `pandas` version: https://gist.github.com/patcon/baf04d76b688ae30b7532e2cda71822f
> 
> Or the `csv` version: https://gist.github.com/patcon/e244968be5782575c1a70ae8f71a8179
> 
> This doesn't feel to me like it belongs in codepaths of the library that exports the polis format 🤔 Do you feel differently?
> 
> EDIT: Or alternatively, what if we added helpful functions to support this? Or if we documented these sorts of post-processing steps in example notebooks?
</details>

> @patcon [summary]: this feels like regular CSV manipulation. I feel this is consumer post-processing. could we instead share example code or include helper functions to support manipulation of export CSVs? [`pandas` example](https://gist.github.com/patcon/baf04d76b688ae30b7532e2cda71822f) or [`csv` example](https://gist.github.com/patcon/e244968be5782575c1a70ae8f71a8179)

---

<details>
<summary>@nicobao [original comment]</summary>

> Yes it's a 1:1 match. There is also an AI summary for each cluster (key). And a global AI summary for the whole conversation:
> 
> ```json
> {
>   "metadata": {
>     "aiSummary": "People discussed city park plans—some wanted more green space, others more parking. Consensus grew around adding trees without cutting access."
>   },
>   "clusters": {
>     "0": {
>       "aiLabel": "Maximalist",
>       "aiSummary": "Maximalist embraces excess, complexity, and bold expression—favoring richness over simplicity in art, design, or lifestyle."
>     },
>     "1": {
>       "aiLabel": "Green Space Advocates",
>       "aiSummary": "Participants supported expanding green areas to enhance community well-being, biodiversity, and environmental sustainability."
>     },
>     "2": {
>       "aiLabel": "Practical Access Supporters",
>       "aiSummary": "This group prioritized parking and accessibility, emphasizing the need for practical infrastructure alongside beautification efforts."
>     }
>   }
> }
> ```
> 
> > I wonder if this best fits in consumer code, as regular CSV manipulation? python ecosystem is very good at csv's even without extra packages, since `csv` package is part of standard lib.
> > Here's the `pandas` version: https://gist.github.com/patcon/baf04d76b688ae30b7532e2cda71822f
> > Or the `csv` version: https://gist.github.com/patcon/e244968be5782575c1a70ae8f71a8179
> > This doesn't feel to me like it belongs in codepaths of the library that exports the polis format 🤔 Do you feel differently?
> > EDIT: Or alternatively, what if we added helpful functions to support this? Or if we documented these sorts of post-processing steps in example notebooks?
> 
> Generally I would have agreed that library consumers do their post-processing later, but I am not here. Why? Because it's far from trivial to know exactly WHERE to store this data WITHOUT breaking backward compatibility for tools that consume pol.is reports, which is one of the core goal of this repo (reproducting pol.is but opening up for new explorative work).
> 
> That's why I suggest an easy way, built-in the library, for library consumers to add these custom data.
> 
> Like I said we don't need nor should we do anything Agora-specific.
> 
> The export function could take these optional params:
> 
> * any key:value associated uniquely with a given cluster key ("0", "1", "2"...etc) so it will write it in an appropriate section of the report _without breaking backward compatibility_ and in a way that will be _standard for everyone using reddwarf_.
> * any key:value associated more globally with the whole conversation _without breaking backward compatibility_ in a way that will be _standard for everyone using reddwarf_.
> 
> This suggestion seems perfectly in line with the following [Goal](https://github.com/polis-community/red-dwarf?tab=readme-ov-file#goals) of red-dwarf in the README: _"Re-usable. It should be easily used in contexts in which its original creators did not anticipate, nor perhaps even desire."_
> 
> as well as the following from `polis-community`'s [Vision](https://github.com/polis-community#vision): _"Plurality over uniformity: Welcome diverse visions and use cases rather than enforcing a one-size-fits-all approach."_
</details>

> @nicobao [paraphrased]: I feel it's confusing for library consumers where this information should be stored in the export. It feels easy to break the CSV compatibility for other consumers if adding it manually, which can break compatibility and so against a core goal of the project. i'd like to be able to add custom data columns as part of library. the suggestion is an easy way to add this data. i'd like a standard way to add 1:1 mappings for any entity ID (group-id's, conversation-id's, etc).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make polis CSV exports easily extensible #81

Discussion Summary

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make polis CSV exports easily extensible #81

Description

Discussion Summary

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions