Skip to content

Make polis CSV exports easily extensible #81

@patcon

Description

@patcon

Re-ticketed from #78 (comment)

Am I summarizing in a way that does this justice? (feel free to edit my text if you have that repo permission! 🙏 🙂 )

Discussion Summary

@nicobao [original comment]

Another thought, Agora provides AI summaries and AI labels generated from a LLM for each cluster. Meaning, instead of "A/B/C" the cluster have actual names, such as here: https://agoracitizen.network/feed/conversation/29LRvQ (click on Analysis and Opinion Groups)

Would it be possible to support an additional param to export these AI labels in the file that export the cluster information? It should be done in a way that is backward compatible with the existing exports (so adding the fields at the end in a column/row that's not being used in an array for example)

In order not to be too specific to Agora, it would be nice to be able to add any key:value that complement information related to the clusters.

@nicobao [paraphrased]: in the export tables that contain cluster information (numeric group-id column of participant-votes.csv table) our implementation has complementary AI-generated string group-labels for each group-id. We'd like to be able to transform the table data to include more columns like this, where key:value offers one-to-one mapping of complementary information like this in the exported CSVs


@patcon [original comment]

Would it be possible to support an additional param to export these AI labels in the file that export the cluster information?

@nicobao do you mean like a 1:1 match of integer label to AI-assigned named labels? I wonder if this best fits in consumer code, as regular CSV manipulation? python ecosystem is very good at csv's even without extra packages, since csv package is part of standard lib.

Here's the pandas version: https://gist.github.com/patcon/baf04d76b688ae30b7532e2cda71822f

Or the csv version: https://gist.github.com/patcon/e244968be5782575c1a70ae8f71a8179

This doesn't feel to me like it belongs in codepaths of the library that exports the polis format 🤔 Do you feel differently?

EDIT: Or alternatively, what if we added helpful functions to support this? Or if we documented these sorts of post-processing steps in example notebooks?

@patcon [summary]: this feels like regular CSV manipulation. I feel this is consumer post-processing. could we instead share example code or include helper functions to support manipulation of export CSVs? pandas example or csv example


@nicobao [original comment]

Yes it's a 1:1 match. There is also an AI summary for each cluster (key). And a global AI summary for the whole conversation:

{
  "metadata": {
    "aiSummary": "People discussed city park plans—some wanted more green space, others more parking. Consensus grew around adding trees without cutting access."
  },
  "clusters": {
    "0": {
      "aiLabel": "Maximalist",
      "aiSummary": "Maximalist embraces excess, complexity, and bold expression—favoring richness over simplicity in art, design, or lifestyle."
    },
    "1": {
      "aiLabel": "Green Space Advocates",
      "aiSummary": "Participants supported expanding green areas to enhance community well-being, biodiversity, and environmental sustainability."
    },
    "2": {
      "aiLabel": "Practical Access Supporters",
      "aiSummary": "This group prioritized parking and accessibility, emphasizing the need for practical infrastructure alongside beautification efforts."
    }
  }
}

I wonder if this best fits in consumer code, as regular CSV manipulation? python ecosystem is very good at csv's even without extra packages, since csv package is part of standard lib.
Here's the pandas version: https://gist.github.com/patcon/baf04d76b688ae30b7532e2cda71822f
Or the csv version: https://gist.github.com/patcon/e244968be5782575c1a70ae8f71a8179
This doesn't feel to me like it belongs in codepaths of the library that exports the polis format 🤔 Do you feel differently?
EDIT: Or alternatively, what if we added helpful functions to support this? Or if we documented these sorts of post-processing steps in example notebooks?

Generally I would have agreed that library consumers do their post-processing later, but I am not here. Why? Because it's far from trivial to know exactly WHERE to store this data WITHOUT breaking backward compatibility for tools that consume pol.is reports, which is one of the core goal of this repo (reproducting pol.is but opening up for new explorative work).

That's why I suggest an easy way, built-in the library, for library consumers to add these custom data.

Like I said we don't need nor should we do anything Agora-specific.

The export function could take these optional params:

  • any key:value associated uniquely with a given cluster key ("0", "1", "2"...etc) so it will write it in an appropriate section of the report without breaking backward compatibility and in a way that will be standard for everyone using reddwarf.
  • any key:value associated more globally with the whole conversation without breaking backward compatibility in a way that will be standard for everyone using reddwarf.

This suggestion seems perfectly in line with the following Goal of red-dwarf in the README: "Re-usable. It should be easily used in contexts in which its original creators did not anticipate, nor perhaps even desire."

as well as the following from polis-community's Vision: "Plurality over uniformity: Welcome diverse visions and use cases rather than enforcing a one-size-fits-all approach."

@nicobao [paraphrased]: I feel it's confusing for library consumers where this information should be stored in the export. It feels easy to break the CSV compatibility for other consumers if adding it manually, which can break compatibility and so against a core goal of the project. i'd like to be able to add custom data columns as part of library. the suggestion is an easy way to add this data. i'd like a standard way to add 1:1 mappings for any entity ID (group-id's, conversation-id's, etc).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions