feat: parsing directly to a protobuf serialised file #122

jkumz · 2025-08-26T06:13:18Z

feat: parsing directly to a protobuf serialised file

Summary by Sourcery

Introduce Protobuf-based ingestion by defining ingestion/query protocols, generating and integrating a Protobuf schema with Python bindings, and implementing ProtobufFileIngestor for efficient serialization of graph data. Update dependencies, linting, and CI configurations and add tests to verify the new pipeline.

New Features:

Define IngestorProtocol and QueryProtocol interfaces for graph ingestion and querying
Implement ProtobufFileIngestor to stream graph nodes and relationships directly into a Protobuf binary file
Include the Protobuf schema and generated Python bindings (schema_pb2.py and .pyi) for graph data

Enhancements:

Add protobuf and types-protobuf to project dependencies
Exclude generated Protobuf files from linting and type checking
Update mypy configuration (explicit_package_bases and mypy_path)

CI:

Bump pre-commit hook versions and configure exclusions for generated files
Clean up whitespace in GitHub Actions workflow files

Tests:

Add unit tests and an end-to-end integration test for the Protobuf ingestion pipeline

feat: parsing directly to a protobuf serialised file

sourcery-ai · 2025-08-26T06:13:24Z

Reviewer's Guide

This PR integrates a direct protobuf serialization pipeline by defining new ingestion/query protocols, adding the protobuf schema and generated code, implementing a ProtobufFileIngestor that buffers and writes graph nodes/relationships to a .proto binary file, and updating configurations and tests to support the change.

Class diagram for ProtobufFileIngestor and Protocols

classDiagram
    class ProtobufFileIngestor {
        +output_path: Path
        +_nodes: dict[str, pb.Node]
        +_relationships: dict[tuple[str, int, str], pb.Relationship]
        +LABEL_TO_ONEOF_FIELD: dict[str, str]
        +ONEOF_FIELD_TO_LABEL: dict[str, str]
        +__init__(output_path: str)
        +ensure_node_batch(label: str, properties: dict[str, Any])
        +ensure_relationship_batch(from_spec, rel_type, to_spec, properties)
        +flush_all()
        -_get_node_id(label: str, properties: dict)
    }
    class IngestorProtocol {
        <<Protocol>>
        +ensure_node_batch(label: str, properties: dict[str, Any])
        +ensure_relationship_batch(from_spec, rel_type, to_spec, properties)
        +flush_all()
    }
    class QueryProtocol {
        <<Protocol>>
        +fetch_all(query: str, params: dict[str, Any] | None)
        +execute_write(query: str, params: dict[str, Any] | None)
    }
    ProtobufFileIngestor ..|> IngestorProtocol : implements

File-Level Changes

Change	Details	Files
Define service protocols for ingestion and querying	Add IngestorProtocol in services/init.py Add QueryProtocol in services/init.py Annotate methods as runtime-checkable Protocols	`codebase_rag/services/__init__.py`
Update CI and linting configurations for protobuf	Whitespace/style fixes in GitHub workflows Bump pre-commit hooks (ruff v6) and exclude generated pb2 files Add protobuf and types-protobuf deps in pyproject.toml and mypy settings	`.github/workflows/claude-code-review.yml` `.github/workflows/claude.yml` `.pre-commit-config.yaml` `pyproject.toml`
Add protobuf schema and generated code	Introduce schema.proto definition Check in generated Python stubs schema_pb2.py and schema_pb2.pyi Ensure type hints and descriptor imports are set	`codec/schema.proto` `codec/schema_pb2.py` `codec/schema_pb2.pyi`
Implement ProtobufFileIngestor service	Buffer nodes/relationships in dictionaries to ensure idempotency Map dynamic labels to oneof payload fields Serialize GraphCodeIndex to binary file on flush	`codebase_rag/services/protobuf_service.py`
Add tests for serialization and end-to-end pipeline	Unit tests for ProtobufFileIngestor (serialization/deserialization) End-to-end test using GraphUpdater and parsers Validate non-empty, deserializable output with expected node/relationship counts	`codebase_rag/tests/test_protobuf_service.py` `codebase_rag/tests/test_protobuf_end_to_end.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist

Summary of Changes

Hello @jkumz, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new feature that enables the direct serialization of code graph data into a Protobuf binary file. This change enhances data handling efficiency by allowing graph nodes and relationships to be processed and stored in a highly optimized format, suitable for rapid extraction and transmission. It bypasses the traditional graph construction phase, streamlining the overall data pipeline. The update includes the definition of a new Protobuf schema, the implementation of a dedicated ingestor class, and the necessary adjustments to development tooling and dependencies to support this new serialization method.

Highlights

New Protobuf Ingestor: This PR introduces a new ProtobufFileIngestor class, which allows for the direct parsing and serialization of graph nodes and relationships into a compact Protobuf binary file. This approach aims to provide a more efficient method for data extraction and transmission, bypassing the need to construct an in-memory graph first.
Protobuf Schema Definition: A comprehensive Protobuf schema (codec/schema.proto) has been defined. This schema specifies the structure for GraphCodeIndex (the top-level container), generic Node and Relationship messages, and detailed messages for various code entities such as Project, Package, Folder, File, Module, Class, Function, Method, ExternalPackage, ModuleImplementation, and ModuleInterface.
Pre-commit Hook Updates: The pre-commit configuration has been updated to accommodate the new Protobuf-generated files. Specifically, ruff and ruff-format now exclude codec/schema_pb2.py and codec/schema_pb2.pyi from linting, and mypy has been configured to include types-protobuf for proper type checking of Protobuf messages.
Comprehensive Test Coverage: Two new test files (test_protobuf_end_to_end.py and test_protobuf_service.py) have been added. These tests validate the integrity of the Protobuf serialization pipeline, ensuring that the generated binary files are valid, non-empty, and can be correctly deserialized back into the expected data structures.
Dependency Management: The project's dependencies (pyproject.toml and uv.lock) have been updated to include protobuf for Protobuf serialization/deserialization and types-protobuf for type hinting support, ensuring all necessary libraries are available for the new functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey there - I've reviewed your changes - here's some feedback:

Rather than committing generated schema_pb2.py and .pyi stubs, consider automating protobuf compilation in CI to avoid merge conflicts and reduce noise.
In ensure_node_batch, when node_id is empty you're silently skipping creation—consider logging a warning to surface missing primary key data.
Add error handling around flush_all's file write to catch and log any I/O failures during serialization.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Rather than committing generated schema_pb2.py and .pyi stubs, consider automating protobuf compilation in CI to avoid merge conflicts and reduce noise.
- In ensure_node_batch, when node_id is empty you're silently skipping creation—consider logging a warning to surface missing primary key data.
- Add error handling around flush_all's file write to catch and log any I/O failures during serialization.

## Individual Comments

### Comment 1
<location> `codebase_rag/services/protobuf_service.py:74` </location>
<code_context>
+                if value is None:
+                    continue
+                destination_attribute = getattr(payload_message, key)
+                if hasattr(destination_attribute, "extend") and isinstance(value, list):
+                    # This is a repeated field. Use .extend() to populate it.
+                    destination_attribute.extend(value)
</code_context>

<issue_to_address>
Check for tuple values in repeated fields.

Use `isinstance(value, (list, tuple))` to ensure tuples are also handled when extending repeated fields.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
                if hasattr(destination_attribute, "extend") and isinstance(value, list):
                    # This is a repeated field. Use .extend() to populate it.
                    destination_attribute.extend(value)
=======
                if hasattr(destination_attribute, "extend") and isinstance(value, (list, tuple)):
                    # This is a repeated field. Use .extend() to populate it.
                    destination_attribute.extend(value)
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `codebase_rag/services/protobuf_service.py:113` </location>
<code_context>
+            )
+            rel.type = pb.Relationship.RelationshipType.RELATIONSHIP_TYPE_UNSPECIFIED
+
+        rel.source_id = str(from_spec[2])
+        rel.target_id = str(to_spec[2])
+
+        if rel.source_id.strip() == "" or rel.target_id.strip() == "":
</code_context>

<issue_to_address>
Consider validating the structure of from_spec and to_spec.

Currently, accessing from_spec[2] and to_spec[2] without validation may cause IndexError if the tuples are shorter than expected. Please add a check to ensure both have at least three elements before accessing.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        rel.source_id = str(from_spec[2])
        rel.target_id = str(to_spec[2])
=======
        if len(from_spec) < 3 or len(to_spec) < 3:
            logger.warning(
                f"Invalid relationship spec: from_spec={from_spec}, to_spec={to_spec}. Both must have at least 3 elements."
            )
            return

        rel.source_id = str(from_spec[2])
        rel.target_id = str(to_spec[2])
>>>>>>> REPLACE

</suggested_fix>

### Comment 3
<location> `codebase_rag/services/protobuf_service.py:122` </location>
<code_context>
+            )
+            return
+
+        if properties:
+            rel.properties.update(properties)
+
+        unique_key = (rel.source_id, rel.type, rel.target_id)
</code_context>

<issue_to_address>
Consider handling non-dict property values gracefully.

Validate that properties is a dict before calling update, or clarify the expected type in the documentation.

Suggested implementation:

```python
        if properties:
            if isinstance(properties, dict):
                rel.properties.update(properties)
            else:
                logger.warning(
                    f"Expected 'properties' to be a dict, got {type(properties).__name__}. Skipping properties update."
                )

```

```python
            if properties:
                if isinstance(properties, dict):
                    existing_rel.properties.update(properties)
                else:
                    logger.warning(
                        f"Expected 'properties' to be a dict, got {type(properties).__name__}. Skipping properties update."
                    )

```
</issue_to_address>

### Comment 4
<location> `codebase_rag/services/protobuf_service.py:143` </location>
<code_context>
+
+        serialized_data = index.SerializeToString()
+
+        self.output_path.parent.mkdir(parents=True, exist_ok=True)
+        with open(self.output_path, "wb") as f:
+            f.write(serialized_data)
</code_context>

<issue_to_address>
Consider handling file write errors explicitly.

Wrap the file write operation in a try/except block to handle permission or disk errors, and provide a clear error message or log.
</issue_to_address>

### Comment 5
<location> `codebase_rag/tests/test_protobuf_service.py:47` </location>
<code_context>
+def test_protobuf_ingestor_serialization_and_deserialization(tmp_path: Path) -> None:
</code_context>

<issue_to_address>
Missing tests for error and edge cases in ProtobufFileIngestor.

Please add tests for scenarios like unknown labels, missing properties, duplicates, unknown relationship types, missing source/target IDs, and property merging to improve robustness and error handling.

Suggested implementation:

```python
def test_protobuf_ingestor_serialization_and_deserialization(tmp_path: Path) -> None:
    """
    Tests that the ProtobufFileIngestor correctly serializes data and that
    the output can be deserialized back to the original structure.
    """
    output_file = tmp_path / "test_graph.proto.bin"
    ingestor = ProtobufFileIngestor(str(output_file))

    for node_data in SAMPLE_NODES.values():
        ingestor.ensure_node_batch(
            str(node_data["label"]), cast(dict[str, Any], node_data["properties"])
        )

def test_protobuf_ingestor_error_and_edge_cases(tmp_path: Path) -> None:
    """
    Tests error and edge cases for ProtobufFileIngestor:
    - unknown labels
    - missing properties
    - duplicate nodes
    - unknown relationship types
    - missing source/target IDs
    - property merging
    """
    output_file = tmp_path / "edge_cases_graph.proto.bin"
    ingestor = ProtobufFileIngestor(str(output_file))

    # Unknown label
    with pytest.raises(ValueError):
        ingestor.ensure_node_batch("UnknownLabel", {"qualified_name": "foo"})

    # Missing properties
    with pytest.raises(KeyError):
        ingestor.ensure_node_batch("Class", {})  # Assuming 'qualified_name' is required

    # Duplicate nodes (should merge or raise, depending on implementation)
    node_props = {"qualified_name": "test_project.UserService"}
    ingestor.ensure_node_batch("Class", node_props)
    # Try to add duplicate
    ingestor.ensure_node_batch("Class", node_props)  # Should not raise, but should not duplicate

    # Unknown relationship type
    with pytest.raises(ValueError):
        ingestor.ensure_relationship_batch(
            "Class", "qualified_name", "test_project.UserService",
            "UNKNOWN_REL_TYPE",
            "Method", "qualified_name", "test_project.UserService.get_user",
            None
        )

    # Missing source/target IDs
    with pytest.raises(KeyError):
        ingestor.ensure_relationship_batch(
            "Class", "qualified_name", None,
            "DEFINES_METHOD",
            "Method", "qualified_name", "test_project.UserService.get_user",
            None
        )

    with pytest.raises(KeyError):
        ingestor.ensure_relationship_batch(
            "Class", "qualified_name", "test_project.UserService",
            "DEFINES_METHOD",
            "Method", "qualified_name", None,
            None
        )

    # Property merging: add node with new property, ensure merge
    ingestor.ensure_node_batch("Class", {"qualified_name": "test_project.UserService", "docstring": "A docstring"})
    # Should merge 'docstring' into existing node
    node = ingestor._nodes["Class"]["test_project.UserService"]
    assert "docstring" in node
    assert node["docstring"] == "A docstring"

```

- Ensure that `pytest` is imported at the top of the file if not already present:
    `import pytest`
- Adjust exception types (`ValueError`, `KeyError`) to match the actual implementation of `ProtobufFileIngestor` if they differ.
- If `ProtobufFileIngestor` does not raise on duplicates but merges, update the duplicate node test to assert correct merging instead of expecting an exception.
- If the internal structure (`_nodes`) is different, update the property merging assertion accordingly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-08-26T06:14:30Z

codebase_rag/services/protobuf_service.py

+                if hasattr(destination_attribute, "extend") and isinstance(value, list):
+                    # This is a repeated field. Use .extend() to populate it.
+                    destination_attribute.extend(value)


suggestion: Check for tuple values in repeated fields.

Use isinstance(value, (list, tuple)) to ensure tuples are also handled when extending repeated fields.

Suggested change

if hasattr(destination_attribute, "extend") and isinstance(value, list):

# This is a repeated field. Use .extend() to populate it.

destination_attribute.extend(value)

if hasattr(destination_attribute, "extend") and isinstance(value, (list, tuple)):

# This is a repeated field. Use .extend() to populate it.

destination_attribute.extend(value)

@jkumz I am not sure if this makes sense but please double check.

codebase_rag/services/protobuf_service.py

sourcery-ai · 2025-08-26T06:14:30Z

codebase_rag/services/protobuf_service.py

+        if properties:
+            rel.properties.update(properties)


suggestion: Consider handling non-dict property values gracefully.

Validate that properties is a dict before calling update, or clarify the expected type in the documentation.

Suggested implementation:

if properties: if isinstance(properties, dict): rel.properties.update(properties) else: logger.warning( f"Expected 'properties' to be a dict, got {type(properties).__name__}. Skipping properties update." )

if properties: if isinstance(properties, dict): existing_rel.properties.update(properties) else: logger.warning( f"Expected 'properties' to be a dict, got {type(properties).__name__}. Skipping properties update." )

@jkumz address this as well, please.

codebase_rag/services/protobuf_service.py

codebase_rag/tests/test_protobuf_service.py

codebase_rag/services/protobuf_service.py

gemini-code-assist

Code Review

This pull request introduces a significant new feature for ingesting codebase analysis results into a Protobuf serialized file. The implementation is well-structured, with clear separation of concerns, new protocols for ingestors, and comprehensive tests. The Protobuf schema is well-designed.

My review focuses on improving the robustness and maintainability of the new ProtobufFileIngestor. I've identified a critical bug related to populating Struct properties that would cause a runtime error. I've also suggested refactoring the property population logic to use standard library functions for better robustness and simplifying the code. Additionally, there are a couple of smaller suggestions to improve type safety and schema clarity.

Overall, this is a great addition. Addressing these points will make the new ingestion pipeline more reliable and easier to maintain.

codebase_rag/services/protobuf_service.py

gemini-code-assist · 2025-08-26T06:16:28Z

codec/schema.proto

+      Package package = 2;
+      Folder folder = 3;
+      Module module = 4;
+      Class class = 5;


While class is a valid field name in Protobuf, it's a reserved keyword in Python and many other languages. The Protobuf compiler renames it (e.g., to class_), but this can be a source of confusion and potential issues with tooling.

To improve clarity and maintainability, consider renaming this field to something that isn't a keyword, like class_payload. This change would also require updating the LABEL_TO_ONEOF_FIELD mapping in ProtobufFileIngestor.

Suggested change

Class class = 5;

Class class_payload = 5;

@jkumz is this really an issue? If it is safe to leave it defined as 'class', then I am okay with this, and we can resolve this.

gemini-code-assist · 2025-08-26T06:16:28Z

codebase_rag/services/protobuf_service.py

+        self._relationships: dict[tuple[str, int, str], pb.Relationship] = {}
+        logger.info(f"ProtobufFileIngestor initialized to write to: {self.output_path}")
+
+    def _get_node_id(self, label: str, properties: dict) -> str:


The properties argument is missing a type hint. Based on the mypy configuration (disallow_untyped_defs = true), this should be typed. Using dict[str, Any] would be consistent with its usage in ensure_node_batch and improve type safety.

Suggested change

def _get_node_id(self, label: str, properties: dict) -> str:

def _get_node_id(self, label: str, properties: dict[str, Any]) -> str:

@jkumz address this as well, please

gemini-code-assist · 2025-08-26T06:16:28Z

codebase_rag/services/protobuf_service.py

+        for key, value in properties.items():
+            if hasattr(payload_message, key):
+                if value is None:
+                    continue
+                destination_attribute = getattr(payload_message, key)
+                if hasattr(destination_attribute, "extend") and isinstance(value, list):
+                    # This is a repeated field. Use .extend() to populate it.
+                    destination_attribute.extend(value)
+                else:
+                    # This is a scalar field. Use a simple assignment.
+                    setattr(payload_message, key, value)


The current logic for populating protobuf messages is complex and could be brittle. It manually checks for repeated fields and might not handle all edge cases correctly (e.g., non-list values for repeated fields).

Using google.protobuf.json_format.ParseDict would simplify this logic, make it more robust, and align with standard practices for converting dictionaries to protobuf messages. This change would improve maintainability.

You'll need to add from google.protobuf.json_format import ParseDict at the top of the file.

try: # Filter out None values, as ParseDict can error on them for scalar fields. valid_properties = {k: v for k, v in properties.items() if v is not None} ParseDict(valid_properties, payload_message, ignore_unknown_fields=True) except Exception as e: logger.error( f"Failed to parse properties for label '{label}' into protobuf message: {e}\n" f"Properties: {valid_properties}" ) return

@jkumz, please implement this suggestion and mark is as done.

jkumz · 2025-08-26T07:52:13Z

Forgot to mention parsing directly to Protobuf eliminates the indempotency provided via MERGE calls so any downstream consumers will need to deal with this accordingly.

There is also more relationships in the proto than in the graph, also as a result of Memgraph processing; when dumping from Memgraph the ingestion was filtered using MATCH clauses so it was not possible to connect a node to a non existing node, nodes had to exist before MERGING.

GraphUpdater only creates Module nodes for files it finds physically within the repository, so it skips Modules for external libraries, which implicitly filters out relationships in Memgraph ingestion for these nodes that never get created, thereby filtering out relationships that would connect these nodes.

This implicit filtering of relationships isn't in place without parsing into Memgraph.

In my testing I was using it on the Pydantic library and got ±24K relationships with the Protobuf route and ±18K with Memgraph dump to JSON. Mostly there is more IMPORTS and INHERITS relationships to items from external libraries whether their Modules exist in the directory or not.

If someone wants to ingest into Memgraph from the protobuf file in the future and treat it as a way to produce compact snapshots then this isn't an issue as the same indempotency & filtering can be applied, but for external processing it's worth noting there will be some extra work to do with this. I think this is fair enough as it's better to have an excess of info than a lack of info in the case of external processing.

…on/Query protocol

vitali87 · 2025-08-26T09:11:35Z

This is an incredible contribution, @jkumz . I will review this PR as soon as possible.

jkumz · 2025-08-26T09:34:12Z

Thanks! Let me know what needs changed to get it merged and I'll be happy to sort it out.

vitali87 · 2025-09-06T19:53:38Z

Thanks! Let me know what needs changed to get it merged and I'll be happy to sort it out.

While I review this, can you please clear all AI bot comments if they make sense to you? Thanks

vitali87 · 2025-09-09T22:45:32Z

Thanks! Let me know what needs changed to get it merged and I'll be happy to sort it out.

@jkumz Left some comments and suggestions. I think we need one more round of refinement, and we are good to merge.
Thanks again for this awesome work!

vitali87

Please see my comments.

vitali87 · 2025-09-21T21:00:52Z

Hey @jkumz are you going to submit the changes? Just bumping this...

feat: parsing directly to a protobuf serialised file

36a35af

feat: parsing directly to a protobuf serialised file

github-project-automation bot added this to @vitali87's graph code Aug 26, 2025

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

sourcery-ai bot reviewed Aug 26, 2025

View reviewed changes

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

jkumz added 2 commits August 26, 2025 16:03

feat: CLI entry point for indexing directly to protobuf; uses Ingesti…

71e12bc

…on/Query protocol

fix: update binary building to accommodate for protobuf

3046ec9

jkumz added 4 commits September 3, 2025 21:41

feat: parse to separate protobuf files for nodes/rels

a08d29a

feat: update CLI to support split protobuf files

060340a

refactor: add relationship to/from labels to protobuf schema

5f87683

test: update protobuf related tests to account for new schema

a14556a

vitali87 requested changes Sep 10, 2025

View reviewed changes

	def _get_node_id(self, label: str, properties: dict) -> str:
	def _get_node_id(self, label: str, properties: dict[str, Any]) -> str:

feat: parsing directly to a protobuf serialised file #122

Are you sure you want to change the base?

feat: parsing directly to a protobuf serialised file #122

Uh oh!

Conversation

jkumz commented Aug 26, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Class diagram for ProtobufFileIngestor and Protocols

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

vitali87 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

vitali87 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

vitali87 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

vitali87 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

vitali87 Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

jkumz commented Aug 26, 2025

Uh oh!

vitali87 commented Aug 26, 2025

Uh oh!

jkumz commented Aug 26, 2025

Uh oh!

vitali87 commented Sep 6, 2025

Uh oh!

vitali87 commented Sep 9, 2025

Uh oh!

vitali87 left a comment

Choose a reason for hiding this comment

Uh oh!

jkumz commented Aug 26, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Aug 26, 2025 •

edited

Loading