Skip to content

prevent offline all system tables #5433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 16, 2025
Merged

Conversation

kevinrr888
Copy link
Member

noticed when working on #5427

@kevinrr888 kevinrr888 added this to the 4.0.0 milestone Mar 27, 2025
@kevinrr888 kevinrr888 self-assigned this Mar 27, 2025
|| tableName.equals(AccumuloTable.ROOT.tableName())) {
throw new AccumuloException("Cannot set table to offline state");
}
NOT_BUILTIN_TABLE.validate(tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, this would have thrown an AccumuloException. Now it throws an IllegalArgumentException. This is user-facing client side behavior. I think IllegalArgumentException is okay here, but I would be curious to know what the 2.1 behavior did before merging this to determine if it's an acceptable change in behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I would be curious to know what the 2.1 behavior did before merging this to determine if it's an acceptable change in behavior.

Could add an IT in 2.1 that attempts to take a system table offline and see what the exception is. Then run the same IT w/ this PR and see what the exception is. See what comes across thrift in both cases

Copy link
Member Author

@kevinrr888 kevinrr888 Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IllegalArgumentException is the same exception and message a user would get if they tried to delete a system table, so assumed it was a fine replacement. Edit: double checked, delete throws an AccumuloException but same message

Tested 2.1... Offlining a system table is an acceptable operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can target 2.1 with these changes

Copy link
Member Author

@kevinrr888 kevinrr888 Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some more digging. The difference in Exceptions is where they are thrown, if validated in thrift (FateServiceHandler) will get the AccumuloException. So can move this validation from TableOperationsImpl to FateServiceHandler, where most of the table validation is done.

As for what ops are/aren't acceptable on system tables, in 2.1, can online/offline any table that's not the ROOT table, can deleteRows of any table that's not the METADATA table, and can bulkImportv2 any table that's not the ROOT table. The other ops like delete, create, etc. ensure that it's not a built in table (which is just META and ROOT in 2.1, I believe)

These were the 3 that stood out as weird to me. Seems like these should be all system tables, not just META or ROOT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be optimized using the AccumuloTable class, and have a contains method on it? To see if it's a built-in system table in the accumulo namespace. I also saw that there was a possibility that the scan ref table could actually be deleted, so it may not be safe to assume these tables are always existing. Since this is just an optimization, to short-circuit a check for built-in tables that should always exist, then it might be okay to just check these metadata tables and do the longer check for everything else.

Copy link
Member Author

@kevinrr888 kevinrr888 Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding something to AccumuloTable seems good to me.

I also saw that there was a possibility that the scan ref table could actually be deleted

I don't believe this can be done:

case TABLE_DELETE: {
TableOperation tableOp = TableOperation.DELETE;
validateArgumentCount(arguments, tableOp, 1);
String tableName =
validateName(arguments.get(0), tableOp, EXISTING_TABLE_NAME.and(NOT_BUILTIN_TABLE));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding something to AccumuloTable seems good to me.

I also saw that there was a possibility that the scan ref table could actually be deleted

I don't believe this can be done:

case TABLE_DELETE: {
TableOperation tableOp = TableOperation.DELETE;
validateArgumentCount(arguments, tableOp, 1);
String tableName =
validateName(arguments.get(0), tableOp, EXISTING_TABLE_NAME.and(NOT_BUILTIN_TABLE));

Okay, that will ensure we can't do it in our API. Where I saw it, TableManager was being called directly, to delete its ZooKeeper contents. I guess that's not an issue from the public API or Fate RPC because of what you linked.

Copy link
Member Author

@kevinrr888 kevinrr888 Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Yeah the concern is with the public API.

I'm still not entirely sure what should be done for this. Should probably make changes to both 2.1 and 4.0. The main concerns are:

  1. What ops are/aren't acceptable on system tables.

in 2.1, can online/offline any table that's not the ROOT table, can deleteRows of any table that's not the METADATA table, and can bulkImportv2 any table that's not the ROOT table... Seems like these should be all system tables, not just META or ROOT.

I'm wondering if it would be better to move all validation for the table names (e.g., ensuring it's not a system table, ensuring the table exists, doesn't exists, etc.) into TableOperationsImpl. Seems like some things are validated in TableOperationsImpl and some are validated in FateServiceHandler or both, which is a bit confusing. If we always validate in TableOperationsImpl we can avoid the RPC.

If FateServiceHandler.executeFateOperation is called somewhere else other than table operations, this may not work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate discussion, I had mentioned to you directly @ctubbsii in regards to

in 2.1, can online/offline any table that's not the ROOT table, can deleteRows of any table that's not the METADATA table, and can bulkImportv2 any table that's not the ROOT table.

That I was wrong about offline here, but I was thinking of deleteRows. So here is the corrected statement:

in 2.1, can online/offline any table that's not the ROOT table, can deleteRows of any table that's not a METADATA table (METADATA, ROOT) (so this is correct in 2.1), and can bulkImportV2 to any table that's not the ROOT table

Since you shouldn't be able to offline any system table through the table ops API, will change this PR to target 2.1.

This PR only pertains to offline so discussion around the other ops will be pushed to other tickets/PRs most likely as discussion in the PR I make for #5427

@kevinrr888 kevinrr888 changed the base branch from main to 2.1 April 15, 2025 19:19
Previously, could offline any table that was not ROOT. This commit
prevents offline of any system table
@kevinrr888
Copy link
Member Author

kevinrr888 commented Apr 15, 2025

I've updated this PR to target 2.1, as could offline METADATA in 2.1. AccumuloException is thrown with the message: Table must not be in the 'accumulo' namespace
(Validation is identical to delete)

Discussion around other potential issues with table operations has been pushed to #5474

@kevinrr888 kevinrr888 requested a review from ctubbsii April 15, 2025 19:30
Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the semantics of the RPC arguments in a bugfix release. It also too aggressively prevents all built-in tables from going offline/online. This prevents the correct behavior of the replication table in 2.1, which by default is offline, but can be turned online to enable replication. While this feature is deprecated and is removed in 3.0 and later, this completely breaks it in 2.1, since it is a built-in table that must be able to be turned online/offline on demand.

@@ -1463,7 +1463,7 @@ public void offline(String tableName, boolean wait)
EXISTING_TABLE_NAME.validate(tableName);

TableId tableId = context.getTableId(tableName);
List<ByteBuffer> args = Arrays.asList(ByteBuffer.wrap(tableId.canonical().getBytes(UTF_8)));
List<ByteBuffer> args = Arrays.asList(ByteBuffer.wrap(tableName.getBytes(UTF_8)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the semantics of the RPC argument from that of a tableId to that of a tableName. This makes this RPC incompatible across a bugfix release, preventing rolling upgrades, which is something we generally support for bugfix releases.

Comment on lines 420 to 421
final TableId tableId =
ClientServiceHandler.checkTableId(manager.getContext(), tableName, tableOp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tableId was already resolved on the client side. Resolving it a second time here, makes it possible for the client side to refer to a different table than that of the server side if some table renames/creates happened concurrently. It also creates an extra lookup that shouldn't be necessary.

The current code using the table ID was fine, but just needed to verify that the TableId didn't refer to the metadata table. Other builtin tables may still be available to go online/offline.

@ctubbsii ctubbsii modified the milestones: 4.0.0, 2.1.4 Apr 15, 2025
@kevinrr888
Copy link
Member Author

@ctubbsii - Thanks for the info. I was unaware of the replication table - thought the only built in tables in 2.1 were ROOT and METADATA.

I avoid any RPC changes now, and added a new Validator.

Thinking when this is merged into 4.0 will have to change NOT_ROOT_TABLE_ID.and(NOT_METADATA_TABLE_ID) for online/offline to use a new Validator<TableId> NOT_BUILTIN_TABLE_ID (since there is no replication table in 4.0, and we want to avoid offlining ROOT, METADATA, FATE, and SCAN_REF). Let me know if this approach sounds good.

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct to me, and I think changing to prevent all of the tables in the accumulo namespace when merging to main is sensible.

@kevinrr888 kevinrr888 merged commit 8b6a49a into apache:2.1 Apr 16, 2025
8 checks passed
@kevinrr888 kevinrr888 deleted the fixOfflineTableBug branch April 16, 2025 20:41
ddanielr pushed a commit to keith-turner/accumulo that referenced this pull request Apr 24, 2025
In 2.1:
* Previously, could offline any table that was not ROOT.
* This commit prevents offlining all system tables except for the replication table (this includes METADATA and ROOT). Does not prevent offlining all system tables in 2.1, as offlining the replication table is a desired feature.

In main/4.x:
* Previously, could offline any table that was not ROOT or METADATA.
* This commit prevents offlining all system tables (this currently includes METADATA, ROOT, FATE, and SCAN_REF). The replication table is removed in 4.x, and we want to prevent offlining any of the currently existing system tables in 4.x.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants