Feasible approach to build a large database

Hello, I am trying to build a database from the NCBI nr FASTA (707338897 entries) for more extensive protein search. I have tried to split up the FASTA into smaller chunks (about 250 entries per run) and combine the result npy files. Larger chunks result in frequent GPU memory issue. I only have access to a 24GB GPU. However, it seems that this will take forever to finish (~ 3 years). I am wondering if there is any method to speed up this process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feasible approach to build a large database #159

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feasible approach to build a large database #159

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions