Skip to content

Feasible approach to build a large database #159

@yzlwk

Description

@yzlwk

Hello, I am trying to build a database from the NCBI nr FASTA (707338897 entries) for more extensive protein search. I have tried to split up the FASTA into smaller chunks (about 250 entries per run) and combine the result npy files. Larger chunks result in frequent GPU memory issue. I only have access to a 24GB GPU. However, it seems that this will take forever to finish (~ 3 years). I am wondering if there is any method to speed up this process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions