Skip to content

PC-NSF-HiFiGAN with ability of pitch shifting and extremely wide pitch range

Latest

Choose a tag to compare

This release is a major release of the DiffSinger Community Vocoder Project, with our first public model weight of a brand new vocoder architecture: PC-NSF-HiFiGAN. Main improvements:

  • The HN-NSF module is replaced by the super lightweight MiniNSF, which is much faster for computation and GPU acceleration.
  • By applying a special training paradigm, PC-NSF-HiFiGAN gains the ability to shift pitch while preserving formants (like WORLD vocoder), and still achieves the same level of audio quality as normal NSF-HiFiGAN.
  • An effective while universal augmentation workflow is used to expand the pitch range, pushing the typical upper limit to D#7 (2489.0Hz).

This release is distributed as follows:

  • A pretrained model for inference in DiffSinger repository
  • A pretrained model for fine-tuning in SingingVocoders repository (see release)
  • A packaged OpenUTAU dependency that can be directly installed into OpenUTAU (rename the suffix to .zip and unzip it to get the ONNX model)

Please note: the file and package names of this released model are different from the former release in February, 2024. You may have to edit your configuration files to switch from the old model to the new model.

Overview

Architecture: PC-NSF-HiFiGAN
Training data: ~79h carefully selected singing voice
Training steps: 40k+108k for fine-tuning
Sampling rate: 44100
Number of mel bins: 128
Hop size: 512
Window size: 2048
Mel frequency (input): 40-16000 Hz
Pitch shifting ability: -12 ~ +12 smt.
Pitch range (output): E2 ~ D#7 typical, may shrink with pitch shifting

Notice

Pretrained models are released under the Attribution-NonCommercial-ShareAlike 4.0 International license. Please read the notice in the folder if you want to redistribute these pretrained models.

Special Statements

We regret to publish a verified Registry of Hostile Conduct (shown as below). This registry documents individuals/entities who have engaged in long-term destructive activities against the development team.

We solemnly declare:

  1. Strongly recommend all users review this registry before downloading and using this vocoder
  2. No technical or legal restrictions are currently imposed on listed parties, as the vocoder is still licensed under CC BY-NC-SA 4.0
  3. Reserve the right to apply further restrictions in case of persistent malicious acts

Registry of Hostile Conduct

Name Identifiers Reason
旋转_turning_point QQ: 2673587414;
Bilibili UID: 285801087;
Discord username: colstone233
Engaging in long-term hostile and personal attacks against developers, repeatedly spreading false information about DiffSinger and the development team, and interfering with the development process of the vocoder and other projects in the community