Skip to content

Database size optimizations #191

Open
@ioqy

Description

@ioqy
  • I have checked the existing issues to avoid duplicates
  • I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue

Is your feature request related to a problem? Please describe

While having a look at the database, I have found that by using a different foreign key or using a foreign key at all could save more than 10% of disk space.

Describe the solution you'd like

Every table that has a relation to a torrent is linked via the info-hash. By adding an ID field with the data type bigserial (or bigint) to the torrents table and using the new ID as a foreign key in the tables torrent_files and torrents_torrent_sources and removing the info_hash field from the tables (except in the torrents table of course), I saved more than 12% space in a test database I converted.

table records table size before [GB] table size after [GB] indexes size before [GB] indexes size after [GB] difference [GB] difference [%]
torrents 6,292,151 0.94 0.99 0.932 1.04 0.16 +8.44 %
torrent_files 30,949,330 3.9 3.45 4.82 3.86 -1.41 -16.17 %
torrents_torrent_sources 6,531,913 2.58 2.55 0.782 0.363 -0.45 -13.36 %
43,773,394 7.42 6.99 6.534 5.263 -1.70 -12.19 %

Of course the info_hash field should also be replaced in the other tables I didn't mention.

Maybe the info-hash in the field torrent_contents.id could also be replaced by the ID.

Additional context

Using a small foreign key field and storing the text/data in different table could also be used for other tables that store text that may seem small because it's only a few characters, but by having the same text a few million times adds up quickly.

Some other fields that I haven't tried but which could be replaced with a smallserial or smallint foreign key (which only takes up 2 bytes and the maximum value of 32,767 should suffice because the fields have very few unique values):

  • torrent_hints.content_type
  • torrent_hints.content_source
  • torrent_hints.video_resolution
  • torrent_hints.video_source
  • torrent_hints.video_codec
  • torrent_hints.video_3d
  • torrent_hints.video_modifier
  • torrent_hints.release_group
  • The link between torrent_sources and torrent_torrent_sources

Maybe even torrent_hints.content_id with a normal integer, because there are more unique values.

I don't know how the bloom filters work, but maybe the unique values of the fields torrents_torrent_sources.bfsd and torrents_torrent_sources.bfpe could also be stored in a separate table and only be referenced by a foreign key. In my database the table torrents_torrent_sources has 6,531,913 records and the two bloom filter fields have only 20,513 and 19,745 unique values, which looks like a lot of duplicates to me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions