Description
- I have checked the existing issues to avoid duplicates
- I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue
Is your feature request related to a problem? Please describe
While having a look at the database, I have found that by using a different foreign key or using a foreign key at all could save more than 10% of disk space.
Describe the solution you'd like
Every table that has a relation to a torrent is linked via the info-hash. By adding an ID field with the data type bigserial
(or bigint
) to the torrents
table and using the new ID as a foreign key in the tables torrent_files
and torrents_torrent_sources
and removing the info_hash
field from the tables (except in the torrents
table of course), I saved more than 12% space in a test database I converted.
table | records | table size before [GB] | table size after [GB] | indexes size before [GB] | indexes size after [GB] | difference [GB] | difference [%] |
---|---|---|---|---|---|---|---|
torrents | 6,292,151 | 0.94 | 0.99 | 0.932 | 1.04 | 0.16 | +8.44 % |
torrent_files | 30,949,330 | 3.9 | 3.45 | 4.82 | 3.86 | -1.41 | -16.17 % |
torrents_torrent_sources | 6,531,913 | 2.58 | 2.55 | 0.782 | 0.363 | -0.45 | -13.36 % |
43,773,394 | 7.42 | 6.99 | 6.534 | 5.263 | -1.70 | -12.19 % |
Of course the info_hash
field should also be replaced in the other tables I didn't mention.
Maybe the info-hash in the field torrent_contents.id
could also be replaced by the ID.
Additional context
Using a small foreign key field and storing the text/data in different table could also be used for other tables that store text that may seem small because it's only a few characters, but by having the same text a few million times adds up quickly.
Some other fields that I haven't tried but which could be replaced with a smallserial
or smallint
foreign key (which only takes up 2 bytes and the maximum value of 32,767 should suffice because the fields have very few unique values):
torrent_hints.content_type
torrent_hints.content_source
torrent_hints.video_resolution
torrent_hints.video_source
torrent_hints.video_codec
torrent_hints.video_3d
torrent_hints.video_modifier
torrent_hints.release_group
- The link between
torrent_sources
andtorrent_torrent_sources
Maybe even torrent_hints.content_id
with a normal integer, because there are more unique values.
I don't know how the bloom filters work, but maybe the unique values of the fields torrents_torrent_sources.bfsd
and torrents_torrent_sources.bfpe
could also be stored in a separate table and only be referenced by a foreign key. In my database the table torrents_torrent_sources
has 6,531,913 records and the two bloom filter fields have only 20,513 and 19,745 unique values, which looks like a lot of duplicates to me.