-
Notifications
You must be signed in to change notification settings - Fork 11
Fix norm fn sigs #80
base: master
Are you sure you want to change the base?
Fix norm fn sigs #80
Conversation
col_sum = np.sum(dge_list.counts, axis=0) | ||
assert isinstance(first_pos, np.integer) | ||
tpm_dge = dge_list.tpm(icd) | ||
ensg_gene = icd.pick_gene_id(icd.get_genes_from_symbol(first_gene)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to how I designed the the test_tpm in the first run. Here is the comment by clintval to that:
2276d19#r224543989
That's the reason why I chose a very simple table that is manually calculatable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm a bit confused. Is this in reference to the comments about using random values? This test shouldn't be using anything random; it should be pulling the first entry from the test data, similar to what the RPKM test is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link I posted is not working as I expected! I'll copy the comment here:
clintval: "Not bad, you are dynamically creating an expected result by copying some code from the main application.
This makes it hard to track down bugs with the logic since the logic exists in two places.
Norm. methods should probably be tested with a very small, hand-designed test case so you can explicitly write out all inputs, and all outputs."
This is why I didn't implement the test_tpm function like the test_rpkm function in the end. As a consequence also test_rpkm shouldn't be implemented as is at the moment..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, that makes much more sense. In that case, it may make sense to us apfejes' RPKM function sig and your test function as standards. It seems reasonable but probably worth discussing (unless it's already been discussed before I arrived).
Would it be worth it to possibly create a minimal test data set and run it through edgeR to match the edgeR outputs to ours? That's probably a larger discussion, but known and verified 3rd-party outputs may make sense for test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider too, an additional layer of testing as property-based for these mathematical calculations:
https://hypothesis.readthedocs.io/en/latest/quickstart.html#writing-tests
edgePy/DGEList.py
Outdated
prior_count: float = PRIOR_COUNT, | ||
) -> "DGEList": | ||
"""Return the DGEList normalized to transcripts per million reads. | ||
TPM = countsPerBase * ( 1/sum[countsPerBase]) * 10^6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my commented tpm function you'll see how to do this with latex/sphinx so it's nicer in the docs.
Harmonized TPM and RPKM function signatures.
Both currently require a CanonicalDataStore. I tried to create an adapter, but it quickly became unwieldy, and I couldn't figure out a good way to do it.