Skip to content
This repository was archived by the owner on Jan 23, 2025. It is now read-only.

Fix norm fn sigs #80

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

zbohannan
Copy link

Harmonized TPM and RPKM function signatures.

Both currently require a CanonicalDataStore. I tried to create an adapter, but it quickly became unwieldy, and I couldn't figure out a good way to do it.

col_sum = np.sum(dge_list.counts, axis=0)
assert isinstance(first_pos, np.integer)
tpm_dge = dge_list.tpm(icd)
ensg_gene = icd.pick_gene_id(icd.get_genes_from_symbol(first_gene))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to how I designed the the test_tpm in the first run. Here is the comment by clintval to that:
2276d19#r224543989

That's the reason why I chose a very simple table that is manually calculatable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm a bit confused. Is this in reference to the comments about using random values? This test shouldn't be using anything random; it should be pulling the first entry from the test data, similar to what the RPKM test is doing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link I posted is not working as I expected! I'll copy the comment here:
clintval: "Not bad, you are dynamically creating an expected result by copying some code from the main application.

This makes it hard to track down bugs with the logic since the logic exists in two places.

Norm. methods should probably be tested with a very small, hand-designed test case so you can explicitly write out all inputs, and all outputs."

This is why I didn't implement the test_tpm function like the test_rpkm function in the end. As a consequence also test_rpkm shouldn't be implemented as is at the moment..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that makes much more sense. In that case, it may make sense to us apfejes' RPKM function sig and your test function as standards. It seems reasonable but probably worth discussing (unless it's already been discussed before I arrived).

Would it be worth it to possibly create a minimal test data set and run it through edgeR to match the edgeR outputs to ours? That's probably a larger discussion, but known and verified 3rd-party outputs may make sense for test cases.

Copy link
Member

@clintval clintval Dec 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider too, an additional layer of testing as property-based for these mathematical calculations:

https://hypothesis.readthedocs.io/en/latest/quickstart.html#writing-tests

prior_count: float = PRIOR_COUNT,
) -> "DGEList":
"""Return the DGEList normalized to transcripts per million reads.
TPM = countsPerBase * ( 1/sum[countsPerBase]) * 10^6
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my commented tpm function you'll see how to do this with latex/sphinx so it's nicer in the docs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants