List comprehensions

Table of Contents

A common pattern

In the code you've written so far, you may have noticed some common patterns. One of these looks something like this:

new_list = []
for element in old_list:
    if some_condition:
        new_list.append(element)

Let's revisit the GC content calculation for a specific example:

def calculate_gc_ratio(sequence):
    gcs = []
    for base in sequence:
        if base in ['G', 'C']:
            gcs.append(base)
    return len(gcs)/len(sequence)

Do you see the pattern?

This is the pattern that is coveniently wrapped up in list comprehensions.

List comprehension syntax

Here is the general pattern from above, transformed to a list comprehension.

new_list = [element for element in old_list if some_condition]

It looks a little backwards, so compare this to the first example and try to match up parts. It says to check if each element in old_list meets some condition. If it does, store it in a list, which is being assigned to new_list. Notice that all of this is happening in between the list notation [ ... ].

Here is the gc function that uses a list comprehension:

def calculate_gc_ratio(sequence):
    gcs = [base for base in sequence if base in ['G', 'C']]
    return len(gcs)/len(sequence)

Below are a few examples of the kind of things that are commonly done with list comprehensions.

Filtering values in a list

calculate_gc_ratio is an example of filtering a list to remove values that don't fit a certain condition (in this case, bases that aren't a "G" or "C").

Here's one more example that filters a list of numbers for even numbers.

numbers = list(range(1, 11))
evens = [num for num in numbers if num % 2 == 0]
print(evens)
[2, 4, 6, 8, 10]

With the old pattern, this would look like this:

numbers = list(range(1, 11))
evens = []
for num in numbers:
    if num % 2 == 0:
        evens.append(num)

Transforming values in a list

Instead of (or in combination with) filtering a list, we can perform a tranformation on all the list values.

As example, we can take a list of percentages and transform them into ratios.

percents = [99, 34, 89, 77, 88]
ratios = [percent/100 for percent in percents]
print(ratios)
[0.99, 0.34, 0.89, 0.77, 0.88]

In this case, we simply divided by one hundred, but you could also apply a function.

Tasks

Cleaning a sequence

Write a function that cleans a sequence by removing gaps and making sure that all letters are capital. Make use of a single list comprehension in your function. The list comprehension should take care of both removing the gaps and capitalizing the base. Write tests (with assert statements at the end of the file or with py.test) to show that your function works as expected.

In [1]: clean_sequence('GGc--AGTT')
GGCAGTT

Released under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

Created with Emacs 24.4.1 (Org mode 8.3beta)

Validate