Nucleotide and protein BLAST analysis for FTO—a (much) better way

Double helix - G-78766236 Basic Local Alignment Search Tool (BLAST) is a mainstay among scientists working with protein, DNA and RNA sequences. BitesizeBio even says it is “arguably the most heavily used tool for sequence analysis”—no surprise considering how prominent large molecules are becoming across diagnostics, biotech and agriculture. BLAST lets you find sequences with letters in an order that aligns closely with your search—this is important for freedom to operate (FTO) and novelty searches, especially considering some scientists generate new sequences on a near-daily basis.

But the way we run sequence-based FTO and novelty searches today is similar to what I imagine the world’s worst game of hopscotch would be like—jumping barefoot from box to box, each filled with needles.

Most of us have to search through sequence files provided by different patent offices, before extracting and analysing relevant results in tools built for academic research (such as NCBI BLAST). Or we start our search in free patent search tools (which have dangerously poor data coverage). Then we save similar sequences, and the patents in which they appear, on local machines before sharing with colleagues via email (or other business communication tools). Throughout this process, we’ll hop between sequence analysis and patent search tools.

On top of this, when companies rely solely on sequence files from patent offices or free tools, they miss out on the full picture:

  • Not all patent offices include sequence files—and those that do often use different formats from each other.
  • Some sequences are found in patent text, rather than attached as separate files, and this makes them unsearchable.
  • Unless you read every single patent containing sequences similar to yours, it’s impossible to know which inventions are relevant to your industry.

Far be it from me to malign the great work done by the people at the National Center for Biotechnology Information (NCBI) when they created BLAST. In fact, it’s not BLAST that’s the problem—it’s that no one’s made it easy enough to use BLAST in the ideal context.

The ideal context is one that allows you to move seamlessly through activities that naturally exist alongside BLAST analysis—patent searching (to understand the legal landscape), filtering similar sequences by industry or application (to reduce manual labour) and bulk sequence analysis (to increase speed). The ability to move seamlessly between activities such as these, affects our likelihood of achieving certain desirable outcomes:

  • Mitigating legal risks
  • Understanding blockers to R&D projects
  • Faster iteration on experiments
  • Reducing costs
  • Monitoring market activity

In this article, I’ll cover these five positive outcomes and how our new tool—PatSnap Bio—makes their realisation much easier.

1.) Mitigating legal risks

The last thing anyone wants is to get slapped with a lawsuit as they prepare for the market launch of their brilliant discovery. But in areas involving large molecules—for example, biologic drugs—the landscape is murkier than London in summer.

Biotech patent filings are growing by 25% annually and highly represented in litigation—11% of all litigation takes place in this industry, where median infringement damages are around $21.5 million—median damages for other industries is $5.9 million. Because biologics are so new, the legal and policy landscape is underdeveloped—meaning there’s an even greater number of hidden dangers.

PatSnap Bio has far greater patent coverage than any other sequence searching tool on the market. It combines 300 million sequences, including those within patent text, with 130+ million patents across 128+ patent jurisdictions. Free tools will make only a fraction of this data available to you—not the least because PatSnap extracts the sequences in patent text and SEQ. ID., while free tools do not.

Click image to enlarge

PatSnap Patent Data CoveragePatSnap platform patent data coverage

You also don’t have to change platforms to move between sequences and patents. Search and BLAST analysis of sequences, and identification and analysis of relevant patents are seamlessly connected.

Click image to enlarge

PatSnap Bio Interface PatSnap Bio search results page

All these combine to eradicate legal blind spots that could come back to bite you.

2.) Understanding blockers to R&D projects

This is an area where current tools really come up short—and PatSnap Bio’s tight linking of sequences to patents shines.

For example, if you were interested in creating a biosimilar of the biologic, Avelumab, you’d have to tread carefully. Merck has taken serious steps to protect its prized asset, with tens of patents comprising multiple families. There are other companies filing patents around this sequence too. How do you de-risk or even assess the risk around such a decision?

Because PatSnap Bio is built as part of a family of powerful patent searching, analysis and visualisation tools, there’s so much more you can do once you’ve identified similar sequences. After refining sequence search results by legal status, identity or similarity score, and (depending on use case) drug target, you can extract patents containing relevant sequences for further analysis. For this reason, Bio is superior to any other tool out there when it comes to whitespace analysis.

In the case of Avelumab, I took all (active or pending) patents protecting inventions relating to similar sequences and therapeutic mechanisms, then plotted them out on a landscape.

Click image to enlarge

Avelumab Patents LandscapeAvelumab sequence patent landscape

This makes it much easier to identify any competitively disadvantageous areas of R&D. And this is just one of many ways you can slice and dice the data, when there’s a syntactically coherent link between sequence searching and patent analysis.

3.) Faster iteration on experiments

Many organisations working with large molecules will generate hundreds of sequences when running experiments—and some will run freedom to operate searches as frequently as every three days.

PatSnap Bio’s high-through sequence searching capabilities—which allows you to search up to 200 sequences at once—means scientists can learn much faster about the feasibility of their work.

You can upload a FASTA file containing reams of sequences and Bio will analyse them all at once.

Click image to enlarge

High-Throughput FASTA Bulk Sequence SearchingPatSnap Bio multi-sequence searching

The platform also makes it easy to switch between the results pages for different sequences.

Click image to enlarge

Multi-Sequence Search Selector

The faster you learn about the legal, technical and competitive landscape surrounding your sequences, the quicker you can adapt. Even better when the robustness of the data on which your analyses are based means you’ll also be making decisions of a higher quality.

4.) Reducing costs

Conversations about money have a sneaky propensity to turn awkward, so I won’t dwell on this one.

But, I should highlight that to avoid all the complicatedness associated with FTO and novelty searches on sequences, many organisations do one of a few things:

  • Use free tools to run the bare minimum search, as a box-ticking exercise
  • Outsource FTO searches to external consultancies or firms
  • Attempt to build their own tools, which circumvent many of the problems I’ve discussed here

Option 1 is extremely costly, financially, in the long term (as I explained in the section about legal risks). Option 2 is quite financially costly in the short term and prolongs the whole process. Option 3 is also extremely costly in the short term, money and time-wise.

All I’m saying is, it doesn’t have to be that way.

5.) Monitoring market activity

Another quality of good sequence-patent search compatibility is that you can use features built for patents to monitor activity around sequences.

For example, when you take relevant patents from a search in PatSnap Bio into our Analytics tool, you can take advantage of the automatic alerts feature. This is just one of several features, typically reserved for patent analysis, that comes in handy when you’re dealing with sequences.

Click image to enlarge

Sequence Patents MonitoringPatSnap sequence monitoring email alerts

The alert tool automatically sends you an email whenever something changes with one of the entries in your search results, or when a new entry is added. All this is based on your sequence search query, meaning you can effectively monitor legal and competitive activity around a sequence (or set of sequences).


Free eBook: the simpler, more powerful way to search and analyse sequences


This eBook explains how PatSnap Bio simplifies and supercharges the process of searching and analysing DNA, RNA and protein sequences. 

PatSnap Bio eBook cover download

DOWNLOAD EBOOK