Description: The article below mentions nc state employee salaries. It is related to the big data analytics at North Carolina State University. The author will demonstrate a cool project where IBM Jay Star team is working together with North Carolina State University to test some of the analytic technologies on web data.
It is David Barnes here with another episode of ET information where I bring you information on new and emerging technologies from IBM. Today I want to show you a demonstration of a very cool project where our IBM j start team is working together with North Carolina state university to test some of our analytic technologies on big data web data to help the university realize more revenue from patents they’ve produced.
What would motivate the folks at NC state to want to do this quite simple patents? I had no idea, but in 2007 alone, the top 150 universities in the united states produce 1.4 billion dollars worth of patents, it’s big money and it’s all money that can be returned back into the university to fund more research.
Here is the problem they’re facing, when I say this is the office of technology transfer at NC state, the problem they’re facing is finding the right suitor to license the patents, go out on the web and start searching around, get on the phone, reach out to past contacts its manually intensive and they have a limited amount of manpower to work on it.
Here’s the back story, there’s a professor at NC state’s business school, the Poole College of Management, he’s talking to his class and he’s telling him about this problem, so it happened that one of the students in the class happened to be an intern with IBM’s Jay Star team, they started talking about one thing that led to another, they got the center for innovation management studies at the University involved the OTT.
They’re called the Simms group, the OTT, the Cyn’s group, they started talking and working together. The next thing they’re off and running on the project will be to start the project, the folks at the OTT needed to identify a sample patent that they’d use for a test case.
So in this case, they chose a smart inhaler technology that was invented at the University, next the OTT folks got together with this seamless group and they identified the websites that we’re going to crawl scrape crawl to get the information, the documents that we’re going to analyze.
In this case, the websites were blogs and wiki’s and government websites cetera, we took those URLs for the websites and we fed them to our IBM content analytics and our IBM project big sheets, those two tools went out to gather the information for us to analyze.
In fact, the documents behind me are part of those that we gathered, next, they work with our IBM J start team to identify the key words they use for the analytic routine, in this case, the key words are the same ones that the folks at the OTT would look for.
If they were visually scanning a document, they took those key words and we built out an analytic routine with our IBM language, we’re plugged that into our IBM content analytics and ran the analytics against the entire corpus of data.
In this case, rather small about a hundred and sixty-seven thousand documents and here are the results, so this is the IBM content analytics user interface on the right side, I can look at the documents that we gathered, I could scan through these things and remember there are a hundred and seventy-six thousand documents.
Obviously, I don’t want to do that, we’ve already run analytics against them on the left side, I can look at the results of the analytic routine, I’m going to go down here, I’m going to select strong hit, but notice strong, hit medium, hit we care, when we run the analytics, remember it is to find a suitor for NC State to license their patent.
Ultimately, we’d like to find one of their key words along with a company’s name and that really binds in that targets on what they’re looking for, so a weak hit, if one of those key words appears in a page with a company’s name, if one of those key words appears in a name with a company’s name in a paragraph that gets stronger.
Then if one of those key words appears in a sentence with a company’s name that’s a strong hit waited very highly and that’s exactly what they’re looking for at NC State, so when I click on this, I’m talking about the office of technology transfer, those are the folks that are down and doing this right.
So let me go ahead and select strong hit, I’ll select facets and these are those documents that have been identified as a strong hit, you can see here GSK Roche nectar etc company’s names.
If I’m interested in one, I’ll pick Novartis and I want to look at those documents, in this case, I think it said seven documents and here’s a surprise, I’m looking for insight and here I see Novartis is identified, but it also threw up schering-plough.
If you’re looking for these companies’ names and key words, you’re probably interested in the fact that this company schering-plough is also named in one of those seven documents.
If I want to pick one of these, I’ll pick on Novartis and I’ll say I want to look at one of the seven documents, select the documents view, I can click on this and go through directly to the document on the web that we crawled in.
I’ll show you why this thing got a strong hit, Novartis company named dry powdered inhaler device, one of the strong key words that the folks in the Simmons group and the OTT identified, that’s exactly what they’re looking for a strong hit out of a hundred and seventy six thousand documents so that there’s one other thing I want to point out that is the analytic routine found before I do that.
I’m going to do a quick refresh, so they get all documents in the view again, this one is possible to fail projects, so what’s that all about? If another company has been working on a similar technology and their project failed, they’re a prime candidate to reach out to possibly license NC State’s successful smart inhaler technology.
So I select possible failed projects, in this case, I’ll select the facets for that and you’ll see there are a lot of documents that were returned from the analytic routine, I’m going to filter down, I want to look at those that should be a terminated test.
So I’ll start typing in termination, I type quickly, not accurately, I’m going to type in termination, here are a few of the documents that have been returned, it seems that there are 11 of them, so these are the documents that have been returned.
I’m going to select the documents view and look, at the top, you can see in blue termination of inhaled in an ugly green termination of our inhaled, I can select this specific document, drill down to it.
Here’s the document from the Securities and Exchange Commission and I could scroll through this an easier way, I’m going to do a ctrl F going to Firefox and have Firefox find in the document. It is fast but not accurate termination.
It is termination of grabbing this, swipe it, scroll down a little bit termination of inhaled insulin programs, that’s exactly what NC State was looking for, so let’s take a step back and look at what I showed you the folks at NC State used IBM content analytics and IBM project big sheets.
They crawled the web, grabbed a hundred and seventy six thousand documents, ran analytics against those documents and put this tool in the hands of the folks at the office of technology transfer.
So they could very quickly get down to things like those strong hits and possible failed projects, imagine trying to do this, manually going to a search engine, you couldn’t do that, it’s unmanageable.
Even if you can do this, this isn’t only for searching, this is analytics against big data to find insight now this project, it was intentionally kept relatively small 176 thousand documents since this time, the folks at NC State have gone out and crawled hundreds of websites, grabbed over ten million documents.
From what I understand in the first week, they identified about ten strong hits and that is one example of what our IBM J star team is doing around big data and big data analytics.
If you want to learn more as they build, I build out more videos, I put them on YouTube, my YouTube channel is youtube.com, slash IBM ET information, if you want to learn more about all of the cool projects, they’re working on it sims at NC state, you can visit their website at CI ms dot n CSU dot edu.
Lastly, if you’re interested in engaging with our j start team to help jump-start your company’s big data analytics projects, you can visit their website at ibm.com / j start.