One of the main features of the CVRC website is to provide faculty and lab managers with a drill down of their finances so they can track where they stand monthly and approve financial reports. The second most used feature is a poster submission process where anyone can pay to have posters printed on either 36” or 42” paper. Other things include faculty, trainee, and employee profiles and an overall hub for the events going on in the Cardiovascular Research Center. Finally, the site allows for mouse surgery requests, key request, and has an antibody database.
All of the financial data is stored in Oracle. Due to Oracle being the beast that it is, extracting your data and showing it as information on a website is no small order. My predecessor took an impressive, but extremely laborious and non-automated route to solve this issue. Everything else on the site will be relatively simple to replicate.
Current Process In Place:
Now I am going to attempt to explain what the current process is. I hope to paint a clear of a picture as possible. Bear in mind, it took a couple weeks for me to wrap my brain around the entire process as I was dealing with orientation and all the other tasks and nuances of working in the CVRC.
I need to explain Oracle really quickly. Oracle tracks all expenses having anything to do with the grants and budgets set forth for our labs and faculty. The data, for the most part is real time. At night, I think midnight, Oracle updates certain views and tables with the current data.
Currently, to use and manipulate the data in Oracle, CVRC has a process in place where at the first of the month (we need to wait until the end of the month to provide a complete report for faculty and staff) a series of queries are ran (2 per lab) where essentially the queries SELECT the data we need FROM Oracle and inserts it into a Microsoft SQL database with 3 tables per lab. From these tables, the CVRC website gleans its data. I will go into the problems with this process in a minute. Just to give you an idea, 1 query would take up a half a page. The picture below might help you visualize the process.
There are many, I will highlight the main ones. So the queries used to select the data from Oracle and then write to MS SQL are all hardcoded. Therefore, if a project number, task number, or award number changes, the queries must be edited. Now, if you were paying attention you noticed that I mentioned there are 3 tables in MS SQL per lab. All three of these tables are NOT related in any way, shape, or form. So basically the entire MS SQL database is NON relational. Also, there is NO reason to have 3 separate tables per lab. So let’s get back to the hardcoded part. At the first of the month, the date needs to be changed to reflect the previous month in the package that is run on the MS SQL to grab the data from Oracle. If ANY accounts or grants are added or removed, on the first of the month, you have to do a copy/paste routine to update ALL the queries (that’s 2 per lab). Then, let’s say you have everything right, you run all the packages, sometimes things just break and you spend your time dumping and reloading tables until its right.
Maintenance? Yeah there is some. You have to maintain a MS SQL database, you have to maintain the changes in accounts so the queries are accurate, and you have to deal with the issues that plague Cold Fusion. My point is, this process should be automated and there should be NO reason for a middle man database.
There it is in a nutshell. I left out the entire run around I did and the other stuff that is not pertinent to what I am actually trying to do here. However, going forward, I will detail everything I tried, failed at, solved, and anything else I learned along the way.
My Quest To Migrate CVRC Website Out Of Cold Fusion To Drupal
And so begins the journey…
With the Cold Fusion servers being phased out at the University it is time to rewrite the CVRC website. I thought, with the all the push for Microsoft products, .NET would be a great platform to build the new site on. A few emails later I had a mapped network drive and space to begin developing in .NET. A week later I stumbled upon a group of developers at the University migrating sites into Drupal. I thought, wait Drupal require Unix hosting. So I emailed a few to see what support there was for Unix based platform and hosting. Apparently, the University hosts a rather large Unix cluster. I was sold! A few emails later I had a mapped network drive and access to MySQL database.
I spend the next week designing, coding, and shaping a new CVRC website from scratch. I had finished a rough design and was ready to start tackling the database side of the site in hopes to re-use a lot of the queries that were already in place. I got an email saying that a few of the developers around UVA were getting together to do a Drupal conference. I figured, why not! I am not developing in Drupal but it would be nice to network and know a few more developers. So I went. After really seeing what it could do, I was sold, again.
Drupal is extremely appealing because it handles data and workflow extremely well, it is a content management system, and maintenance would not require a web developer … just someone who understand the site and can click a lot. I figured what a great platform to build the new CVRC site on. Out with the old and in with the new … I moved my current site and installed Drupal. With the help of Anson Parker I was well on my way.
There was still this nagging problem of how do I eliminate the middle man database and automate all of this data movement. Drupal does NOT play nice with Oracle (come to think of it, what does?) so I couldn’t just query it directly. I started nagging Hamp Carruth and Kate Bakich to see if they could help me with this process. I had heard Hamp was good at writing script and batch files and Kate’s group manages the server environment for human resources and finance. After about a week of emails there was still no clear solution. I shelved it thinking the solution would come to me as I learn Drupal.
Poster Printing: (detailed guide here)
Let’s build out the poster printing model. By now I had spent a bit of time with Drupal and was confident I could have this done in a day or two. Poster printing allows anyone to log into the site and request a poster. They can attach a file to the request (we usually get either Powerpoint or Adobe files). They can request a proof. If they request a proof, the print request goes into the awaiting poster print queue. At the office, we print out their proof on a normal sheet of paper. Once their proof is approved, it is moved to the awaiting print queue. After the poster has been printed and their invoice generated/printed, it is moved to the complete poster queue. Each step fires out emails to me and the person who requested the poster. As an administrator I can log in and track ALL poster prints and where they are in the process. The user can only see their poster as well as a history of past submitted posters (this feature is not in the current version, only my version). My new version will solve the issue of users trying to log in to approve their poster and they receive authentication errors or their poster just isn’t there.
Drupal has a module called feeds. Basically what it does is parse or step through different various forms of data (CSV in my case) and inserts all the data into a table with the fields mapping you provide. So if you had a CSV field with the following headers: fname, lname. You could map fname and lname to field_fname and field_lname in Drupal. Then you can schedule Feeds to import the data or just do it once. From there, you can use all of your Drupal goodies to manipulate the data and turn it into information.
That got me thinking about my shelved problem of eliminating the middle man database. What if the data I need in Oracle could be extracted and stored somewhere on the network in a CSV file? Then I could schedule Feeds in Drupal to check the file at a certain time and update the content. Well that would work. But how would I get the data out of Oracle automatically without me running something. That is still an issue I have not solved yet. I image I could write a batch script that could run on my machine but it would get tricky because it would require the VPN connection to be on and I would have to connect to the Oracle database at the command line level. Back on the shelf it goes for now while I build out on complete lab, start to finish.
September 2012 Update:
We have gone live with my solution. I have yet to find any computational errors and displaying the data as information in a usable format has become incredibly easy and efficient. What was my solution? Glad you asked.
Basically, like I mentioned before, my solution uses a combination of feeds, views, and organic groups with some flags, panels, and other modules sprinkled in. A summarized process goes like this: At the beginning of every month, I use an Oracle SQL development tool to run a series of queries to extract the financial data I need. It then exports the data as a CSV file and stores the data on a server. My website has feeds schedule to look at that data and update all nodes. The only thing that isn't automated is the query and dump of the data in Oracle. I hope to eventually tackle that but seeing as how it takes five minutes, it is not a priority. Once the data is imported, I then use views to pipe the data into tables as I see fit. That is, of course, an understatement. A lot goes into displaying the data including some very clever data relations and helper modules from the Drupal community.
As far as I can tell, this solution is completely scalable and reusable. The system basically revolves around PTAO numbers. They are unique numbers that when combined make up the accounts here at UVA. Any PTAO can be managed with my solution. More or less data can be shown given the situation and having exposed filters allows the front end user to see some really powerful, summarized data. My next step will be to provide a business intelligence like dashboard where the front end user can spit out graphs and compare data.
The site and all the other development sites I work/build on are NOT being hosted by ITS or the university. I am running my own rack with a few servers running VMWARE with a couple of virtual machines. One server is development and the other faster, more reliable server is production. Everythign has static IPs and the servers are hooked up to emergency power. In about a week or so I will have access to the official UVA datacenter where I will be moving everything to. In my experience, Drupal sites run better the more ram you give it and the faster the ram is. But having my own servers and several virtual machines is so useful!
Currently I am fleshing out the site and moving content over. I am also implementing a ticketing system so I can handle the requests that come to me. I have also gotten about 75% of the way through a new process for managing poster prints, mouse surgery requests, and slide stainers.