Hairpin / Bulge Calculator
Client: Allegheny College Biochemistry Department Chairperson
Deadline: May 2005
Project Description:
 |
| Hairpin / Bulge Calculator GUI Screenshot |
My client wanted a program which would parse through files containing nucleotide pairings and search for certain entities within RNA strands called hairpins and bulges. The input file was created from the output of another molecule analyzing program.
The molecule analyzing program would take a molecule and break it down into nucleotide pairs: (A – Adenine, G - Guanine, C – Cytosine, and U - Uracil). It would display this information in a data file which would show the number of the current nucleotide and the nucleotide it is paired with, leaving a zero if no pairing was present. So the following sequence would be Adenine (Nucleotide number 23) paired with Uracil (Nucleotide number 634): 23 A 634 U.
He wanted a program that would detect either Hairpins or Bulges, allowing the user to choose between the two. This program had to define both hairpins, as well as, bulges. Both entities could be located in the input file. Here a series of one or more consecutive non-paired nucleotide bases were found in the input file sandwiched between other paired nucleotides. Once the program accepted a hairpin or bulge choice, it had to accept a size for the hairpin or bulge. Size is determined by the number of consecutive zeros (un-matched nucleotide pairs) in a hairpin or bulge. The following is an example of a Hairpin with a size of 3:
32 A 36 C
33 G 0
34 G 0
35 C 0
36 C 32 A
The program had to have the capability to open up a series of dynamic boxes. Here the user could enter values to search for specific hairpins or bulges. In the case of hairpins, it sufficed to search for the unpaired nucleotides. In the example above a match would be found if GGC was entered in the dynamic boxes for a Hairpin choice of size 3. These input boxes had to change based on the size entered and the choice of either a hairpin or bulge.
The bulge information was slightly different from the hairpin. A user might find it useful to search for specific types of bulges, comparing the pairings at the ends in conjunction with the unpaired nucleotides. This provided some specifics for the GUI, some input boxes would need to be editable for some choices but then ghosted out for others. He said that he didn’t foresee searching for entities beyond size ten, providing a cap for the number of dynamic boxes. This program’s output had to show the number of matches it found and be able to write that information to a file.
An additional request was for the program to have the ability to do batch analyzing, parsing many files in succession during a single run to search for a certain hairpin or bulge. However the program also needed the option to search for only one file as opposed to many. The client also asked for a browse button to search for and select the directory or file that the program would parse.
Project Solution:
The solution was a triumphant and engaging ordeal as far as coding was concerned. I chose the Java language as a base for the program because of my client’s desire for a GUI. I had a radio button at the top of the program window so the user could select the type of entity they were searching for, either a hairpin or bulge.
An error detection and report panel on the window also doubled as a step by step manual that told the user how to operate the program. The user entered a size in the hairpin/ bulge size box and clicked on the open input boxes button to enter the desired nucleotide data to match the files against. The program could handle sizes of nucleotides, up to 12, in the dynamic box section. This met the criteria of my client but I wanted the program to be scalable. I added a section where the user could enter information for sizes up to 50 in a manual text field box. The program ghosted out boxes which were not being used and un-ghosted the boxes which required information.
I had two sections for specifying the input for the program. In one text field I had a browse button, allowing the user to select a single file from the hard drive. In the other text field, the user could select an entire directory or folder to parse a group of files. The interface was configured so a user could only fill in one input text field with the other one reverting to blank, to reduce the possibility for errors.
Once all the information was provided, the program ran, counting the matches it found and displaying the result next to the number of files it had searched. The information was able to be written to an output file. The program had a default “Output.txt” file it would use if the user didn’t enter a name for an output file. In cases where a file extension was not provided by the user, it would automatically appended a .txt affix to the filename.
The program was robust. It could parse .doc, .txt, .dat, and .bpseq files. I did my best to optimize the performance of this program. Normally it takes a person at least 1 to 2 hours to parse a single ten thousand nucleotide base molecule file by hand, which was the way the professors had to do it until now. My program was able to parse a whole directory of 500 of these files in under 30 seconds.
The program had some bonus features, such as allowing the user to enter X as a wild card. This would turn up nucleotides with any valid base provided the other static nucleotides for the bulge or hairpin were matched.
I was given an extensive deadline since I was concurrently enrolled in classes and the program had many features. I was able to get the final version of the program to him, ahead of time, by January 16th, 2005. |