Companies in the Bioinformatics Software and Software

R00063-Bioinformatika
RešeRše:
“Bioinformatika.”
Datum vypracování: 2004-10-21
Požadavky:
Bioinformatika. Informace o semináři Bioinformatika II na Invexu 2004. Obecné
informace o bioinformatice. Přístupy k datům. Firmy zaměřené na software s linkami
na ně. Informace o projektu "Blue gene" společnosti IBM, aplikaci HP AlphaServer
v projektu "GeneProt" a o projektu skupiny "SUN - Computational biology". Údaje
z databází UK Oxford, Cambridge a Institutu neurologie univeristy Londýn. Údaje
z databáze Dialog.
Klíčová slova:
„Bioinformatika.“
„Bioinformatics.“
„Blue gene.“
Invex 2004-Bioinformatika II-seminář:
Bioinformatika II – Metody, technologie a software
Místo konání: pavilon E, Press Center
Organizátoři: Přírodovědecká fakulta – Masarykova univerzita v Brně, Veletrhy Brno, a.s.
Odborný garant: prof. Jiří Damborský
Popis:
Druhý ročník odborného sympozia za účasti zahraničních vědeckých kapacit na téma inspirující pro budoucí rozvoj
informatiky se mimo jiné zaměří na aplikovatelnost vědeckých poznatků v praxi.
Na stránskách www.muni.cz najdete kontakt odborneho garanta seminare pana
doc. Mgr. Jiří Damborskeho, Dr.
doc. Mgr. Jiří Damborský, Dr.
Fakulta/Ústav Přírodovědecká fakulta
Katedra/Oddělení Národní centrum pro výzkum biomolekul
Kancelář pav. 07/02001 (Kotlářská 2, 611 37 Brno)
Telefon 549 49 3467
Fax 549 49 2556
E-mail [email protected]
WWW Home Page http://ncbr.chemi.muni.cz/~jiri/
Na stránkách http://www.cba.muni.cz/projekty/bioinformatics/program.htm
najdete program semináře Bionformatika II.
Program:
13:00*
13:00-13:05
Jan Žaloudík and Milan Gelnar : Welcome
13:05-13:50
Janusz Bujnicki, International Institute of Molecular and Cellular Biology, Warsaw,
Poland (EMBO Young Investigator Lecture): Metaservers and Frankenstein's
Monsters: Protein Structure Prediction by Consensus Fold Recognition
and Assembly of Fragments
14:00
14:00-14:25
Jan Pačes, Institute of Molecular Genetics:
Bioinformatics: What Can We Do
with Genomes in Computers?
14:25-14:50
Matej Lexa, Masaryk University:
14:50-15:15
Martina Réblová, Botanical Institute:
15:15
coffee-break, poster session, software demonstrations
Weak Similarity in Biological Sequences:
Rapid Approximate Word Searches and Their Use to Identify Structural
Features in Protein Sequences
Phylogenetic Analysis: Methods and
Principles for Constructing Phylogenies
16:30
16:30-16:55
Jiří Vondrášek, Institute of Organic Chemistry and Biochemistry:
16:55-17:20
Petr Hořín, Veterinary and Pharmaceutical University:
17:20-17:45
Ladislav Dušek, Centre of Biostatistics and Analyses, Masaryk University:
Structural
Bioinformatics: How Far We Can Go from an Amino Acids Sequence
Genomic Approaches in
Analysis of Complex Traits: Example of Innate Immunity
Multidimensional Data Sources in Current Biology and Medical Sciences:
How to Get Information Effectively?
17:45
17:45-17:55
Closing discussion
17:55-18:00
Jiří Damborský: Closure
* presenting authors are kindly requested to be at the lecture room 40 min before beginning of the meeting to
check their presentations, mount posters and install software.
Obecné informace o bioinformatice:
Bioinformatika s pomocí počítačů řeší biologické otázky. Někdy modeluje procesy nebo
stavy, někdy zpřístupňuje data ve formě databází, někdy něco předpovídá (například
předpovídá/nalézá geny a jiné "elementy" uvnitř nukleotidové sekvence tvořící
chromozóm). Dále třeba předpovídá funkci bílkovin obsažených v předpovězených
genech (rozuměj kódovaných těmi geny), předpovídá struktury molekul a jejich interakce
(třeba s nějakými dalšími molekulami), porovnává jednotlivé organismy mezi sebou na
základě různých kombinací genů v nich obsažených a v neposlední řadě na základě
sekvenční odlišnosti určuje míru jejich příbuznosti. A to tento výčet jistě není úplný.
Krátce řečeno, odcizila ostatním biologickým oborům vše co souvisí s počítači.
Když jsem někdy minulý týden ve svém PocketPC četl článek z New York
Times (ale možná to bylo z něčeho úplně jiného), hovořilo se v něm o velmi
zajímavé věci. Článek pojednával o nutné obměně generací lidí, kteří hýbou
současným Internetem. Hovořilo se v něm o něčem, co už tvrdím asi dva
roky. Současný Internet rozhýbala řada nadšenců, lidí nebojících se riskovat,
přijímat rozhodnutí. Ti jej doposud řídí a v mnoha případech jsou to spíše
nespoutaní ďáblové, než lidé hovící administrativě, pořádku, plánům,
podnikatelským záměrům a vůbec všem těm zdánlivě zbytečným věcem.
Jenže, v článku se hovořilo o dvou důležitých záležitostech. Tou první byla
skutečnost, kterou možná mnozí "hybatelé" Internetem pociťuji na vlastní
kůži. Jde o jakousi vyčerpanost, "vyhořelost", nedostatek onoho prvotního
elánu, nadšení, jásotu, zběsilosti a živelnosti. Po nějakých těch dvou až šesti
letech, které tito lidé věnovali Internetu, se jim už jaksi "nechce". Po celou
dobu pracovali určitě víc než 8 a půl hodiny denně. Neznali víkendy, pracovali
dlouho do noci. A v některých případech zbohatli, v některých případech
nikoliv. Každopádně dokázali nemožné. Posunout Internet kamsi daleko
dopředu.
Druhá důležitá věc, uvedená v onom zmiňovaném článku, ovšem musí přijít.
Musí nastoupit armáda administrativně orientovaných "řiditelů" a "ředitelů".
Musí nastoupit klasické způsoby podnikání. Běžné obchodní vztahy, běžné
problémy, běžné postupy. Smlouvy, dohody, spory. Ona původní skupina lidí,
kteří dokáží hýbat věcmi tam, kde původně nic není, se ovšem na tento cíl
jaksi nehodí. Prostě na to nemají žaludek. Jsou to vizionáři, lidé kteří vymyslí,
postaví, prosadí a přivedou v život něco, na co by si "běžný" člověk netroufl.
Zhroutil by se totiž již v okamžiku, kdy by zjistil, kolik práce a prostředků je
potřeba, aplikoval by standardní poučky a postupy, a pak odsoudil většinu
projektů v zapomnění na dně temné zásuvky.
O čem se ale také náhodou v článku hovořilo, bylo něco uplně jiného. Článek
byl pochopitelně "emerický", takže jedna z osob, patřících do kategorie
"Internetových buditelů", se jaksi mezi řečí zmínila, že teď si vezme tak šest
měsíců dovolené, naučí se konečně surfovat a řídit letadlo. A pak se možná
vrhna na biotechnologie, že to je právě ten obor co bude "frčet". A k
Internetu? Tam se prý už asi nevrátí.
Pak jsem, ke konci minulého týdne, narazil na další článek. News.com
informoval o investici 100 milionů dolarů do biotechnologické revoluce. IBM
je tím, kdo chce vrazit 100 milionů dolarů do vývoje produktů na pomoc
vědcům zkoumajícím masivní množství dat majících (možná) vztah k chování
genů a proteinů. A součástí investice je založení nové divize, věnující se
právě vědám zkoumajícím život. A IBM není samo, článek v News.com
upozornil, že podobné investice již proběhly od společností jako HP či Sun!
Co víc, jde o pokračování investic do tohoto odvětví - IBM již loni v prosinci
investovalo (také 100 milionů) do Blue Gene (ach, má oblíbená modrá
barva!) iniciativy. Součástí investice je i vybudování superpočítače schopného
pomoci porozumět vytváření proteinů! A aby se vše zhodnotilo, IBM
samozřejmě vytvořilo novou obchodní jednotku, ta prodává počítače a služby
určené biotechnologii, zdravotnictví, farmaceutice, genetice a dalším
podobným vědeckým odvětvím. Ona se totiž biotechnologie (a genetika) bez
pořádného počítačového výkonu prostě neobejde (viz. třeba článek Dolování
dat pomáhá vědcům)
Něco na tom tedy asi bude, že zrovna biotechnologie (přezdívaná ve
zkratce biotech) může být právě tím oborem, kam prchnou ti správní vizionáři
a buditelé. Což mi nedá nevzpomenout na stále se množící informace o (dalo
by se i říci) aférách týkajících se právě biotechnologie. Patří sem notoricky
známá klonovaná ovce Dolly, nedávné oznámení o uzavření projektu
mapování lidského genomu, snahy o patentování genetických informací
(horké téma v USA) a různé úvahy o využití genetického inženýrství (něco
málo třeba v článku Balancování nad genetickou propastí).
Co vy na to?
Zdroje na Internetu:
Biotechnology@Yahoo!
Bio Online
BioSpace
National Biotechnology Information Facility
Biotech Chronicles
DNAPatent.com
SciWeb
Algoritmy pro biotechnologie: Od farmakogenetiky po sekvenování
Bioinformatika motivuje dodavatele hardwaru i softwaru k vývoji stále výkonnějších
počítačů i chytřejších algoritmů. Jaká je ale podstata oněch výpočetně náročných úloh,
od kterých se současně tolik očekává např. ve farmaceutickém průmyslu? V následujícím
článku si představíme několik z nich.
Začít můžeme např. vyhodnocováním klinických a dalších testů, které jsou součástí cyklu
vývoje léků. V podstatě se jedná o obyčejnou statistiku. Zajímavou však úlohu činí
skutečnost, že neexistují pouze látky účinné a neúčinné, ale také léky působící pouze za
určitých podmínek nebo u určitých skupin obyvatelstva. Informatika pak musí dodat
nástroje, které dokáží v ohromných souborech dat vyhmátnout na první pohled unikající
souvislosti.
Léky šité na míru
Příkladem je třeba kauza léku BiDil, který je určen na srdeční choroby (podrobněji
referoval např. server Osel.cz, viz http://www.osel.cz/index.php?obsah=6&clanek=843).
V 80. letech byl tento preparát testován, avšak jeho účinnost na obecnou populaci se
nepodařilo prokázat a k výrobě léku nedošlo. Teprve díky nové analýze tehdy dat "po
jednotlivých skupinách", kterou provedli informatici americké firmy Nitro Med, se
ukázalo, že látka dává nadějné výsledky u Afroameričanů, prakticky neúčinná je však u
bílých. Následné klinické testy tento rozdíl potvrdily a výsledkem je tak první lék určený
pro konkrétní populaci. BiDil se nyní nachází ve fázi schvalování.
Lék pro konkrétní populaci je samozřejmě jen prvním krokem, protože se stále jedná o
hodně hrubé měřítko. V budoucnu se ale předpokládá medicína ušitá na míru přímo
konkrétním jedincům v závislosti na analýze jejich genetické informace. Už nyní dává
medicína dělící se podle jednotlivých populací šanci různým izolovaným skupinám a
menšinám, které se od "obecného vzorku" značně odlišují a často trpí specifickými
chorobami. Kromě velkých, plošně působících farmaceutických koncernů se předpokládá
také vznik malých biotechnologických firem zaměřených právě na vývoj léků pro takové
konkrétní skupiny/populace. Podobný scénář alespoň zazněl na jarním setkání First
Tuesday, které bylo věnováno právě biotechnologiím.
(podrobnosti
http://www.scienceworld.cz/sw.nsf/ID/9DA53EA026ECDF20C1256EA70037B88B?OpenDo
cument&cast=1)
Většina výše popsaných problémů patří z informatického hlediska do kategorie získávání
znalostí z dat.
„Tato koncepce se označuje jako farmakogenomika a věští se jí světlá budoucnost. Vývoj
nového léku je bohužel náročný a drahý a farmaceutické firmy nejsou dobročinné
organizace. Musejí vydělávat a to znamená, že se jim nemalé investice musejí vrátit. To
značně omezuje vývoj léků, které by působily jen na malé skupiny lidí – Afroameričané
jsou v tomto směru populace dosti početná a relativně movitá. Potenciální trh s lékem
pro opravdovou menšinu by byl příliš malý. Prozatím se dá proto počítat spíše s tím, že
lékaři budou na základě výsledků výzkumu ve farmakogenomice volit ze stávajících
preparátů ty, u kterých bude pro danou skupinu obyvatel menší riziko nežádoucích
vedlejších účinků,“ uvádí k tomu Prof. Ing. Jaroslav Petr, DrSc., který pracuje ve
Výzkumném ústavu živočišné výroby v pražské Uhříněvsi a přednáší biotechnologie na
České zemědělské univerzitě.
Ještě jeden názor na farmakogenetiku
Michael Storek,
biochemik firmy Compound Therapeutics,
[email protected]
"Mnohé začínající biotechnologické společnosti se před několika lety začaly zabývat
farmakogenomikou. Myšlenka je poměrně prostá, stačí přečíst variace genetické
informace (tedy DNA) pacienta a na jejím základě určit, zda daný lék pacientovi pomůže
či zda mu hrozí vedlejší účinky. Tyto jednoduché principy se ale zatím nepodařilo
přeměnit do komerčně úspěšných technologií. První problém představuje cena přečtení
DNA. Ačkoli se technologie DNA sekvenování stále vylepšuje, přesto se cena čtení genů
odpovídajících za účinek daného léku pohybuje ve stovkách dolarů. Velké farmaceutické
firmy také nikdy nebyly farmakogenomice příliš nakloněny, neboť menší skupina pacientů
by pro ně znamenala nižší tržby. Menší biotechnologické společnosti marně spoléhaly na
spolupráci s farmaceutickými giganty a buď zkrachovaly, nebo rychle změnily obor
podnikání.
Co bude s farmakogenomikou dále? Velká část výzkumu léků šitých na míru se nyní
přesunula na univerzity. Farmaceutické firmy užívají farmakogenomiku ke “vzkříšení”
léků, které během klinických zkoušek vykazovaly účinnost jen u části pacientů. Nezbývá
než věřit, že klesající cena DNA sekvenování dovolí přečíst celý genetický kód pacienta a
ten pak bude součástí jeho zdravotní karty - podobně jako je tomu dnes s informací o
očkování."
Sekvence DNA
Již téměř klasickou úlohu z oblasti bioinformatiky představuje sekvenování, tedy "čtení"
DNA písmenko po písmenku. Nejznámějším případem je samozřejmě projekt lidského
genomu.
Bioinformatika pomohla především následujícím způsobem: Namísto čtení DNA písmenko
po písmenku se nyní postupuje v zásadě tak, že dojde k namnožení molekul DNA, jejich
následnému náhodnému sestříhání a pak k softwarové analýze překryvů, z níž má být
stanovena původní sekvence (Ve skutečnosti je to trochu složitější, uplatní se také
schopnost DNA přepisovat se do RNA - zřejmě nejpoužívanější je v tomto případě metoda
tzv. estů, se kterou přišel bývalý ředitel firmy Celera Craig Venter, zřejmě nejznámější
postava z celého projektu lidského genomu. Princip však zůstává stejný. - podrobnosti
viz např.
http://www.scienceworld.cz/sw.nsf/ID/7B352C62F13B62D4C1256E970048FADD?OpenDo
cument&cast=1
http://www.scienceworld.cz/sw.nsf/ID/E237DD7AF94ADBDDC1256E970048FAD5?OpenD
ocument&cast=1).
Popsaná úloha vypadá triviálně, je však třeba si uvědomit, že před sebou máme řetězce
dlouhé miliardy písmenek. Samozřejmě, že úlohu můžeme "řešit" prostě tak, že veškeré
existující rozstříhané sekvence složíme lineárně za sebe. Takový výsledek bude
vyhovovat zadání v tom smyslu, že uplatníme všechny sekvence - my jsme ale DNA
stříhali a potřebujeme samozřejmě najít překryvy. V úloze jde vlastně o to, že hledáme
nejkratší řetězec vyhovující všem podmínkám, minimum v obrovském stavovém
prostoru. Po stránce algoritmu má úloha blízko ke známému problému obchodního
cestujícího.
Kopírování DNA navíc neprobíhá se 100% účinností, dochází při něm k chybám. Úkolem
algoritmu je proto najít nejspíše nejpravděpodobnější sekvenci. A zbývá dodat (což platí
v bioinformatice velmi často), že aby se na problému mohly podílet výzkumné týmy z
celého světa, je třeba jej efektivně paralelizovat.
„Bez pokroku v počítačové technice by pokrok v genomice rozhodně nenabral takové
tempo, jakého jsme svědky," vysvětluje Jaroslav Petr. "Čtením sekvencí DNA ale úloha
počítačů v genomice nekončí. Počítače nám pomáhají pochopit, co je v genomu vlastně
zapsáno. Zcela samostatný problém představuje hledání genů. Ty tvoří jen zlomek z
celého genomu – u člověka asi 1,5 %. Dnes máme k dispozici algoritmy, které umějí
geny ze záplavy písmen genetického kódu vyhmátnout. Dejme tomu, že takhle najdeme
v lidském geonomu gen a chceme vědět, k čemu je dobrý. Jedna z možností, jak najít
odpověď na tuhle otázku, je najít pomocí speciálního softwaru v rozsáhlých databázích
obdobný gen u jiného živočicha, např. u myši. Myš pak můžeme podrobit experimentu,
při kterém je vybraný gen vyřazen z funkce a vědci sledují, co takto postižené myši
chybí. Odtud je už jen krůček k identifikaci příčiny dědičných chorob a hledání léku proti
nim. Přiznejme si ale, že stávající algoritmy umějí dobře hledat pouze "typické" geny.
Vůči genům, které by se vymykaly tomu, co o genech zatím víme - a které by proto byly
nejspíš úžasně zajímavé - mohou být současné algoritmy slepé.“
Proteiny
Klíčovou proceduru, která by mohla výrazně zefektivnit vývoj léků, představuje
počítačové modelování 3D struktury proteinů. Právě 3D struktura má přitom těsný vztah
i k biologické funkci.
Připravit protein laboratorně a pak zkoumat jeho účinky je nákladné a časově náročné mnohem účinnější je použít modelování "in silico". Jako vstup máme pouhou sekvenci
proteinu (tedy pořadí aminokyselin), z níž bychom se měli postupně naučit odhadovat
prostorovou strukturu i biologickou funkci. Vlastní laboratorní testování by pak probíhalo
pouze na molekulách, které už byly počítačově předvybrány.
Celý problém je přitom komplikován tím, že tvar a funkce proteinu závisejí na
"písmenkách" různých aminokyselin v různé míře - někdy stačí záměna jediné
aminokyseliny k tomu, že vznikne nefunkční protein, jindy změny nemají nijak zřetelný
dopad a kód vykazuje značnou redundanci. Funkčně odpovídající protein můžeme také
často sestavit ze zcela odlišných řetězců aminokyselin.
Spíše než analýza sekvence proteinu písmenko po písmenku se proto uplatňuje
rozpoznávání obecnějších struktur, tzv. vzorů. Do kategorie rozpoznává vzorů, tedy na
samé pomezí umělé inteligence, patří přitom i řada úloh v oblasti genomiky (více např.
článek DNA bojuje proti spamu
http://www.scienceworld.cz/sw.nsf/pocitace/352C372DF858F4FFC1256EF600533709?Op
enDocument&cast=1).
V této souvislosti může být zajímavé, že pro rozpoznávání vzorů byl již navržen také
efektivní kvantový algoritmus (podrobnosti článek Kvantové rozpoznávání obrazů
http://www.scienceworld.cz/sw.nsf/ID/C27175EFCA2B2CFBC1256E970048FF68?OpenDo
cument&cast=1).
Dejme opět slovo Jaroslavu Petrovi: „Vědní disciplína zvaná proteomika – tedy věda o
bílkovinách v organismu – prožívá v současné době boom. Velmi zajímavé jsou případy,
kdy protein mění své trojrozměrné uspořádání bez toho, že by se měnilo jeho
aminokyselinové složení. S novým tvarem získá protein i nové vlastnosti. To je případ
tzv. prionů čili proteinových infekčních částic, jež vyvolávají smutně proslulé choroby,
jako je BSE u skotu nebo Creutzfeldt-Jakobopva choroba lidí. Tyto choroby vznikají
vlastně „zašmodrcháním“ bílkoviny, která je nám vlastní a ve svém původním tvaru nám
nijak neškodí. Studium takových prostorových přesmyků se zdá být důležité nejen pro
studium chorob, ale i pro pochopení normálních funkcí našeho těla. Velmi podobné
„šmodrchání“ jiné bílkoviny se v našem mozku významně účastní ukládání informací do
paměti."
model struktury prionu
Kladistika
Kladistické analýzy bývají využívány především v evoluční biologii. Zhruba řečeno v nich
vycházíme z toho, že jednotlivé druhy se od sebe postupně oddělovaly známým
"stromečkem". Jak ale určit konkrétní průběh onoho větvení?
Představte si, že máme např. člověka, sysla a slona. Jak stanovit stromeček? Jaký z
těchto druhů se od společného předka odštěpil jako první? (Jinak řečeno: Má např. člověk
blíže k syslu nebo ke slonovi nebo je od obou vzdálen stejně? Poslední verze by platila,
pokud by se nejdříve oddělil předek člověka a až potom předek sysla od předka slona.)
Kladistika funguje tak, že vybere nějaké znaky (vcelku lhostejno, zda jde přitom o
sekvence DNA nebo třeba o stavbu očí) a organismy podle nich srovnává. Výsledkem je
pak např. mnohorozměrný prostor plný nul a jedniček - to za předpokladu, že u každého
testovaného organismu rozlišujeme pouze to, zda daný znak má nebo nemá.
Úloha má v principu opět nekonečně řešení (mutace vznikají náhodně), my však opět
hledáme nejúspornější cestu grafem - minimum stavového prostoru. Ptáme se prostě,
jakým nejmenším počtem větvení a kroků-mutací se můžeme dostat k existující
diverzitě.
Jakmile pro nějaký (obvykle hodně velký) soubor znaků stanovíme vývojový stromeček,
vybereme si znaky jiné a provedeme srovnání znovu. To, co nás především zajímá, je
především stabilita jednou utvořeného stromu. Pokud nám pro jiné znaky vyjde stejný
strom, pak jsme evoluční události zřejmě zaregistrovali správně.
Kladistika vede k závěrům, které příliš neladí s tradiční biologickou taxonomií, jak se učí
na základních a středních školách. Vyjde nám totiž například to, že latimerie (ryba stojící
blízko předkům obojživelníků) je vlastně příbuznější člověku než kaprovi, takže celá
skupina "ryby" nemá z evolučního hlediska žádný smysl. (Na vysvětlenou: Stromeček v
tomto případě probíhal tak, že nejprve došlo k oddělení předka kapra a až později se
oddělil předek člověka a předek latimerie.) Zájemce o podrobnější popis kladistických
metod lze odkázat např. na knihu Jak se dělá evoluce (Jan Zrzavý, David Storch,
Stanislav Mihulka: Jak se dělá evoluce, Paseka, Praha, 2004, úryvky z knihy můžete
dohledat i na Science Worldu).
V kladistice ovšem nejde pouze o tvorbu teoretických konstrukcí a vývojových
stromečků. Je důležité např. vědět, jak blízko mají jednotlivé organismy k člověku a
identifikovat podobnosti i odlišnosti metabolických procesů - třeba v případě testování
nových léků na zvířatech nebo při pokusech používat zvířata pro pěstování transplantátů
určených lidským pacientům.
Profesor Jaroslav Petr uvádí v této souvislosti následující zajímavost: „Podobnými
metodami bývá hledán i obraz hypotetického prapředka všech stávajících organismů na
Zemi (LUCA –last universal common ancestor). Je to zapeklitá práce, protože všechny
procesy, kterými tento dávný prapředek všech dnešních živých tvorů vznikl, jsou
zastřeny nespočtem následných změn dědičné informace každého z jeho potomků. Navíc
se zdá, že jednoduché mikroorganismy si mezi sebou handlovaly geny tak čile, že pro ně
představa stromu, který se větví, ale už nikdy nesplétá, prostě neplatí."
Podrobnosti viz článek Hledá se první buňka
http://www.scienceworld.cz/sw.nsf/ID/5EAF67184C501F09C1256EAF004BB31D?OpenDo
cument&cast=1
Jazykové stromečky
Následující aplikace je od vlastní bioinformatiky poněkud odlehlá, nicméně dobře ukazuje,
že některé jednou vzniklé algoritmy mají mnohem obecnější uplatnění.
Podobně jako dochází k větvení druhů, větvily se v minulosti také jednotlivé jazyky.
Situace je v tomto případě samozřejmě složitější o to, že jednou vzniklé jazyky nejsou
oddělené úplně pevně, mísí se a dochází mezi nimi nadále k přebírání slov i gramatických
pravidel. Podobné výpůjčky nebyly ovšem především v minulosti nijak časté, a proto i v
případě jazyků umíme na základě kladistických analýz konstruovat naše oblíbené
stromečky. Opět platí, že výstupem z programu může být např. určitý konkrétní strom.
Posléze změníme kritéria/vstupní data a analyzujeme stabilitu získaného stromu. Pokud
dostaneme stejný strom např. po srovnání osobních zájmen jmen rodinných příslušníků,
naše výsledky to činí výrazně věrohodnější.
Kladistické analýzy byly prozatím použity především pro hledání geneze indoevropských
jazyků. Výsledek podobných pokusů je zajímavý nejenom pro lingvisty, ale hodně také
vypovídá o průběhu pravěkých migrací (poskytuje nám informace nejenom o tom, jak
určité události probíhaly, ale také kdy k nim došlo). Nasnadě je kombinace takto
získaných poznatků s historickým a archeologickým bádáním.
Podrobnosti např. v článku Evoluce jazyků a pravěké migrace
http://www.scienceworld.cz/sw.nsf/ID/D741AB35B059C852C1256E970049223D?OpenDo
cument&cast=1
DNA jako počítač
Speciální kapitolou bioinformatiky jsou pak také tzv. DNA počítače a DNA čipy, kterým
jsme se věnovali na Science Worldu již opakovaně, naposled v článku DNA počítače
odhalí nádorové buňky.
http://www.scienceworld.cz/sw.nsf/ID/2132140648ADD43DC1256E9700492310?OpenDo
cument&cast=1
Prof. K. R. Bruckdorfer Key Publications
Signalling Pathways and Reactive Oxygen and Nitrogen Species in the Vasculature
The cells of the artery wall regulate of the rate of blood flow, platelet activity and the formation of
thrombi.These important functions may be disrupted in diseased arteries, leading to increased risks of
thrombosis. Much of the work in our group is centred around the functions of these arterials cells and
platelets, investing both the basic biochemistry and clinical manifestations of these phenomonena.
One key areas of research concerns the gas nitric oxide, which is biosynthesised from L-arginine in
both endothelial cells, platelets and macrophages. Nitric oxide relaxes smooth muscle cells and
inhibits platelets activation. We have been interested particularly in the interactions of nitric oxide with
reactive oxygen species such as hydrogen peroxide and superoxide anions. The latter react with NO
to form peroxynitrite which in turn modifies proteins by nitration of tyrosine residues. We have shown
already, in platelets, that nitration of platelets is a naturally occuring phenomenon. Nitration is clear
important in pathological tissues e.g. there is a large accumulation of nitrated proteins in
atherosclerotic lesions. We have also found that there may be mechanisms by which the nitro group
may be removed. This may be of relevance to tyrosine phosphorylation mechanisms in cell activation
and inhibition, since we have preliminary evidence that nitration may block some of the
phosphorylation sites, at least transitorally.
The aim of the projects in this area would be to investigate the role of nitration in tyrosine signalling
mechanisms in platelets and the endothelium.
Other projects concern the role of tissue factor, best know as the initiators of the soluble coagulation
cascade. Tissue factor, a transmembrane glycoprotein, has other roles in relation to cell proliferation
and development of blood vessels via signal transduction mechanisms. We are currently interested in
the mechanism by which tissue factor, particularly its cytoplasmic domain, can interact with known
signalling mechanisms.
Prof. S.J. Perkins Key Publications
Molecular structures of proteins in infection, inflammation and immunity
Protein structural studies are central to many biochemical and molecular biology investigations in
molecular medicine. Three-dimensional structures are invaluable for diagnosis of disease-causing
mutations or development of rational strategies for therapy. There is a long tradition of protein
structural work in our laboratory at the Royal Free, where we work with antibody and complement
proteins. We also collaborate with clinical biochemists and molecular biologists on-site in Hospital
Departments who use protein structures to visualise their projects. So our working environment is
diverse and stimulating. Possible projects on offer are:
(1) Structural determinations of the antibody classes and their interactions with receptors: the different
forms of IgA. There are five antibody classes. Monomers and dimers of immunoglobulin IgA are
extremely abundant in body secretions, yet their structures are poorly known. In addition IgA forms a
number of important complexes with secretory component and with IgA alpha-receptors. We are using
a range of technologies to determine protein structures at medium resolution using a novel
methodology based on synchrotron X-rays at Daresbury (Cheshire), neutrons at the Rutherford (near
to Oxford) and the ILL (Grenoble, France), and an analytical ultracentrifuge in our laboratory. We then
use protein structure modelling methods performed on a cluster of modern Silicon Graphics
Workstations using molecular graphics to determine the structures. The antibody structures will be
correlated with the unique immunological role of IgA, possibly by additional experimentation in
collaboration with our collaborators at Ninewells Hospital in Dundee. Related projects on offer may
include work with the IgG and IgD classes, and with the antibody receptors themselves using the
same technology. See Boehm et al (1999) and Perkins et al (1998).
(2) Expression and NMR or crystallographic structure determinations of complement proteins
important in inflammation: factor B and properdin. The components of the complement system provide
a major non-adaptive immune defence mechanism for its host. These are activated in response to the
challenge of foreign material in plasma. Complement activation proceeds through a series of limited
proteolytic steps in one of three pathways, the alternative, the classical and the lectin pathways.
The serine protease factor B plays a key role in the alternative pathway of activation and control of
complement. Once the central protein C3 is activated, it forms a complex with factor B (C3b.FB) which
is activated to form C3b.Bb, a protease which activates more C3, and enhances C3 deposition to
promote opsonisation. We have made progress towards understanding the unique functional role of
the five domains in FB data by a combination of molecular biology expression studies, NMR and
crystallographic approaches, and projects are available in these areas.
Human properdin is an essential regulator of the activation of the alternative pathway, as it stabilises
the complex between C3b and factor B. It contains a novel small thrombospondin type I repeat
domain, the second most abundant domain type in complement, which occurs in many other
multidomain proteins. Its molecular structure is wholly unknown. We have successfully over-expressed
five TSR domains of properdin and obtained high-quality NMR spectra. There are chances to
determine the molecular structure of several of these domains in order to describe the full structure of
properdin using an appropriate combination of multinuclear NMR and protein crystallography in the
first instance, followed by X-ray and neutron scattering and analytical ultracentrifugation. See
Hinshelwood et al (1999).
(3) Molecular modelling of mutation sites in proteins. Bioinformatics strategies and protein homology
modelling are invaluable tools to help interpret a rapidly increasing database of known natural human
genetic mutations that result in dysfunctional proteins. We work with several Departments to examine
these, most notably the Haemophilia Unit at the Royal Free Hospital, one of the three largest in the
UK, where many patients reveal novel mutations in their blood coagulation proteins. Other relevant
proteins include membrane proteins and developmental proteins. The significance of these mutations
is not clear without performing molecular biology expression work to test the behaviour of the mutant
proteins, in combination with structural studies to decipher the effect of the mutation on protein folding
or activity. Projects on offer will include the development of an integrated database using
bioinformatics technology for interrogation in order to interpret these mutations and those yet to be
identified, as well as appropriate wet lab work. See Jenkins et al. (1998).
Dr. K. Srai Key Publications
Summary of research interests
i) Nutrients and gene regulation
ii) Iron transport across cellular membranes in relation to Iron deficiency and hereditary
haemochromatosis
iii) Absorption and metabolism of polyphenols and flavonoids
iv) Glucose metabolism and Diabetes
v) Functional and molecular characterisation of renal purinergic receptors in health and disease.
Molecular Basis of Iron Homeostasis in Health and Disease
My group together with Professor Robert Hider and his colleague at KGT, Kings College and
Professor John Porter at Royal Free and University College Medical School have been awarded a
MRC co-operative group status. This award also includes two component grants. In addition to this I
have project grants from The Wellcome Trust, EU, BBSRC, Sir Jules Thorn Charitable Trust and
NKRF.(See Current Grants)
My research in the field of Iron metabolism can be grouped under the heading of Molecular basis of
Iron Homeostasis in Health and disease. Aim: To define the genetic and molecular contribution to the
control of body iron stores and cellular iron overload, and to provide the basis of the development of
novel strategies for the better management and prevention of iron deficiency and iron overload.
Objectives: To characterise genes that are responsible for cellular iron trafficking and for intestinal iron
absorption and regulation. To quantify genetic and environmental contribution to the control of body
iron stores To quantify the influence of various plasma factors (which are altered as a result of iron
deficiency, hypoxia, pregnancy and increased erythropoesis) on the control of intestinal iron
absorption.
Background:
Iron is the only nutrient of which there is a known widespread deficiency in the UK, with up to 25%
prevalence in certain sociodemographic groups ( including women of child bearing age, vegetarians
and adolescent females). Conversely, primary iron overload (genetic haemochromatosis) is the most
common genetic disorder in the UK. Treatment of iron deficiency in the population may increase the
penetrance (currently low) of genetic haemochromatosis with toxic consequences. On the other hand,
it is not known how genetic factors interact with the known variability of food-iron bio-availability to
determine body iron stores. Gene cloning projects have recently identifies an array of genes that
contribute to the mechanism and regulation of iron absorption.
The following projects will be running in parallel with the ultimate goal of achieving the above aims.
Project 1.
Molecular mechanism of iron transfer across the placenta: Immunohistochemical localisation and role
of DCT1 (DMT1), Ireg1, and hephaestin in iron transfer across placenta.
( The Wellcome Trust Project grant :Sep 2000- Dec 2003; £ 220,000).
The aim of this study is to determine the molecular mechanism of iron efflux across the placenta into
fetal circulation, in particular to determine the role of DCTI(DMT1), Iregl and hephaestin in this
process.
Iron transfer in placenta can be considered in three stages, uptake across the placental microvillous
border membrane, transfer across the placental cell and efflux into the fetal circulation. Iron uptake
across the microvillous brush border is through transferrin receptor and receptor mediated
endocytosis. Very little is known about transfer across the cell or the efflux of iron from the placenta
into the fetal circulation. Recently however, there have been several different discoveries (cloning of
divalent cation transporter DCTI (DMTI), cloning of Iregl and caeruloplasmin homologue, Hephaestin),
which may provide information necessary to elucidate the mechanism of iron efflux across the
placental basolateral membrane.
Hypothesis: DCTI (DMTI) and or lregl is localised to the basolateral membrane and involved in the
efflux of iron into fetal circulation. This is co-localised with hephaestin" which oxidises ferrous to ferric
iron prior to its binding to transferrin in the fetal circulation.
Sub-cellular localisation of DCT1 (DMTI), Iregl and Hephaestin will be investigated using antibodies
raised against synthetic peptides. Once localisation of these proteins has been determined, regulation
of these proteins by dietary iron and their involvement in the regulation of iron transfer across
placenta, particularly efflux across the basolateral membrane, will be determined.
Rats will be used to study the effect of decreasing dietary iron on the regulation of DCTI (DMTI), Iregl
and Hephaestim. BeWo cells in culture will be employed to study the effect of iron deficiency on
transfer of iron across the placenta and its regulation in relation to expression and localisation
The information obtained from these studies will be used to devise iron supplementation regime for
pregnant mothers in order to prevent iron deficiency in both the mother and the child.
Project 2:
The molecular regulation of iron absorption with reference to the defect in haemochromatosis Dr SKS
SRAI (Sir Jules Thorn Charitable Trust Project Grant: Nov 1999 - Oct 2001: £82,000) (MRC Grant
under consideration) Dr SKS Srai, Dr A Bomford, Dr R Simpson & Dr E Debnam) The mutation for
hereditary haemochromatosis (HH) is the commonest genetic abnormality in people of Northern
European descent. The recent cloning of HFE, the gene for HH, has not yet led to an increased
understanding of the molecular defect, which is expressed as increased iron absorption from a normal
diet, because
1. regulation of absorption of dietary iron under normal conditions is not understood and the role of
wild type HFE in the process is unclear.
2. evidence for mechanistic linkage between HFE and the genetic component of the mucosal iron
transport pathway are lacking.
Plan: Information on the function and the regulation of the mucosal iron transport pathway is urgently
needed. We propose a detailed genetic and functional analysis of the component of this pathway in
HH using information on novel genes derived from an initial study by our group, of inbred mouse
strains with well characterised defects in mucosal iron transport. This initial study has led to the
cloning of ferric reductase in the apical membrane of the villus enterocytes and in the basolateral
membrane, transporter which is responsible for iron efflux. Gene expression studies at the mRNA and
protein levels using duodenal tissue samples will be performed together with in situ techniques to
obtain information on localisation of transcript along the crypt-villus axis. These results should permit
further analysis of how individual genes in the iron transport pathway are regulated.
Project 3:
Molecular mechanisms involved in the dietary regulation of DMT1 expression in human intestinal
epithelial cells Dr SKS Srai (RF&UCMS) & Dr P Sharp (Surrey University){ BBSRC Joint Project Grant
Oct2000 - Sep2003: £189,444)
The nutritional significance of maintaining adequate dietary levels of the transition metals iron, zinc
and copper is clear due to their essential role in a plethora of biochemical events in the body. This is
confirmed by the large number of pathologies associated with imbalances in metal ion homeostasis.
There is good evidence, from studies on animals and cell lines, that dietary levels of individual metals
can influence the absorption and utilisation of others. Our study, using the Caco-2 TC7 cell model of
human enterocytes, will investigate the biochemical basis for these dietary interactions and will focus
on the putative metal ion transporter, DMT1. The data from this project will advance our knowledge of
diet-gene interaction in regulating mineral metabolism at the cell and molecular level, and is thus
relevant in understanding the underlying causes of metal ion deficiency and overload disorders.
The objectives of this work are to test the hypothesis that DMT1 acts as a divalent metal ion
transporter in human enterocytes and that its activity can be regulated by dietary levels of nutritionally
important trace metals. This hypothesis will be tested using the Caco-2 TC7 cell model of human small
intestinal enterocytes and the work will address the following issues:
1. Functional measurement of metal ion transport in Caco-2 TC7 cells in response to chronic
adaptations (10 days) in dietary levels of iron, zinc or copper.
2. Effect of these dietary changes on the expression of the putative metal ion transporter, DMT1, at the
protein (western blotting) and mRNA (RT-PCR) level.
3. Cellular distribution of the two splice variant of the DMT1 gene, following dietary metal ion
manipulation, using confocal microscopy.
4. The molecular events underlying changes in DMT1 homeostasis, focussing on the 5' promoter
region of the gene and in particular the role of the 5 metal response elements, using luciferase
reporter gene assays.
The role that DMT1 plays in the transport of iron, zinc and copper using site directed mutagenesis and
the Xenopus oocyte expression system.
Project 4: Evaluation of the safety and efficacy of iron supplementation in pregnant women.
European Commission Framework V Grant (Feb 2000 - Jan 2003) £1,101,077 with eight other
partners. Dr Srai (RF&UCMS) & Dr McArdle's (The Rowett Research Institute) share £194,917 for 1
RA1B and 1 PhD student.
Iron deficiency is common and can have harmful effects on the mother and her foetus. Anaemia is
therefore always treated with iron supplements. However levels given vary widely and there is a
Growing concern about the risks associated with iron overload. Since iron can generate free radicals,
and interact with other nutrients, assessment of supplementation in pregnant women is essential.
Volunteers will be given two levels of iron with in the range given clinically, or a placebo, and the
effects of parameters such as oxidative stress, cardiovascular well being, zinc and copper metabolism
will be measured. We will study treatment directly in patients with ileostoma, identifying the cause of
GI upset. We will measure the effects on babies at term, on placental function and on expression of
different genes in supplemented rats and cultured cells, to elucidate the molecular basis of the change
and a rational basis for supplementation.
Project 5: Haem metabolism and control of intestinal iron absorption. Dr SKS Srai (RF&UCMS) & R
Simpson (King's College, KGT) {MRC project grant Oct 2000 - Sep2003; £171,241)
Background: Iron homeostasis is maintained primarily by controlling intestinal iron absorption of
dietary iron. Alterations in body iron levels (deficiency/overload) are often associated with important
and clinical consequences. The mechanism and regulation of the absorptive process is however,
unclear. Modifications in haem biosynthesis in animals and human (experimental/accidental/ genetic)
have been reported to induce changes in iron metabolism and iron absorption. The relationship
between the two parameters is however, not well understood. We will investigate, at the cellular and
molecular level, how dynamic changes in levels of haem and intermediates of its biosynthesis
(particularly ALA) affect intestinal iron transport. These studies will help to further clarify the iron
absorption regulatory process and elucidate changes in iron metabolism in certain porphria and
haemoglobinopathies.
We will assay urinary ALA and phorphobilinogen output, urinary and biliary porphyrins, tissue haem
levels and enzymatic activities in mice with altered iron metabolism (hypoxic, dietary iron deficient,
hypotransferrinaemic, iron loaded). In addition, the effects of ALA administration on gene expression
and iron absorption will be ascertained in these mouse models. We will study the mechanism by which
ALA and other specific regents influence haem biosynthesis and iron transport in epithelial cells
MOLECULAR MECHANISM OF POLYPHENOL ABSORPTION, METABOLISM AND ANTIOXIDANT
EFFECT
Dr SKS SRAI, Dr E DEBNAM (RF& UCMS) AND Prof C RICE-EVANS (Guy's Hospital, KGT) (BBSRC
Project Grant Jan 2001 - Mar 2003; £118,000)
Professor Rice -Evans is the principal applicant
Gastrointestinal factors, influencing the metabolism and functional activities of dietary, plant
polyphenols.
The importance of dietary antioxidants in the maintenance of health and protection from damage
induced by oxidative stress, implicated in the risk of chronic diseases, is coming to the forefront of
dietary recommendation and the development of functional foods. Recent work is beginning to
highlight a role for flavonoid and polyphenolic component of the diet, known to be powerful hydrogen
donating antioxidants and scavengers of reactive oxygen and reactive nitrogen species in vitro.
The purpose of this project is to elucidate the functional forms of diet derived flavonoids in vivo by
investigating the gastro-intestinal factors influencing their metabolism and functional activities at
various levels, namely pre--absorption events in the gastric lumen and the modification and
metabolism they undergo in the small intestine. The antioxidant activities of the identified conjugates
and metabolites will also be assessed.
MOLECULAR MECHANISM OF GLUCOSE TRANSPORT ACROSS INTESTINAL AND RENAL
EPITHELIAL CELLS
DR SKS SRAI, DR ES DEBNAM AND PROF R UNWIN (The Wellcome Trust Project Grant, Sep2000
- Feb 2002; £72126)
Changes in renal and intestinal glucose transport in diabetes mellitus and control by glucagons:
Involvement of protein kinase A and protein kinase C signalling pathways.
The intracellular processes involved in the control of renal and intestinal glucose transport are poorly
understood. We have shown that both PKC and PKA- pathway are involved in the control of renal and
intestinal brush-border glucose transport and that they differentially regulate GLUT (facilitated glucose
transporters) and SGLT(sodium dependent glucose transporter) transporters, respectively. Aims of
this project are:
1. To determine the role of protein kinase A (PKA) and protein kinase C (PKC) in controlling the
expression and activity of the two classes of renal and intestinal transporters: GLUT (facilitated) and
SGLT (Na+ coupled).
2. To define the relationship of these signalling pathways to the changes in renal and intestinal
glucose transport that occur in insulin-opaenic diabetes mellitus, and in response to pancreatic
glucagons
3. To establish the contribution of glucagon or glucagon like peptide receptors along the renal tubule
and intestinal tract. The long term goal is to define the significance of altered renal tubular transport of
glucose in diabetes to its renal pathophysiology.
FUNCTIONAL AND MOLECULAR CHARACTERISATION OF RENAL PURINERGIC RECEPTORS
IN HEALTH AND DISEASE DR SKS SRAI, DR ES DEBNAM AND PROF ROBERT UNWIN (
Supported by Grants to Professor Robert Unwin from the NKRF, the Welcome Trust and the MRC )
Dr A.E. Michael Key Publications
Cellular & Molecular Endocrinology
Since joining the Department in 1991, Tony Michael and members of his team have been investigating
cellular & molecular aspects of endocrinology. Although current projects include ongoing research into
the control of renal function by adrenal steroids, the majority of research by this team is concerned
with cellular aspects of reproductive endocrinology. Consequently, most of the members of Tony
Michael's research team are also members of the interdisciplinary "Reproduction & Development
Group" at the Royal Veterinary College (RVC) (London).
At present, there are 2 main research themes being investigated by the team: ·
- Metabolism of cortisol by isoforms of the enzyme 11ß-hydroxysteroid dehydrogenase (11ßHSD) ·
- Regulation of the expression and function of prostaglandin receptors by steroid hormones.
In a range of tissues, 11ßHSD converts the anti-inflammatory adrenal steroid, cortisol (hydrocortisone)
to the inactive metabolite, cortisone. In the kidney, this enzyme activity is vital to deny cortisol access
to non-specific mineralocorticoid receptors. The team are currently investigating physiological
scenarios in which renal metabolism of cortisol is decreased so that cortisol can act in concert with
aldosterone to control sodium, potassium, and acid-base balance. In the placenta, 11ßHSD acts as an
enzymatic barrier, preventing cortisol from passing from the maternal circulation into the foetus.
Ongoing research has demonstrated that maternal nutrient restriction decreases placental inactivation
of cortisol, and that this decrease in 11ßHSD activity is associated with intra-uterine growth restriction
("small-for-date" babies): a phenomenon strongly implicated in increased risk of adult diseases (e.g.
diabetes and cardiovascular disease).
The major focus of research into 11ßHSD is in the context of the ovary. Studies performed in the
1990's indicated a link between high rates of ovarian cortisol oxidation and failure of women to
become pregnant through in vitro fertilisation (IVF). Research over the past 5 years has sought to
explain this association, examining those endocrine, paracrine and intra-cellular factors that influence
the expression and activities of specific 11ßHSD isoforms in the ovary. These are currently being
investigated in a wide range of species.
As regards the actions of prostaglandins in the human ovary, these appear to be mediated via specific
hepta-helical, G-protein-coupled receptors. In non-ovarian cells, prostaglandin E2 (PGE2) acts via at
least 4 different isoforms of the EP receptor, whereas PGF2a acts via the FP receptor. Recently, the
team has established that human ovarian cells, recovered from IVF patients, express functional EP1,
EP2, EP4 and FP receptors. Current studies are investigating whether progesterone and oestradiol
can affect either the expression of these receptors or their ability to couple to the cyclic AMP, inositol
polyphosphate and calcium signal transduction pathways.
Team Members (Last Updated 01 October 2001)
Ms Christina (C) Chandras*
Dr Tracey (TE) Harris
Ms Kim (KC) Jonas*
Dr Tony (AE) Michael
Mr Dean (DP) Norgate
Dr Lisa (LM) Thurston
(*Graduate Student)
Current Collaborators
Dr D Robert E Abayasekara (RVC, London, UK)
Dr John Carroll (UCL, UK)
Professor John RG Challis (University of Toronto, Canada)
Dr Robert C Fowkes (St.Bart's, London, UK)
Dr Linda Gregory (University Hospital of Wales, Cardiff, UK)
Dr HJ (Lenus) Kloosterboer (Organon, Oss, Netherlands) Professor Andres Lopez-Bernal (University
of Bristol, UK)
Dr S Kaila S Srai (UCL, UK)
Professor Paul M Stewart (University of Birmingham, UK) Professor Robert J Unwin (UCL, UK)
Professor D Claire Wathes (RVC, London, UK)
Professor Robert J Webb (University of Nottingham, UK)
Dr Peter J Wood (University of Southampton, UK)
Professsor Kaiping Yang (University of Western Ontario, Canada)
Přístupy k datům:
Bioinformatics Tools - www.Stratagene.com
Analyze pathways, gene expression data, protein and DNA sequences.
Software.
Bio IT & Informatics - bioteam.net
Honest, objective & vendor neutral clusters & pipelines our specialty
Partners in Life Science Informatics
Email:[email protected]
Tel: (978) 304-1222
BioTeam is a consulting collective dedicated to delivering vendor-neutral informatics solutions to the
life science industry. BioTeam principals Athanas, Dagdigian, Gloss, and Van Etten have been
jointly serving the biotech and pharmaceutical communities as a team for several years.
Individually they possess a broad spectrum of skills and experience from scientific analysis to high
performance technical computing infrastructures. Together they complement each other to provide
complete beginning-to-end life science informatics and Bio-IT solutions.
Bioinformatics News - www.BioInform.com
Get the exclusive insider's report on bioinformatics with BioInform
Novinky v oboru bioinformatika.
Bioinformatics Toolbox - www.mathworks.com
Analyze genomic, proteomic, & microarray data in MATLAB®.
Read, analyze, and visualize genomic, proteomic, and microarray data
The Bioinformatics Toolbox offers computational molecular biologists and other research scientists
an open and extensible environment in which to explore ideas, prototype new algorithms, and build
applications in drug research, genetic engineering, and other genomics and proteomics projects.
The toolbox provides access to genomic and proteomic data formats, analysis techniques, and
specialized visualizations for genomic and proteomic sequence and microarray analysis. Most functions
are implemented in the open MATLAB language, enabling you to customize the algorithms or
develop your own.
Events

Seminars

Upcoming Webinars

Recorded Webinars

Tradeshows
Firmy zaměřené na software a software samotný:
Companies in the Bioinformatics Software and Software-Related Services Sector
Technology Area
DNA/Protein
Sequence Analysis
Microarray/Gene
Company
Name
Allometra
Headquarters Company website
Davis, Calif.
www.allometra.com/
Edinburgh,
www.anedabio.com/
Scotland, UK
Apocom
Knoxville,
www.apocom.com
Tenn.
Bioinformatics Waterloo,
www.bioinformaticssolutions.com/
Solutions
Ontario,
Canada
BioMax
Martinsried, www.biomax.de
Informatics
Germany
BioTools
Edmonton,
www.biotools.com
Alberta,
Canada
DNAStar
Madison,
www.dnastar.com
Wisc.
DNAtools
Fort Collins, www.dnatools.com
Colo.
Gene Codes
Ann Arbor,
www.genecodes.com
Mich.
Gene-IT
Paris, France www.gene-it.com
GeneStudio
Suwannee,
www.genestudio.com
Ga.
Genomatix
Munich,
www.genomatix.de
Software
Germany
Genometrician Saint-Sulpice, www.genometrician.com
Switzerland
Genomix
Oak Ridge,
www.genomix.com
Tenn.
Geospiza
Seattle,
www.geospiza.com
Wash.
MiraiBio
Alameda,
www.miraibio.com
Calif.
Ocimum
Hyderabad,
www.ocimumbio.com
Biosolutions
India
Paracel
Pasadena,
www.paracel.com
Calif.
Redasoft
Bradford,
www.redasoft.com
Ontario,
Canada
Rescentris
Columbus,
www.rescentris.com
Ohio
Scinova
Mumbai, India www.scinovaindia.com
Informatics
Softberry
Mount Kisco, www.softberry.com
NY
TimeLogic
Carlsbad,
www.timelogic.com
Calif.
Textco
West
www.textco.com
BioSoftware
Lebanon, NH
Aber Genomic Aberystwyth, www.abergc.com/
Expression Analysis Computing
Alma
Bioinformatics
Amersham
Biosciences
Niagara
(formerly
Imaging
Reseearch)
BioDiscovery
BioMind
Chang
Bioscience
Corimbia
Genedata
Insightful (life
science
business)
Iobion
Informatics
(affiliate of
Stratagene)
Koada
Technology
MicroDiscovery
Proteomics (Mass
Spec, 2D Gel
Analysis)
Wales, UK
Madrid, Spain www.almabioinfo.com
St.
Catherines,
Ontario,
Canada
www.imagingresearch.com/
El Segundo, www.biodiscovery.com
Calif.
Bethesda, Md. www.biomind.com
Castro Valley, www.changbioscience.com
Calif.
Berkeley,
www.corimbia.com/
Calif.
Basel,
www.genedata.com/
Switzerland
Seattle,
www.insightful.com/industry/pharm/default.asp
Wash.
La Jolla, Calif. www.iobion.com
Glasgow,
www.koada.com
Scotland, UK
Berlin,
www.microdiscovery.de
Germany
MolecularWare Cambridge,
www.molecularware.com/
(subsidiary of Mass.
Calbatech)
MolMine
Bergen,
www.molmine.com
Norway
OmniViz
Maynard,
www.omniviz.com
Mass.
Partek
St. Charles, www.partek.com
Mo.
Predictive
Kingston,
www.predictivepatterns.com
Patterns
Ontario,
Canada
Rosetta
Seattle,
www.rosettabio.com/
Biosoftware
Wash.
SAS
Cary, NC
www.sas.com/industry/pharma/
Silicon Genetics Redwood City, www.silicongenetics.com/cgi/SiG.cgi/index.smf
Calif.
Spotfire
Somerville,
www.spotfire.com
Mass.
SPSS (life
Chicago, Ill. www.spss.com/applications/science/
science group)
Strand
Bangalore,
www.strandgenomics.com
Genomics
India
TG Services
El Sobrante, www.genepilot.com/index.html
Calif.
ViaLogy
Altadena,
www.vialogy.com
Calif.
VizX Labs
Seattle,
www.vizxlabs.com/index.html
Wash.
Decodon
Greifswald,
www.decodon.com
Germany
Geneva
Bioinformatics
Genomic
Solutions
(subsidiary of
Harvard
Biosciences)
Imaxia
Geneva,
Switzerland
Ann Arbor,
Mich.
Cupertino,
www.genebio.com/
http://65.219.84.5/index.html
www.imaxia.com
Structural
Proteomics
Calif.
Matrix Science London, UK
www.matrixscience.com/
Nonlinear
Newcastle
www.nonlinear.com
Dynamics
upon Tyne, UK
Eidogen
Pasadena,
www.eidogen.com/
Calif.
Epitope
Edmundbyers www.epitope-informatics.com
Informatics
(near
Durham), UK
Molsoft
San Diego,
www.molsoft.com
Calif.
RedStorm
Houston,
Scientific
Texas
www.redstormscientific.com
Proceryon
Biosciences
Salzburg,
Austria
www.proceryon.com
Protein
Mechanics
Mountain
View, Calif.
www.proteinmechanics.com
Pathway Analysis
Ariadne
Genomics
GeneGo
Rockville, Md. www.ariadnegenomics.com/index.html
St. Joseph,
Mich.
www.genego.com
Hippron
Physiomics
Ottawa,
Canada
www.hippron.com
Ingenuity
Systems
Mountain
View, Calif.
www.ingenuity.com/index.html
Jubilant Biosys
Silico Insights
Genetic Variation
Analysis
Ananomouse
Biodata
Bangalore,
India
Woburn,
Mass.
San
Francisco,
Calif.
www.jubilantbiosys.com/
silicoinsights.com/index.html
www.ananomouse.com/home.html
Tartu, Estonia
www.biodata.ee
Forensic
Bioinformatic
Fairborn, Ohio
Services
Golden Helix
SoftGenetics
Visualize (life
science group)
www.bioforensics.com
Bozeman,
www.goldenhelix.com/index.jsp
Montana
State College, www.softgenetics.com/
Penn.
Phoenix, Az.
www.visualizeinc.com/index.html
Cellular Simulation
BioAnalytics
Group
Cellicon
Biotechnologies
Entelos
Gene Network
Sciences
Genomatica
Teranode
Ontologies
Text Mining
BioWisdom
Electric
Genetics
Axontologic
Definiens
Hightstown, www.bioanalyticsgroup.com/default.htm
NJ
Boston, Mass. www.puretechventures.com/portfolio/cellicon/?section=ventures
Foster City,
Calif.
Ithaca, NY
www.entelos.com/
San Diego,
Calif.
Seattle,
Wash.
Cambridge,
UK
Cape Town,
South Africa
Orlando, Fl.
Munich,
Germany
www.genomatica.com/index1.html
www.gnsbiotech.com//index.php
www.teranode.com/
www.biowisdom.com/
www.egenetics.com/index.html
www.axontologic.com/
www.definiens.com
eTexx
InPharmix
Molecular
Connections
PubGene
Reel Two
SemanTx Life
Sciences
(subsidiary of
Jarg)
Workflow/Pipelining Incogen
Inforsense
KooPrime
SciTegic
Irving, Texas
Greenwood,
Ind.
Bangalore,
India
Oslo, Norway
San
Francisco,
Calif.
Waltham,
Mass.
www.etexxbio.com/
www.inpharmix.com/
www.molecularconnections.com/website/
www.pubgene.com/
www.reeltwo.com/
www.semantxls.com/
Williamsburg, www.incogen.com/
Va.
London, UK
www.inforsense.com/
Singapore
www.kooprime.com/webpage.htm
San Diego,
Calif.
www.scitegic.com/main.html
TurboWorx
Integration
Burlington,
Mass.
GeneticXchange Menlo Park,
Calif.
Genomining
Montrouge,
France
IO Informatics Emeryville,
Calif.
www.turbogenomics.com/
www.geneticxchange.com/v3/index.php
www.genomining.com/home.en.html
www.io-informatics.com/
Multiple Products
Accelrys
Applied Maths
Compugen
San Diego,
www.accelrys.com/
Calif.
Sint-Martens- www.applied-maths.com/
Latem,
Belgium
Tel Aviv,
www.cgen.com/
Israel
Frederick, Md. www.informaxinc.com/content.cfm?pageid=1
InforMax
(Invitrogen
subsidiary)
Lion Bioscience Heidelberg,
Germany
www.lionbioscience.com/
Databáze universitních knihoven universit Oxford, Cambridge a Institutu neurologie
university Londýn.
1)
Personal name
Bergeron, Bryan P.
Title
Bioinformatics computing, Bryan Bergeron
Subject
Bioinformatics
Class number
570.285, BER
Publication
Upper Saddle River, N.J., London, Prentice Hall PTR, 2003
Format:
Books
Title
Bioinformatics, sequence, structure and databanks - a
practical approach, Ed. Des Higgins and W.Taylor
Subject
Bioinformatics
Class number
570.285, BIO
Publication
Oxford, Oxf.U.P., 2000
Series
Practical Approach
Format:
Books
Personal name
Baldi, Pierre
Title
Bioinformatics, the machine learning approach, Pierre Baldi,
Sřren Brunak
Subject
Bioinformatics
Class number
570.285, BAL
Publication
Cambridge, Mass., London, MIT Press, c1998
Series
Adaptive computation and machine learning
General note
"A Bradford book."
Format:
Books
2)
3)
4)
Personal name
Baldi, Pierre
Title
Bioinformatics, the machine learning approach, Pierre Baldi,
Sřren Brunak
Subject
Bioinformatics
Class number
570.285, BAL
Edition
2nd ed
Personal name
Attwood, T. K.
Title
Introduction to bioinformatics
Subject
Bioinformatics
Class number
570.285, ATT
Publication
Harlow, Longman, 1999
Format:
Books
Publication
Cambridge, Mass., MIT Press, 2001
Series
Adaptive computation and machine learning
Edition history note
Previous ed.: 1998
General note
"A Bradford book"
Format:
Books
Personal name
Krane, Dan E.
Title
Fundamental concepts of bioinformatics, Dan E. Krane and
Michael L. Raymer
Subject
Bioinformatics
Class number
570.285, KRA
Publication
San Francisco, London, Benjamin Cummings, 2003
Format:
Books
Title
Bioinformatics, managing scientific data, edited by Zoe
Lacroix, Terence Critchlow
Subject
Bioinformatics
Class number
570.285, BIO
5)
6)
7)
8)
Publication
San Francisco, Calif., Morgan Kaufmann, Oxford, Elsevier
Science, 2003
Format:
Books
9)
Title
Essentials of genomics and bioinformatics, edited by C.W.
Sensen
Subject
Bioinformatics
Class number
570.285, ESS
Publication
Weinheim, Wiley-VCH, c2002
Edition history note
Concise ed. of Biotechnology, vol 5b. 2001
Added title
Biotechnology
Format:
Books
10)
Title
Bioinformatics, databases and systems, edited by Stanley
Letovsky
Subject
Bioinformatics
Class number
570.285, BIO
Publication
Boston, London, Kluwer Academic, c1999
Format:
Books
11)
Main Entry
Lesk, A,M.
Title
Introduction to bioinformatics.
Imprint
Oxford University Press. 2005.
12)
Dewey class mark
Title
Imprint
Descr.
Series
574.1925
Chromatin and Chromatin remodelling enzymes. Part A / edited by C. David Allis, Carl Wu.
San Diego, CA ; London : Elsevier Academic Press, 2004.
xxxviii, 540p. : ill. ; 24cm.
Methods in enzymology ; 375
Contents: Histone bioinformatics ; Biochemistry of histones, nucleosomes, and chromatin ;
Molecular cytology of chromatin functions.
Enzymes
Subject - Lib.Cong.
Chromatin
Enzymology
Allis, C. David
Add.Entry
Wu, Carl
Contents
holdings (1)
ISBN
All items
0121827798 (cased) : 99.95
13)
Dewey class mark
Title
574.0285
Bioinformatics and genomes : current perspectives / edited by Miguel A. Andrade
Wymondham : Horizon Scientific, 2003
Imprint
xii, 227p ; 24cm
Descr.
Genomes -- Data processing
Subject - Lib.Cong.
Bioinformatics
Add.Entry
holdings (1)
ISBN
Andrade, Miguel A.
All items
1898486476 (cased) : 80.00
14)
Dewey class mark
Title
574.0285
Bioinformatics and genomes : current perspectives / edited by Miguel A. Andrade
Wymondham : Horizon Scientific, 2003
Imprint
xii,
227p ; 24cm
Descr.
Genomes -- Data processing
Subject - Lib.Cong.
Bioinformatics
Add.Entry
holdings (1)
ISBN
Andrade, Miguel A.
All items
1898486476 (cased) : 80.00
15)
Main Entry
Title
Imprint
Subject - Lib.Cong.
Institute of Electrical and Electronics Engineers .
Transactions on nanobioscience
Institute of Electrical and Electronics Engineers
Ultrastructure (Biology)
Nanoscience
Bioinformatics
Add.Title
holdings (1)
holdings (2)
holdings (2)
ISSN
16)
IEEE transactions on nanobioscience
All items
Year 2004
Year 2003
1536-1241
Dewey class mark
574.870285
Bioinformatics : genes, proteins and computers / edited by Christine Orengo, David Jones,
Janet Thornton
Oxford : BIOS, 2003
Imprint
xiv, 298p ; 25cm
Descr.
Molecular biology -- Computer simulation
Subject - Lib.Cong.
Proteins -- Analysis -- Data processing
Genetics
Jones, David
Add.Entry
Orengo, Christine
Thornton, Janet
Title
holdings (1)
ISBN
All items
1859960545 (pbk) : 29.99
17)
Dewey class mark
574.870285
Bioinformatics : genes, proteins and computers / edited by Christine Orengo, David Jones,
Janet Thornton
Oxford : BIOS, 2003
Imprint
xiv, 298p ; 25cm
Descr.
Molecular biology -- Computer simulation
Subject - Lib.Cong.
Proteins -- Analysis -- Data processing
Genetics
Jones, David
Add.Entry
Orengo, Christine
Thornton, Janet
Title
holdings (1)
ISBN
All items
1859960545 (pbk) : 29.99
18)
Dewey class mark
Main Entry
Title
Imprint
Descr.
Subject - Lib.Cong.
holdings (1)
ISBN
574.0285
Lesk, Arthur M.
Introduction to bioinformatics
Oxford : Oxford University Press, 2002
xvi, 283p : ill ; 25cm
Bioinformatics
All items
0199251967 (pbk) : 19.99
19)
Dewey class mark
574.0285
Main Entry
Westhead, David R.
Title
Bioinformatics / David R. Westhead, J. Howard Parish and Richard M. Twyman
Imprint
Descr.
Series
Bibliogr.
Oxford : BIOS, 2002
viii, 257p : ill ; 25cm
Instant notes
Includes bibliographical references and index
Subject - Lib.Cong.
Bioinformatics -- Examinations -- Study guides
Add.Entry
Parish, J. H. (John Howard)
Twyman, Richard M.
Add.Title
Instant notes in bioinformatics
holdings (1)
ISBN
All items
1859962726 (pbk) : 16.99
20)
Dewey class mark
Main Entry
574.870285
Baxevanis, Andreas D.
Bioinformatics : a practical guide to the analysis of genes and proteins / Andreas D.
Baxevanis [and] B.F. Francis Ouellette
2nd ed.
Edition
New York ; Chichester : Wiley, 2001
Imprint
xviii, 470p
Descr.
Methods of biochemical analysis ; 43
Series
Previous ed.: 1998
Gen. note
Molecular biology -- Computer simulation
Subject - Lib.Cong.
Molecular biology -- Mathematical models
Proteins -- Analysis -- Data processing
Genetics
Ouellette, B. F. Francis
Add.Entry
Title
holdings (1)
ISBN
All items
0471383902 (cased)
0471383910 (pbk.) : 51.95
21)
Dewey class mark
574.8702854
Bioinformatics : sequence, structure, and databanks : a practical approach / edited by D.
Higgins and W. Taylor
Oxford : Oxford University Press, 2000
Imprint
xx, 249 p. 1 v. (pbk) ; 24cm
Descr.
The practical approach series
Series
Biology -- Data processing
Subject - Lib.Cong.
Biomolecules -- Data processing
Higgins, Des
Add.Entry
Taylor, Willie
Title
All items
0199637903 (pbk.) : 29.95
0199637911 (cased)
holdings (1)
ISBN
22)
Dewey class mark
Main Entry
574.0285
Attwood, Teresa K.
Introduction to bioinformatics / Teresa K. Attwood and David J. Parry-Smith
Title
Harlow : Pearson Education, 1999
Imprint
xxv,218p : ill (pbk) ; 24cm
Descr.
Cell and molecular biology in action series
Series
Includes index
Bibliogr.
Molecular biology -- Data processing
Subject - Lib.Cong.
Parry-Smith, David J.
Add.Entry
holdings (1)
ISBN
All items
0582327881 (pbk) : 23.99
23)
Bioinformatics : the analysis of protein sequence homology and protein struc
Mistry, T Other titles by Author(s)
Publication Date: 2002
Control Number: M0018105KP
Copies: 1 copy - Show Copy
24)
Bioinformatics
Lacey, N Other titles by Author(s)
Publication Date: 2003
Control Number: M0022613KP
Copies: 1 copy - Show Copy
25)
Bioinformatics : a practical guide to the analysis of genes and proteins / [. - 2nd ed
Publication Date: 2001
Control Number: 0471383902
26)
Title Briefings in bioinformatics
Publisher London : Henry Stewart
Control Number 1467-5463
Shelved at journals
27)
Bioinformatics : sequence, structure, and databanks : a practical approach /
Publication Date: 2000
Control Number: 0199637911
28)
Bioinformatics : the machine learning approach / Pierre Baldi, Søren Brunak. - 2nd ed
Baldi, Pierre Other titles by Author(s)
Publication Date: 2001
Control Number: 026202506x
29)
Developing bioinformatics computer skills / Cynthia Gibas, James Fenton and
Gibas, Cynthia Other titles by Author(s)
Publication Date: 2000
Control Number: 1565926641
30)
Genetically yours : bioinforming, biopharming, biofarming / Hwa A. Lim
Lim, Hwa A. Other titles by Author(s)
Publication Date: 2002
Control Number: 9810249381
31)
Molecular biology of the gene / James Watson ... [et al.]. - 5th ed
Publication Date: 2004
Control Number: 0321223683
32)
Molecular biology of the gene / James Watson ... [et al.]. - 5th ed
CD-ROM
Publication Date: 2004
Control Number: M0025595KP
33)
1. Proteomics Techniques •••
Royal Free Campus Departments Home About us Facilities Microscopy Techniques Administration Costs...
...Bioinformatics Sample Preparation Optimal sample preparation is essential for good 2D results. The ideal process will cause
complete solubilization,...
...Bioinformatics Data from the mass spectrometer consists of a series of molecular ion m/z (mass/charge) values that are translated
into to a...
60% Tue, 07 Sep 2004 10:08:56 GMT http://www.rfc.ucl.ac.uk/departments/Biomedical-...
Introduction
This facility has been set up to help researchers establish and run proteomic projects and identify
biological molecules of interest – particularly proteins.
The facility is currently equipped with Amersham Biosciences 2D gel electrophoresis equipment, a
Waters/Micromass capillary HPLC system coupled to a qToF-µ electrospray mass spectrometer.
2D gel electrophoresis is a suitable technique for asking, “Where do
differences arise amongst the proteins in two similar samples?” For
example, closely matched samples from diseased and healthy cells can
be compared. Differences in protein abundance or covalent modification
(e.g. phosphorylation, glycosylation and acylation) can provide
important clues to the pathogenesis, progress and treatment of a
disease.
Once a protein has been isolated and digested, the mass spectrometer is a suitable tool for asking,
“What is this protein?”, “Which residues are modified?”, and “What is the modification?”
As there are many practical considerations in setting up these types of experiments it is strongly
advised that you contact the facility staff about the design of the study prior to the preparation of
samples. The following guidelines provide a brief introduction for those interested in proteomics:
Sample Preparation
Optimal sample preparation is essential for good 2D results. The ideal process will cause complete
solubilization, disaggregation and denaturation of the proteins in the sample. However, as samples
vary in their constituent properties, optimal procedures must be determined empirically for each type of
sample.
The development of the sample preparation must take into account the object of the investigation:
increasing the range and number of detectable proteins can sometimes only be obtained at the
expense of clarity and reproducibility. Close collaboration between investigators and facility staff
should produce a sample preparation protocol tailored to the needs of the investigation.
Protecting Samples against Proteolysis
Disrupting tissue or cultured cells can liberate or activate endogenous proteases. As proteolytic
degradation of proteins greatly complicates 2D electrophoresis, measures should be included to avoid
this problem. Proteases can be inactivated by immediate snap-freezing or denaturing samples for
example with 10% TCA, 8M urea or 2% SDS. However, the addition of a cocktail of protease inhibitors
has often proved adequate. Facility staff will be pleased to advise collaborators on suitable strategies.
Gel Electrophoresis
2D gel electrophoresis is the main technique used to separate proteins in the facility. It is suitable for
analyzing the proteins present in complicated biological samples, but can be supplemented with a
variety of biochemical techniques in order to concentrate proteins of interest. 2D electrophoresis is
often suitable for separating highly complex protein mixtures, however some proteins will be masked
by other more abundant species and many proteins will remain beyond the limit of detection. The pH
range for the 1st dimension needs to be optimized, and this may consume considerable amounts of
time and sample.
Image Analysis
Computer-aided comparison of 2D gels facilitates the identification of changes in protein mobility and
abundance. The facility has two workstations equipped with PDQuest, a standard software package in
proteomics. Gels can be stained with silver or Deep Purple using protocols that allow subsequent
analysis by mass spectrometry. It is hoped to develop comparative fluorescence-based techniques at
a later stage. Note that there is inherently some gel to gel variation in staining and mobililty;
consequently changes are significant if they are reproducible and generally greater than 2 to 3-fold in
intensity.
Excision and Digestion
Spots from 2D gels that reproducibly differ between experimental and control samples are manually
removed for digestion, usually with trypsin. As such it is not practical to excise large numbers of
proteins from multiple gels. It is vital up to this stage that samples are not contaminated with
environmental proteins (typically keratins). Proteolytic fragments can then be desalted and analysed
directly on the mass spectrometer or separated using the CapLC HPLC system and automatically
loaded onto the mass spectrometer.
Mass Spectrometry
The Micromass qToF-µ electrospray mass spectrometer is configured for high-sensitivity analysis of
small numbers of samples. It is not a high-throughput instrumen and requires expertise to operate. It
is not suitable for screening vast numbers of samples. Sensitivity is highly compromised by salts
and detergents in the sample. In addition to peptide analysis, the qToF-µ is suitable for studying many
intact proteins (contact the facility staff for details). Facility staff will advise you on the best buffer in
which to prepare samples.
Bioinformatics
Data from the mass spectrometer consists of a series of molecular ion m/z (mass/charge) values that
are translated into to a corresponding series of molecular weights or ”peptide fingerprint". Each
fingerprint is then compared with the predicted fingerprints of all proteins in a comprehensive
sequence database to identify the parent protein. In additionmolecular ions can be fragmented by
collision in order to derive sest of daughter ions, from which information on amino acid sequence and
sites of covalent modification can be obtained. Note that in general covalent modifications need to be
stable and of high stoichiometry to be identified. The facility is equipped with a workstation running
MassLynx 4.0 software. Facility staff will assist researchers with the identification of target proteins.
Useful references and web pages
British Mass Spectrometry Society
A basic mass spectrometry tutorial
Bio-Rad Proteomics Guide
International Mass Spectrometry News
American Society for Mass Spectrometry
Informace o projektu „Blue gene“ společnosti IBM-výzkum:
http://www.research.ibm.com/bluegene/
Partial 'Blue Gene' Systems Are Now Two of the Top Ten Most
Powerful Supercomputers on Earth
June 21, 2004--For the first time, two IBM Blue Gene/L prototype systems appear on the Top 10 list of
supercomputers. The Blue Gene/L prototype represents a radical new design for supercomputing. At
1/20th the physical size of existing machines of comparable power, Blue Gene/L enables dramatic
reductions in power consumption, cost and space requirements for businesses requiring immense
computing power. For a new architecture to produce so much compute power in such a small package
is a stunning achievement, and provides a glimpse of the future of supercomputing.
The number four-ranked Blue Gene/L DD1 Prototype, with a sustained speed of 11.68 teraflops and a
peak speed of 16 teraflops, uses more than 8,000 PowerPC processors packed into just four
refrigerator-sized racks. This ground breaking system is only 1/16 of its planned final capacity and has
skyrocketed to the 4th place from the 73rd spot on the list in November 2003. The eighth-ranked Blue
Gene/L DD2 Prototype has a sustained speed of 8.66 teraflops and a peak speed of 11.47 teraflops.
The DD2 system is based on the second generation of the Blue Gene/L chips, which are more
powerful than those used in the DD1 prototype.
About IBM's Blue Gene Supercomputing Project
Blue Gene is an IBM supercomputing project dedicated to building a new family of supercomputers
optimized for bandwidth, scalability and the ability to handle large amounts of data while consuming a
fraction of the power and floor space required by today's fastest systems. The full Blue Gene/L
machine is being built for the Lawrence Livermore National Laboratory in California, and will have a
peak speed of 360 teraflops. When completed in 2005, IBM expects Blue Gene/L to lead the Top500
supercomputer list. A second Blue Gene/L machine is planned for ASTRON, a leading astronomy
organization in the Netherlands. IBM and its partners are currently exploring a growing list of
applications including hydrodynamics, quantum chemistry, molecular dynamics, climate modeling and
financial modeling. Read more...
Presentations, Preprints, and Publications
Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory; The Journal of
Physical Chemistry B; 108(21); 6571-6581
Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 2. Example Applications to
Alanine Dipeptide and a beta-Hairpin Peptide; The Journal of Physical Chemistry B; 108(21); 65826594
Agenda and Presentations for Blue Gene Briefing Day--February 6, 2004
Molecular Dynamics Investigation of the Structural Properties of Phosphatidylethanolamine Lipid
Bilayers
Design and Analysis of the BlueGene/L Torus Interconnection Network
A Volumetric FFT for Blue Gene/L, to appear in the Proceedings of HiPC2003
Blue Matter, An Application Framework for Molecular Simulation on Blue Gene, Journal of Parallel and
Distributed Computing Volume 63, Issues 7-8 July-August 2003 , Pages 759-773
Understanding folding and design: Replica-exchange simulations of "Trp-cage" miniproteins, Proc.
Natl. Acad. Sci. USA, Vol. 100, Issue 13, June 24, 2003, pp. 7587-7592
An overview of the BlueGene/L supercomputer, Supercomputing 2002 Technical Papers, November
2002
Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in
water?, Proc. Natl. Acad. Sci. USA, Vol. 99, Issue 20, October 1, 2002, pp. 12777-12782
The free energy landscape for beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. USA, Vol.
98, Issue 26, December 18, 2001, pp. 14931-14936
Blue Gene Project Update
Efficient multiple time step method for use with Ewald and particle mesh Ewald for large biomolecular
systems, The Journal of Chemical Physics, Volume 115, Issue 5, 2001, pp. 2348-2358
Blue Gene: A vision for protein science using a petaflop supercomputer, IBM Systems Journal,
Volume 40, Number 2, 2001, p. 310
Industry Links
Unraveling the Mystery of Protein Folding
Physicists Take on Challenge Of Showing How Proteins Fold, The Scientist
The Bridge from Genes to Proteins
Informace aplikaci AlphaServer společnosti HP v projektu GeneProt:
http://www.hp.com/techservers/life_sciences/success_geneprot.pdf
Informace o projektu skupiny „Computational biology“
(www.sun.com/edu/hpc/compbiosig):
http://www.sun.com/products-nsolutions/edu/events/archive/hpc/presentations/june01/stefan_unger.pdf
Databáze Dialog:
Konference: (Vybráno bylo 61 záznamů z odborných konferencí v posledním období r. 2004)
1)
A sequence-focused parallelisation of EMBOSS on a cluster of workstations
Podesta, K.; Crane, M.; Ruskin, H.J.
Sch. of Comput., Dublin City Univ., Ireland
Conference: Computational Science and it's Applications - ICCSA 2004. International
Conference. Proceedings (Lecture Notes in Comput. Sci. Vol.3045)
Part: Vol.3 , Page: 473-80 Vol.3
Editor: Lagana, A.; Gavrilova,M.L.; Kumar,V.; Mun,Y.; Tan,C.J.K.; Gervasi,O.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , 4588 Pages
Conference: Computational Science and it's Applications - ICCSA 2004. International
Conference. Proceedings , Sponsor: Univ. of Perugia, Italy, Univ. of Calgary, Canada, Univ.
of Minnesota, USA, Queen's Univ. of Belfast, UK, Heuchera Technol., UK, GRID.IT:
Enabling Platforms for High-Performance Computational Grids Oriented to Scalable Virtual
Organizations of the Minitsty of Sci. and Educ. of Italy, COST - European Cooperation in the
Field of Sci. and Tech. Res , 14-17 May 2004 , Assisi, Italy
Language: English
Abstract: A number of individual bioinformatics applications (particularly BLAST and
other sequence searching methods) have recently been implemented over clusters of
workstations to take advantage of extra processing power. Performance improvements
are achieved for increasingly large sets of input data (sequences and databases), using
these implementations. We present an analysis of programs in the EMBOSS suite based
on increasing sequence size, and implement these programs in parallel over a cluster of
workstations using sequence segmentation with overlap. We observe general increases in
runtime for all programs, and examine the speedup for the most intensive ones to
establish an optimum segmentation size for those programs across the cluster.
2)
Genome database integration
Robinson, A.; Rahayu, W.
Dept. Comput. Sci. & Comput. Eng., LaTrobe Univ., Bundoora, Vic., Australia
Conference: Computational Science and it's Applications - ICCSA 2004. International
Conference. Proceedings (Lecture Notes in Comput. Sci. Vol.3045)
Part: Vol.3 , Page: 443-53 Vol.3
Editor: Lagana, A.; Gavrilova,M.L.; Kumar,V.; Mun,Y.; Tan,C.J.K.; Gervasi,O.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , 4588 Pages
Conference: Computational Science and it's Applications - ICCSA 2004. International
Conference. Proceedings , Sponsor: Univ. of Perugia, Italy, Univ. of Calgary, Canada, Univ.
of Minnesota, USA, Queen's Univ. of Belfast, UK, Heuchera Technol., UK, GRID.IT:
Enabling Platforms for High-Performance Computational Grids Oriented to Scalable Virtual
Organizations of the Minitsty of Sci. and Educ. of Italy, COST - European Cooperation in the
Field of Sci. and Tech. Res , 14-17 May 2004 , Assisi, Italy
Language: English
Abstract: This paper presents a solution to many of the problems in genome database
integration including an integrated interface for accessing all genome databases
simultaneously and the problem of a common interchange data format. The solution is
the addition of a middle or mediation layer of a three layer approach. The solution
provides a simple step by step approach to connect other existing genome databases
quickly and efficiently. The internal data format used is a commonly used bioinformatics
format called BSML, a subset of the XML standard. The architecture also allows easy
addition and deletion of functionality. Finally, an implementation of this solution is
presented with the required support functionality to validate the proposed integration
method.
3)
Cell modeling using agent-based formalisms
Webb, K.; White, T.
Sch. of Comput. Sci., Carleton Univ., Ont., Canada
Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on
Industrial and Engineering Applications of Artificial Intelligence and Expert Systems,
IEA/AIE 2004. Proceedings (Lecture Notes in Artificial Intelligence Vol.3029) , Page: 128-37
Editor: Orchard, B.; Yang, C.; Ali, M.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , xxi+1272 Pages
Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on
Industrial and Engineering Applications of Artificial Intelligence and Expert Systems,
IEA/AIE 2004. Proceedings , 17-20 May 2004 , Ottawa, Ont., Canada
Language: English
Abstract: The systems biology community is building increasingly complex models and
simulations of cells and other biological entities. In doing so the community is beginning
to look at alternatives to traditional representations such as those provided by ordinary
differential equations (ODE). Making use of the object-oriented (OO) paradigm, the
unified modeling language (UML) and real-time object-oriented modeling (ROOM) visual
formalisms, we describe a simple model that includes membranes with lipid bilayers,
multiple compartments including a variable number of mitochondria, substrate
molecules, enzymes with reaction rules, and metabolic pathways. We demonstrate the
validation of the model by comparison with Gepasi and comment on the reusability of
model components.
4)
Digital signal processing in predicting secondary structures of proteins
Mitra, D.; Smith, M.
Dept. of Comput. Sci., Florida Inst. of Technol., Melbourne, FL, USA
Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on
Industrial and Engineering Applications of Artificial Intelligence and Expert Systems,
IEA/AIE 2004. Proceedings (Lecture Notes in Artificial Intelligence Vol.3029) , Page: 40-9
Editor: Orchard, B.; Yang, C.; Ali, M.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , xxi+1272 Pages
Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on
Industrial and Engineering Applications of Artificial Intelligence and Expert Systems,
IEA/AIE 2004. Proceedings , 17-20 May 2004 , Ottawa, Ont., Canada
Language: English
Abstract: Traditionally protein secondary structure prediction methods work with
aggregate knowledge gleaned over a training set of proteins, or with some knowledge
acquired from the experts about how to assign secondary structural elements to each
amino acid. We are proposing here a methodology that is primarily targeted for any
given query protein rather being trained over a pre-determined training set. For some
query proteins our prediction accuracies are predictably higher than most other methods,
while for other proteins they may not be so, but we would at least know that even before
running the algorithms. Our method is based on homology-modeling. When a
significantly homologous protein (to the query) with known structure is available in the
database our prediction accuracy could be even 90% or above. Our objective is to
improve the accuracy of the predictions for the so called "easy" proteins (where
sufficiently similar homologues with known structures are available), rather than
improving the bottom-line of the structure prediction problem, or the average prediction
accuracy over many query proteins. We use digital signal processing (DSP) technique
that is of global nature in assigning structural elements to the respective residues. This is
the key to our success. We have tried some variation of the proposed core methodology
and the experimental results are presented in this article.
5)
Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering
Publisher: IEEE , Los Alamitos, CA, USA , 2004 , xviii+613 Pages
Conference: Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering ,
19-21 May 2004 , Taichung, Taiwan
Language: English
Abstract: The following topics are dealt with: bioengineering; data integration, medical
image processing; parallel computing; medical informatics; gene analysis; transcriptome
and functional genomics; homology search; structural biology; algorithms; proteinprotein interactions, indexing techniques and intelligent systems.
6)
Lymphoma cancer classification using genetic programming with SNR features
Jin-Hyuk Hong; Sung-Bae Cho
Dept. of Comput. Sci., Yonsei Univ., South Korea
Conference: Genetic Programming. 7th European Conference on Genetic Programming
EuroGP 2004. Proceedings. (Lecture Notes in Comput. Sci. Vol.3003) , Page: 78-88
Editor: Keijzer, M.; O'Reilly, U.-M.; Lucas, S.M.; Costa, E.; Soule, T.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , xi+410 Pages
Conference: Genetic Programming. 7th European Conference on Genetic Programming
EuroGP 2004. Proceedings , 5-7 April 2004 , Coimbra, Portugal
Language: English
Abstract: Lymphoma cancer classification with DNA microarray data is one of the
important problems in bioinformatics. Many machine learning techniques have been
applied to the problem and produced valuable results. However the medical field requires
not only a high-accuracy classifier, but also the in-depth analysis and understanding of
classification rules obtained. Since gene expression data have thousands of features, it is
nearly impossible to represent and understand their complex relationships directly. In
this paper, we adopt the SNR (signal-to-noise ratio) feature selection to reduce the
dimensionality of the data, and then use genetic programming to generate cancer
classification rules with the features. In the experimental results on Lymphoma cancer
dataset, the proposed method yielded 96.6% test accuracy on average, and an excellent
arithmetic classification rule set that classifies all the samples correctly is discovered by
the proposed method.
7)
Bioinformatics in the undergraduate curriculum: opportunities for computer science
educators
Burhans, D.T.; Doom, T.E.; DeJongh, M.; Leblanc, M.
Dept. of Comput. Sci., Canisius Coll., Buffalo, NY, USA
SIGCSE Bulletin
Conference: SIGCSE Bull. (USA) , vol.36, no.1 , Page: 229-30
Publisher: ACM , March 2004
Conference: Thirty-Fifth SIGCSE Technical Symposium on Computer Science Education ,
Sponsor: ACM Spcial Interest Group on Comput. Sci. Educ , 3-7 March 2004 , Norfolk, VA,
USA
Language: English
Abstract: Biology has become an increasingly data-driven science. Modern experimental
techniques, including automated DNA sequencing, gene expression micro arrays, and Xray crystallography are producing molecular data at a rate that has made traditional data
analysis methods impractical. Computational methods are becoming an increasingly
important aspect of the evaluation and analysis of experimental data in molecular
biology. Bioinformatics is the term coined for the new field that merges biology and
computer science to manage and analyze this data, with the ultimate goal of
understanding and modeling living systems (2003). The emergence of bioinformatics
provides new challenges and opportunities for computer science educators. This panel
assembles four individuals who collectively have experience teaching bioinformatics at
both liberal arts colleges and universities, and who also have industry experience in
bioinformatics, to discuss various approaches to incorporating bioinformatics into the
undergraduate curriculum.
8)
hMiDas and hMitChip: new opportunities in mitochondrial bioinformatics and genomic
medicine
Alesci, S.; Su, Y.A.; Chrousos, G.P.
Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems ,
Page: 329-34
Editor: Long, R.; Antani, S.; Lee, D.J.; Nutter, B.; Zhang, M.
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xv+603 Pages
Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems ,
Sponsor: IEEE Comput. Soc. Tech. Committee on Computational Medicine, Texas Tech Univ.
College of Eng , 24-25 June 2004 , Bethesda, MD, USA
Language: English
Abstract: We developed a human mitochondria-focused gene database (hMiDas) and
customized cDNA microarray chip (hMitChip) to help biomedical research in mitochondrial
genomics. The current version of hMiDas contains 1,242 gene entries (including mtDNA
genes, nuclear genes related to mitochondria structure and junctions, predicted loci and
experimental genes), organized in 15 categories and 24 subcategories. The database
interface allows keyword-based searches as well as advanced field and/or case-sensitive
searches. Each gene record includes 19 fields, mostly hyperlinked to the corresponding
source. Moreover, for each gene, the user is given the option to run literature search
using PubMed, and gene/protein homology search using BLAST and FASTA. The hMitChip
was constructed using hMiDas as a reference. Currently, it contains a selection of 501
mitochondria-related nuclear genes and 192 control elements, all spotted in duplicate on
glass slides. Slide quality was checked by microarray hybridization with 50 mu g of Cy3labeled sample cDNA and Cy5-labeled comparing cDNA, followed by array scan and
image analysis. The hMitChip was tested in vitro using RNA extracted from cancer cell
lines. Gene expression changes detected by hMitChip were confirmed by quantitative
real-time RT-PCR analysis.
9)
From sequence to structure using PF2: improving methods for protein folding
prediction
Hussain, S.
Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems ,
Page: 323-8
Editor: Long, R.; Antani, S.; Lee, D.J.; Nutter, B.; Zhang, M.
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xv+603 Pages
Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems ,
Sponsor: IEEE Comput. Soc. Tech. Committee on Computational Medicine, Texas Tech Univ.
College of Eng , 24-25 June 2004 , Bethesda, MD, USA
Language: English
Abstract: Projects dependent on proteomic data are challenged not by the lack of
methods to analyze this information, but by the lack of means to capture and manage
the data. A few primary players in the bioinformatics realm are promoting the use of
selected standardized technologies to access biological data. Many organizations
exposing bioinformatics tools, however, do not have the resources required for utilizing
these technologies. In order to provide interfaces for non-standardized bioinformatics
tools, open-source projects have led to the development of hundreds of software
libraries. These tools lack architectural unity, making it difficult to script bioinformatics
research projects, such as protein structure prediction algorithms, which involve the use
of multiple tools in varying order and number. As a solution, we have focused on building
a software model, named the Protein Folding Prediction Framework (PF2), which provides
a unifying method for the addition and usage of connection modules to bioinformatics
databases exposed via Web-based tools, software suites, or e-mail services. The
framework provides mechanisms that allow users to create and add new connections
without supplementary code as well as to introduce entirely new logical scenarios. In
addition, PF2 offers a convenient interface, a multi-threaded execution-engine, and a
built-in visualization suite to provide the bioinformatics community with an end-to-end
solution for performing complex genomic and proteomic inquiries.
10)
Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems
Editor: Long, R.; Antani, S.; Lee, D.J.; Nutter, B.; Zhang, M.
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xv+603 Pages
Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems ,
Sponsor: IEEE Comput. Soc. Tech. Committee on Computational Medicine, Texas Tech Univ.
College of Eng , 24-25 June 2004 , Bethesda, MD, USA
Language: English
Abstract: The following topics are dealt with: medical databases; content-based image
retrieval; medical systems; signal processing; imaging, telemedicine; data mining; image
processing; pattern recognition; segmentation; medical devices; image processing tools;
clinical applications; handheld computing for medicine; decision support systems; and
bioinformatics.
11)
Integrating ontology and workflow in PROTEUS, a grid-based problem solving
environment for bioinformatics
Cannataro, M.; Comito, C.; Guzzo, A.; Veltri, P.
Univ. of Catanzaro, Italy
Conference: Proceedings. ITCC 2004. International Conference on Information Technology:
Coding and Computing
Part: Vol.2 , Page: 90-4 Vol.2
Editor: Srimani, P.K.
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , 1710 Pages
Conference: Proceedings. ITCC 2004. International Conference on Information Technology:
Coding and Computing , Sponsor: IEEE Comput. Soc. Task Force on Information Technology
for Business Application , 5-7 April 2004 , Las Vegas, NV, USA
Language: English
Abstract: Bioinformatics is as a bridge between life science and computer science:
computer algorithms are needed to face complexity of biological processes.
Bioinformatics applications manage complex biological data stored into distributed and
often heterogeneous databases and require large computing power. We discuss
requirements of such applications and present the architecture of PROTEUS, a grid-based
problem solving environment that integrates ontology and workflow approaches to
enhance composition and execution of bioinformatics applications on the grid.
12)
Algorithm Theory - SWAT 2004. 9th Scandinavian Workshop on Algorithm Theory.
Proceedings (Lecture Notes in Comput. Sci. Vol.3111)
Editor: Hagerup, T.; Katajainen, J.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , xi+506 Pages
Conference: Algorithm Theory - SWAT 2004. 9th Scandinavian Workshop on Algorithm
Theory. Proceedings , Sponsor: DIKU, Univ. of Southern Denmark, Dept. of Math. and
Comput. Sci., IT Univ. Copenhagen, Danish Nat. Sci. Res. Council, First Graduate School,
LESS, Nokia, SAS , 8-10 July 2004 , Humlebaek, Denmark
Language: English
Abstract: The following topics are dealt with: dynamic multithreaded algorithms; cacheoblivious algorithms and data structures; graphs and trees; optimally competitive list
batching; algorithmic complexity; scheduling; and approximation algorithms.
13)
Proceedings of the IEEE 30th Annual Northeast Bioengineering Conference (IEEE Cat.
No.04CH37524)
Editor: Schreiner, S.; Cezeaux, J.L.; Muratore, D.M.
Publisher: IEEE , Piscataway, NJ, USA , 2004 , xxiii+262 Pages
Conference: Proceedings of the IEEE 30th Annual Northeast Bioengineering Conference ,
Sponsor: BEACON, Tyco Healthcare, Reebok, BEI, The Whitaker Found , 17-18 April 2004 ,
Springfield, MA, USA
Language: English
Abstract: The following topics were dealt with: neural engineering; biomedical
instrumentation; medical imaging; physiological monitoring; cardiovascular
biomechanics; biosensors; bioMEMS; biomaterials tissue and cellular engineering;
rehabilitation engineering; telemedicine and virtual reality in medicine; biomedical
education; pharmaceutical engineering; drug delivery; bio-optics; bioinformatics; surgical
devices; and the medical applications of nanosystems and nanotechnology
14)
Applications of Evolutionary Computing. Evo Workshops 2004: EvoBIO,
EvoCOMNET, EvoHOT, EvoMUSART, and EvoSTOC. Proceedings (Lecture Notes in
Comput. Sci. Vol.3005)
Editor: Raidl, G.R.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , xix+562 Pages
Conference: Applications of Evolutionary Computing. Evo Workshops 2004: EvoBIO,
EvoCOMNET, EvoHOT, EvoMUSART, and EvoSTOC. Proceedings , Sponsor: EvoNET,
Univ. of Coimbra , 5-7 April 2004 , Coimbra, Portugal
Language: English
Abstract: The following topics are dealt with: EvoBIO; evolutionary bioinformatics;
EvoCOMNET; evolutionary computation; communications, networks, and connected
systems; EvoHOT; hardware optimization techniques; binary decision diagrams;
multilayer floorplan layout problem; EvoIASP; image analysis; signal processing; object
recognition systems; EvoMUSART; evolutionary music; evolutionary art; EvoSTOC;
evolutionary algorithms; stochastic environment; optimization problems; and dynamic
environments.
15)
Bioinformatics: a knowledge engineering approach
Kasabov, N.
Sch. of Bus., Auckland Univ. of Technol., New Zealand
Conference: 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings
(IEEE Cat. No.04EX791)
Part: Vol.1 , Page: 19-24 Vol.1
Editor: Yager, R.R.; Sgurev, V.S.
Publisher: IEEE , Piscataway, NJ, USA , 2004 , 756 Pages
Conference: 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings ,
Sponsor: IEEE Instrumentation and Measurement Soc., IEEE IM/CS/SMC Joint Chapter of
Bulgaria , 22-24 June 2004 , Varna, Bulgaria
Language: English
Abstract: The paper introduces the knowledge engineering (KE) approach for the
modeling and the discovery of new knowledge in bioinformatics. This approach extends
the machine learning approach with various rule extraction and other knowledge
representation procedures. Examples of the KE approach, and especially of one of the
recently developed techniques - evolving connectionist systems (ECOS), to challenging
problems in bioinformatics are given, that include: DNA sequence analysis, microarray
gene expression profiling, protein structure prediction, finding gene regulatory networks,
medical prognostic systems, computational neurogenetic modeling.
16)
Unordered tree mining with applications to phylogeny
Shasha, D.; Wang, J.T.L.; Sen Zhang
Courant Inst. of Math. Sci., New York Univ., NY, USA
Conference: Proceedings. 20th International Conference on Data Engineering , Page: 708-19
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xx+880 Pages
Conference: Proceedings. 20th International Conference on Data Engineering , Sponsor:
Microsoft Res., bea, IBM, MITRE, Sun Microsystems , 30 March-2 April 2004 , Boston, MA,
USA
Language: English
Abstract: Frequent structure mining (FSM) aims to discover and extract patterns
frequently occurring in structural data, such as trees and graphs. FSM finds many
applications in bioinformatics, XML processing, Web log analysis, and so on. We present a
new FSM technique for finding patterns in rooted unordered labeled trees. The patterns
of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the
same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T,
our algorithm finds all interesting cousin pairs of T in O(|T|/sup 2/) time where |T| is the
number of nodes in T. Experimental results on synthetic data and phylogenies show the
scalability and effectiveness of the proposed technique. To demonstrate the usefulness of
our approach, we discuss its applications to locating co-occurring patterns in multiple
evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding
kernel trees of groups of phylogenies. We also describe extensions of our algorithms for
undirected acyclic graphs (or free trees).
17)
LDC: enabling search by partial distance in a hyper-dimensional space
Koudas, N.; Ooi, B.C.; Shen, H.T.; Tung, A.K.H.
Shannon Lab., AT&T Labs Res., Basking Ridge, NJ, USA
Conference: Proceedings. 20th International Conference on Data Engineering , Page: 6-17
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xx+880 Pages
Conference: Proceedings. 20th International Conference on Data Engineering , Sponsor:
Microsoft Res., bea, IBM, MITRE, Sun Microsystems , 30 March-2 April 2004 , Boston, MA,
USA
Language: English
Abstract: Recent advances in research fields like multimedia and bioinformatics have
brought about a new generation of hyper-dimensional databases which can contain
hundreds or even thousands of dimensions. Such hyper-dimensional databases pose
significant problems to existing high-dimensional indexing techniques which have been
developed for indexing databases with (commonly) less than a hundred dimensions. To
support efficient querying and retrieval on hyper-dimensional databases, we propose a
methodology called local digital coding (LDC) which can support k-nearest neighbors
(KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices,
such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for
each point in the database. Pruning during KNN search is performed by dynamically
selecting only a subset of the bits from the DC based on which subsequent comparisons
are performed. In doing so, expensive operations involved in computing L-norm distance
functions between hyper-dimensional data can be avoided. Extensive experiments are
conducted to show that our methodology offers significant performance advantages over
other existing indexing methods on both real life and synthetic hyper-dimensional
datasets.
18)
Proceedings. 20th International Conference on Data Engineering
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xx+880 Pages
Conference: Proceedings. 20th International Conference on Data Engineering , Sponsor:
Microsoft Res., bea, IBM, MITRE, Sun Microsystems , 30 March-2 April 2004 , Boston, MA,
USA
Language: English
Abstract: The following topics are dealt with: XML; query processing; tree data
structures; database management systems; Internet; indexing; semi-structured data;
data mining; streams; sensors; middleware; workflow; Web data management; security;
data warehouses; OLAP; enterprise systems; scientific and biological databases;
bioinformatics; and clustering.
19)
Design and implementation of a computational grid for bioinformatics
Chao-Tung Yang; Yu-Lun Kuo; Chuan-Lin Lai
Dept. of Comput. Sci. & Inf. Eng., Tunghai Univ., Taichung, Taiwan
Conference: Proceedings. 2004 IEEE International Conference on e-Technology, eCommerce and e-Service , Page: 448-51
Editor: Yuan, S.-T.; Liu, J.
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xxi+575 Pages
Conference: Proceedings. 2004 IEEE International Conference on e-Technology, eCommerce and e-Service , Sponsor: IEEE Task Committee of e-Commerce, Fu-Jen Univ. of
Taiwan, BIKMrdc of Fu-Jen Univ., Academia Sinica, Nat. Sci. Council of Taiwan, Ministry of
Educ. of Taiwan, Information Syst. Frontiers, Microsoft, ChungHwa Data Mining Soc , 28-31
March 2004 , Taipei, Taiwan
Language: English
Abstract: The popular technologies, Internet computing and grid technologies promise
to change the way we tackle complex problems. They enable large-scale aggregation and
sharing of computational, data and other resources across institutional boundaries. And
harnessing these new technologies effectively transforms scientific disciplines ranging
from high-energy physics to the life sciences. The computational analysis of biological
sequences is a kind of computation driven science. Cause the biology data growing
quickly and these databases are heterogeneous. We can use the grid system sharing and
integrating the heterogeneous biology database. As we know, bioinformatics tools can
speed up analysis the large-scale sequence data, especially about sequence alignment
and analysis. The FASTA is a tool for aligning multiple protein or nucleotide sequences.
These two bioinformatics software, which we used is a distributed and parallel version.
The software uses a message-passing library called MPI (message passing interface) and
runs on distributed workstation clusters as well as on traditional parallel computers. A
grid computing environment is proposed and constructed on multiple Linux PC clusters by
using globus toolkit (GT) and SUN grid engine (SGE). The experimental results and
performances of the bioinformatics tool using on grid system are also presented.
20)
Aligning multiple sequences by genetic algorithm
Li-fang Liu; Hong-wei Huo; Bao-shu Wang
Sch. of Comput. Sci. & Technol., Xidian Univ., Xi'an, China
Conference: 2004 International Conference on Communications, Circuits and Systems (IEEE
Cat. No.04EX914)
Part: Vol.2 , Page: 994-8 Vol.2
Publisher: IEEE , Piscataway, NJ, USA , 2004 , 1584 Pages
Conference: 2004 International Conference on Communications, Circuits and Systems ,
Sponsor: Ministry of Educ. (MOE) of PR China, City Univ. of Hong Kong, K.C. Wong Educ.
Found , 27-29 June 2004 , Chengdu, China
Language: English
Abstract: The paper presents a genetic algorithm for solving multiple sequence
alignment in bioinformatics. The algorithm involves four different operators, one type of
selection operator, two types of crossover operators, and one type of mutation operator;
the mutation operator is realized by a dynamic programming method. Experimental
results of benchmarks from the BAliBASE show that the proposed algorithm is feasible for
aligning equidistant protein sequences, and the quality of alignment is comparable to
that obtained with ClustalX.
21)
Algorithms for estimating information distance with application to bioinformatics and
linguistics
Kaitchenko, A.
Dept. of Phys. & Comput., Wilfrid Laurier Univ., Waterloo, Ont., Canada
Conference: Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat.
No.04CH37513)
Part: Vol.4 , Page: 2255-8 Vol.4
Publisher: IEEE , Piscataway, NJ, USA , 2004 , 2908 Pages
Conference: Canadian Conference on Electrical and Computer Engineering 2004 , Sponsor:
Cisco Syst., General Elec., Ryerson Univ., AVFX Audio Visual, Bell Canada, Dofasco, Dye &
Durham, Gennum Corp., IEEE Canada Found., Univ. of Toronto, Niagara College of Appl.
Arts and Technol , 2-5 May 2004 , Niagara Falls, Ont., Canada
Language: English
Abstract: We review unnormalized and normalized information distances based on
incomputable notions of Kolmogorov complexity and discuss how Kolmogorov complexity
can be approximated by data compression algorithms. We argue that optimal algorithms
for data compression with side information can be successfully used to approximate the
normalized distance. Next, we discuss an alternative information distance, which is based
on relative entropy rate (also known as Kullback-Leibler divergence), and compression-
based algorithms for its estimation. We conjecture that in bioinformatics and
computational linguistics this alternative distance is more relevant and important than
the ones based on Kolmogorov complexity.
22)
g the three-dimensional structures of proteins: combined alignment approach
Jaehyun Sim; Seung-Yeon Kim; Jooyoung Lee; Ahrim Yoo
Sch. of Comput. Sci., Korea Inst. for Adv. Study, Seoul, South Korea
Journal of the Korean Physical Society
Conference: J. Korean Phys. Soc. (South Korea) , vol.44, no.3, pt.1 , Page: 611-16
Publisher: Korean Phys. Soc , March 2004
Conference: 12th Thermal and Statistical Physics Workshop , 19-21 Aug. 2003 , Suanbo,
Chungbuk, South Korea
Language: English
Abstract: Protein structure prediction is a great challenge in molecular biophysics and
bioinformatics. Most approaches to structure, prediction use known structure information
from the Protein Data Bank (PDB). In these approaches, it is most crucial to find a
homologous protein (template) from the PDB to a query sequence and to align the query
sequence to the template sequence. We propose a profile-profile alignment method
based on the cosine similarity criterion, and combine this with a sequence-profile
alignment, the secondary structure prediction of the query protein, and the experimental
secondary structure of the template protein. Our method, which we call combined
alignment, provides good results for the 1107 query-template pairs of the SCOP database
and the CASP5 target proteins. They show that combined alignment significantly
improves the recognition of distant homology.
23)
The role of computer science in undergraduate bioinformatics education
Burhans, D.T.; Skuse, G.R.
Dept. of Comput. Sci., Canisius Coll., Buffalo, NY, USA
SIGCSE Bulletin
Conference: SIGCSE Bull. (USA) , vol.36, no.1 , Page: 417-21
Publisher: ACM , March 2004
Conference: Thirty-Fifth SIGCSE Technical Symposium on Computer Science Education ,
Sponsor: ACM Spcial Interest Group on Comput. Sci. Educ , 3-7 March 2004 , Norfolk, VA,
USA
Language: English
Abstract: The successful implementation of educational programs in bioinformatics
presents many challenges. The interdisciplinary nature of bioinformatics requires close
cooperation between computer scientists and biologists despite inescapable differences in
the ways in which members of these professions think. It is clear that the development of
quality curricula for bioinformatics must draw upon the expertise of both disciplines. In
addition, biologists and computer scientists can benefit from opportunities to carry out
interdisciplinary research with one another. This paper examines the role of computer
science in undergraduate bioinformatics education from the perspectives of two
bioinformatics program directors. Their respective programs exemplify two substantively
different approaches to undergraduate education in bioinformatics due to the fact that
they are at markedly different institutions. One institution is a large, technical university,
offering both undergraduate and graduate degrees in bioinformatics while the other is a
small, Jesuit liberal arts college with an undergraduate program in bioinformatics.
Despite these differences there is considerable overlap with respect to the role of
computer science. This paper discusses the ways in which computer science has been
integrated into these two undergraduate bioinformatics programs, compares alternative
approaches, and presents some of the inherent challenges.
24)
Challenges posed by adoption issues from a bioinformatics point of view
Moise, D.L.; Wong, K.; Moise, G.
Dept. of Comput. Sci., Alberta Univ., Edmonton, Alta., Canada
Conference: "Fourth International Workshop on Adoption-Centric Software Engineering
(ACSE 2004)" W6S Workshop - 26th International Conference on Software Engineering ,
Page: 75-9
Publisher: IEE , Stevenage, UK , 2004 , vi+85 Pages
Conference: "Fourth International Workshop on Adoption-Centric Software Engineering
(ACSE 2004)" W6S Workshop - 26th International Conference on Software Engineering ,
Sponsor: IEEE Comput. Soc., SIGSOFT, IEE , 25 May 2004 , Edinburgh, Scotland, UK
Language: English
Abstract: Developing interoperability models for data is a crucial factor for the adoption
of research tools within industry. In this paper, we discuss efficient data interoperability
models within a field where they are highly needed: the bioinformatics field. We present
the challenges that interoperability models for data must face within this field and we
discuss some existing strategies built to address these challenges. The potential of a
semi-structured data model based on XML is discussed. Also, a novel approach that
enhances the capabilities of the data integration model by automatically identifying XML
documents generated based on the same DTD is presented. Practices developed within
this application domain can be used for the benefit of similar adoption issues in various
other domains.
25)
Software engineering challenges in bioinformatics
Barker, J.; Thornton, J.
Eur. Bioinformatics Inst., Cambridge, UK
Conference: Proceedings. 26th International Conference on Software Engineering , Page:
12-15
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xviii+786 Pages
Conference: Proceedings. 26th International Conference on Software Engineering , Sponsor:
IEE, Assoc. for Comput. Machinery Special Interest Group on Software Eng., IEEE Comput.
Soc , 23-28 May 2004 , Edinburgh, UK
Language: English
Abstract: Data from biological research is proliferating rapidly and advanced data
storage and analysis methods are required to manage it. We introduce the main sources
of biological data available and outline some of the domain specific problems associated
with automated analysis. We discuss two major areas in which we are likely experience
software engineering challenges over the next ten years: data integration and
presentation.
26)
BLID: an application of logical information systems to bioinformatics
Ferre, S.; King, R.D.
Dept. of Comput. Sci., Wales Univ., Aberystwyth, UK
Conference: Concept Lattices. Second International Conference on Formal Concept Analysis,
ICFCA 2004. Proceedings (Lecture Notes in Artificial Intelligence Vol.2961) , Page: 47-54
Editor: Eklund, P.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , ix+409 Pages
Conference: Concept Lattices. Second International Conference on Formal Concept Analysis,
ICFCA 2004. Proceedings , 23-26 Feb. 2004 , Sydney, NSW, Australia
Language: English
Abstract: BLID (bio-logical intelligent database) is a bioinformatic system designed to
help biologists extract new knowledge from raw genome data by providing high-level
facilities for both data browsing and analysis. We describe BLID's novel data browsing
system which is based on the idea of logical information systems. This enables combined
querying and navigation of data in BLID (extracted from public bioinformatic
repositories). The browsing language is a logic especially designed for bioinformatics. It
currently includes sequence motifs, taxonomies, and macromolecule structures, and it is
designed to be easily extensible, as it is composed of reusable components. Navigation is
tightly combined with this logic, and assists users in browsing a genome through a form
of human-computer dialog.
27)
The automatic generation of programs for classification problems with grammatical
swarm
O'Neill, M.; Brabazon, A.; Adley, C.
Biocomputing & Dev. Syst. Group, Univ. of Limerick, Ireland
Conference: Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat.
No.04TH8753)
Part: Vol.1 , Page: 104-10 Vol.1
Publisher: IEEE , Piscataway, NJ, USA , 2004 , xxx+2371 Pages
Conference: Proceedings of the 2004 Congress on Evolutionary Computation , Sponsor:
IEEE Neural Network Soc., Evolutionary Programming Soc., IEE , 19-23 June 2004 ,
Portland, OR, USA
Language: English
Abstract: This case study examines the application of grammatical swarm to
classification problems, and illustrates the particle swarm algorithms' ability to specify
the construction of programs. Each individual particle represents choices of program
construction rules, where these rules are specified using a Backus-Naur Form grammar.
Two problem instances are tackled, the first a mushroom classification problem, the
second a bioinformatics problem that involves the detection of eukaryotic DNA promoter
sequences. For the first problem we generate solutions that take the form of conditional
statements in a C-like language subset, and for the second problem we generate simple
regular expressions. The results demonstrate that it is simple regular expressions. The
results demonstrate that it is possible to generate programs using the grammatical
swarm technique with a performance similar to the grammatical evolution evolutionary
automatic programming approach.
28)
Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat.
No.04TH8753)
Part: Vol.1
Publisher: IEEE , Piscataway, NJ, USA , 2004 , xxx+2371 Pages
Conference: Proceedings of the 2004 Congress on Evolutionary Computation , Sponsor:
IEEE Neural Network Soc., Evolutionary Programming Soc., IEE , 19-23 June 2004 ,
Portland, OR, USA
Language: English
Abstract: The following topics are discussed: evolutionary multiobjective optimization;
evolutionary algorithms; combinatorial and numerical optimization; swarm intelligence;
evolutionary computation and games; evolutionary computation in bioinformatics and
computational biology; evolutionary design; evolutionary computing in the process
industry; evolutionary computation in finance and economics; evolutionary scheduling;
evolutionary design and evolvable hardware; evolutionary design automation;
evolutionary computation in cryptology and computer security; learning and
approximation in design optimization; and coevolution and collective behavior.
29)
Construct a grid computing environment for bioinformatics
Yu-Lun Kuo; Chao-Tung Yang; Chuan-Lin Lai; Tsai-Ming Tseng
Dept. of Comput. Sci. & Inf. Eng., Tunghai Univ., Taichung, Taiwan
Conference: Proceedings. 7th International Symposium on Parallel Architectures, Algorithms
and Networks. I-SPAN'04 , Page: 339-44
Editor: Hsu, D.F.; Hiraki, K.; Shen, S.; Sudborough, H.
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xvi+645 Pages
Conference: Proceedings. 7th International Symposium on Parallel Architectures, Algorithms
and Networks. I-SPAN'04 , Sponsor: Univ. of Hong Kong , 10-12 May 2004 , Hong Kong,
China
Language: English
Abstract: Internet computing and grid technologies promise to change the way we
tackle complex problems. They will enable large-scale aggregation and sharing of
computational, data and other resources across institutional boundaries. And harnessing
these new technologies effectively will transform scientific disciplines ranging from highenergy physics to the life sciences. The computational analysis of biological sequences is
a kind of computation driven science. Cause the biology data growing quickly and these
databases are heterogeneous. We can use the grid system sharing and integrating the
heterogeneous biology database. As we know, bioinformatics tools can speed up analysis
the large-scale sequence data, especially about sequence alignment. The FASTA is a tool
for aligning multiple protein or nucleotide sequences. FASTA which we used is a
distributed and parallel version. The software uses a message-passing library called MPl
(Message Passing Interface) and runs on distributed workstation clusters as well as on
traditional parallel computers. A grid computing environment is proposed and
constructed on multiple Linux PC clusters by using Globus Toolkit (GT) and SUN Grid
Engine (SGE). The experimental results and performances of the bioinformatics tool
using on grid system are also presented in this paper.
30)
Proceedings. 7th International Symposium on Parallel Architectures, Algorithms and
Networks. I-SPAN'04
Editor: Hsu, D.F.; Hiraki, K.; Shen, S.; Sudborough, H.
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xvi+645 Pages
Conference: Proceedings. 7th International Symposium on Parallel Architectures, Algorithms
and Networks. I-SPAN'04 , Sponsor: Univ. of Hong Kong , 10-12 May 2004 , Hong Kong,
China
Language: English
Abstract: The following topics are dealt with: routing; wireless networks; content
distribution; parallel algorithms; interconnection networks; fault tolerance; graphs; load
balancing; semantic Web; data distribution; communication performance; parallel
architecture; Internet technology and applications; quality of service; optical networks;
mobile computing; network security and management; and bioinformatics.
31)
Experiences on adaptive grid scheduling of parameter sweep applications
Huedo, E.; Montero, R.S.; Llorente, I.M.
Lab. Computacion Avanzada, CSIC-INTA, Torrejon de Ardoz, Spain
Conference: Proceedings. 12th Euromicro Conference on Parallel, Distributed and NetworkBased Processing , Page: 28-33
Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xiii+442 Pages
Conference: Proceedings. 12th Euromicro Conference on Parallel, Distributed and NetworkBased Processing , 11-13 Feb. 2004 , Coruna, Spain
Language: English
Abstract: Grids offer a dramatic increase in the number of available compute and
storage resources that can be delivered to applications. This new computational
infrastructure provides a promising platform to execute loosely coupled, high-throughput
parameter sweep applications. This kind of applications arises naturally in many scientific
and engineering fields like bioinformatics, computational fluid dynamics (CFD), particle
physics, etc. The efficient execution and scheduling of parameter sweep applications is
challenging because of the dynamic and heterogeneous nature of grids. We present a
scheduling algorithm built on top of the GridWay framework that combines: (i) adaptive
scheduling to reflect the dynamic grid characteristics; (ii) adaptive execution to migrate
running jobs to better resources and provide fault tolerance; (iii) re-use of common files
between tasks to reduce the file transfer overhead. The efficiency of the approach is
demonstrated in the execution of a CFD application on a highly heterogeneous research
testbed.
32)
Asynchronous HMM with applications to speech recognition
Garg, A.; Balakrishnan, S.; Vaithyanathan, S.
Almaden Res. Center, San Jose, CA, USA
Conference: 2004 IEEE International Conference on Acoustics, Speech, and Signal
Processing
Part: vol.1 , Page: I-1009-12 vol.1
Publisher: IEEE , Piscataway, NJ, USA , 2004 , 5 vol. (cix+1045) Pages
Conference: 2004 IEEE International Conference on Acoustics, Speech, and Signal
Processing , 17-21 May 2004 , Montreal, Que., Canada
Language: English
Abstract: We develop a novel formalism for modeling speech signals which are
irregularly or incompletely sampled. This situation can arise in real world applications
where the speech signal is being transmitted over an error prone channel where parts of
the signal can be dropped. Typical speech systems based on hidden Markov models,
cannot handle such data since HMMs rely on the assumption that observations are
complete and made at regular intervals. We introduce the asynchronous HMM, a variant
of the inhomogeneous HMM commonly used in bioinformatics, and show how it can be
used to model irregularly or incompletely sampled data. A nested EM algorithm is
presented in brief which can be used to learn the parameters of this asynchronous HMM.
Evaluation on real world speech data, which has been modified to simulate channel
errors, shows that this model and its variants significantly outperform the standard HMM
and methods based on data interpolation
33)
An asynchronous GALS interface with applications
Smith, S.F.
Electr. & Comput. Eng. Dept., Boise State Univ., USA
Conference: 2004 IEEE Workshop on Microelectronics and Electron Devices (IEEE Cat.
No.04EX810) , Page: 41-4
Publisher: IEEE , Piscataway, NJ, USA , 2004 , xii+136 Pages
Conference: 2004 IEEE Workshop on Microelectronics and Electron Devices , 16 April 2004
, Boise, ID, USA
Language: English
Abstract: A low-latency asynchronous interface for use in globally-asynchronous locallysynchronous (GALS) integrated circuits is presented. The interface is compact and does
not alter the local clocks of the interfaced local clock domains in any way (unlike many
existing GALS interfaces). Two applications of the interface to GALS systems are shown.
The first is a single-chip shared-memory multiprocessor for generic supercomputing use.
The second is an application-specific coprocessor for hardware acceleration of the SmithWaterman algorithm. This is a bioinformatics algorithm used for sequence alignment
(similarity searching) between DNA or amino acid (protein) sequences and sequence
databases such as the recently completed human genome database.
34)
Computational Methods for SNPs and Haplotype Inference. DIMACS/RECOMB
Satellite Workshop. Revised Papers. (Lecture Notes in Bioinformatics Vol.2983)
Editor: Istrail, S.; Waterman, M.; Clark, A.
Publisher: Springer-Verlag , Berlin, Germany , 2004 , ix+152 Pages
Conference: Computational Methods for SNPs and Haplotype Inference. DIMACS/RECOMB
Satellite Workshop. Revised Papers , 21-22 Nov. 2002 , Piscataway, NJ, USA
Language: English
Abstract: The conference focused on methods for SNP and haplotype analysis and their
applications to disease associations. The ability to score large numbers of DNA variants
(SNPs) in large samples of humans is rapidly accelerating, as is the demand to apply
these data to tests of association with diseased states. The problem suffers from
excessive dimensionality, so any means of reducing the number of dimensions of the
space of genotype classes in a biologically meaningful way would likely be of benefit.
Linked SNPs are often statistically associated with one another (in "linkage
disequilibrium"), and the number of distinct configurations of multiple tightly linked SNPs
in a sample is often far lower than one would expect from independent sampling. These
joint configurations, or haplotypes, might be a more biologically meaningful unit, since
they represent sets of SNPs that co-occur in a population. Recently there has been much
excitement over the idea that such haplotypes occur as blocks across the genome, as
these blocks suggest that fewer distinct SNPs need to be scored to capture the
information about genotype identity. There is need for formal analysis of this dimension
reduction problem, for formal treatment of the hierarchical structure of haplotypes, and
for consideration of the utility of these approaches toward meeting the end goal of
finding genetic variants associated with complex diseases.
35)
IT service infrastructure for integrative systems biology
Curcin, Vasa; Ghanem, Moustafa; Guo, Yike; Rowe, Anthony; He, Wayne; Pei, Hao; Qiang,
Lu; Li, Yuanyuan
Department of Computing Imperial College London, London SW7 2BZ, United Kingdom
Conference: Proceedings - 2004 IEEE International Conference on Services Computing, SCC
2004 , Shanghai, China , 20040915-20040918 , (Sponsor: IEEE Computer Society, TSC-SC;
IBM T.J. Watson Research Center; Shanghai Jiao Tong University (SJTU), China; University
of Hong Kong, E-Business Technology Institute, China)
Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004
Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 2004. ,
2004
Language: English
Abstract: Despite the large number of software tools and hardware platforms aiming to
solve the problems that bioinformatics is facing today, there is no platform solution that
can scale up to its demands, in terms of both scope and sheer volume. DiscoveryNet
scientific workflow system is here extended into a service-centric component architecture
that brings together cross-domain applications through web and grid services and
composes them as novel service offerings. Two case studies implemented on top of the
platform, SARS analysis and microarray/metabonomics, are described.
36)
Integrating text mining into distributed bioinformatics workflows: A Web services
implementation
Gaizauskas, Rob; Davis, Neil; Demetriou, George; Guo, Yikun; Roberts, Ian
Department of Computer Science University of Sheffield, Sheffield, United Kingdom
Conference: Proceedings - 2004 IEEE International Conference on Services Computing, SCC
2004 , Shanghai, China , 20040915-20040918 , (Sponsor: IEEE Computer Society, TSC-SC;
IBM T.J. Watson Research Center; Shanghai Jiao Tong University (SJTU), China; University
of Hong Kong, E-Business Technology Institute, China)
Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004
Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 2004. ,
2004
Language: English
Abstract: Workflows are useful ways to support scientific researchers in carrying out
repetitive analytical tasks on digital information. Web services can provide a useful
implementation mechanism for workflows, particularly when they are distributed, i.e.,
where some of the data or processing resources are remote from the scientist initiating
the workflow. While many scientific workflows primarily involve operations on structured
or numerical data, all interpretation of results is done in the context of related work in
the field, as reflected in the scientific literature. Text mining technology can assist in
automatically building helpful pathways into the relevant literature as part of a workflow
in order to support the scientific discovery process. In this paper we demonstrate how
these three technologies - workflows, text mining, and web services - can be fruitfully
combined in order to support bioinformatics researchers investigating the genetic basis of
two physiological disorders - Graves' disease and Williams syndrome.
37)
Bioinformatics and Systems Biology, rapidly evolving tools for interpreting plant
response to global change
Blanchard, Jeffrey L.
Conference: Linking Functional Genomics with Physiology for Global Change , Denver, CO,
United States , 20031105-20031105
Field Crops Research v 90 n 1 Nov 8 2004. p 117-131 , 2004
Language: English
Abstract: Global change is impacting the evolutionary trajectory of our planet's biota. In
spite of the widely appreciated magnitude of this process, we still have a limited ability to
estimate biological effects of increased atmospheric CO//2 or of climate change. Many
new molecular techniques, including microarrays and metabolic profiling, are emerging
that allow the direct observation of the vast repertoire of an organism's cellular processes
in laboratory and ecological settings. The challenge now is to integrate these large data
sets containing spatial and temporal components into models that enable us to explain
how organisms respond to increased atmospheric CO //2 and eventually to develop
models that accurately predict their evolutionary trajectory. In response, the field of
bioinformatics is expanding to better facilitate information transfer between laboratory
experiments and mathematical modeling in support of the emerging field of Systems
Biology. copy 2004 Elsevier B.V. All rights reserved.
38)
Integration of genomics approach with traditional breeding towards improving abiotic
stress adaptation: Drought and aluminum toxicity as case studies
Ishitani, Manabu; Rao, Idupulapati; Wenzl, Peter; Beebe, Steve; Tohme, Joe
Conference: Linking Functional Genomics with Physiology for Global Change , Denver, CO,
United States , 20031105-20031105
Field Crops Research v 90 n 1 Nov 8 2004. p 35-45 , 2004
Language: English
Abstract: Traditional breeding efforts are expected to be greatly enhanced through
collaborative approaches incorporating functional, comparative and structural genomics.
Potential benefits of combining genomic tools with traditional breeding have been a
source of widespread interest and resulted in numerous efforts to achieve the desired
synergy among disciplines. The International Center for Tropical Agriculture (CIAT) is
applying functional genomics by focusing on characterizing genetic diversity for crop
improvement in common bean (Phaseolus vulgaris L.), cassava (Manihot esculenta
Crantz), tropical grasses, and upland rice (Oriza sativa L.). This article reviews how CIAT
combines genomic approaches, plant breeding, and physiology to understand and exploit
underlying genetic mechanisms of abiotic stress adaptation for crop improvement. The
overall CIAT strategy combines both bottom-up (gene to phenotype) and top-down
(phenotype to gene) approaches by using gene pools as sources for breeding tools. The
strategy offers broad benefits by combining not only in-house crop knowledge, but
publicly available knowledge from well-studied model plants such as arabidopsis left
bracket Arabidopsis thaliana (L.) Heynh. right bracket . Successfully applying functional
genomics in trait gene discovery requires diverse genetic resources, crop phenotyping,
genomics tools integrated with bioinformatics and proof of gene function in planta (proof
of concept). In applying genomic approaches to crop improvement, two major gaps
remain. The first gap lies in understanding the desired phenotypic trait of crops in the
field and enhancing that knowledge through genomics. The second gap concerns
mechanisms for applying genomic information to obtain improved crop phenotypes. A
further challenge is to effectively combine different genomic approaches, integrating
information to maximize crop improvement efforts. Research at CIAT on drought
tolerance in common bean and aluminum resistance in tropical forage grasses (Brachiaria
spp.) is used to illustrate the opportunities and constraints in breeding for adaptation to
abiotic stresses.
39)
From sequence to structure using PF2: Improving methods for protein folding
prediction
Hussain, Saleem
Conference: Proceedings 17th IEEE Symposium on Computer-Based Medical Systems, CBMS
2004 , Bethesda, MD, United States , 20040624-20040625 , (Sponsor: IEEE Computer
Society; Texas Tech University College of Engineering)
Proceedings of the IEEE Symposium on Computer-Based Medical Systems Proceedings 17th
IEEE Symposium on Computer-Based Medical Systems, CBMS 2004 v 17 2004. , 2004
Language: English
Abstract: Projects dependent on proteomic data are challenged not by the lack of
methods to analyze this information, but by the lack of means to capture and manage
the data. A few primary players in the bioinformatics realm are promoting the use of
selected standardized technologies to access biological data. Many organizations
exposing bioinformatics tools, however, do not have the resources required for utilizing
these technologies. In order to provide interfaces for non-standardized bioinformatics
tools, open-source projects have led to the development of hundreds of software
libraries. These tools lack architectural unity, making it difficult to script bioinformatics
research projects, such as protein structure prediction algorithms, which involve the use
of multiple tools in varying order and number. As a solution, we have focused on building
a software model, named the Protein Folding Prediction Framework (PF2), which provides
a unifying method for the addition and usage of connection modules to bioinformatics
databases exposed via web-based tools, software suites, or e-mail services. The
framework provides mechanisms that allow users to create and add new connections
without supplementary code as well as to introduce entirely new logical scenarios. In
addition, PF2 offers a convenient interface, a multi-threaded execution-engine, and a
built-in visualization suite to provide the bioinformatics community with an end-to-end
solution for performing complex genomic and proteomic inquiries.
40)
26th international conference on software engineering: ICSE 2004
Anon (Ed.)
Conference: Proceedings - 26th International Conference on Software Engineering, ICSE
2004 , Edinburgh, United Kingdom , 20040523-20040528 , (Sponsor: Institution of Electrical
Engineers, IEE; British Computer Society, BCS; Association for Computing Machinery, ACM
SIGSOFT; Association for Computing Machinery, ACM SIGPLAN; IEEE Computer Society
Technical Council on Software Engineering)
Proceedings - International Conference on Software Engineering Proceedings - 26th
International Conference on Software Engineering, ICSE 2004 v 26 2004. , 2004
Language: English
Abstract: The proceedings contains 122 papers from the 26**t**h International
Conference on Software Engineering: ICSE 2004. The topics discussed include:
Controlling the complexity of software designs; software engineering challenges in
bioinformatics; adding high availability and autonomic behavior to Web services; grid
small and large: distributed systems and global communities; a model driven approach
for software systems reliability; component-based self-adaptability in peer-to-peer
architectures and one more step in the direction of modularized integration concerns.
41)
Software engineering challenges in bioinformatics
Barker, Jonathan; Thornton, Janet
European Bioinformatics Institute Wellcome Trust Genome Campus, Cambridge CB10 1SD,
United Kingdom
Conference: Proceedings - 26th International Conference on Software Engineering, ICSE
2004 , Edinburgh, United Kingdom , 20040523-20040528 , (Sponsor: Institution of Electrical
Engineers, IEE; British Computer Society, BCS; Association for Computing Machinery, ACM
SIGSOFT; Association for Computing Machinery, ACM SIGPLAN; IEEE Computer Society
Technical Council on Software Engineering)
Proceedings - International Conference on Software Engineering Proceedings - 26th
International Conference on Software Engineering, ICSE 2004 v 26 2004. , 2004
Language: English
Abstract: Data from biological research is proliferating rapidly and advanced data
storage and analysis methods are required to manage it. We introduce the main sources
of biological data available and outline some of the domain-specific problems associated
with automated analysis. We discuss two major areas in which we are likely experience
software engineering challenges over the next ten years: data integration and
presentation.
42)
hMiDas and hMitChip: New opportunities in mitochondrial bioinformatics and genomic
medicine
Alesci, Salvatore; Su, Yan A.; Chrousos, George P.
Conference: Proceedings 17th IEEE Symposium on Computer-Based Medical Systems, CBMS
2004 , Bethesda, MD, United States , 20040624-20040625 , (Sponsor: IEEE Computer
Society; Texas Tech University College of Engineering)
Proceedings of the IEEE Symposium on Computer-Based Medical Systems Proceedings 17th
IEEE Symposium on Computer-Based Medical Systems, CBMS 2004 v 17 2004. , 2004
Language: English
Abstract: We developed a human mitochondria-focused gene database (hMiDas) and
customized cDNA microarray chip (hMitChip) to help biomedical research in mitochondrial
genomics. The current version of hMiDas contains 1,242 gene entries (including mtDNA
genes, nuclear genes related to mitochondria structure and functions, predicted loci and
experimental genes), organized in 15 categories and 24 subcategories. The database
interface allows keyword-based searches as well as advanced field and/or case-sensitive
searches. Each gene record includes 19 fields, mostly hyperlinked to the corresponding
source. Moreover, for each gene, the user is given the option to run literature search
using PubMed, and gene/protein homology search using BLAST and FASTA. The hMitChip
was constructed using hMiDas as a reference. Currently, it contains a selection of 501
mitochondria-related nuclear genes and 192 control elements, all spotted in duplicate on
glass slides. Slide quality was checked by microarray hybridization with 50 mug of Cy3labeled sample cDNA and Cy5-labeled comparing cDNA, followed by array scan and
image analysis. The hMitChip was tested in vitro using RNA extracted from cancer cell
lines. Gene expression changes detected by hMitChip were confirmed by quantitative
real-time RT-PCR analysis.
43)
DWDM-RAM: A data intensive grid service architecture enabled by dynamic optical
networks
Lavian, T.; Mambretti, J.; Cutrell, D.; Cohen, H.; Merrill, S.; Durairaj, R.; Daspit, P.; Monga,
I.; Naiksatam, S.; Figueira, S.; Gutierrez, D.; Hoang, D.; Travostino, F.
Conference: 2004 IEEE International Symposium on Cluster Computing and the Grid,
CCGrid 2004 , Chicago, IL, United States , 20040419-20040422 , (Sponsor: Institute of
Electrical and Electronics Engineers, IEEE)
2004 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004 2004
IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004 2004. ,
2004
Language: English
Abstract: Next generation applications and architectures (for example, Grids) are driving
radical changes in the nature of traffic, service models, technology, and cost, creating
opportunities for an advanced communications infrastructure to tackle next generation
data services. To take advantage of these trends and opportunities, research
communities are creating new architectures, such as the Open Grid Service Architecture
(OGSA), which are being implemented in new prototype advanced infrastructures. The
DWDM-RAM project, funded by DARPA, is actively addressing the challenges of next
generation applications. DWDM-RAM is an architecture for data-intensive services
enabled by next generation dynamic optical networks. It develops and demonstrates a
novel architecture for new data communication services, within the OGSA context, that
allows for managing extremely large sets of distributed data. Novel features move
network services beyond notions of the network as a managed resource, for example, by
including capabilities for dynamic on-demand provisioning and advance scheduling.
DWDM-RAM encapsulates optical network resources (Lambdas, lightpaths) into a Grid
Service and integrates their management within the Open Grid Service Architecture.
Migration to emerging standards such as WS-Resource Framework (WS-RF) should be
staright forward. In initial applications, DWDM-RAM targets specific data-intensive
services such as rapid, massive data transfers used by large scale eScience applications,
including: high-energy physics, geophysics, life science, bioinformatics, genomics,
medical morphometry, tomography, microscopy imaging, astronomical and astrophysical
imaging, complex modeling, and visualization.
44)
Soft Semantic Web services agent
Wang, Haibin; Zhang, Yan-Qing; Sunderraman, Rajshekhar
Department of Computer Science Georgia State University, Atlanta, GA 30302, United States
Conference: NAFIPS 2004 - Annual Meeting of the North American Fuzzy Information
Processing Society: Fuzzy Sets in the Heart of the Canadian Rockies , Banff, Alta, Canada ,
20040627-20040630 , (Sponsor: IEEE Systems, Man, and Cybernetics Society; North
American Fuzzy Information Processing Society,NAFIPS; Institute of Electrical and
Electronics Engineers, IEEE)
Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS
NAFIPS 2004 - Annual Meeting of the North American Fuzzy Information Processing
Society: Fuzzy Sets in the Heart of the Canadian Rockies v 1 2004. , 2004
Language: English
Abstract: Web services play an active role in the business integration and other fields
such as bioinformatics. Current Web services technologies such as WSDL, UDDI,
BPEL4WS and BSML are not semantic-oriented. Several proposals have been proposed to
develop Semantic Web services to facilitate the discovery of relevant Web services. In
our vision, with the mature of Semantic Web services technologies, there will be a lot of
public or private Semantic Web services Registries based on specific ontologies. These
Registries may provide a lot of similar Web services. So how to provide the high quality
of service (QoS) Semantic Web services for specific domain using these Registries will be
a challenge task. Different domains have different requirements of QoS, it is impractical
to use classical mathematical modeling methods to evaluate the QoS of Semantic Web
services. In this paper, we propose a framework called Soft Semantic Web services Agent
(SSWSA) for providing high QoS Semantic Web services using soft computing
methodology. And we will use fuzzy neural network with GA learning algorithm as our
study case. Simulation result shows that the SSWSA could handle fuzzy and uncertain
QoS metrics effectively.
45)
Asynchronous HMM with applications to speech recognition
Garg, Ashutosh; Balakrishnan, Sreeram; Vaithyanathan, Shivakumar
IBM Almaden Research Center, San Jose, CA 95120, United States
Conference: Proceedings - IEEE International Conference on Acoustics, Speech, and Signal
Processing , Montreal, Que, Canada , 20040517-20040521 , (Sponsor: Institute of Electrical
and Electronics Engineers,)
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings Proceedings - IEEE International Conference on Acoustics, Speech, and Signal
Processing v 1 2004. , 2004
Language: English
Abstract: We develop a novel formalism for modeling speech signals which are
irregularly or incompletely sampled. This situation can arise in real world applications
where the speech signal is being transmitted over an error prone channel where parts of
the signal can be dropped. Typical speech systems based on Hidden Markov Models,
cannot handle such data since HMMs rely on the assumption that observations are
complete and made at regular intervals. In this paper we introduce the asynchronous
HMM, a variant of the inhomogenous HMM commonly used in Bioinformatics, and show
how it can be used to model irregularly or incompletely sampled data. A nested EM
algorithm is presented in brief which can be used to learn the parameters of this
asynchronous HMM. Evaluation on real world speech data that has been modified to
simulate channel errors, shows that this model and its variants significantly outperforms
the standard HMM and methods based on data interpolation.
46)
Toward large-scale modeling of the microbial cell for computer simulation
Ishii, Nobuyoshi; Robert, Martin; Nakayama, Yoichi; Kanai, Akio; Tomita, Masaru
Conference: Highlights from the ECB11: Building Bridges Between Bioscience , Basel,
Switzerland , 20030801-20030801
Journal of Biotechnology v 113 n 1-3 Sep 30 2004. p 281-294 , 2004
Language: English
Abstract: In the post-genomic era, the large-scale, systematic, and functional analysis
of all cellular components using transcriptomics, proteomics, and metabolomics, together
with bioinformatics for the analysis of the massive amount of data generated by these
"omics" methods are the focus of intensive research activities. As a consequence of these
developments, systems biology, whose goal is to comprehend the organism as a complex
system arising from interactions between its multiple elements, becomes a more tangible
objective. Mathematical modeling of microorganisms and subsequent computer
simulations are effective tools for systems biology, which will lead to a better
understanding of the microbial cell and will have immense ramifications for biological,
medical, environmental sciences, and the pharmaceutical industry.In this review, we
describe various types of mathematical models (structured, unstructured, static,
dynamic, etc.), of microorganisms that have been in use for a while, and others that are
emerging. Several biochemical/cellular simulation platforms to manipulate such models
are summarized and the E-Cell system**1 developed in our laboratory is introduced.
Finally, our strategy for building a "whole cell metabolism model", including the
experimental approach, is presented. copy 2004 Elsevier B.V. All rights reserved.
47)
Bringing planning to autonomic applications with ABLE
Srivastava, Biplav; Bigus, Joseph P.; Schlosnagle, Donald A.
IBM India Research Laboratory IIT Delhi, Hauz Khas, New Delhi 110016, India
Conference: Proceedings - International Conference on Autonomic Computing , New York,
NY, United States , 20040517-20040518 , (Sponsor: IEEE Computer Society; IBM; Sun
Microsystems; National Science Foundation)
Proceedings - International Conference on Autonomic Computing Proceedings International Conference on Autonomic Computing 2004. , 2004
Language: English
Abstract: Planning has received tremendous interest as a research area within AI over
the last three decades but it has not been applied commercially as widely as its other AI
counterparts like learning or data mining. The reasons are many: the utility of planning
in business applications was unclear, the planners used to work best in small domains
and there was no general purpose planning and execution infrastructure widely available.
Much has changed lately. Compelling applications have emerged, e.g., computing
systems have become so complex that the IT industry recognizes the necessity of
deliberative methods to make these systems self-configuring, self-healing, selfoptimizing and self-protecting. Planning has seen an upsurge in the last decade with new
planners that are orders of magnitude faster than before and are able to scale this
performance to complex domains, e.g., those with metric and temporal constraints.
However, planning and execution infrastructure is still tightly tied to a specific application
which can have its own idiosyncrasies. In this paper, we fill the infrastructural gap by
providing a domain independent planning and execution environment that is
implemented in the ABLE agent building toolkit, and demonstrate its ability to solve
practical business applications. The planning-enabled ABLE is publicly available and is
being used to solve a variety of planning applications in IBM including the selfmanagement/autonomic computing scenarios.
48)
Design and implementation of a computational Grid for bioinformatics
Yang, Chao-Tung; Kuo, Yu-Lun; Lai, Chuan-Lin
High-Perf. Computing Laboratory Department of Computer Science Tunghai University,
Taichung, 407, Taiwan
Conference: Proceedings - 2004 IEEE International Conference on e-Technology, eCommerce and e-Service, EEE 2004 , Taipei, Taiwan , 20040328-20040331 , (Sponsor: IEEE
Task Committee on E-Commerce; Fu-Jen University of Taiwan; BIKMrdc of Fu-Jen
University; Academia Sinica; National Science Council of Taiwan)
Proceedings - 2004 IEEE International Conference on e-Technology, e-Commerce and eService, EEE 2004 Proceedings - 2004 IEEE International Conference on e-Technology, eCommerce and e-Service, EEE 2004 2004. , 2004
Language: English
Abstract: The popular technologies, internet computing and Grid technologies promise
to change the way we tackle complex problems. They will enable large-scale aggregation
and sharing of computational, data and other resources across institutional boundaries.
And harnessing these new technologies effectively will transform scientific disciplines
ranging from high-energy physics to the life sciences. The computational analysis of
biological sequences is a kind of computation driven science. Cause the biology data
growing quickly and these databases are heterogeneous. We can use the grid system
sharing and integrating the heterogeneous biology database. As we know, bioinformatics
tools can speed up analysis the large-scale sequence data, especially about sequence
alignment and analysis. The FASTA is a tool for aligning multiple protein or nucleotide
sequences. These two bioinformatics software which we used is a distributed and parallel
version. The software uses a message-passing library called MPI (Message Passing
Interface) and runs on distributed workstation clusters as well as on traditional parallel
computers. A grid computing environment is proposed and constructed on multiple Linux
PC Clusters by using Globus Toolkit (GT) and SUN Grid Engine (SGE). The experimental
results and performances of the bioinformatics tool using on grid system are also
presented in this paper.
49)
Proceedings - Fourth IEEE symposium on bioinformatics and bioengineering, BIBE
2004
Anon (Ed.)
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: The proceedings contains 73 papers from the conference on Fourth IEEE
Symposium on Bioinformatics and Bioengineering, BIBE 2004. The topics discussed
include: techniques for enhancing computation of DNA curvature molecules; towards
automating an interventional radiological procedure; reducing the computational load of
energy evaluations for protein folding; segmentation of the sylvian fissure in brain MR
images; biomedical ontologies in post-genomic information systems; identifying
significant genes from microarray data; good spaced seeds for homology search; and
estimating seed sensitivity on homogeneous alignments.
50)
SemanticObjects and biomedical informatics
Kitazawa, Atsushi; Yoshimura, Masayoshi
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: The use of SemanticObjects (SO) in biomedical informatics is discussed. SO is
a virtual database supporting object relational data model accommodating nested
relational data. It is observed that advances in biomedical informatics will lead to a new
generation of database, knowledge base, software engineering, security, user interface
and operating system technologies. Bioinformatics requires intelligent algorithm be
developed to solve complex biomedical problems and also new tools to assist physicians
and biologists to manage and utilize the large amount of information available.
51)
Automating the determination of open reading frames in genomic sequences using the
web service techniques - A case study using SARS Coronavirus
Chang, Paul Hsueh-Min; Soo, Von-Wun; Chen, Tai-Yu; Lai, Wei-Shen; Su, Shiun-Cheng;
Huang, Yu-Ling
Department of Computer Science National Tsing-Hua University, Hsinchu, 300, Taiwan
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: As more and more new genome sequences were reported nowadays,
analyzing the functions of a new genome sequence becomes more and more desirable
and compelling. However, the determination of the functions of a genomic sequence is
not an easy task. Even with several bioinformatic tools, the task is still a labor-intensive
one. This is because human experts have to intervene during the processing of using
these tools. For efficiency, immediacy and reduction of human labor, a system of
automating the analyzing process is proposed. We take the automated determination of
Open Reading Frames of a genomic sequence as the domain tasks that involve using a
number of computational tools and interpreting the results returned from the tools. A
service-oriented approach is taken, in which analyzing tools are wrapped as Web services
and described in Semantic Web languages including OWL and OWL-S. The SARS
Coronavirus genomic sequence is taken as a test case for our approaches. We are in the
process of building an agent-based system for automating the tasks, in which an
intelligent agent is responsible for understanding purposes of the Web services by
parsing the service descriptions, and carrying out the interpretation tasks according to a
workflow.
52)
Efficient filtration of sequence similarity search through singular value decomposition
Aghili, S. Alireza; Sahin, Ozgur D.; Agrawal, Divyakant; El Abbadi, Amr
Department of Computer Science Univ. of California Santa Barbara, Santa Barbara, CA
93106, United States
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: Similarity search in textual databases and bioinformatics has received
substantial attention in the past decade. Numerous filtration and indexing techniques
have been proposed to reduce the curse of dimensionality. This paper proposes a novel
approach to map the problem of whole-genome sequence similarity search into an
approximate vector comparison in the well-established multidimensional vector space.
We propose the application of the Singular Value Decomposition (SVD) dimensionality
reduction technique as a pre-processing filtration step to effectively reduce the search
space and the running time of the search operation. Our empirical results on a
Prokaryote and a Eukaryote DNA contig dataset, demonstrate effective filtration to prune
non-relevant portions of the database with up to 2.3 times faster running time compared
with q-gram approach. SVD filtration may easily be integrated as a pre-processing step
for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We
analyze the precision of applying SVD filtration as a transformation-based dimensionality
reduction technique, and finally discuss the imposed trade-offs.
53)
An IDC-based algorithm for efficient homology filtration with guaranteed seriate
coverage
Lee, Hsiao Ping; Shih, Ching Hua; Tsai, Yin Te; Sheu, Tzu Fang; Tang, Chuan Yi
Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: The homology search within genomic databases is a fundamental and crucial
work for biological knowledge discovery. With exponentially increasing sizes and accesses
of databases, the filtration approach, which filters impossible homology candidates to
reduce the time for homology verification, becomes more important in bioinformatics.
Most of known gram-based filtration approaches, like QUASAR, in the literature have
limited error tolerance and would conduct potentially higher false-positives. In this paper,
we present an IDC-based lossless filtration algorithm with guaranteed seriate coverage
and error tolerance for efficient homology discovery. In our method, the original work of
homology extraction with requested seriate coverage and error levels is transformed to a
longest increasing subsequence problem with range constraints, and an efficient
algorithm is proposed for the problem in this paper. The experimental results show that
the method significantly outperforms QUASAR. On some comparable sensitivity levels,
our homology filter would make the discovery more than three orders of magnitude
faster than that QUASAR does, and more than four orders faster than the exhaustive
search.
54)
ARMEDA II: Supporting genomic medicine through the integration of medical and
genetic databases
Garcia-Remesal, M.; Maojo, V.; Billhardt, H.; Crespo, J.; Alonso-Calvo, R.; Perez-Rey, D.;
Martin, F.; Sousa, A.
Biomedical Informatics Group Polytechnical University of Madrid, Madrid, Spain
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: In this paper we present ARMEDA II, a project designed to integrate
distributed heterogeneous medical and genetic databases in support of genomic
medicine. In this project, we have followed a "virtual repository" or VR approach.
Although VRs are entities that do not contain any data, but metadata, they give users the
perception of being working with local repositories that integrate data from different and
remote sources. Our approach is based on two basic operators employed to connect new
databases to the system: mapping and unification. The mapping process produces what
is called the "virtual conceptual schema" of the newly created VR while the unification
process provides tools to create an integrated virtual schema for at least two pre-existing
VRs. We tested the current implementation of ARMEDA II using two tumor databases,
one containing information from a hospital and the other containing genetic data
associated to the tumor samples. The performance of the system was also evaluated
using a pre-created set of 30 queries. For all queries the test yielded promising results
since the system successfully retrieved the correct information. The ARMEDA II project is
the current version of an ongoing project developed in the framework of an European
Commission funded project.
55)
European support to biomedical informatics development: In pursue of genomic
medicine
Sanz, Ferran; Diaz, Carlos; Martin-Sanchez, Fernando; Bonis, Julio
Biomed. Informatics Research Group Munic. Inst. of Medical Research IMIM, Barcelona,
Spain
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: Analyses of the relationships and synergy between Bioinformatics (BI) and
Medical Informatics (MI) show that there is a great potential for synergy between both
disciplines with a view on continuity and individualisation of healthcare, but that a
collaborative effort is needed to bridge the current gap between them. Biomedical
Informatics (BMI) is the emerging discipline that aims to put these two worlds together
so that the discovery and creation of novel diagnostic and therapeutic methods is
fostered. The INFOBIOMED network is a new approach that aims to set a durable
structure for this collaborative strategy in Europe, mobilising the critical mass of
resources necessary for enabling the consolidation of BMI as a crucial scientific discipline
for future healthcare. The specific objectives of INFOBIOMED aim at enabling systematic
progress in clinical and genetic data interoperability and integration and advancing the
exchange and interfacing of methods, tools and technologies used in both MI and BI.
Moreover, it intends to enable pilot applications in particular fields that demonstrate the
benefits of a synergetic approach in BMI, as well as to create a robust framework for
education, training and mobility of involved researchers in BMI for the creation of a solid
European BMI research capacity.
56)
Biomedical ontologies in post-genomic information systems
Perez-Rey, D.; Maojo, V.; Garcia-Remesal, M.; Alonso-Calvo, R.
Artificial Intelligence Laboratory School of Computer Science Polytechnic University of
Madrid, Boadilla del Monte, 28660 Madrid, Spain
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: After the completion of the Human Genome Project, a new, post genomic era,
is beginning to analyze and interpret the huge amount of genomic information.
Information methods and techniques from areas such as database integration,
information retrieval, knowledge discovery in databases (KDD) and decision support
systems (DSS) are needed. These systems should take into account idiosyncratic
differences between these two interacting fields, medicine and biology. Their
correspondent medical informatics (MI) and bioinformatics (BI) should also interact and
there is a need for a point to support the communication. Biomedical ontologies can be
used to enhance biomedical information systems, providing a knowledge sharing
framework. However, ontology tools are still in its infancy and there is a need of
standards, services, automatic management tools, etc... to be able to properly apply this
technology environment. Nevertheless, ontologies are just the technical framework the
most important issue is the content and the use policy.
57)
GeneWebEx: Gene annotation web extraction, aggregation, and updating from webbased biomolecular databanks
Masseroli, Marco; Stella, Andrea; Meani, Natalia; Alcalay, Myriam; Pinciroli, Francesco
Bioengineering Department Politecnico di Milano, I-20133 Milano, Italy
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: Numerous genomic annotations are currently stored in different webaccessible databanks that scientists need to mine with user-defined queries and in a
batch mode to orderly integrate the diverse mined data in suitable user-customizable
working environments. Unfortunately, to date, most accessible databanks can be
interrogated only for a single gene or protein at a time and generally the data retrieved
are available in HTML page format only. We developed GeneWebEx to effectively mine
data of interest in different HTML pages of web-based databanks, and organize extracted
data for further analyses. GeneWebEx utilizes user-defined templates to identify data to
extract, and aggregates and structures them in a database designed to allocate the
various extractions from distinct biomolecular databanks. Moreover, a template-based
module enables automatic updating of extracted data. Validations performed on
GeneWebEx allowed us to efficiently gather relevant annotations from various sources,
and comprehensively query them to highlight significant biological characteristics.
58)
Design of specie-specific primers for virus diagnosis in plants with PCR
Rocha, K.; Medeiros, C.; Monteiro, M.; Goncalves, L.; Marinho, P.
Univ. Federal do Rio Grande do Norte DCA-CT-UFRN, CEP. 59. 072-970, Natal, RN, Brazil
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: We propose a specialist software to diagnose viral disease in plants. Our
strategy is to align nucleotide sequences of plant virus to discover specie-specific regions
of genes of the viral genomes, so as to design a primer. The program designs oligonucleotide primers used for polymerase chain reaction (PCR), a very cheap diagnosis
technique. The user can specify (or use default) constraints for primer and amplified
product lengths, as percentage of G+C, absolute or relative melting temperatures, and
primer 3 prime nucleotides. The program screens candidate primer sequences with
displayed user-specifiable parameters in order to help minimizing nonspecific priming and
primer secondary structure. We tested this tool by designing two specific primers which
were used to amplify known viral species, then used to perform a virus diagnosis.
59)
Using distributed computing platform to solve high computing and huge data processing
problems in bioinformatics
Chen, Shih-Nung; Tsai, Jeffrey J.P.; Huang, Chih-Wei; Chen, Rong-Ming; Lin, Raymond
C.K.
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: Since the problems in bioinformatics are related to massive computing and
massive data. In recent years, due to distributed computing is gaining recognition. The
task originally requiring high computing power does not only rely on supercomputer.
Distributed computing used off-the-shelf PC with high speed network can offer low cost
and high performance computing power to handle the task. Therefore, the purpose of
this paper is to implement a complete distributed computing platform based on peer-topeer file sharing technology. The platform integrated scheduling, load balancing, file
sharing, maintenance of data integrity, and user-friendly interface etc. functions.
Through the platform can assist bioinformaticists in massive computing and massive data
problems. Besides, the platform is easier use, more reliable, and more helpful than
others for researchers to conduct bioinformatics research.
60)
An effective approach for constructing the phylogenetic tree on a grid-based
architecture
Liu, Damon Shing-Min; Wu, Che-Hao
Department of Computer Science National Chung Cheng University, Chiayi, 621, Taiwan
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: In biological research, scientists often need to use the information of the
species to infer the evolutionary relationship among them. The evolutionary relationships
are generally represented by a labeled binary tree, called the evolutionary tree (or
phylogenetic tree). Reconstructing evolutionary tree is a major research problem in
biology, and this problem is often known as phylogeny problem. The difficulty of such
problem is that the number of possible evolutionary trees is very large. As the number of
species increases, exhaustive enumeration of all possible relationships is not feasible.
The quantitative nature of species relationships therefore requires the development of
more rigorous methods for tree construction. The phylogeny problem is computationally
intensive, thus it is suitable for distributed computing environment. Grid Computing (or
Computational Grid) is a new concept to integrate the CPU power, the storage and other
resources via Internet in order to get overall computing power. Nowadays, many
bioinformaticists are developing the BioGrid technology in order to solve the challenges
that need intensive computing in biology. In this paper, we design and develop a Gridbased system, and propose an efficient method based on the concept of quartet for
solving the phylogeny problem on this architecture.
61)
Towards Ubiquitous Bio-Information Computing: Data protocols, middleware, and web
services for heterogeneous biological information integration and retrieval
Hong, Pengyu; Zhong, Sheng; Wong, Wing H.
Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering,
BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society;
IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan;
Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information
Industry, Taiwan)
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004
2004. , 2004
Language: English
Abstract: Biological information computing is rapidly advancing from homogeneous data
computation to large-scale heterogeneous data computation. However, the development
of data specification protocols, software middleware, and Web services, which support
large-scale heterogeneous data exchange, integration, and computation, generally falls
behind data expansion rates and bioinformatics demands. The Ubiquitous Bio-Information
Computing (UBIC**2) project aims to disseminate software packages to assist the
development of heterogeneous bio-information computing applications that are
interoperable and may run distributedly. UBIC**2 lays down the software architecture for
integrating, retrieving, and manipulating heterogeneous biological information so that
data behave like being stored in a unified database. UBIC**2 programming library
implements the software architecture and provides application programming interfaces
(APIs) to facilitate the development of heterogeneous bio-information computing
applications. To achieve interoperability, UBIC**2 Web services use XML-based data
communication means, which allow distributed applications to consume heterogeneous
bio-information regardless of platforms. The documents and software package of
UBIC**2 are available at http://www.ubic2.org.
Konec rešerše