R00063-Bioinformatika RešeRše: “Bioinformatika.” Datum vypracování: 2004-10-21 Požadavky: Bioinformatika. Informace o semináři Bioinformatika II na Invexu 2004. Obecné informace o bioinformatice. Přístupy k datům. Firmy zaměřené na software s linkami na ně. Informace o projektu "Blue gene" společnosti IBM, aplikaci HP AlphaServer v projektu "GeneProt" a o projektu skupiny "SUN - Computational biology". Údaje z databází UK Oxford, Cambridge a Institutu neurologie univeristy Londýn. Údaje z databáze Dialog. Klíčová slova: „Bioinformatika.“ „Bioinformatics.“ „Blue gene.“ Invex 2004-Bioinformatika II-seminář: Bioinformatika II – Metody, technologie a software Místo konání: pavilon E, Press Center Organizátoři: Přírodovědecká fakulta – Masarykova univerzita v Brně, Veletrhy Brno, a.s. Odborný garant: prof. Jiří Damborský Popis: Druhý ročník odborného sympozia za účasti zahraničních vědeckých kapacit na téma inspirující pro budoucí rozvoj informatiky se mimo jiné zaměří na aplikovatelnost vědeckých poznatků v praxi. Na stránskách www.muni.cz najdete kontakt odborneho garanta seminare pana doc. Mgr. Jiří Damborskeho, Dr. doc. Mgr. Jiří Damborský, Dr. Fakulta/Ústav Přírodovědecká fakulta Katedra/Oddělení Národní centrum pro výzkum biomolekul Kancelář pav. 07/02001 (Kotlářská 2, 611 37 Brno) Telefon 549 49 3467 Fax 549 49 2556 E-mail [email protected] WWW Home Page http://ncbr.chemi.muni.cz/~jiri/ Na stránkách http://www.cba.muni.cz/projekty/bioinformatics/program.htm najdete program semináře Bionformatika II. Program: 13:00* 13:00-13:05 Jan Žaloudík and Milan Gelnar : Welcome 13:05-13:50 Janusz Bujnicki, International Institute of Molecular and Cellular Biology, Warsaw, Poland (EMBO Young Investigator Lecture): Metaservers and Frankenstein's Monsters: Protein Structure Prediction by Consensus Fold Recognition and Assembly of Fragments 14:00 14:00-14:25 Jan Pačes, Institute of Molecular Genetics: Bioinformatics: What Can We Do with Genomes in Computers? 14:25-14:50 Matej Lexa, Masaryk University: 14:50-15:15 Martina Réblová, Botanical Institute: 15:15 coffee-break, poster session, software demonstrations Weak Similarity in Biological Sequences: Rapid Approximate Word Searches and Their Use to Identify Structural Features in Protein Sequences Phylogenetic Analysis: Methods and Principles for Constructing Phylogenies 16:30 16:30-16:55 Jiří Vondrášek, Institute of Organic Chemistry and Biochemistry: 16:55-17:20 Petr Hořín, Veterinary and Pharmaceutical University: 17:20-17:45 Ladislav Dušek, Centre of Biostatistics and Analyses, Masaryk University: Structural Bioinformatics: How Far We Can Go from an Amino Acids Sequence Genomic Approaches in Analysis of Complex Traits: Example of Innate Immunity Multidimensional Data Sources in Current Biology and Medical Sciences: How to Get Information Effectively? 17:45 17:45-17:55 Closing discussion 17:55-18:00 Jiří Damborský: Closure * presenting authors are kindly requested to be at the lecture room 40 min before beginning of the meeting to check their presentations, mount posters and install software. Obecné informace o bioinformatice: Bioinformatika s pomocí počítačů řeší biologické otázky. Někdy modeluje procesy nebo stavy, někdy zpřístupňuje data ve formě databází, někdy něco předpovídá (například předpovídá/nalézá geny a jiné "elementy" uvnitř nukleotidové sekvence tvořící chromozóm). Dále třeba předpovídá funkci bílkovin obsažených v předpovězených genech (rozuměj kódovaných těmi geny), předpovídá struktury molekul a jejich interakce (třeba s nějakými dalšími molekulami), porovnává jednotlivé organismy mezi sebou na základě různých kombinací genů v nich obsažených a v neposlední řadě na základě sekvenční odlišnosti určuje míru jejich příbuznosti. A to tento výčet jistě není úplný. Krátce řečeno, odcizila ostatním biologickým oborům vše co souvisí s počítači. Když jsem někdy minulý týden ve svém PocketPC četl článek z New York Times (ale možná to bylo z něčeho úplně jiného), hovořilo se v něm o velmi zajímavé věci. Článek pojednával o nutné obměně generací lidí, kteří hýbou současným Internetem. Hovořilo se v něm o něčem, co už tvrdím asi dva roky. Současný Internet rozhýbala řada nadšenců, lidí nebojících se riskovat, přijímat rozhodnutí. Ti jej doposud řídí a v mnoha případech jsou to spíše nespoutaní ďáblové, než lidé hovící administrativě, pořádku, plánům, podnikatelským záměrům a vůbec všem těm zdánlivě zbytečným věcem. Jenže, v článku se hovořilo o dvou důležitých záležitostech. Tou první byla skutečnost, kterou možná mnozí "hybatelé" Internetem pociťuji na vlastní kůži. Jde o jakousi vyčerpanost, "vyhořelost", nedostatek onoho prvotního elánu, nadšení, jásotu, zběsilosti a živelnosti. Po nějakých těch dvou až šesti letech, které tito lidé věnovali Internetu, se jim už jaksi "nechce". Po celou dobu pracovali určitě víc než 8 a půl hodiny denně. Neznali víkendy, pracovali dlouho do noci. A v některých případech zbohatli, v některých případech nikoliv. Každopádně dokázali nemožné. Posunout Internet kamsi daleko dopředu. Druhá důležitá věc, uvedená v onom zmiňovaném článku, ovšem musí přijít. Musí nastoupit armáda administrativně orientovaných "řiditelů" a "ředitelů". Musí nastoupit klasické způsoby podnikání. Běžné obchodní vztahy, běžné problémy, běžné postupy. Smlouvy, dohody, spory. Ona původní skupina lidí, kteří dokáží hýbat věcmi tam, kde původně nic není, se ovšem na tento cíl jaksi nehodí. Prostě na to nemají žaludek. Jsou to vizionáři, lidé kteří vymyslí, postaví, prosadí a přivedou v život něco, na co by si "běžný" člověk netroufl. Zhroutil by se totiž již v okamžiku, kdy by zjistil, kolik práce a prostředků je potřeba, aplikoval by standardní poučky a postupy, a pak odsoudil většinu projektů v zapomnění na dně temné zásuvky. O čem se ale také náhodou v článku hovořilo, bylo něco uplně jiného. Článek byl pochopitelně "emerický", takže jedna z osob, patřících do kategorie "Internetových buditelů", se jaksi mezi řečí zmínila, že teď si vezme tak šest měsíců dovolené, naučí se konečně surfovat a řídit letadlo. A pak se možná vrhna na biotechnologie, že to je právě ten obor co bude "frčet". A k Internetu? Tam se prý už asi nevrátí. Pak jsem, ke konci minulého týdne, narazil na další článek. News.com informoval o investici 100 milionů dolarů do biotechnologické revoluce. IBM je tím, kdo chce vrazit 100 milionů dolarů do vývoje produktů na pomoc vědcům zkoumajícím masivní množství dat majících (možná) vztah k chování genů a proteinů. A součástí investice je založení nové divize, věnující se právě vědám zkoumajícím život. A IBM není samo, článek v News.com upozornil, že podobné investice již proběhly od společností jako HP či Sun! Co víc, jde o pokračování investic do tohoto odvětví - IBM již loni v prosinci investovalo (také 100 milionů) do Blue Gene (ach, má oblíbená modrá barva!) iniciativy. Součástí investice je i vybudování superpočítače schopného pomoci porozumět vytváření proteinů! A aby se vše zhodnotilo, IBM samozřejmě vytvořilo novou obchodní jednotku, ta prodává počítače a služby určené biotechnologii, zdravotnictví, farmaceutice, genetice a dalším podobným vědeckým odvětvím. Ona se totiž biotechnologie (a genetika) bez pořádného počítačového výkonu prostě neobejde (viz. třeba článek Dolování dat pomáhá vědcům) Něco na tom tedy asi bude, že zrovna biotechnologie (přezdívaná ve zkratce biotech) může být právě tím oborem, kam prchnou ti správní vizionáři a buditelé. Což mi nedá nevzpomenout na stále se množící informace o (dalo by se i říci) aférách týkajících se právě biotechnologie. Patří sem notoricky známá klonovaná ovce Dolly, nedávné oznámení o uzavření projektu mapování lidského genomu, snahy o patentování genetických informací (horké téma v USA) a různé úvahy o využití genetického inženýrství (něco málo třeba v článku Balancování nad genetickou propastí). Co vy na to? Zdroje na Internetu: Biotechnology@Yahoo! Bio Online BioSpace National Biotechnology Information Facility Biotech Chronicles DNAPatent.com SciWeb Algoritmy pro biotechnologie: Od farmakogenetiky po sekvenování Bioinformatika motivuje dodavatele hardwaru i softwaru k vývoji stále výkonnějších počítačů i chytřejších algoritmů. Jaká je ale podstata oněch výpočetně náročných úloh, od kterých se současně tolik očekává např. ve farmaceutickém průmyslu? V následujícím článku si představíme několik z nich. Začít můžeme např. vyhodnocováním klinických a dalších testů, které jsou součástí cyklu vývoje léků. V podstatě se jedná o obyčejnou statistiku. Zajímavou však úlohu činí skutečnost, že neexistují pouze látky účinné a neúčinné, ale také léky působící pouze za určitých podmínek nebo u určitých skupin obyvatelstva. Informatika pak musí dodat nástroje, které dokáží v ohromných souborech dat vyhmátnout na první pohled unikající souvislosti. Léky šité na míru Příkladem je třeba kauza léku BiDil, který je určen na srdeční choroby (podrobněji referoval např. server Osel.cz, viz http://www.osel.cz/index.php?obsah=6&clanek=843). V 80. letech byl tento preparát testován, avšak jeho účinnost na obecnou populaci se nepodařilo prokázat a k výrobě léku nedošlo. Teprve díky nové analýze tehdy dat "po jednotlivých skupinách", kterou provedli informatici americké firmy Nitro Med, se ukázalo, že látka dává nadějné výsledky u Afroameričanů, prakticky neúčinná je však u bílých. Následné klinické testy tento rozdíl potvrdily a výsledkem je tak první lék určený pro konkrétní populaci. BiDil se nyní nachází ve fázi schvalování. Lék pro konkrétní populaci je samozřejmě jen prvním krokem, protože se stále jedná o hodně hrubé měřítko. V budoucnu se ale předpokládá medicína ušitá na míru přímo konkrétním jedincům v závislosti na analýze jejich genetické informace. Už nyní dává medicína dělící se podle jednotlivých populací šanci různým izolovaným skupinám a menšinám, které se od "obecného vzorku" značně odlišují a často trpí specifickými chorobami. Kromě velkých, plošně působících farmaceutických koncernů se předpokládá také vznik malých biotechnologických firem zaměřených právě na vývoj léků pro takové konkrétní skupiny/populace. Podobný scénář alespoň zazněl na jarním setkání First Tuesday, které bylo věnováno právě biotechnologiím. (podrobnosti http://www.scienceworld.cz/sw.nsf/ID/9DA53EA026ECDF20C1256EA70037B88B?OpenDo cument&cast=1) Většina výše popsaných problémů patří z informatického hlediska do kategorie získávání znalostí z dat. „Tato koncepce se označuje jako farmakogenomika a věští se jí světlá budoucnost. Vývoj nového léku je bohužel náročný a drahý a farmaceutické firmy nejsou dobročinné organizace. Musejí vydělávat a to znamená, že se jim nemalé investice musejí vrátit. To značně omezuje vývoj léků, které by působily jen na malé skupiny lidí – Afroameričané jsou v tomto směru populace dosti početná a relativně movitá. Potenciální trh s lékem pro opravdovou menšinu by byl příliš malý. Prozatím se dá proto počítat spíše s tím, že lékaři budou na základě výsledků výzkumu ve farmakogenomice volit ze stávajících preparátů ty, u kterých bude pro danou skupinu obyvatel menší riziko nežádoucích vedlejších účinků,“ uvádí k tomu Prof. Ing. Jaroslav Petr, DrSc., který pracuje ve Výzkumném ústavu živočišné výroby v pražské Uhříněvsi a přednáší biotechnologie na České zemědělské univerzitě. Ještě jeden názor na farmakogenetiku Michael Storek, biochemik firmy Compound Therapeutics, [email protected] "Mnohé začínající biotechnologické společnosti se před několika lety začaly zabývat farmakogenomikou. Myšlenka je poměrně prostá, stačí přečíst variace genetické informace (tedy DNA) pacienta a na jejím základě určit, zda daný lék pacientovi pomůže či zda mu hrozí vedlejší účinky. Tyto jednoduché principy se ale zatím nepodařilo přeměnit do komerčně úspěšných technologií. První problém představuje cena přečtení DNA. Ačkoli se technologie DNA sekvenování stále vylepšuje, přesto se cena čtení genů odpovídajících za účinek daného léku pohybuje ve stovkách dolarů. Velké farmaceutické firmy také nikdy nebyly farmakogenomice příliš nakloněny, neboť menší skupina pacientů by pro ně znamenala nižší tržby. Menší biotechnologické společnosti marně spoléhaly na spolupráci s farmaceutickými giganty a buď zkrachovaly, nebo rychle změnily obor podnikání. Co bude s farmakogenomikou dále? Velká část výzkumu léků šitých na míru se nyní přesunula na univerzity. Farmaceutické firmy užívají farmakogenomiku ke “vzkříšení” léků, které během klinických zkoušek vykazovaly účinnost jen u části pacientů. Nezbývá než věřit, že klesající cena DNA sekvenování dovolí přečíst celý genetický kód pacienta a ten pak bude součástí jeho zdravotní karty - podobně jako je tomu dnes s informací o očkování." Sekvence DNA Již téměř klasickou úlohu z oblasti bioinformatiky představuje sekvenování, tedy "čtení" DNA písmenko po písmenku. Nejznámějším případem je samozřejmě projekt lidského genomu. Bioinformatika pomohla především následujícím způsobem: Namísto čtení DNA písmenko po písmenku se nyní postupuje v zásadě tak, že dojde k namnožení molekul DNA, jejich následnému náhodnému sestříhání a pak k softwarové analýze překryvů, z níž má být stanovena původní sekvence (Ve skutečnosti je to trochu složitější, uplatní se také schopnost DNA přepisovat se do RNA - zřejmě nejpoužívanější je v tomto případě metoda tzv. estů, se kterou přišel bývalý ředitel firmy Celera Craig Venter, zřejmě nejznámější postava z celého projektu lidského genomu. Princip však zůstává stejný. - podrobnosti viz např. http://www.scienceworld.cz/sw.nsf/ID/7B352C62F13B62D4C1256E970048FADD?OpenDo cument&cast=1 http://www.scienceworld.cz/sw.nsf/ID/E237DD7AF94ADBDDC1256E970048FAD5?OpenD ocument&cast=1). Popsaná úloha vypadá triviálně, je však třeba si uvědomit, že před sebou máme řetězce dlouhé miliardy písmenek. Samozřejmě, že úlohu můžeme "řešit" prostě tak, že veškeré existující rozstříhané sekvence složíme lineárně za sebe. Takový výsledek bude vyhovovat zadání v tom smyslu, že uplatníme všechny sekvence - my jsme ale DNA stříhali a potřebujeme samozřejmě najít překryvy. V úloze jde vlastně o to, že hledáme nejkratší řetězec vyhovující všem podmínkám, minimum v obrovském stavovém prostoru. Po stránce algoritmu má úloha blízko ke známému problému obchodního cestujícího. Kopírování DNA navíc neprobíhá se 100% účinností, dochází při něm k chybám. Úkolem algoritmu je proto najít nejspíše nejpravděpodobnější sekvenci. A zbývá dodat (což platí v bioinformatice velmi často), že aby se na problému mohly podílet výzkumné týmy z celého světa, je třeba jej efektivně paralelizovat. „Bez pokroku v počítačové technice by pokrok v genomice rozhodně nenabral takové tempo, jakého jsme svědky," vysvětluje Jaroslav Petr. "Čtením sekvencí DNA ale úloha počítačů v genomice nekončí. Počítače nám pomáhají pochopit, co je v genomu vlastně zapsáno. Zcela samostatný problém představuje hledání genů. Ty tvoří jen zlomek z celého genomu – u člověka asi 1,5 %. Dnes máme k dispozici algoritmy, které umějí geny ze záplavy písmen genetického kódu vyhmátnout. Dejme tomu, že takhle najdeme v lidském geonomu gen a chceme vědět, k čemu je dobrý. Jedna z možností, jak najít odpověď na tuhle otázku, je najít pomocí speciálního softwaru v rozsáhlých databázích obdobný gen u jiného živočicha, např. u myši. Myš pak můžeme podrobit experimentu, při kterém je vybraný gen vyřazen z funkce a vědci sledují, co takto postižené myši chybí. Odtud je už jen krůček k identifikaci příčiny dědičných chorob a hledání léku proti nim. Přiznejme si ale, že stávající algoritmy umějí dobře hledat pouze "typické" geny. Vůči genům, které by se vymykaly tomu, co o genech zatím víme - a které by proto byly nejspíš úžasně zajímavé - mohou být současné algoritmy slepé.“ Proteiny Klíčovou proceduru, která by mohla výrazně zefektivnit vývoj léků, představuje počítačové modelování 3D struktury proteinů. Právě 3D struktura má přitom těsný vztah i k biologické funkci. Připravit protein laboratorně a pak zkoumat jeho účinky je nákladné a časově náročné mnohem účinnější je použít modelování "in silico". Jako vstup máme pouhou sekvenci proteinu (tedy pořadí aminokyselin), z níž bychom se měli postupně naučit odhadovat prostorovou strukturu i biologickou funkci. Vlastní laboratorní testování by pak probíhalo pouze na molekulách, které už byly počítačově předvybrány. Celý problém je přitom komplikován tím, že tvar a funkce proteinu závisejí na "písmenkách" různých aminokyselin v různé míře - někdy stačí záměna jediné aminokyseliny k tomu, že vznikne nefunkční protein, jindy změny nemají nijak zřetelný dopad a kód vykazuje značnou redundanci. Funkčně odpovídající protein můžeme také často sestavit ze zcela odlišných řetězců aminokyselin. Spíše než analýza sekvence proteinu písmenko po písmenku se proto uplatňuje rozpoznávání obecnějších struktur, tzv. vzorů. Do kategorie rozpoznává vzorů, tedy na samé pomezí umělé inteligence, patří přitom i řada úloh v oblasti genomiky (více např. článek DNA bojuje proti spamu http://www.scienceworld.cz/sw.nsf/pocitace/352C372DF858F4FFC1256EF600533709?Op enDocument&cast=1). V této souvislosti může být zajímavé, že pro rozpoznávání vzorů byl již navržen také efektivní kvantový algoritmus (podrobnosti článek Kvantové rozpoznávání obrazů http://www.scienceworld.cz/sw.nsf/ID/C27175EFCA2B2CFBC1256E970048FF68?OpenDo cument&cast=1). Dejme opět slovo Jaroslavu Petrovi: „Vědní disciplína zvaná proteomika – tedy věda o bílkovinách v organismu – prožívá v současné době boom. Velmi zajímavé jsou případy, kdy protein mění své trojrozměrné uspořádání bez toho, že by se měnilo jeho aminokyselinové složení. S novým tvarem získá protein i nové vlastnosti. To je případ tzv. prionů čili proteinových infekčních částic, jež vyvolávají smutně proslulé choroby, jako je BSE u skotu nebo Creutzfeldt-Jakobopva choroba lidí. Tyto choroby vznikají vlastně „zašmodrcháním“ bílkoviny, která je nám vlastní a ve svém původním tvaru nám nijak neškodí. Studium takových prostorových přesmyků se zdá být důležité nejen pro studium chorob, ale i pro pochopení normálních funkcí našeho těla. Velmi podobné „šmodrchání“ jiné bílkoviny se v našem mozku významně účastní ukládání informací do paměti." model struktury prionu Kladistika Kladistické analýzy bývají využívány především v evoluční biologii. Zhruba řečeno v nich vycházíme z toho, že jednotlivé druhy se od sebe postupně oddělovaly známým "stromečkem". Jak ale určit konkrétní průběh onoho větvení? Představte si, že máme např. člověka, sysla a slona. Jak stanovit stromeček? Jaký z těchto druhů se od společného předka odštěpil jako první? (Jinak řečeno: Má např. člověk blíže k syslu nebo ke slonovi nebo je od obou vzdálen stejně? Poslední verze by platila, pokud by se nejdříve oddělil předek člověka a až potom předek sysla od předka slona.) Kladistika funguje tak, že vybere nějaké znaky (vcelku lhostejno, zda jde přitom o sekvence DNA nebo třeba o stavbu očí) a organismy podle nich srovnává. Výsledkem je pak např. mnohorozměrný prostor plný nul a jedniček - to za předpokladu, že u každého testovaného organismu rozlišujeme pouze to, zda daný znak má nebo nemá. Úloha má v principu opět nekonečně řešení (mutace vznikají náhodně), my však opět hledáme nejúspornější cestu grafem - minimum stavového prostoru. Ptáme se prostě, jakým nejmenším počtem větvení a kroků-mutací se můžeme dostat k existující diverzitě. Jakmile pro nějaký (obvykle hodně velký) soubor znaků stanovíme vývojový stromeček, vybereme si znaky jiné a provedeme srovnání znovu. To, co nás především zajímá, je především stabilita jednou utvořeného stromu. Pokud nám pro jiné znaky vyjde stejný strom, pak jsme evoluční události zřejmě zaregistrovali správně. Kladistika vede k závěrům, které příliš neladí s tradiční biologickou taxonomií, jak se učí na základních a středních školách. Vyjde nám totiž například to, že latimerie (ryba stojící blízko předkům obojživelníků) je vlastně příbuznější člověku než kaprovi, takže celá skupina "ryby" nemá z evolučního hlediska žádný smysl. (Na vysvětlenou: Stromeček v tomto případě probíhal tak, že nejprve došlo k oddělení předka kapra a až později se oddělil předek člověka a předek latimerie.) Zájemce o podrobnější popis kladistických metod lze odkázat např. na knihu Jak se dělá evoluce (Jan Zrzavý, David Storch, Stanislav Mihulka: Jak se dělá evoluce, Paseka, Praha, 2004, úryvky z knihy můžete dohledat i na Science Worldu). V kladistice ovšem nejde pouze o tvorbu teoretických konstrukcí a vývojových stromečků. Je důležité např. vědět, jak blízko mají jednotlivé organismy k člověku a identifikovat podobnosti i odlišnosti metabolických procesů - třeba v případě testování nových léků na zvířatech nebo při pokusech používat zvířata pro pěstování transplantátů určených lidským pacientům. Profesor Jaroslav Petr uvádí v této souvislosti následující zajímavost: „Podobnými metodami bývá hledán i obraz hypotetického prapředka všech stávajících organismů na Zemi (LUCA –last universal common ancestor). Je to zapeklitá práce, protože všechny procesy, kterými tento dávný prapředek všech dnešních živých tvorů vznikl, jsou zastřeny nespočtem následných změn dědičné informace každého z jeho potomků. Navíc se zdá, že jednoduché mikroorganismy si mezi sebou handlovaly geny tak čile, že pro ně představa stromu, který se větví, ale už nikdy nesplétá, prostě neplatí." Podrobnosti viz článek Hledá se první buňka http://www.scienceworld.cz/sw.nsf/ID/5EAF67184C501F09C1256EAF004BB31D?OpenDo cument&cast=1 Jazykové stromečky Následující aplikace je od vlastní bioinformatiky poněkud odlehlá, nicméně dobře ukazuje, že některé jednou vzniklé algoritmy mají mnohem obecnější uplatnění. Podobně jako dochází k větvení druhů, větvily se v minulosti také jednotlivé jazyky. Situace je v tomto případě samozřejmě složitější o to, že jednou vzniklé jazyky nejsou oddělené úplně pevně, mísí se a dochází mezi nimi nadále k přebírání slov i gramatických pravidel. Podobné výpůjčky nebyly ovšem především v minulosti nijak časté, a proto i v případě jazyků umíme na základě kladistických analýz konstruovat naše oblíbené stromečky. Opět platí, že výstupem z programu může být např. určitý konkrétní strom. Posléze změníme kritéria/vstupní data a analyzujeme stabilitu získaného stromu. Pokud dostaneme stejný strom např. po srovnání osobních zájmen jmen rodinných příslušníků, naše výsledky to činí výrazně věrohodnější. Kladistické analýzy byly prozatím použity především pro hledání geneze indoevropských jazyků. Výsledek podobných pokusů je zajímavý nejenom pro lingvisty, ale hodně také vypovídá o průběhu pravěkých migrací (poskytuje nám informace nejenom o tom, jak určité události probíhaly, ale také kdy k nim došlo). Nasnadě je kombinace takto získaných poznatků s historickým a archeologickým bádáním. Podrobnosti např. v článku Evoluce jazyků a pravěké migrace http://www.scienceworld.cz/sw.nsf/ID/D741AB35B059C852C1256E970049223D?OpenDo cument&cast=1 DNA jako počítač Speciální kapitolou bioinformatiky jsou pak také tzv. DNA počítače a DNA čipy, kterým jsme se věnovali na Science Worldu již opakovaně, naposled v článku DNA počítače odhalí nádorové buňky. http://www.scienceworld.cz/sw.nsf/ID/2132140648ADD43DC1256E9700492310?OpenDo cument&cast=1 Prof. K. R. Bruckdorfer Key Publications Signalling Pathways and Reactive Oxygen and Nitrogen Species in the Vasculature The cells of the artery wall regulate of the rate of blood flow, platelet activity and the formation of thrombi.These important functions may be disrupted in diseased arteries, leading to increased risks of thrombosis. Much of the work in our group is centred around the functions of these arterials cells and platelets, investing both the basic biochemistry and clinical manifestations of these phenomonena. One key areas of research concerns the gas nitric oxide, which is biosynthesised from L-arginine in both endothelial cells, platelets and macrophages. Nitric oxide relaxes smooth muscle cells and inhibits platelets activation. We have been interested particularly in the interactions of nitric oxide with reactive oxygen species such as hydrogen peroxide and superoxide anions. The latter react with NO to form peroxynitrite which in turn modifies proteins by nitration of tyrosine residues. We have shown already, in platelets, that nitration of platelets is a naturally occuring phenomenon. Nitration is clear important in pathological tissues e.g. there is a large accumulation of nitrated proteins in atherosclerotic lesions. We have also found that there may be mechanisms by which the nitro group may be removed. This may be of relevance to tyrosine phosphorylation mechanisms in cell activation and inhibition, since we have preliminary evidence that nitration may block some of the phosphorylation sites, at least transitorally. The aim of the projects in this area would be to investigate the role of nitration in tyrosine signalling mechanisms in platelets and the endothelium. Other projects concern the role of tissue factor, best know as the initiators of the soluble coagulation cascade. Tissue factor, a transmembrane glycoprotein, has other roles in relation to cell proliferation and development of blood vessels via signal transduction mechanisms. We are currently interested in the mechanism by which tissue factor, particularly its cytoplasmic domain, can interact with known signalling mechanisms. Prof. S.J. Perkins Key Publications Molecular structures of proteins in infection, inflammation and immunity Protein structural studies are central to many biochemical and molecular biology investigations in molecular medicine. Three-dimensional structures are invaluable for diagnosis of disease-causing mutations or development of rational strategies for therapy. There is a long tradition of protein structural work in our laboratory at the Royal Free, where we work with antibody and complement proteins. We also collaborate with clinical biochemists and molecular biologists on-site in Hospital Departments who use protein structures to visualise their projects. So our working environment is diverse and stimulating. Possible projects on offer are: (1) Structural determinations of the antibody classes and their interactions with receptors: the different forms of IgA. There are five antibody classes. Monomers and dimers of immunoglobulin IgA are extremely abundant in body secretions, yet their structures are poorly known. In addition IgA forms a number of important complexes with secretory component and with IgA alpha-receptors. We are using a range of technologies to determine protein structures at medium resolution using a novel methodology based on synchrotron X-rays at Daresbury (Cheshire), neutrons at the Rutherford (near to Oxford) and the ILL (Grenoble, France), and an analytical ultracentrifuge in our laboratory. We then use protein structure modelling methods performed on a cluster of modern Silicon Graphics Workstations using molecular graphics to determine the structures. The antibody structures will be correlated with the unique immunological role of IgA, possibly by additional experimentation in collaboration with our collaborators at Ninewells Hospital in Dundee. Related projects on offer may include work with the IgG and IgD classes, and with the antibody receptors themselves using the same technology. See Boehm et al (1999) and Perkins et al (1998). (2) Expression and NMR or crystallographic structure determinations of complement proteins important in inflammation: factor B and properdin. The components of the complement system provide a major non-adaptive immune defence mechanism for its host. These are activated in response to the challenge of foreign material in plasma. Complement activation proceeds through a series of limited proteolytic steps in one of three pathways, the alternative, the classical and the lectin pathways. The serine protease factor B plays a key role in the alternative pathway of activation and control of complement. Once the central protein C3 is activated, it forms a complex with factor B (C3b.FB) which is activated to form C3b.Bb, a protease which activates more C3, and enhances C3 deposition to promote opsonisation. We have made progress towards understanding the unique functional role of the five domains in FB data by a combination of molecular biology expression studies, NMR and crystallographic approaches, and projects are available in these areas. Human properdin is an essential regulator of the activation of the alternative pathway, as it stabilises the complex between C3b and factor B. It contains a novel small thrombospondin type I repeat domain, the second most abundant domain type in complement, which occurs in many other multidomain proteins. Its molecular structure is wholly unknown. We have successfully over-expressed five TSR domains of properdin and obtained high-quality NMR spectra. There are chances to determine the molecular structure of several of these domains in order to describe the full structure of properdin using an appropriate combination of multinuclear NMR and protein crystallography in the first instance, followed by X-ray and neutron scattering and analytical ultracentrifugation. See Hinshelwood et al (1999). (3) Molecular modelling of mutation sites in proteins. Bioinformatics strategies and protein homology modelling are invaluable tools to help interpret a rapidly increasing database of known natural human genetic mutations that result in dysfunctional proteins. We work with several Departments to examine these, most notably the Haemophilia Unit at the Royal Free Hospital, one of the three largest in the UK, where many patients reveal novel mutations in their blood coagulation proteins. Other relevant proteins include membrane proteins and developmental proteins. The significance of these mutations is not clear without performing molecular biology expression work to test the behaviour of the mutant proteins, in combination with structural studies to decipher the effect of the mutation on protein folding or activity. Projects on offer will include the development of an integrated database using bioinformatics technology for interrogation in order to interpret these mutations and those yet to be identified, as well as appropriate wet lab work. See Jenkins et al. (1998). Dr. K. Srai Key Publications Summary of research interests i) Nutrients and gene regulation ii) Iron transport across cellular membranes in relation to Iron deficiency and hereditary haemochromatosis iii) Absorption and metabolism of polyphenols and flavonoids iv) Glucose metabolism and Diabetes v) Functional and molecular characterisation of renal purinergic receptors in health and disease. Molecular Basis of Iron Homeostasis in Health and Disease My group together with Professor Robert Hider and his colleague at KGT, Kings College and Professor John Porter at Royal Free and University College Medical School have been awarded a MRC co-operative group status. This award also includes two component grants. In addition to this I have project grants from The Wellcome Trust, EU, BBSRC, Sir Jules Thorn Charitable Trust and NKRF.(See Current Grants) My research in the field of Iron metabolism can be grouped under the heading of Molecular basis of Iron Homeostasis in Health and disease. Aim: To define the genetic and molecular contribution to the control of body iron stores and cellular iron overload, and to provide the basis of the development of novel strategies for the better management and prevention of iron deficiency and iron overload. Objectives: To characterise genes that are responsible for cellular iron trafficking and for intestinal iron absorption and regulation. To quantify genetic and environmental contribution to the control of body iron stores To quantify the influence of various plasma factors (which are altered as a result of iron deficiency, hypoxia, pregnancy and increased erythropoesis) on the control of intestinal iron absorption. Background: Iron is the only nutrient of which there is a known widespread deficiency in the UK, with up to 25% prevalence in certain sociodemographic groups ( including women of child bearing age, vegetarians and adolescent females). Conversely, primary iron overload (genetic haemochromatosis) is the most common genetic disorder in the UK. Treatment of iron deficiency in the population may increase the penetrance (currently low) of genetic haemochromatosis with toxic consequences. On the other hand, it is not known how genetic factors interact with the known variability of food-iron bio-availability to determine body iron stores. Gene cloning projects have recently identifies an array of genes that contribute to the mechanism and regulation of iron absorption. The following projects will be running in parallel with the ultimate goal of achieving the above aims. Project 1. Molecular mechanism of iron transfer across the placenta: Immunohistochemical localisation and role of DCT1 (DMT1), Ireg1, and hephaestin in iron transfer across placenta. ( The Wellcome Trust Project grant :Sep 2000- Dec 2003; £ 220,000). The aim of this study is to determine the molecular mechanism of iron efflux across the placenta into fetal circulation, in particular to determine the role of DCTI(DMT1), Iregl and hephaestin in this process. Iron transfer in placenta can be considered in three stages, uptake across the placental microvillous border membrane, transfer across the placental cell and efflux into the fetal circulation. Iron uptake across the microvillous brush border is through transferrin receptor and receptor mediated endocytosis. Very little is known about transfer across the cell or the efflux of iron from the placenta into the fetal circulation. Recently however, there have been several different discoveries (cloning of divalent cation transporter DCTI (DMTI), cloning of Iregl and caeruloplasmin homologue, Hephaestin), which may provide information necessary to elucidate the mechanism of iron efflux across the placental basolateral membrane. Hypothesis: DCTI (DMTI) and or lregl is localised to the basolateral membrane and involved in the efflux of iron into fetal circulation. This is co-localised with hephaestin" which oxidises ferrous to ferric iron prior to its binding to transferrin in the fetal circulation. Sub-cellular localisation of DCT1 (DMTI), Iregl and Hephaestin will be investigated using antibodies raised against synthetic peptides. Once localisation of these proteins has been determined, regulation of these proteins by dietary iron and their involvement in the regulation of iron transfer across placenta, particularly efflux across the basolateral membrane, will be determined. Rats will be used to study the effect of decreasing dietary iron on the regulation of DCTI (DMTI), Iregl and Hephaestim. BeWo cells in culture will be employed to study the effect of iron deficiency on transfer of iron across the placenta and its regulation in relation to expression and localisation The information obtained from these studies will be used to devise iron supplementation regime for pregnant mothers in order to prevent iron deficiency in both the mother and the child. Project 2: The molecular regulation of iron absorption with reference to the defect in haemochromatosis Dr SKS SRAI (Sir Jules Thorn Charitable Trust Project Grant: Nov 1999 - Oct 2001: £82,000) (MRC Grant under consideration) Dr SKS Srai, Dr A Bomford, Dr R Simpson & Dr E Debnam) The mutation for hereditary haemochromatosis (HH) is the commonest genetic abnormality in people of Northern European descent. The recent cloning of HFE, the gene for HH, has not yet led to an increased understanding of the molecular defect, which is expressed as increased iron absorption from a normal diet, because 1. regulation of absorption of dietary iron under normal conditions is not understood and the role of wild type HFE in the process is unclear. 2. evidence for mechanistic linkage between HFE and the genetic component of the mucosal iron transport pathway are lacking. Plan: Information on the function and the regulation of the mucosal iron transport pathway is urgently needed. We propose a detailed genetic and functional analysis of the component of this pathway in HH using information on novel genes derived from an initial study by our group, of inbred mouse strains with well characterised defects in mucosal iron transport. This initial study has led to the cloning of ferric reductase in the apical membrane of the villus enterocytes and in the basolateral membrane, transporter which is responsible for iron efflux. Gene expression studies at the mRNA and protein levels using duodenal tissue samples will be performed together with in situ techniques to obtain information on localisation of transcript along the crypt-villus axis. These results should permit further analysis of how individual genes in the iron transport pathway are regulated. Project 3: Molecular mechanisms involved in the dietary regulation of DMT1 expression in human intestinal epithelial cells Dr SKS Srai (RF&UCMS) & Dr P Sharp (Surrey University){ BBSRC Joint Project Grant Oct2000 - Sep2003: £189,444) The nutritional significance of maintaining adequate dietary levels of the transition metals iron, zinc and copper is clear due to their essential role in a plethora of biochemical events in the body. This is confirmed by the large number of pathologies associated with imbalances in metal ion homeostasis. There is good evidence, from studies on animals and cell lines, that dietary levels of individual metals can influence the absorption and utilisation of others. Our study, using the Caco-2 TC7 cell model of human enterocytes, will investigate the biochemical basis for these dietary interactions and will focus on the putative metal ion transporter, DMT1. The data from this project will advance our knowledge of diet-gene interaction in regulating mineral metabolism at the cell and molecular level, and is thus relevant in understanding the underlying causes of metal ion deficiency and overload disorders. The objectives of this work are to test the hypothesis that DMT1 acts as a divalent metal ion transporter in human enterocytes and that its activity can be regulated by dietary levels of nutritionally important trace metals. This hypothesis will be tested using the Caco-2 TC7 cell model of human small intestinal enterocytes and the work will address the following issues: 1. Functional measurement of metal ion transport in Caco-2 TC7 cells in response to chronic adaptations (10 days) in dietary levels of iron, zinc or copper. 2. Effect of these dietary changes on the expression of the putative metal ion transporter, DMT1, at the protein (western blotting) and mRNA (RT-PCR) level. 3. Cellular distribution of the two splice variant of the DMT1 gene, following dietary metal ion manipulation, using confocal microscopy. 4. The molecular events underlying changes in DMT1 homeostasis, focussing on the 5' promoter region of the gene and in particular the role of the 5 metal response elements, using luciferase reporter gene assays. The role that DMT1 plays in the transport of iron, zinc and copper using site directed mutagenesis and the Xenopus oocyte expression system. Project 4: Evaluation of the safety and efficacy of iron supplementation in pregnant women. European Commission Framework V Grant (Feb 2000 - Jan 2003) £1,101,077 with eight other partners. Dr Srai (RF&UCMS) & Dr McArdle's (The Rowett Research Institute) share £194,917 for 1 RA1B and 1 PhD student. Iron deficiency is common and can have harmful effects on the mother and her foetus. Anaemia is therefore always treated with iron supplements. However levels given vary widely and there is a Growing concern about the risks associated with iron overload. Since iron can generate free radicals, and interact with other nutrients, assessment of supplementation in pregnant women is essential. Volunteers will be given two levels of iron with in the range given clinically, or a placebo, and the effects of parameters such as oxidative stress, cardiovascular well being, zinc and copper metabolism will be measured. We will study treatment directly in patients with ileostoma, identifying the cause of GI upset. We will measure the effects on babies at term, on placental function and on expression of different genes in supplemented rats and cultured cells, to elucidate the molecular basis of the change and a rational basis for supplementation. Project 5: Haem metabolism and control of intestinal iron absorption. Dr SKS Srai (RF&UCMS) & R Simpson (King's College, KGT) {MRC project grant Oct 2000 - Sep2003; £171,241) Background: Iron homeostasis is maintained primarily by controlling intestinal iron absorption of dietary iron. Alterations in body iron levels (deficiency/overload) are often associated with important and clinical consequences. The mechanism and regulation of the absorptive process is however, unclear. Modifications in haem biosynthesis in animals and human (experimental/accidental/ genetic) have been reported to induce changes in iron metabolism and iron absorption. The relationship between the two parameters is however, not well understood. We will investigate, at the cellular and molecular level, how dynamic changes in levels of haem and intermediates of its biosynthesis (particularly ALA) affect intestinal iron transport. These studies will help to further clarify the iron absorption regulatory process and elucidate changes in iron metabolism in certain porphria and haemoglobinopathies. We will assay urinary ALA and phorphobilinogen output, urinary and biliary porphyrins, tissue haem levels and enzymatic activities in mice with altered iron metabolism (hypoxic, dietary iron deficient, hypotransferrinaemic, iron loaded). In addition, the effects of ALA administration on gene expression and iron absorption will be ascertained in these mouse models. We will study the mechanism by which ALA and other specific regents influence haem biosynthesis and iron transport in epithelial cells MOLECULAR MECHANISM OF POLYPHENOL ABSORPTION, METABOLISM AND ANTIOXIDANT EFFECT Dr SKS SRAI, Dr E DEBNAM (RF& UCMS) AND Prof C RICE-EVANS (Guy's Hospital, KGT) (BBSRC Project Grant Jan 2001 - Mar 2003; £118,000) Professor Rice -Evans is the principal applicant Gastrointestinal factors, influencing the metabolism and functional activities of dietary, plant polyphenols. The importance of dietary antioxidants in the maintenance of health and protection from damage induced by oxidative stress, implicated in the risk of chronic diseases, is coming to the forefront of dietary recommendation and the development of functional foods. Recent work is beginning to highlight a role for flavonoid and polyphenolic component of the diet, known to be powerful hydrogen donating antioxidants and scavengers of reactive oxygen and reactive nitrogen species in vitro. The purpose of this project is to elucidate the functional forms of diet derived flavonoids in vivo by investigating the gastro-intestinal factors influencing their metabolism and functional activities at various levels, namely pre--absorption events in the gastric lumen and the modification and metabolism they undergo in the small intestine. The antioxidant activities of the identified conjugates and metabolites will also be assessed. MOLECULAR MECHANISM OF GLUCOSE TRANSPORT ACROSS INTESTINAL AND RENAL EPITHELIAL CELLS DR SKS SRAI, DR ES DEBNAM AND PROF R UNWIN (The Wellcome Trust Project Grant, Sep2000 - Feb 2002; £72126) Changes in renal and intestinal glucose transport in diabetes mellitus and control by glucagons: Involvement of protein kinase A and protein kinase C signalling pathways. The intracellular processes involved in the control of renal and intestinal glucose transport are poorly understood. We have shown that both PKC and PKA- pathway are involved in the control of renal and intestinal brush-border glucose transport and that they differentially regulate GLUT (facilitated glucose transporters) and SGLT(sodium dependent glucose transporter) transporters, respectively. Aims of this project are: 1. To determine the role of protein kinase A (PKA) and protein kinase C (PKC) in controlling the expression and activity of the two classes of renal and intestinal transporters: GLUT (facilitated) and SGLT (Na+ coupled). 2. To define the relationship of these signalling pathways to the changes in renal and intestinal glucose transport that occur in insulin-opaenic diabetes mellitus, and in response to pancreatic glucagons 3. To establish the contribution of glucagon or glucagon like peptide receptors along the renal tubule and intestinal tract. The long term goal is to define the significance of altered renal tubular transport of glucose in diabetes to its renal pathophysiology. FUNCTIONAL AND MOLECULAR CHARACTERISATION OF RENAL PURINERGIC RECEPTORS IN HEALTH AND DISEASE DR SKS SRAI, DR ES DEBNAM AND PROF ROBERT UNWIN ( Supported by Grants to Professor Robert Unwin from the NKRF, the Welcome Trust and the MRC ) Dr A.E. Michael Key Publications Cellular & Molecular Endocrinology Since joining the Department in 1991, Tony Michael and members of his team have been investigating cellular & molecular aspects of endocrinology. Although current projects include ongoing research into the control of renal function by adrenal steroids, the majority of research by this team is concerned with cellular aspects of reproductive endocrinology. Consequently, most of the members of Tony Michael's research team are also members of the interdisciplinary "Reproduction & Development Group" at the Royal Veterinary College (RVC) (London). At present, there are 2 main research themes being investigated by the team: · - Metabolism of cortisol by isoforms of the enzyme 11ß-hydroxysteroid dehydrogenase (11ßHSD) · - Regulation of the expression and function of prostaglandin receptors by steroid hormones. In a range of tissues, 11ßHSD converts the anti-inflammatory adrenal steroid, cortisol (hydrocortisone) to the inactive metabolite, cortisone. In the kidney, this enzyme activity is vital to deny cortisol access to non-specific mineralocorticoid receptors. The team are currently investigating physiological scenarios in which renal metabolism of cortisol is decreased so that cortisol can act in concert with aldosterone to control sodium, potassium, and acid-base balance. In the placenta, 11ßHSD acts as an enzymatic barrier, preventing cortisol from passing from the maternal circulation into the foetus. Ongoing research has demonstrated that maternal nutrient restriction decreases placental inactivation of cortisol, and that this decrease in 11ßHSD activity is associated with intra-uterine growth restriction ("small-for-date" babies): a phenomenon strongly implicated in increased risk of adult diseases (e.g. diabetes and cardiovascular disease). The major focus of research into 11ßHSD is in the context of the ovary. Studies performed in the 1990's indicated a link between high rates of ovarian cortisol oxidation and failure of women to become pregnant through in vitro fertilisation (IVF). Research over the past 5 years has sought to explain this association, examining those endocrine, paracrine and intra-cellular factors that influence the expression and activities of specific 11ßHSD isoforms in the ovary. These are currently being investigated in a wide range of species. As regards the actions of prostaglandins in the human ovary, these appear to be mediated via specific hepta-helical, G-protein-coupled receptors. In non-ovarian cells, prostaglandin E2 (PGE2) acts via at least 4 different isoforms of the EP receptor, whereas PGF2a acts via the FP receptor. Recently, the team has established that human ovarian cells, recovered from IVF patients, express functional EP1, EP2, EP4 and FP receptors. Current studies are investigating whether progesterone and oestradiol can affect either the expression of these receptors or their ability to couple to the cyclic AMP, inositol polyphosphate and calcium signal transduction pathways. Team Members (Last Updated 01 October 2001) Ms Christina (C) Chandras* Dr Tracey (TE) Harris Ms Kim (KC) Jonas* Dr Tony (AE) Michael Mr Dean (DP) Norgate Dr Lisa (LM) Thurston (*Graduate Student) Current Collaborators Dr D Robert E Abayasekara (RVC, London, UK) Dr John Carroll (UCL, UK) Professor John RG Challis (University of Toronto, Canada) Dr Robert C Fowkes (St.Bart's, London, UK) Dr Linda Gregory (University Hospital of Wales, Cardiff, UK) Dr HJ (Lenus) Kloosterboer (Organon, Oss, Netherlands) Professor Andres Lopez-Bernal (University of Bristol, UK) Dr S Kaila S Srai (UCL, UK) Professor Paul M Stewart (University of Birmingham, UK) Professor Robert J Unwin (UCL, UK) Professor D Claire Wathes (RVC, London, UK) Professor Robert J Webb (University of Nottingham, UK) Dr Peter J Wood (University of Southampton, UK) Professsor Kaiping Yang (University of Western Ontario, Canada) Přístupy k datům: Bioinformatics Tools - www.Stratagene.com Analyze pathways, gene expression data, protein and DNA sequences. Software. Bio IT & Informatics - bioteam.net Honest, objective & vendor neutral clusters & pipelines our specialty Partners in Life Science Informatics Email:[email protected] Tel: (978) 304-1222 BioTeam is a consulting collective dedicated to delivering vendor-neutral informatics solutions to the life science industry. BioTeam principals Athanas, Dagdigian, Gloss, and Van Etten have been jointly serving the biotech and pharmaceutical communities as a team for several years. Individually they possess a broad spectrum of skills and experience from scientific analysis to high performance technical computing infrastructures. Together they complement each other to provide complete beginning-to-end life science informatics and Bio-IT solutions. Bioinformatics News - www.BioInform.com Get the exclusive insider's report on bioinformatics with BioInform Novinky v oboru bioinformatika. Bioinformatics Toolbox - www.mathworks.com Analyze genomic, proteomic, & microarray data in MATLAB®. Read, analyze, and visualize genomic, proteomic, and microarray data The Bioinformatics Toolbox offers computational molecular biologists and other research scientists an open and extensible environment in which to explore ideas, prototype new algorithms, and build applications in drug research, genetic engineering, and other genomics and proteomics projects. The toolbox provides access to genomic and proteomic data formats, analysis techniques, and specialized visualizations for genomic and proteomic sequence and microarray analysis. Most functions are implemented in the open MATLAB language, enabling you to customize the algorithms or develop your own. Events Seminars Upcoming Webinars Recorded Webinars Tradeshows Firmy zaměřené na software a software samotný: Companies in the Bioinformatics Software and Software-Related Services Sector Technology Area DNA/Protein Sequence Analysis Microarray/Gene Company Name Allometra Headquarters Company website Davis, Calif. www.allometra.com/ Edinburgh, www.anedabio.com/ Scotland, UK Apocom Knoxville, www.apocom.com Tenn. Bioinformatics Waterloo, www.bioinformaticssolutions.com/ Solutions Ontario, Canada BioMax Martinsried, www.biomax.de Informatics Germany BioTools Edmonton, www.biotools.com Alberta, Canada DNAStar Madison, www.dnastar.com Wisc. DNAtools Fort Collins, www.dnatools.com Colo. Gene Codes Ann Arbor, www.genecodes.com Mich. Gene-IT Paris, France www.gene-it.com GeneStudio Suwannee, www.genestudio.com Ga. Genomatix Munich, www.genomatix.de Software Germany Genometrician Saint-Sulpice, www.genometrician.com Switzerland Genomix Oak Ridge, www.genomix.com Tenn. Geospiza Seattle, www.geospiza.com Wash. MiraiBio Alameda, www.miraibio.com Calif. Ocimum Hyderabad, www.ocimumbio.com Biosolutions India Paracel Pasadena, www.paracel.com Calif. Redasoft Bradford, www.redasoft.com Ontario, Canada Rescentris Columbus, www.rescentris.com Ohio Scinova Mumbai, India www.scinovaindia.com Informatics Softberry Mount Kisco, www.softberry.com NY TimeLogic Carlsbad, www.timelogic.com Calif. Textco West www.textco.com BioSoftware Lebanon, NH Aber Genomic Aberystwyth, www.abergc.com/ Expression Analysis Computing Alma Bioinformatics Amersham Biosciences Niagara (formerly Imaging Reseearch) BioDiscovery BioMind Chang Bioscience Corimbia Genedata Insightful (life science business) Iobion Informatics (affiliate of Stratagene) Koada Technology MicroDiscovery Proteomics (Mass Spec, 2D Gel Analysis) Wales, UK Madrid, Spain www.almabioinfo.com St. Catherines, Ontario, Canada www.imagingresearch.com/ El Segundo, www.biodiscovery.com Calif. Bethesda, Md. www.biomind.com Castro Valley, www.changbioscience.com Calif. Berkeley, www.corimbia.com/ Calif. Basel, www.genedata.com/ Switzerland Seattle, www.insightful.com/industry/pharm/default.asp Wash. La Jolla, Calif. www.iobion.com Glasgow, www.koada.com Scotland, UK Berlin, www.microdiscovery.de Germany MolecularWare Cambridge, www.molecularware.com/ (subsidiary of Mass. Calbatech) MolMine Bergen, www.molmine.com Norway OmniViz Maynard, www.omniviz.com Mass. Partek St. Charles, www.partek.com Mo. Predictive Kingston, www.predictivepatterns.com Patterns Ontario, Canada Rosetta Seattle, www.rosettabio.com/ Biosoftware Wash. SAS Cary, NC www.sas.com/industry/pharma/ Silicon Genetics Redwood City, www.silicongenetics.com/cgi/SiG.cgi/index.smf Calif. Spotfire Somerville, www.spotfire.com Mass. SPSS (life Chicago, Ill. www.spss.com/applications/science/ science group) Strand Bangalore, www.strandgenomics.com Genomics India TG Services El Sobrante, www.genepilot.com/index.html Calif. ViaLogy Altadena, www.vialogy.com Calif. VizX Labs Seattle, www.vizxlabs.com/index.html Wash. Decodon Greifswald, www.decodon.com Germany Geneva Bioinformatics Genomic Solutions (subsidiary of Harvard Biosciences) Imaxia Geneva, Switzerland Ann Arbor, Mich. Cupertino, www.genebio.com/ http://65.219.84.5/index.html www.imaxia.com Structural Proteomics Calif. Matrix Science London, UK www.matrixscience.com/ Nonlinear Newcastle www.nonlinear.com Dynamics upon Tyne, UK Eidogen Pasadena, www.eidogen.com/ Calif. Epitope Edmundbyers www.epitope-informatics.com Informatics (near Durham), UK Molsoft San Diego, www.molsoft.com Calif. RedStorm Houston, Scientific Texas www.redstormscientific.com Proceryon Biosciences Salzburg, Austria www.proceryon.com Protein Mechanics Mountain View, Calif. www.proteinmechanics.com Pathway Analysis Ariadne Genomics GeneGo Rockville, Md. www.ariadnegenomics.com/index.html St. Joseph, Mich. www.genego.com Hippron Physiomics Ottawa, Canada www.hippron.com Ingenuity Systems Mountain View, Calif. www.ingenuity.com/index.html Jubilant Biosys Silico Insights Genetic Variation Analysis Ananomouse Biodata Bangalore, India Woburn, Mass. San Francisco, Calif. www.jubilantbiosys.com/ silicoinsights.com/index.html www.ananomouse.com/home.html Tartu, Estonia www.biodata.ee Forensic Bioinformatic Fairborn, Ohio Services Golden Helix SoftGenetics Visualize (life science group) www.bioforensics.com Bozeman, www.goldenhelix.com/index.jsp Montana State College, www.softgenetics.com/ Penn. Phoenix, Az. www.visualizeinc.com/index.html Cellular Simulation BioAnalytics Group Cellicon Biotechnologies Entelos Gene Network Sciences Genomatica Teranode Ontologies Text Mining BioWisdom Electric Genetics Axontologic Definiens Hightstown, www.bioanalyticsgroup.com/default.htm NJ Boston, Mass. www.puretechventures.com/portfolio/cellicon/?section=ventures Foster City, Calif. Ithaca, NY www.entelos.com/ San Diego, Calif. Seattle, Wash. Cambridge, UK Cape Town, South Africa Orlando, Fl. Munich, Germany www.genomatica.com/index1.html www.gnsbiotech.com//index.php www.teranode.com/ www.biowisdom.com/ www.egenetics.com/index.html www.axontologic.com/ www.definiens.com eTexx InPharmix Molecular Connections PubGene Reel Two SemanTx Life Sciences (subsidiary of Jarg) Workflow/Pipelining Incogen Inforsense KooPrime SciTegic Irving, Texas Greenwood, Ind. Bangalore, India Oslo, Norway San Francisco, Calif. Waltham, Mass. www.etexxbio.com/ www.inpharmix.com/ www.molecularconnections.com/website/ www.pubgene.com/ www.reeltwo.com/ www.semantxls.com/ Williamsburg, www.incogen.com/ Va. London, UK www.inforsense.com/ Singapore www.kooprime.com/webpage.htm San Diego, Calif. www.scitegic.com/main.html TurboWorx Integration Burlington, Mass. GeneticXchange Menlo Park, Calif. Genomining Montrouge, France IO Informatics Emeryville, Calif. www.turbogenomics.com/ www.geneticxchange.com/v3/index.php www.genomining.com/home.en.html www.io-informatics.com/ Multiple Products Accelrys Applied Maths Compugen San Diego, www.accelrys.com/ Calif. Sint-Martens- www.applied-maths.com/ Latem, Belgium Tel Aviv, www.cgen.com/ Israel Frederick, Md. www.informaxinc.com/content.cfm?pageid=1 InforMax (Invitrogen subsidiary) Lion Bioscience Heidelberg, Germany www.lionbioscience.com/ Databáze universitních knihoven universit Oxford, Cambridge a Institutu neurologie university Londýn. 1) Personal name Bergeron, Bryan P. Title Bioinformatics computing, Bryan Bergeron Subject Bioinformatics Class number 570.285, BER Publication Upper Saddle River, N.J., London, Prentice Hall PTR, 2003 Format: Books Title Bioinformatics, sequence, structure and databanks - a practical approach, Ed. Des Higgins and W.Taylor Subject Bioinformatics Class number 570.285, BIO Publication Oxford, Oxf.U.P., 2000 Series Practical Approach Format: Books Personal name Baldi, Pierre Title Bioinformatics, the machine learning approach, Pierre Baldi, Sřren Brunak Subject Bioinformatics Class number 570.285, BAL Publication Cambridge, Mass., London, MIT Press, c1998 Series Adaptive computation and machine learning General note "A Bradford book." Format: Books 2) 3) 4) Personal name Baldi, Pierre Title Bioinformatics, the machine learning approach, Pierre Baldi, Sřren Brunak Subject Bioinformatics Class number 570.285, BAL Edition 2nd ed Personal name Attwood, T. K. Title Introduction to bioinformatics Subject Bioinformatics Class number 570.285, ATT Publication Harlow, Longman, 1999 Format: Books Publication Cambridge, Mass., MIT Press, 2001 Series Adaptive computation and machine learning Edition history note Previous ed.: 1998 General note "A Bradford book" Format: Books Personal name Krane, Dan E. Title Fundamental concepts of bioinformatics, Dan E. Krane and Michael L. Raymer Subject Bioinformatics Class number 570.285, KRA Publication San Francisco, London, Benjamin Cummings, 2003 Format: Books Title Bioinformatics, managing scientific data, edited by Zoe Lacroix, Terence Critchlow Subject Bioinformatics Class number 570.285, BIO 5) 6) 7) 8) Publication San Francisco, Calif., Morgan Kaufmann, Oxford, Elsevier Science, 2003 Format: Books 9) Title Essentials of genomics and bioinformatics, edited by C.W. Sensen Subject Bioinformatics Class number 570.285, ESS Publication Weinheim, Wiley-VCH, c2002 Edition history note Concise ed. of Biotechnology, vol 5b. 2001 Added title Biotechnology Format: Books 10) Title Bioinformatics, databases and systems, edited by Stanley Letovsky Subject Bioinformatics Class number 570.285, BIO Publication Boston, London, Kluwer Academic, c1999 Format: Books 11) Main Entry Lesk, A,M. Title Introduction to bioinformatics. Imprint Oxford University Press. 2005. 12) Dewey class mark Title Imprint Descr. Series 574.1925 Chromatin and Chromatin remodelling enzymes. Part A / edited by C. David Allis, Carl Wu. San Diego, CA ; London : Elsevier Academic Press, 2004. xxxviii, 540p. : ill. ; 24cm. Methods in enzymology ; 375 Contents: Histone bioinformatics ; Biochemistry of histones, nucleosomes, and chromatin ; Molecular cytology of chromatin functions. Enzymes Subject - Lib.Cong. Chromatin Enzymology Allis, C. David Add.Entry Wu, Carl Contents holdings (1) ISBN All items 0121827798 (cased) : 99.95 13) Dewey class mark Title 574.0285 Bioinformatics and genomes : current perspectives / edited by Miguel A. Andrade Wymondham : Horizon Scientific, 2003 Imprint xii, 227p ; 24cm Descr. Genomes -- Data processing Subject - Lib.Cong. Bioinformatics Add.Entry holdings (1) ISBN Andrade, Miguel A. All items 1898486476 (cased) : 80.00 14) Dewey class mark Title 574.0285 Bioinformatics and genomes : current perspectives / edited by Miguel A. Andrade Wymondham : Horizon Scientific, 2003 Imprint xii, 227p ; 24cm Descr. Genomes -- Data processing Subject - Lib.Cong. Bioinformatics Add.Entry holdings (1) ISBN Andrade, Miguel A. All items 1898486476 (cased) : 80.00 15) Main Entry Title Imprint Subject - Lib.Cong. Institute of Electrical and Electronics Engineers . Transactions on nanobioscience Institute of Electrical and Electronics Engineers Ultrastructure (Biology) Nanoscience Bioinformatics Add.Title holdings (1) holdings (2) holdings (2) ISSN 16) IEEE transactions on nanobioscience All items Year 2004 Year 2003 1536-1241 Dewey class mark 574.870285 Bioinformatics : genes, proteins and computers / edited by Christine Orengo, David Jones, Janet Thornton Oxford : BIOS, 2003 Imprint xiv, 298p ; 25cm Descr. Molecular biology -- Computer simulation Subject - Lib.Cong. Proteins -- Analysis -- Data processing Genetics Jones, David Add.Entry Orengo, Christine Thornton, Janet Title holdings (1) ISBN All items 1859960545 (pbk) : 29.99 17) Dewey class mark 574.870285 Bioinformatics : genes, proteins and computers / edited by Christine Orengo, David Jones, Janet Thornton Oxford : BIOS, 2003 Imprint xiv, 298p ; 25cm Descr. Molecular biology -- Computer simulation Subject - Lib.Cong. Proteins -- Analysis -- Data processing Genetics Jones, David Add.Entry Orengo, Christine Thornton, Janet Title holdings (1) ISBN All items 1859960545 (pbk) : 29.99 18) Dewey class mark Main Entry Title Imprint Descr. Subject - Lib.Cong. holdings (1) ISBN 574.0285 Lesk, Arthur M. Introduction to bioinformatics Oxford : Oxford University Press, 2002 xvi, 283p : ill ; 25cm Bioinformatics All items 0199251967 (pbk) : 19.99 19) Dewey class mark 574.0285 Main Entry Westhead, David R. Title Bioinformatics / David R. Westhead, J. Howard Parish and Richard M. Twyman Imprint Descr. Series Bibliogr. Oxford : BIOS, 2002 viii, 257p : ill ; 25cm Instant notes Includes bibliographical references and index Subject - Lib.Cong. Bioinformatics -- Examinations -- Study guides Add.Entry Parish, J. H. (John Howard) Twyman, Richard M. Add.Title Instant notes in bioinformatics holdings (1) ISBN All items 1859962726 (pbk) : 16.99 20) Dewey class mark Main Entry 574.870285 Baxevanis, Andreas D. Bioinformatics : a practical guide to the analysis of genes and proteins / Andreas D. Baxevanis [and] B.F. Francis Ouellette 2nd ed. Edition New York ; Chichester : Wiley, 2001 Imprint xviii, 470p Descr. Methods of biochemical analysis ; 43 Series Previous ed.: 1998 Gen. note Molecular biology -- Computer simulation Subject - Lib.Cong. Molecular biology -- Mathematical models Proteins -- Analysis -- Data processing Genetics Ouellette, B. F. Francis Add.Entry Title holdings (1) ISBN All items 0471383902 (cased) 0471383910 (pbk.) : 51.95 21) Dewey class mark 574.8702854 Bioinformatics : sequence, structure, and databanks : a practical approach / edited by D. Higgins and W. Taylor Oxford : Oxford University Press, 2000 Imprint xx, 249 p. 1 v. (pbk) ; 24cm Descr. The practical approach series Series Biology -- Data processing Subject - Lib.Cong. Biomolecules -- Data processing Higgins, Des Add.Entry Taylor, Willie Title All items 0199637903 (pbk.) : 29.95 0199637911 (cased) holdings (1) ISBN 22) Dewey class mark Main Entry 574.0285 Attwood, Teresa K. Introduction to bioinformatics / Teresa K. Attwood and David J. Parry-Smith Title Harlow : Pearson Education, 1999 Imprint xxv,218p : ill (pbk) ; 24cm Descr. Cell and molecular biology in action series Series Includes index Bibliogr. Molecular biology -- Data processing Subject - Lib.Cong. Parry-Smith, David J. Add.Entry holdings (1) ISBN All items 0582327881 (pbk) : 23.99 23) Bioinformatics : the analysis of protein sequence homology and protein struc Mistry, T Other titles by Author(s) Publication Date: 2002 Control Number: M0018105KP Copies: 1 copy - Show Copy 24) Bioinformatics Lacey, N Other titles by Author(s) Publication Date: 2003 Control Number: M0022613KP Copies: 1 copy - Show Copy 25) Bioinformatics : a practical guide to the analysis of genes and proteins / [. - 2nd ed Publication Date: 2001 Control Number: 0471383902 26) Title Briefings in bioinformatics Publisher London : Henry Stewart Control Number 1467-5463 Shelved at journals 27) Bioinformatics : sequence, structure, and databanks : a practical approach / Publication Date: 2000 Control Number: 0199637911 28) Bioinformatics : the machine learning approach / Pierre Baldi, Søren Brunak. - 2nd ed Baldi, Pierre Other titles by Author(s) Publication Date: 2001 Control Number: 026202506x 29) Developing bioinformatics computer skills / Cynthia Gibas, James Fenton and Gibas, Cynthia Other titles by Author(s) Publication Date: 2000 Control Number: 1565926641 30) Genetically yours : bioinforming, biopharming, biofarming / Hwa A. Lim Lim, Hwa A. Other titles by Author(s) Publication Date: 2002 Control Number: 9810249381 31) Molecular biology of the gene / James Watson ... [et al.]. - 5th ed Publication Date: 2004 Control Number: 0321223683 32) Molecular biology of the gene / James Watson ... [et al.]. - 5th ed CD-ROM Publication Date: 2004 Control Number: M0025595KP 33) 1. Proteomics Techniques ••• Royal Free Campus Departments Home About us Facilities Microscopy Techniques Administration Costs... ...Bioinformatics Sample Preparation Optimal sample preparation is essential for good 2D results. The ideal process will cause complete solubilization,... ...Bioinformatics Data from the mass spectrometer consists of a series of molecular ion m/z (mass/charge) values that are translated into to a... 60% Tue, 07 Sep 2004 10:08:56 GMT http://www.rfc.ucl.ac.uk/departments/Biomedical-... Introduction This facility has been set up to help researchers establish and run proteomic projects and identify biological molecules of interest – particularly proteins. The facility is currently equipped with Amersham Biosciences 2D gel electrophoresis equipment, a Waters/Micromass capillary HPLC system coupled to a qToF-µ electrospray mass spectrometer. 2D gel electrophoresis is a suitable technique for asking, “Where do differences arise amongst the proteins in two similar samples?” For example, closely matched samples from diseased and healthy cells can be compared. Differences in protein abundance or covalent modification (e.g. phosphorylation, glycosylation and acylation) can provide important clues to the pathogenesis, progress and treatment of a disease. Once a protein has been isolated and digested, the mass spectrometer is a suitable tool for asking, “What is this protein?”, “Which residues are modified?”, and “What is the modification?” As there are many practical considerations in setting up these types of experiments it is strongly advised that you contact the facility staff about the design of the study prior to the preparation of samples. The following guidelines provide a brief introduction for those interested in proteomics: Sample Preparation Optimal sample preparation is essential for good 2D results. The ideal process will cause complete solubilization, disaggregation and denaturation of the proteins in the sample. However, as samples vary in their constituent properties, optimal procedures must be determined empirically for each type of sample. The development of the sample preparation must take into account the object of the investigation: increasing the range and number of detectable proteins can sometimes only be obtained at the expense of clarity and reproducibility. Close collaboration between investigators and facility staff should produce a sample preparation protocol tailored to the needs of the investigation. Protecting Samples against Proteolysis Disrupting tissue or cultured cells can liberate or activate endogenous proteases. As proteolytic degradation of proteins greatly complicates 2D electrophoresis, measures should be included to avoid this problem. Proteases can be inactivated by immediate snap-freezing or denaturing samples for example with 10% TCA, 8M urea or 2% SDS. However, the addition of a cocktail of protease inhibitors has often proved adequate. Facility staff will be pleased to advise collaborators on suitable strategies. Gel Electrophoresis 2D gel electrophoresis is the main technique used to separate proteins in the facility. It is suitable for analyzing the proteins present in complicated biological samples, but can be supplemented with a variety of biochemical techniques in order to concentrate proteins of interest. 2D electrophoresis is often suitable for separating highly complex protein mixtures, however some proteins will be masked by other more abundant species and many proteins will remain beyond the limit of detection. The pH range for the 1st dimension needs to be optimized, and this may consume considerable amounts of time and sample. Image Analysis Computer-aided comparison of 2D gels facilitates the identification of changes in protein mobility and abundance. The facility has two workstations equipped with PDQuest, a standard software package in proteomics. Gels can be stained with silver or Deep Purple using protocols that allow subsequent analysis by mass spectrometry. It is hoped to develop comparative fluorescence-based techniques at a later stage. Note that there is inherently some gel to gel variation in staining and mobililty; consequently changes are significant if they are reproducible and generally greater than 2 to 3-fold in intensity. Excision and Digestion Spots from 2D gels that reproducibly differ between experimental and control samples are manually removed for digestion, usually with trypsin. As such it is not practical to excise large numbers of proteins from multiple gels. It is vital up to this stage that samples are not contaminated with environmental proteins (typically keratins). Proteolytic fragments can then be desalted and analysed directly on the mass spectrometer or separated using the CapLC HPLC system and automatically loaded onto the mass spectrometer. Mass Spectrometry The Micromass qToF-µ electrospray mass spectrometer is configured for high-sensitivity analysis of small numbers of samples. It is not a high-throughput instrumen and requires expertise to operate. It is not suitable for screening vast numbers of samples. Sensitivity is highly compromised by salts and detergents in the sample. In addition to peptide analysis, the qToF-µ is suitable for studying many intact proteins (contact the facility staff for details). Facility staff will advise you on the best buffer in which to prepare samples. Bioinformatics Data from the mass spectrometer consists of a series of molecular ion m/z (mass/charge) values that are translated into to a corresponding series of molecular weights or ”peptide fingerprint". Each fingerprint is then compared with the predicted fingerprints of all proteins in a comprehensive sequence database to identify the parent protein. In additionmolecular ions can be fragmented by collision in order to derive sest of daughter ions, from which information on amino acid sequence and sites of covalent modification can be obtained. Note that in general covalent modifications need to be stable and of high stoichiometry to be identified. The facility is equipped with a workstation running MassLynx 4.0 software. Facility staff will assist researchers with the identification of target proteins. Useful references and web pages British Mass Spectrometry Society A basic mass spectrometry tutorial Bio-Rad Proteomics Guide International Mass Spectrometry News American Society for Mass Spectrometry Informace o projektu „Blue gene“ společnosti IBM-výzkum: http://www.research.ibm.com/bluegene/ Partial 'Blue Gene' Systems Are Now Two of the Top Ten Most Powerful Supercomputers on Earth June 21, 2004--For the first time, two IBM Blue Gene/L prototype systems appear on the Top 10 list of supercomputers. The Blue Gene/L prototype represents a radical new design for supercomputing. At 1/20th the physical size of existing machines of comparable power, Blue Gene/L enables dramatic reductions in power consumption, cost and space requirements for businesses requiring immense computing power. For a new architecture to produce so much compute power in such a small package is a stunning achievement, and provides a glimpse of the future of supercomputing. The number four-ranked Blue Gene/L DD1 Prototype, with a sustained speed of 11.68 teraflops and a peak speed of 16 teraflops, uses more than 8,000 PowerPC processors packed into just four refrigerator-sized racks. This ground breaking system is only 1/16 of its planned final capacity and has skyrocketed to the 4th place from the 73rd spot on the list in November 2003. The eighth-ranked Blue Gene/L DD2 Prototype has a sustained speed of 8.66 teraflops and a peak speed of 11.47 teraflops. The DD2 system is based on the second generation of the Blue Gene/L chips, which are more powerful than those used in the DD1 prototype. About IBM's Blue Gene Supercomputing Project Blue Gene is an IBM supercomputing project dedicated to building a new family of supercomputers optimized for bandwidth, scalability and the ability to handle large amounts of data while consuming a fraction of the power and floor space required by today's fastest systems. The full Blue Gene/L machine is being built for the Lawrence Livermore National Laboratory in California, and will have a peak speed of 360 teraflops. When completed in 2005, IBM expects Blue Gene/L to lead the Top500 supercomputer list. A second Blue Gene/L machine is planned for ASTRON, a leading astronomy organization in the Netherlands. IBM and its partners are currently exploring a growing list of applications including hydrodynamics, quantum chemistry, molecular dynamics, climate modeling and financial modeling. Read more... Presentations, Preprints, and Publications Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 1. Theory; The Journal of Physical Chemistry B; 108(21); 6571-6581 Describing Protein Folding Kinetics by Molecular Dynamics Simulations. 2. Example Applications to Alanine Dipeptide and a beta-Hairpin Peptide; The Journal of Physical Chemistry B; 108(21); 65826594 Agenda and Presentations for Blue Gene Briefing Day--February 6, 2004 Molecular Dynamics Investigation of the Structural Properties of Phosphatidylethanolamine Lipid Bilayers Design and Analysis of the BlueGene/L Torus Interconnection Network A Volumetric FFT for Blue Gene/L, to appear in the Proceedings of HiPC2003 Blue Matter, An Application Framework for Molecular Simulation on Blue Gene, Journal of Parallel and Distributed Computing Volume 63, Issues 7-8 July-August 2003 , Pages 759-773 Understanding folding and design: Replica-exchange simulations of "Trp-cage" miniproteins, Proc. Natl. Acad. Sci. USA, Vol. 100, Issue 13, June 24, 2003, pp. 7587-7592 An overview of the BlueGene/L supercomputer, Supercomputing 2002 Technical Papers, November 2002 Can a continuum solvent model reproduce the free energy landscape of a beta-hairpin folding in water?, Proc. Natl. Acad. Sci. USA, Vol. 99, Issue 20, October 1, 2002, pp. 12777-12782 The free energy landscape for beta-hairpin folding in explicit water, Proc. Natl. Acad. Sci. USA, Vol. 98, Issue 26, December 18, 2001, pp. 14931-14936 Blue Gene Project Update Efficient multiple time step method for use with Ewald and particle mesh Ewald for large biomolecular systems, The Journal of Chemical Physics, Volume 115, Issue 5, 2001, pp. 2348-2358 Blue Gene: A vision for protein science using a petaflop supercomputer, IBM Systems Journal, Volume 40, Number 2, 2001, p. 310 Industry Links Unraveling the Mystery of Protein Folding Physicists Take on Challenge Of Showing How Proteins Fold, The Scientist The Bridge from Genes to Proteins Informace aplikaci AlphaServer společnosti HP v projektu GeneProt: http://www.hp.com/techservers/life_sciences/success_geneprot.pdf Informace o projektu skupiny „Computational biology“ (www.sun.com/edu/hpc/compbiosig): http://www.sun.com/products-nsolutions/edu/events/archive/hpc/presentations/june01/stefan_unger.pdf Databáze Dialog: Konference: (Vybráno bylo 61 záznamů z odborných konferencí v posledním období r. 2004) 1) A sequence-focused parallelisation of EMBOSS on a cluster of workstations Podesta, K.; Crane, M.; Ruskin, H.J. Sch. of Comput., Dublin City Univ., Ireland Conference: Computational Science and it's Applications - ICCSA 2004. International Conference. Proceedings (Lecture Notes in Comput. Sci. Vol.3045) Part: Vol.3 , Page: 473-80 Vol.3 Editor: Lagana, A.; Gavrilova,M.L.; Kumar,V.; Mun,Y.; Tan,C.J.K.; Gervasi,O. Publisher: Springer-Verlag , Berlin, Germany , 2004 , 4588 Pages Conference: Computational Science and it's Applications - ICCSA 2004. International Conference. Proceedings , Sponsor: Univ. of Perugia, Italy, Univ. of Calgary, Canada, Univ. of Minnesota, USA, Queen's Univ. of Belfast, UK, Heuchera Technol., UK, GRID.IT: Enabling Platforms for High-Performance Computational Grids Oriented to Scalable Virtual Organizations of the Minitsty of Sci. and Educ. of Italy, COST - European Cooperation in the Field of Sci. and Tech. Res , 14-17 May 2004 , Assisi, Italy Language: English Abstract: A number of individual bioinformatics applications (particularly BLAST and other sequence searching methods) have recently been implemented over clusters of workstations to take advantage of extra processing power. Performance improvements are achieved for increasingly large sets of input data (sequences and databases), using these implementations. We present an analysis of programs in the EMBOSS suite based on increasing sequence size, and implement these programs in parallel over a cluster of workstations using sequence segmentation with overlap. We observe general increases in runtime for all programs, and examine the speedup for the most intensive ones to establish an optimum segmentation size for those programs across the cluster. 2) Genome database integration Robinson, A.; Rahayu, W. Dept. Comput. Sci. & Comput. Eng., LaTrobe Univ., Bundoora, Vic., Australia Conference: Computational Science and it's Applications - ICCSA 2004. International Conference. Proceedings (Lecture Notes in Comput. Sci. Vol.3045) Part: Vol.3 , Page: 443-53 Vol.3 Editor: Lagana, A.; Gavrilova,M.L.; Kumar,V.; Mun,Y.; Tan,C.J.K.; Gervasi,O. Publisher: Springer-Verlag , Berlin, Germany , 2004 , 4588 Pages Conference: Computational Science and it's Applications - ICCSA 2004. International Conference. Proceedings , Sponsor: Univ. of Perugia, Italy, Univ. of Calgary, Canada, Univ. of Minnesota, USA, Queen's Univ. of Belfast, UK, Heuchera Technol., UK, GRID.IT: Enabling Platforms for High-Performance Computational Grids Oriented to Scalable Virtual Organizations of the Minitsty of Sci. and Educ. of Italy, COST - European Cooperation in the Field of Sci. and Tech. Res , 14-17 May 2004 , Assisi, Italy Language: English Abstract: This paper presents a solution to many of the problems in genome database integration including an integrated interface for accessing all genome databases simultaneously and the problem of a common interchange data format. The solution is the addition of a middle or mediation layer of a three layer approach. The solution provides a simple step by step approach to connect other existing genome databases quickly and efficiently. The internal data format used is a commonly used bioinformatics format called BSML, a subset of the XML standard. The architecture also allows easy addition and deletion of functionality. Finally, an implementation of this solution is presented with the required support functionality to validate the proposed integration method. 3) Cell modeling using agent-based formalisms Webb, K.; White, T. Sch. of Comput. Sci., Carleton Univ., Ont., Canada Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2004. Proceedings (Lecture Notes in Artificial Intelligence Vol.3029) , Page: 128-37 Editor: Orchard, B.; Yang, C.; Ali, M. Publisher: Springer-Verlag , Berlin, Germany , 2004 , xxi+1272 Pages Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2004. Proceedings , 17-20 May 2004 , Ottawa, Ont., Canada Language: English Abstract: The systems biology community is building increasingly complex models and simulations of cells and other biological entities. In doing so the community is beginning to look at alternatives to traditional representations such as those provided by ordinary differential equations (ODE). Making use of the object-oriented (OO) paradigm, the unified modeling language (UML) and real-time object-oriented modeling (ROOM) visual formalisms, we describe a simple model that includes membranes with lipid bilayers, multiple compartments including a variable number of mitochondria, substrate molecules, enzymes with reaction rules, and metabolic pathways. We demonstrate the validation of the model by comparison with Gepasi and comment on the reusability of model components. 4) Digital signal processing in predicting secondary structures of proteins Mitra, D.; Smith, M. Dept. of Comput. Sci., Florida Inst. of Technol., Melbourne, FL, USA Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2004. Proceedings (Lecture Notes in Artificial Intelligence Vol.3029) , Page: 40-9 Editor: Orchard, B.; Yang, C.; Ali, M. Publisher: Springer-Verlag , Berlin, Germany , 2004 , xxi+1272 Pages Conference: Innovations in Applied Artificial Intelligence. 17th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, IEA/AIE 2004. Proceedings , 17-20 May 2004 , Ottawa, Ont., Canada Language: English Abstract: Traditionally protein secondary structure prediction methods work with aggregate knowledge gleaned over a training set of proteins, or with some knowledge acquired from the experts about how to assign secondary structural elements to each amino acid. We are proposing here a methodology that is primarily targeted for any given query protein rather being trained over a pre-determined training set. For some query proteins our prediction accuracies are predictably higher than most other methods, while for other proteins they may not be so, but we would at least know that even before running the algorithms. Our method is based on homology-modeling. When a significantly homologous protein (to the query) with known structure is available in the database our prediction accuracy could be even 90% or above. Our objective is to improve the accuracy of the predictions for the so called "easy" proteins (where sufficiently similar homologues with known structures are available), rather than improving the bottom-line of the structure prediction problem, or the average prediction accuracy over many query proteins. We use digital signal processing (DSP) technique that is of global nature in assigning structural elements to the respective residues. This is the key to our success. We have tried some variation of the proposed core methodology and the experimental results are presented in this article. 5) Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering Publisher: IEEE , Los Alamitos, CA, USA , 2004 , xviii+613 Pages Conference: Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering , 19-21 May 2004 , Taichung, Taiwan Language: English Abstract: The following topics are dealt with: bioengineering; data integration, medical image processing; parallel computing; medical informatics; gene analysis; transcriptome and functional genomics; homology search; structural biology; algorithms; proteinprotein interactions, indexing techniques and intelligent systems. 6) Lymphoma cancer classification using genetic programming with SNR features Jin-Hyuk Hong; Sung-Bae Cho Dept. of Comput. Sci., Yonsei Univ., South Korea Conference: Genetic Programming. 7th European Conference on Genetic Programming EuroGP 2004. Proceedings. (Lecture Notes in Comput. Sci. Vol.3003) , Page: 78-88 Editor: Keijzer, M.; O'Reilly, U.-M.; Lucas, S.M.; Costa, E.; Soule, T. Publisher: Springer-Verlag , Berlin, Germany , 2004 , xi+410 Pages Conference: Genetic Programming. 7th European Conference on Genetic Programming EuroGP 2004. Proceedings , 5-7 April 2004 , Coimbra, Portugal Language: English Abstract: Lymphoma cancer classification with DNA microarray data is one of the important problems in bioinformatics. Many machine learning techniques have been applied to the problem and produced valuable results. However the medical field requires not only a high-accuracy classifier, but also the in-depth analysis and understanding of classification rules obtained. Since gene expression data have thousands of features, it is nearly impossible to represent and understand their complex relationships directly. In this paper, we adopt the SNR (signal-to-noise ratio) feature selection to reduce the dimensionality of the data, and then use genetic programming to generate cancer classification rules with the features. In the experimental results on Lymphoma cancer dataset, the proposed method yielded 96.6% test accuracy on average, and an excellent arithmetic classification rule set that classifies all the samples correctly is discovered by the proposed method. 7) Bioinformatics in the undergraduate curriculum: opportunities for computer science educators Burhans, D.T.; Doom, T.E.; DeJongh, M.; Leblanc, M. Dept. of Comput. Sci., Canisius Coll., Buffalo, NY, USA SIGCSE Bulletin Conference: SIGCSE Bull. (USA) , vol.36, no.1 , Page: 229-30 Publisher: ACM , March 2004 Conference: Thirty-Fifth SIGCSE Technical Symposium on Computer Science Education , Sponsor: ACM Spcial Interest Group on Comput. Sci. Educ , 3-7 March 2004 , Norfolk, VA, USA Language: English Abstract: Biology has become an increasingly data-driven science. Modern experimental techniques, including automated DNA sequencing, gene expression micro arrays, and Xray crystallography are producing molecular data at a rate that has made traditional data analysis methods impractical. Computational methods are becoming an increasingly important aspect of the evaluation and analysis of experimental data in molecular biology. Bioinformatics is the term coined for the new field that merges biology and computer science to manage and analyze this data, with the ultimate goal of understanding and modeling living systems (2003). The emergence of bioinformatics provides new challenges and opportunities for computer science educators. This panel assembles four individuals who collectively have experience teaching bioinformatics at both liberal arts colleges and universities, and who also have industry experience in bioinformatics, to discuss various approaches to incorporating bioinformatics into the undergraduate curriculum. 8) hMiDas and hMitChip: new opportunities in mitochondrial bioinformatics and genomic medicine Alesci, S.; Su, Y.A.; Chrousos, G.P. Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems , Page: 329-34 Editor: Long, R.; Antani, S.; Lee, D.J.; Nutter, B.; Zhang, M. Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xv+603 Pages Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems , Sponsor: IEEE Comput. Soc. Tech. Committee on Computational Medicine, Texas Tech Univ. College of Eng , 24-25 June 2004 , Bethesda, MD, USA Language: English Abstract: We developed a human mitochondria-focused gene database (hMiDas) and customized cDNA microarray chip (hMitChip) to help biomedical research in mitochondrial genomics. The current version of hMiDas contains 1,242 gene entries (including mtDNA genes, nuclear genes related to mitochondria structure and junctions, predicted loci and experimental genes), organized in 15 categories and 24 subcategories. The database interface allows keyword-based searches as well as advanced field and/or case-sensitive searches. Each gene record includes 19 fields, mostly hyperlinked to the corresponding source. Moreover, for each gene, the user is given the option to run literature search using PubMed, and gene/protein homology search using BLAST and FASTA. The hMitChip was constructed using hMiDas as a reference. Currently, it contains a selection of 501 mitochondria-related nuclear genes and 192 control elements, all spotted in duplicate on glass slides. Slide quality was checked by microarray hybridization with 50 mu g of Cy3labeled sample cDNA and Cy5-labeled comparing cDNA, followed by array scan and image analysis. The hMitChip was tested in vitro using RNA extracted from cancer cell lines. Gene expression changes detected by hMitChip were confirmed by quantitative real-time RT-PCR analysis. 9) From sequence to structure using PF2: improving methods for protein folding prediction Hussain, S. Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems , Page: 323-8 Editor: Long, R.; Antani, S.; Lee, D.J.; Nutter, B.; Zhang, M. Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xv+603 Pages Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems , Sponsor: IEEE Comput. Soc. Tech. Committee on Computational Medicine, Texas Tech Univ. College of Eng , 24-25 June 2004 , Bethesda, MD, USA Language: English Abstract: Projects dependent on proteomic data are challenged not by the lack of methods to analyze this information, but by the lack of means to capture and manage the data. A few primary players in the bioinformatics realm are promoting the use of selected standardized technologies to access biological data. Many organizations exposing bioinformatics tools, however, do not have the resources required for utilizing these technologies. In order to provide interfaces for non-standardized bioinformatics tools, open-source projects have led to the development of hundreds of software libraries. These tools lack architectural unity, making it difficult to script bioinformatics research projects, such as protein structure prediction algorithms, which involve the use of multiple tools in varying order and number. As a solution, we have focused on building a software model, named the Protein Folding Prediction Framework (PF2), which provides a unifying method for the addition and usage of connection modules to bioinformatics databases exposed via Web-based tools, software suites, or e-mail services. The framework provides mechanisms that allow users to create and add new connections without supplementary code as well as to introduce entirely new logical scenarios. In addition, PF2 offers a convenient interface, a multi-threaded execution-engine, and a built-in visualization suite to provide the bioinformatics community with an end-to-end solution for performing complex genomic and proteomic inquiries. 10) Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems Editor: Long, R.; Antani, S.; Lee, D.J.; Nutter, B.; Zhang, M. Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xv+603 Pages Conference: Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems , Sponsor: IEEE Comput. Soc. Tech. Committee on Computational Medicine, Texas Tech Univ. College of Eng , 24-25 June 2004 , Bethesda, MD, USA Language: English Abstract: The following topics are dealt with: medical databases; content-based image retrieval; medical systems; signal processing; imaging, telemedicine; data mining; image processing; pattern recognition; segmentation; medical devices; image processing tools; clinical applications; handheld computing for medicine; decision support systems; and bioinformatics. 11) Integrating ontology and workflow in PROTEUS, a grid-based problem solving environment for bioinformatics Cannataro, M.; Comito, C.; Guzzo, A.; Veltri, P. Univ. of Catanzaro, Italy Conference: Proceedings. ITCC 2004. International Conference on Information Technology: Coding and Computing Part: Vol.2 , Page: 90-4 Vol.2 Editor: Srimani, P.K. Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , 1710 Pages Conference: Proceedings. ITCC 2004. International Conference on Information Technology: Coding and Computing , Sponsor: IEEE Comput. Soc. Task Force on Information Technology for Business Application , 5-7 April 2004 , Las Vegas, NV, USA Language: English Abstract: Bioinformatics is as a bridge between life science and computer science: computer algorithms are needed to face complexity of biological processes. Bioinformatics applications manage complex biological data stored into distributed and often heterogeneous databases and require large computing power. We discuss requirements of such applications and present the architecture of PROTEUS, a grid-based problem solving environment that integrates ontology and workflow approaches to enhance composition and execution of bioinformatics applications on the grid. 12) Algorithm Theory - SWAT 2004. 9th Scandinavian Workshop on Algorithm Theory. Proceedings (Lecture Notes in Comput. Sci. Vol.3111) Editor: Hagerup, T.; Katajainen, J. Publisher: Springer-Verlag , Berlin, Germany , 2004 , xi+506 Pages Conference: Algorithm Theory - SWAT 2004. 9th Scandinavian Workshop on Algorithm Theory. Proceedings , Sponsor: DIKU, Univ. of Southern Denmark, Dept. of Math. and Comput. Sci., IT Univ. Copenhagen, Danish Nat. Sci. Res. Council, First Graduate School, LESS, Nokia, SAS , 8-10 July 2004 , Humlebaek, Denmark Language: English Abstract: The following topics are dealt with: dynamic multithreaded algorithms; cacheoblivious algorithms and data structures; graphs and trees; optimally competitive list batching; algorithmic complexity; scheduling; and approximation algorithms. 13) Proceedings of the IEEE 30th Annual Northeast Bioengineering Conference (IEEE Cat. No.04CH37524) Editor: Schreiner, S.; Cezeaux, J.L.; Muratore, D.M. Publisher: IEEE , Piscataway, NJ, USA , 2004 , xxiii+262 Pages Conference: Proceedings of the IEEE 30th Annual Northeast Bioengineering Conference , Sponsor: BEACON, Tyco Healthcare, Reebok, BEI, The Whitaker Found , 17-18 April 2004 , Springfield, MA, USA Language: English Abstract: The following topics were dealt with: neural engineering; biomedical instrumentation; medical imaging; physiological monitoring; cardiovascular biomechanics; biosensors; bioMEMS; biomaterials tissue and cellular engineering; rehabilitation engineering; telemedicine and virtual reality in medicine; biomedical education; pharmaceutical engineering; drug delivery; bio-optics; bioinformatics; surgical devices; and the medical applications of nanosystems and nanotechnology 14) Applications of Evolutionary Computing. Evo Workshops 2004: EvoBIO, EvoCOMNET, EvoHOT, EvoMUSART, and EvoSTOC. Proceedings (Lecture Notes in Comput. Sci. Vol.3005) Editor: Raidl, G.R. Publisher: Springer-Verlag , Berlin, Germany , 2004 , xix+562 Pages Conference: Applications of Evolutionary Computing. Evo Workshops 2004: EvoBIO, EvoCOMNET, EvoHOT, EvoMUSART, and EvoSTOC. Proceedings , Sponsor: EvoNET, Univ. of Coimbra , 5-7 April 2004 , Coimbra, Portugal Language: English Abstract: The following topics are dealt with: EvoBIO; evolutionary bioinformatics; EvoCOMNET; evolutionary computation; communications, networks, and connected systems; EvoHOT; hardware optimization techniques; binary decision diagrams; multilayer floorplan layout problem; EvoIASP; image analysis; signal processing; object recognition systems; EvoMUSART; evolutionary music; evolutionary art; EvoSTOC; evolutionary algorithms; stochastic environment; optimization problems; and dynamic environments. 15) Bioinformatics: a knowledge engineering approach Kasabov, N. Sch. of Bus., Auckland Univ. of Technol., New Zealand Conference: 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791) Part: Vol.1 , Page: 19-24 Vol.1 Editor: Yager, R.R.; Sgurev, V.S. Publisher: IEEE , Piscataway, NJ, USA , 2004 , 756 Pages Conference: 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings , Sponsor: IEEE Instrumentation and Measurement Soc., IEEE IM/CS/SMC Joint Chapter of Bulgaria , 22-24 June 2004 , Varna, Bulgaria Language: English Abstract: The paper introduces the knowledge engineering (KE) approach for the modeling and the discovery of new knowledge in bioinformatics. This approach extends the machine learning approach with various rule extraction and other knowledge representation procedures. Examples of the KE approach, and especially of one of the recently developed techniques - evolving connectionist systems (ECOS), to challenging problems in bioinformatics are given, that include: DNA sequence analysis, microarray gene expression profiling, protein structure prediction, finding gene regulatory networks, medical prognostic systems, computational neurogenetic modeling. 16) Unordered tree mining with applications to phylogeny Shasha, D.; Wang, J.T.L.; Sen Zhang Courant Inst. of Math. Sci., New York Univ., NY, USA Conference: Proceedings. 20th International Conference on Data Engineering , Page: 708-19 Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xx+880 Pages Conference: Proceedings. 20th International Conference on Data Engineering , Sponsor: Microsoft Res., bea, IBM, MITRE, Sun Microsystems , 30 March-2 April 2004 , Boston, MA, USA Language: English Abstract: Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. We present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grandparent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|/sup 2/) time where |T| is the number of nodes in T. Experimental results on synthetic data and phylogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees). 17) LDC: enabling search by partial distance in a hyper-dimensional space Koudas, N.; Ooi, B.C.; Shen, H.T.; Tung, A.K.H. Shannon Lab., AT&T Labs Res., Basking Ridge, NJ, USA Conference: Proceedings. 20th International Conference on Data Engineering , Page: 6-17 Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xx+880 Pages Conference: Proceedings. 20th International Conference on Data Engineering , Sponsor: Microsoft Res., bea, IBM, MITRE, Sun Microsystems , 30 March-2 April 2004 , Boston, MA, USA Language: English Abstract: Recent advances in research fields like multimedia and bioinformatics have brought about a new generation of hyper-dimensional databases which can contain hundreds or even thousands of dimensions. Such hyper-dimensional databases pose significant problems to existing high-dimensional indexing techniques which have been developed for indexing databases with (commonly) less than a hundred dimensions. To support efficient querying and retrieval on hyper-dimensional databases, we propose a methodology called local digital coding (LDC) which can support k-nearest neighbors (KNN) queries on hyper-dimensional databases and yet co-exist with ubiquitous indices, such as B+-trees. LDC extracts a simple bitmap representation called digital code(DC) for each point in the database. Pruning during KNN search is performed by dynamically selecting only a subset of the bits from the DC based on which subsequent comparisons are performed. In doing so, expensive operations involved in computing L-norm distance functions between hyper-dimensional data can be avoided. Extensive experiments are conducted to show that our methodology offers significant performance advantages over other existing indexing methods on both real life and synthetic hyper-dimensional datasets. 18) Proceedings. 20th International Conference on Data Engineering Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xx+880 Pages Conference: Proceedings. 20th International Conference on Data Engineering , Sponsor: Microsoft Res., bea, IBM, MITRE, Sun Microsystems , 30 March-2 April 2004 , Boston, MA, USA Language: English Abstract: The following topics are dealt with: XML; query processing; tree data structures; database management systems; Internet; indexing; semi-structured data; data mining; streams; sensors; middleware; workflow; Web data management; security; data warehouses; OLAP; enterprise systems; scientific and biological databases; bioinformatics; and clustering. 19) Design and implementation of a computational grid for bioinformatics Chao-Tung Yang; Yu-Lun Kuo; Chuan-Lin Lai Dept. of Comput. Sci. & Inf. Eng., Tunghai Univ., Taichung, Taiwan Conference: Proceedings. 2004 IEEE International Conference on e-Technology, eCommerce and e-Service , Page: 448-51 Editor: Yuan, S.-T.; Liu, J. Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xxi+575 Pages Conference: Proceedings. 2004 IEEE International Conference on e-Technology, eCommerce and e-Service , Sponsor: IEEE Task Committee of e-Commerce, Fu-Jen Univ. of Taiwan, BIKMrdc of Fu-Jen Univ., Academia Sinica, Nat. Sci. Council of Taiwan, Ministry of Educ. of Taiwan, Information Syst. Frontiers, Microsoft, ChungHwa Data Mining Soc , 28-31 March 2004 , Taipei, Taiwan Language: English Abstract: The popular technologies, Internet computing and grid technologies promise to change the way we tackle complex problems. They enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. And harnessing these new technologies effectively transforms scientific disciplines ranging from high-energy physics to the life sciences. The computational analysis of biological sequences is a kind of computation driven science. Cause the biology data growing quickly and these databases are heterogeneous. We can use the grid system sharing and integrating the heterogeneous biology database. As we know, bioinformatics tools can speed up analysis the large-scale sequence data, especially about sequence alignment and analysis. The FASTA is a tool for aligning multiple protein or nucleotide sequences. These two bioinformatics software, which we used is a distributed and parallel version. The software uses a message-passing library called MPI (message passing interface) and runs on distributed workstation clusters as well as on traditional parallel computers. A grid computing environment is proposed and constructed on multiple Linux PC clusters by using globus toolkit (GT) and SUN grid engine (SGE). The experimental results and performances of the bioinformatics tool using on grid system are also presented. 20) Aligning multiple sequences by genetic algorithm Li-fang Liu; Hong-wei Huo; Bao-shu Wang Sch. of Comput. Sci. & Technol., Xidian Univ., Xi'an, China Conference: 2004 International Conference on Communications, Circuits and Systems (IEEE Cat. No.04EX914) Part: Vol.2 , Page: 994-8 Vol.2 Publisher: IEEE , Piscataway, NJ, USA , 2004 , 1584 Pages Conference: 2004 International Conference on Communications, Circuits and Systems , Sponsor: Ministry of Educ. (MOE) of PR China, City Univ. of Hong Kong, K.C. Wong Educ. Found , 27-29 June 2004 , Chengdu, China Language: English Abstract: The paper presents a genetic algorithm for solving multiple sequence alignment in bioinformatics. The algorithm involves four different operators, one type of selection operator, two types of crossover operators, and one type of mutation operator; the mutation operator is realized by a dynamic programming method. Experimental results of benchmarks from the BAliBASE show that the proposed algorithm is feasible for aligning equidistant protein sequences, and the quality of alignment is comparable to that obtained with ClustalX. 21) Algorithms for estimating information distance with application to bioinformatics and linguistics Kaitchenko, A. Dept. of Phys. & Comput., Wilfrid Laurier Univ., Waterloo, Ont., Canada Conference: Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513) Part: Vol.4 , Page: 2255-8 Vol.4 Publisher: IEEE , Piscataway, NJ, USA , 2004 , 2908 Pages Conference: Canadian Conference on Electrical and Computer Engineering 2004 , Sponsor: Cisco Syst., General Elec., Ryerson Univ., AVFX Audio Visual, Bell Canada, Dofasco, Dye & Durham, Gennum Corp., IEEE Canada Found., Univ. of Toronto, Niagara College of Appl. Arts and Technol , 2-5 May 2004 , Niagara Falls, Ont., Canada Language: English Abstract: We review unnormalized and normalized information distances based on incomputable notions of Kolmogorov complexity and discuss how Kolmogorov complexity can be approximated by data compression algorithms. We argue that optimal algorithms for data compression with side information can be successfully used to approximate the normalized distance. Next, we discuss an alternative information distance, which is based on relative entropy rate (also known as Kullback-Leibler divergence), and compression- based algorithms for its estimation. We conjecture that in bioinformatics and computational linguistics this alternative distance is more relevant and important than the ones based on Kolmogorov complexity. 22) g the three-dimensional structures of proteins: combined alignment approach Jaehyun Sim; Seung-Yeon Kim; Jooyoung Lee; Ahrim Yoo Sch. of Comput. Sci., Korea Inst. for Adv. Study, Seoul, South Korea Journal of the Korean Physical Society Conference: J. Korean Phys. Soc. (South Korea) , vol.44, no.3, pt.1 , Page: 611-16 Publisher: Korean Phys. Soc , March 2004 Conference: 12th Thermal and Statistical Physics Workshop , 19-21 Aug. 2003 , Suanbo, Chungbuk, South Korea Language: English Abstract: Protein structure prediction is a great challenge in molecular biophysics and bioinformatics. Most approaches to structure, prediction use known structure information from the Protein Data Bank (PDB). In these approaches, it is most crucial to find a homologous protein (template) from the PDB to a query sequence and to align the query sequence to the template sequence. We propose a profile-profile alignment method based on the cosine similarity criterion, and combine this with a sequence-profile alignment, the secondary structure prediction of the query protein, and the experimental secondary structure of the template protein. Our method, which we call combined alignment, provides good results for the 1107 query-template pairs of the SCOP database and the CASP5 target proteins. They show that combined alignment significantly improves the recognition of distant homology. 23) The role of computer science in undergraduate bioinformatics education Burhans, D.T.; Skuse, G.R. Dept. of Comput. Sci., Canisius Coll., Buffalo, NY, USA SIGCSE Bulletin Conference: SIGCSE Bull. (USA) , vol.36, no.1 , Page: 417-21 Publisher: ACM , March 2004 Conference: Thirty-Fifth SIGCSE Technical Symposium on Computer Science Education , Sponsor: ACM Spcial Interest Group on Comput. Sci. Educ , 3-7 March 2004 , Norfolk, VA, USA Language: English Abstract: The successful implementation of educational programs in bioinformatics presents many challenges. The interdisciplinary nature of bioinformatics requires close cooperation between computer scientists and biologists despite inescapable differences in the ways in which members of these professions think. It is clear that the development of quality curricula for bioinformatics must draw upon the expertise of both disciplines. In addition, biologists and computer scientists can benefit from opportunities to carry out interdisciplinary research with one another. This paper examines the role of computer science in undergraduate bioinformatics education from the perspectives of two bioinformatics program directors. Their respective programs exemplify two substantively different approaches to undergraduate education in bioinformatics due to the fact that they are at markedly different institutions. One institution is a large, technical university, offering both undergraduate and graduate degrees in bioinformatics while the other is a small, Jesuit liberal arts college with an undergraduate program in bioinformatics. Despite these differences there is considerable overlap with respect to the role of computer science. This paper discusses the ways in which computer science has been integrated into these two undergraduate bioinformatics programs, compares alternative approaches, and presents some of the inherent challenges. 24) Challenges posed by adoption issues from a bioinformatics point of view Moise, D.L.; Wong, K.; Moise, G. Dept. of Comput. Sci., Alberta Univ., Edmonton, Alta., Canada Conference: "Fourth International Workshop on Adoption-Centric Software Engineering (ACSE 2004)" W6S Workshop - 26th International Conference on Software Engineering , Page: 75-9 Publisher: IEE , Stevenage, UK , 2004 , vi+85 Pages Conference: "Fourth International Workshop on Adoption-Centric Software Engineering (ACSE 2004)" W6S Workshop - 26th International Conference on Software Engineering , Sponsor: IEEE Comput. Soc., SIGSOFT, IEE , 25 May 2004 , Edinburgh, Scotland, UK Language: English Abstract: Developing interoperability models for data is a crucial factor for the adoption of research tools within industry. In this paper, we discuss efficient data interoperability models within a field where they are highly needed: the bioinformatics field. We present the challenges that interoperability models for data must face within this field and we discuss some existing strategies built to address these challenges. The potential of a semi-structured data model based on XML is discussed. Also, a novel approach that enhances the capabilities of the data integration model by automatically identifying XML documents generated based on the same DTD is presented. Practices developed within this application domain can be used for the benefit of similar adoption issues in various other domains. 25) Software engineering challenges in bioinformatics Barker, J.; Thornton, J. Eur. Bioinformatics Inst., Cambridge, UK Conference: Proceedings. 26th International Conference on Software Engineering , Page: 12-15 Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xviii+786 Pages Conference: Proceedings. 26th International Conference on Software Engineering , Sponsor: IEE, Assoc. for Comput. Machinery Special Interest Group on Software Eng., IEEE Comput. Soc , 23-28 May 2004 , Edinburgh, UK Language: English Abstract: Data from biological research is proliferating rapidly and advanced data storage and analysis methods are required to manage it. We introduce the main sources of biological data available and outline some of the domain specific problems associated with automated analysis. We discuss two major areas in which we are likely experience software engineering challenges over the next ten years: data integration and presentation. 26) BLID: an application of logical information systems to bioinformatics Ferre, S.; King, R.D. Dept. of Comput. Sci., Wales Univ., Aberystwyth, UK Conference: Concept Lattices. Second International Conference on Formal Concept Analysis, ICFCA 2004. Proceedings (Lecture Notes in Artificial Intelligence Vol.2961) , Page: 47-54 Editor: Eklund, P. Publisher: Springer-Verlag , Berlin, Germany , 2004 , ix+409 Pages Conference: Concept Lattices. Second International Conference on Formal Concept Analysis, ICFCA 2004. Proceedings , 23-26 Feb. 2004 , Sydney, NSW, Australia Language: English Abstract: BLID (bio-logical intelligent database) is a bioinformatic system designed to help biologists extract new knowledge from raw genome data by providing high-level facilities for both data browsing and analysis. We describe BLID's novel data browsing system which is based on the idea of logical information systems. This enables combined querying and navigation of data in BLID (extracted from public bioinformatic repositories). The browsing language is a logic especially designed for bioinformatics. It currently includes sequence motifs, taxonomies, and macromolecule structures, and it is designed to be easily extensible, as it is composed of reusable components. Navigation is tightly combined with this logic, and assists users in browsing a genome through a form of human-computer dialog. 27) The automatic generation of programs for classification problems with grammatical swarm O'Neill, M.; Brabazon, A.; Adley, C. Biocomputing & Dev. Syst. Group, Univ. of Limerick, Ireland Conference: Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753) Part: Vol.1 , Page: 104-10 Vol.1 Publisher: IEEE , Piscataway, NJ, USA , 2004 , xxx+2371 Pages Conference: Proceedings of the 2004 Congress on Evolutionary Computation , Sponsor: IEEE Neural Network Soc., Evolutionary Programming Soc., IEE , 19-23 June 2004 , Portland, OR, USA Language: English Abstract: This case study examines the application of grammatical swarm to classification problems, and illustrates the particle swarm algorithms' ability to specify the construction of programs. Each individual particle represents choices of program construction rules, where these rules are specified using a Backus-Naur Form grammar. Two problem instances are tackled, the first a mushroom classification problem, the second a bioinformatics problem that involves the detection of eukaryotic DNA promoter sequences. For the first problem we generate solutions that take the form of conditional statements in a C-like language subset, and for the second problem we generate simple regular expressions. The results demonstrate that it is simple regular expressions. The results demonstrate that it is possible to generate programs using the grammatical swarm technique with a performance similar to the grammatical evolution evolutionary automatic programming approach. 28) Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753) Part: Vol.1 Publisher: IEEE , Piscataway, NJ, USA , 2004 , xxx+2371 Pages Conference: Proceedings of the 2004 Congress on Evolutionary Computation , Sponsor: IEEE Neural Network Soc., Evolutionary Programming Soc., IEE , 19-23 June 2004 , Portland, OR, USA Language: English Abstract: The following topics are discussed: evolutionary multiobjective optimization; evolutionary algorithms; combinatorial and numerical optimization; swarm intelligence; evolutionary computation and games; evolutionary computation in bioinformatics and computational biology; evolutionary design; evolutionary computing in the process industry; evolutionary computation in finance and economics; evolutionary scheduling; evolutionary design and evolvable hardware; evolutionary design automation; evolutionary computation in cryptology and computer security; learning and approximation in design optimization; and coevolution and collective behavior. 29) Construct a grid computing environment for bioinformatics Yu-Lun Kuo; Chao-Tung Yang; Chuan-Lin Lai; Tsai-Ming Tseng Dept. of Comput. Sci. & Inf. Eng., Tunghai Univ., Taichung, Taiwan Conference: Proceedings. 7th International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'04 , Page: 339-44 Editor: Hsu, D.F.; Hiraki, K.; Shen, S.; Sudborough, H. Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xvi+645 Pages Conference: Proceedings. 7th International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'04 , Sponsor: Univ. of Hong Kong , 10-12 May 2004 , Hong Kong, China Language: English Abstract: Internet computing and grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. And harnessing these new technologies effectively will transform scientific disciplines ranging from highenergy physics to the life sciences. The computational analysis of biological sequences is a kind of computation driven science. Cause the biology data growing quickly and these databases are heterogeneous. We can use the grid system sharing and integrating the heterogeneous biology database. As we know, bioinformatics tools can speed up analysis the large-scale sequence data, especially about sequence alignment. The FASTA is a tool for aligning multiple protein or nucleotide sequences. FASTA which we used is a distributed and parallel version. The software uses a message-passing library called MPl (Message Passing Interface) and runs on distributed workstation clusters as well as on traditional parallel computers. A grid computing environment is proposed and constructed on multiple Linux PC clusters by using Globus Toolkit (GT) and SUN Grid Engine (SGE). The experimental results and performances of the bioinformatics tool using on grid system are also presented in this paper. 30) Proceedings. 7th International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'04 Editor: Hsu, D.F.; Hiraki, K.; Shen, S.; Sudborough, H. Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xvi+645 Pages Conference: Proceedings. 7th International Symposium on Parallel Architectures, Algorithms and Networks. I-SPAN'04 , Sponsor: Univ. of Hong Kong , 10-12 May 2004 , Hong Kong, China Language: English Abstract: The following topics are dealt with: routing; wireless networks; content distribution; parallel algorithms; interconnection networks; fault tolerance; graphs; load balancing; semantic Web; data distribution; communication performance; parallel architecture; Internet technology and applications; quality of service; optical networks; mobile computing; network security and management; and bioinformatics. 31) Experiences on adaptive grid scheduling of parameter sweep applications Huedo, E.; Montero, R.S.; Llorente, I.M. Lab. Computacion Avanzada, CSIC-INTA, Torrejon de Ardoz, Spain Conference: Proceedings. 12th Euromicro Conference on Parallel, Distributed and NetworkBased Processing , Page: 28-33 Publisher: IEEE Comput. Soc , Los Alamitos, CA, USA , 2004 , xiii+442 Pages Conference: Proceedings. 12th Euromicro Conference on Parallel, Distributed and NetworkBased Processing , 11-13 Feb. 2004 , Coruna, Spain Language: English Abstract: Grids offer a dramatic increase in the number of available compute and storage resources that can be delivered to applications. This new computational infrastructure provides a promising platform to execute loosely coupled, high-throughput parameter sweep applications. This kind of applications arises naturally in many scientific and engineering fields like bioinformatics, computational fluid dynamics (CFD), particle physics, etc. The efficient execution and scheduling of parameter sweep applications is challenging because of the dynamic and heterogeneous nature of grids. We present a scheduling algorithm built on top of the GridWay framework that combines: (i) adaptive scheduling to reflect the dynamic grid characteristics; (ii) adaptive execution to migrate running jobs to better resources and provide fault tolerance; (iii) re-use of common files between tasks to reduce the file transfer overhead. The efficiency of the approach is demonstrated in the execution of a CFD application on a highly heterogeneous research testbed. 32) Asynchronous HMM with applications to speech recognition Garg, A.; Balakrishnan, S.; Vaithyanathan, S. Almaden Res. Center, San Jose, CA, USA Conference: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing Part: vol.1 , Page: I-1009-12 vol.1 Publisher: IEEE , Piscataway, NJ, USA , 2004 , 5 vol. (cix+1045) Pages Conference: 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing , 17-21 May 2004 , Montreal, Que., Canada Language: English Abstract: We develop a novel formalism for modeling speech signals which are irregularly or incompletely sampled. This situation can arise in real world applications where the speech signal is being transmitted over an error prone channel where parts of the signal can be dropped. Typical speech systems based on hidden Markov models, cannot handle such data since HMMs rely on the assumption that observations are complete and made at regular intervals. We introduce the asynchronous HMM, a variant of the inhomogeneous HMM commonly used in bioinformatics, and show how it can be used to model irregularly or incompletely sampled data. A nested EM algorithm is presented in brief which can be used to learn the parameters of this asynchronous HMM. Evaluation on real world speech data, which has been modified to simulate channel errors, shows that this model and its variants significantly outperform the standard HMM and methods based on data interpolation 33) An asynchronous GALS interface with applications Smith, S.F. Electr. & Comput. Eng. Dept., Boise State Univ., USA Conference: 2004 IEEE Workshop on Microelectronics and Electron Devices (IEEE Cat. No.04EX810) , Page: 41-4 Publisher: IEEE , Piscataway, NJ, USA , 2004 , xii+136 Pages Conference: 2004 IEEE Workshop on Microelectronics and Electron Devices , 16 April 2004 , Boise, ID, USA Language: English Abstract: A low-latency asynchronous interface for use in globally-asynchronous locallysynchronous (GALS) integrated circuits is presented. The interface is compact and does not alter the local clocks of the interfaced local clock domains in any way (unlike many existing GALS interfaces). Two applications of the interface to GALS systems are shown. The first is a single-chip shared-memory multiprocessor for generic supercomputing use. The second is an application-specific coprocessor for hardware acceleration of the SmithWaterman algorithm. This is a bioinformatics algorithm used for sequence alignment (similarity searching) between DNA or amino acid (protein) sequences and sequence databases such as the recently completed human genome database. 34) Computational Methods for SNPs and Haplotype Inference. DIMACS/RECOMB Satellite Workshop. Revised Papers. (Lecture Notes in Bioinformatics Vol.2983) Editor: Istrail, S.; Waterman, M.; Clark, A. Publisher: Springer-Verlag , Berlin, Germany , 2004 , ix+152 Pages Conference: Computational Methods for SNPs and Haplotype Inference. DIMACS/RECOMB Satellite Workshop. Revised Papers , 21-22 Nov. 2002 , Piscataway, NJ, USA Language: English Abstract: The conference focused on methods for SNP and haplotype analysis and their applications to disease associations. The ability to score large numbers of DNA variants (SNPs) in large samples of humans is rapidly accelerating, as is the demand to apply these data to tests of association with diseased states. The problem suffers from excessive dimensionality, so any means of reducing the number of dimensions of the space of genotype classes in a biologically meaningful way would likely be of benefit. Linked SNPs are often statistically associated with one another (in "linkage disequilibrium"), and the number of distinct configurations of multiple tightly linked SNPs in a sample is often far lower than one would expect from independent sampling. These joint configurations, or haplotypes, might be a more biologically meaningful unit, since they represent sets of SNPs that co-occur in a population. Recently there has been much excitement over the idea that such haplotypes occur as blocks across the genome, as these blocks suggest that fewer distinct SNPs need to be scored to capture the information about genotype identity. There is need for formal analysis of this dimension reduction problem, for formal treatment of the hierarchical structure of haplotypes, and for consideration of the utility of these approaches toward meeting the end goal of finding genetic variants associated with complex diseases. 35) IT service infrastructure for integrative systems biology Curcin, Vasa; Ghanem, Moustafa; Guo, Yike; Rowe, Anthony; He, Wayne; Pei, Hao; Qiang, Lu; Li, Yuanyuan Department of Computing Imperial College London, London SW7 2BZ, United Kingdom Conference: Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 , Shanghai, China , 20040915-20040918 , (Sponsor: IEEE Computer Society, TSC-SC; IBM T.J. Watson Research Center; Shanghai Jiao Tong University (SJTU), China; University of Hong Kong, E-Business Technology Institute, China) Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 2004. , 2004 Language: English Abstract: Despite the large number of software tools and hardware platforms aiming to solve the problems that bioinformatics is facing today, there is no platform solution that can scale up to its demands, in terms of both scope and sheer volume. DiscoveryNet scientific workflow system is here extended into a service-centric component architecture that brings together cross-domain applications through web and grid services and composes them as novel service offerings. Two case studies implemented on top of the platform, SARS analysis and microarray/metabonomics, are described. 36) Integrating text mining into distributed bioinformatics workflows: A Web services implementation Gaizauskas, Rob; Davis, Neil; Demetriou, George; Guo, Yikun; Roberts, Ian Department of Computer Science University of Sheffield, Sheffield, United Kingdom Conference: Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 , Shanghai, China , 20040915-20040918 , (Sponsor: IEEE Computer Society, TSC-SC; IBM T.J. Watson Research Center; Shanghai Jiao Tong University (SJTU), China; University of Hong Kong, E-Business Technology Institute, China) Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 Proceedings - 2004 IEEE International Conference on Services Computing, SCC 2004 2004. , 2004 Language: English Abstract: Workflows are useful ways to support scientific researchers in carrying out repetitive analytical tasks on digital information. Web services can provide a useful implementation mechanism for workflows, particularly when they are distributed, i.e., where some of the data or processing resources are remote from the scientist initiating the workflow. While many scientific workflows primarily involve operations on structured or numerical data, all interpretation of results is done in the context of related work in the field, as reflected in the scientific literature. Text mining technology can assist in automatically building helpful pathways into the relevant literature as part of a workflow in order to support the scientific discovery process. In this paper we demonstrate how these three technologies - workflows, text mining, and web services - can be fruitfully combined in order to support bioinformatics researchers investigating the genetic basis of two physiological disorders - Graves' disease and Williams syndrome. 37) Bioinformatics and Systems Biology, rapidly evolving tools for interpreting plant response to global change Blanchard, Jeffrey L. Conference: Linking Functional Genomics with Physiology for Global Change , Denver, CO, United States , 20031105-20031105 Field Crops Research v 90 n 1 Nov 8 2004. p 117-131 , 2004 Language: English Abstract: Global change is impacting the evolutionary trajectory of our planet's biota. In spite of the widely appreciated magnitude of this process, we still have a limited ability to estimate biological effects of increased atmospheric CO//2 or of climate change. Many new molecular techniques, including microarrays and metabolic profiling, are emerging that allow the direct observation of the vast repertoire of an organism's cellular processes in laboratory and ecological settings. The challenge now is to integrate these large data sets containing spatial and temporal components into models that enable us to explain how organisms respond to increased atmospheric CO //2 and eventually to develop models that accurately predict their evolutionary trajectory. In response, the field of bioinformatics is expanding to better facilitate information transfer between laboratory experiments and mathematical modeling in support of the emerging field of Systems Biology. copy 2004 Elsevier B.V. All rights reserved. 38) Integration of genomics approach with traditional breeding towards improving abiotic stress adaptation: Drought and aluminum toxicity as case studies Ishitani, Manabu; Rao, Idupulapati; Wenzl, Peter; Beebe, Steve; Tohme, Joe Conference: Linking Functional Genomics with Physiology for Global Change , Denver, CO, United States , 20031105-20031105 Field Crops Research v 90 n 1 Nov 8 2004. p 35-45 , 2004 Language: English Abstract: Traditional breeding efforts are expected to be greatly enhanced through collaborative approaches incorporating functional, comparative and structural genomics. Potential benefits of combining genomic tools with traditional breeding have been a source of widespread interest and resulted in numerous efforts to achieve the desired synergy among disciplines. The International Center for Tropical Agriculture (CIAT) is applying functional genomics by focusing on characterizing genetic diversity for crop improvement in common bean (Phaseolus vulgaris L.), cassava (Manihot esculenta Crantz), tropical grasses, and upland rice (Oriza sativa L.). This article reviews how CIAT combines genomic approaches, plant breeding, and physiology to understand and exploit underlying genetic mechanisms of abiotic stress adaptation for crop improvement. The overall CIAT strategy combines both bottom-up (gene to phenotype) and top-down (phenotype to gene) approaches by using gene pools as sources for breeding tools. The strategy offers broad benefits by combining not only in-house crop knowledge, but publicly available knowledge from well-studied model plants such as arabidopsis left bracket Arabidopsis thaliana (L.) Heynh. right bracket . Successfully applying functional genomics in trait gene discovery requires diverse genetic resources, crop phenotyping, genomics tools integrated with bioinformatics and proof of gene function in planta (proof of concept). In applying genomic approaches to crop improvement, two major gaps remain. The first gap lies in understanding the desired phenotypic trait of crops in the field and enhancing that knowledge through genomics. The second gap concerns mechanisms for applying genomic information to obtain improved crop phenotypes. A further challenge is to effectively combine different genomic approaches, integrating information to maximize crop improvement efforts. Research at CIAT on drought tolerance in common bean and aluminum resistance in tropical forage grasses (Brachiaria spp.) is used to illustrate the opportunities and constraints in breeding for adaptation to abiotic stresses. 39) From sequence to structure using PF2: Improving methods for protein folding prediction Hussain, Saleem Conference: Proceedings 17th IEEE Symposium on Computer-Based Medical Systems, CBMS 2004 , Bethesda, MD, United States , 20040624-20040625 , (Sponsor: IEEE Computer Society; Texas Tech University College of Engineering) Proceedings of the IEEE Symposium on Computer-Based Medical Systems Proceedings 17th IEEE Symposium on Computer-Based Medical Systems, CBMS 2004 v 17 2004. , 2004 Language: English Abstract: Projects dependent on proteomic data are challenged not by the lack of methods to analyze this information, but by the lack of means to capture and manage the data. A few primary players in the bioinformatics realm are promoting the use of selected standardized technologies to access biological data. Many organizations exposing bioinformatics tools, however, do not have the resources required for utilizing these technologies. In order to provide interfaces for non-standardized bioinformatics tools, open-source projects have led to the development of hundreds of software libraries. These tools lack architectural unity, making it difficult to script bioinformatics research projects, such as protein structure prediction algorithms, which involve the use of multiple tools in varying order and number. As a solution, we have focused on building a software model, named the Protein Folding Prediction Framework (PF2), which provides a unifying method for the addition and usage of connection modules to bioinformatics databases exposed via web-based tools, software suites, or e-mail services. The framework provides mechanisms that allow users to create and add new connections without supplementary code as well as to introduce entirely new logical scenarios. In addition, PF2 offers a convenient interface, a multi-threaded execution-engine, and a built-in visualization suite to provide the bioinformatics community with an end-to-end solution for performing complex genomic and proteomic inquiries. 40) 26th international conference on software engineering: ICSE 2004 Anon (Ed.) Conference: Proceedings - 26th International Conference on Software Engineering, ICSE 2004 , Edinburgh, United Kingdom , 20040523-20040528 , (Sponsor: Institution of Electrical Engineers, IEE; British Computer Society, BCS; Association for Computing Machinery, ACM SIGSOFT; Association for Computing Machinery, ACM SIGPLAN; IEEE Computer Society Technical Council on Software Engineering) Proceedings - International Conference on Software Engineering Proceedings - 26th International Conference on Software Engineering, ICSE 2004 v 26 2004. , 2004 Language: English Abstract: The proceedings contains 122 papers from the 26**t**h International Conference on Software Engineering: ICSE 2004. The topics discussed include: Controlling the complexity of software designs; software engineering challenges in bioinformatics; adding high availability and autonomic behavior to Web services; grid small and large: distributed systems and global communities; a model driven approach for software systems reliability; component-based self-adaptability in peer-to-peer architectures and one more step in the direction of modularized integration concerns. 41) Software engineering challenges in bioinformatics Barker, Jonathan; Thornton, Janet European Bioinformatics Institute Wellcome Trust Genome Campus, Cambridge CB10 1SD, United Kingdom Conference: Proceedings - 26th International Conference on Software Engineering, ICSE 2004 , Edinburgh, United Kingdom , 20040523-20040528 , (Sponsor: Institution of Electrical Engineers, IEE; British Computer Society, BCS; Association for Computing Machinery, ACM SIGSOFT; Association for Computing Machinery, ACM SIGPLAN; IEEE Computer Society Technical Council on Software Engineering) Proceedings - International Conference on Software Engineering Proceedings - 26th International Conference on Software Engineering, ICSE 2004 v 26 2004. , 2004 Language: English Abstract: Data from biological research is proliferating rapidly and advanced data storage and analysis methods are required to manage it. We introduce the main sources of biological data available and outline some of the domain-specific problems associated with automated analysis. We discuss two major areas in which we are likely experience software engineering challenges over the next ten years: data integration and presentation. 42) hMiDas and hMitChip: New opportunities in mitochondrial bioinformatics and genomic medicine Alesci, Salvatore; Su, Yan A.; Chrousos, George P. Conference: Proceedings 17th IEEE Symposium on Computer-Based Medical Systems, CBMS 2004 , Bethesda, MD, United States , 20040624-20040625 , (Sponsor: IEEE Computer Society; Texas Tech University College of Engineering) Proceedings of the IEEE Symposium on Computer-Based Medical Systems Proceedings 17th IEEE Symposium on Computer-Based Medical Systems, CBMS 2004 v 17 2004. , 2004 Language: English Abstract: We developed a human mitochondria-focused gene database (hMiDas) and customized cDNA microarray chip (hMitChip) to help biomedical research in mitochondrial genomics. The current version of hMiDas contains 1,242 gene entries (including mtDNA genes, nuclear genes related to mitochondria structure and functions, predicted loci and experimental genes), organized in 15 categories and 24 subcategories. The database interface allows keyword-based searches as well as advanced field and/or case-sensitive searches. Each gene record includes 19 fields, mostly hyperlinked to the corresponding source. Moreover, for each gene, the user is given the option to run literature search using PubMed, and gene/protein homology search using BLAST and FASTA. The hMitChip was constructed using hMiDas as a reference. Currently, it contains a selection of 501 mitochondria-related nuclear genes and 192 control elements, all spotted in duplicate on glass slides. Slide quality was checked by microarray hybridization with 50 mug of Cy3labeled sample cDNA and Cy5-labeled comparing cDNA, followed by array scan and image analysis. The hMitChip was tested in vitro using RNA extracted from cancer cell lines. Gene expression changes detected by hMitChip were confirmed by quantitative real-time RT-PCR analysis. 43) DWDM-RAM: A data intensive grid service architecture enabled by dynamic optical networks Lavian, T.; Mambretti, J.; Cutrell, D.; Cohen, H.; Merrill, S.; Durairaj, R.; Daspit, P.; Monga, I.; Naiksatam, S.; Figueira, S.; Gutierrez, D.; Hoang, D.; Travostino, F. Conference: 2004 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004 , Chicago, IL, United States , 20040419-20040422 , (Sponsor: Institute of Electrical and Electronics Engineers, IEEE) 2004 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004 2004 IEEE International Symposium on Cluster Computing and the Grid, CCGrid 2004 2004. , 2004 Language: English Abstract: Next generation applications and architectures (for example, Grids) are driving radical changes in the nature of traffic, service models, technology, and cost, creating opportunities for an advanced communications infrastructure to tackle next generation data services. To take advantage of these trends and opportunities, research communities are creating new architectures, such as the Open Grid Service Architecture (OGSA), which are being implemented in new prototype advanced infrastructures. The DWDM-RAM project, funded by DARPA, is actively addressing the challenges of next generation applications. DWDM-RAM is an architecture for data-intensive services enabled by next generation dynamic optical networks. It develops and demonstrates a novel architecture for new data communication services, within the OGSA context, that allows for managing extremely large sets of distributed data. Novel features move network services beyond notions of the network as a managed resource, for example, by including capabilities for dynamic on-demand provisioning and advance scheduling. DWDM-RAM encapsulates optical network resources (Lambdas, lightpaths) into a Grid Service and integrates their management within the Open Grid Service Architecture. Migration to emerging standards such as WS-Resource Framework (WS-RF) should be staright forward. In initial applications, DWDM-RAM targets specific data-intensive services such as rapid, massive data transfers used by large scale eScience applications, including: high-energy physics, geophysics, life science, bioinformatics, genomics, medical morphometry, tomography, microscopy imaging, astronomical and astrophysical imaging, complex modeling, and visualization. 44) Soft Semantic Web services agent Wang, Haibin; Zhang, Yan-Qing; Sunderraman, Rajshekhar Department of Computer Science Georgia State University, Atlanta, GA 30302, United States Conference: NAFIPS 2004 - Annual Meeting of the North American Fuzzy Information Processing Society: Fuzzy Sets in the Heart of the Canadian Rockies , Banff, Alta, Canada , 20040627-20040630 , (Sponsor: IEEE Systems, Man, and Cybernetics Society; North American Fuzzy Information Processing Society,NAFIPS; Institute of Electrical and Electronics Engineers, IEEE) Annual Conference of the North American Fuzzy Information Processing Society - NAFIPS NAFIPS 2004 - Annual Meeting of the North American Fuzzy Information Processing Society: Fuzzy Sets in the Heart of the Canadian Rockies v 1 2004. , 2004 Language: English Abstract: Web services play an active role in the business integration and other fields such as bioinformatics. Current Web services technologies such as WSDL, UDDI, BPEL4WS and BSML are not semantic-oriented. Several proposals have been proposed to develop Semantic Web services to facilitate the discovery of relevant Web services. In our vision, with the mature of Semantic Web services technologies, there will be a lot of public or private Semantic Web services Registries based on specific ontologies. These Registries may provide a lot of similar Web services. So how to provide the high quality of service (QoS) Semantic Web services for specific domain using these Registries will be a challenge task. Different domains have different requirements of QoS, it is impractical to use classical mathematical modeling methods to evaluate the QoS of Semantic Web services. In this paper, we propose a framework called Soft Semantic Web services Agent (SSWSA) for providing high QoS Semantic Web services using soft computing methodology. And we will use fuzzy neural network with GA learning algorithm as our study case. Simulation result shows that the SSWSA could handle fuzzy and uncertain QoS metrics effectively. 45) Asynchronous HMM with applications to speech recognition Garg, Ashutosh; Balakrishnan, Sreeram; Vaithyanathan, Shivakumar IBM Almaden Research Center, San Jose, CA 95120, United States Conference: Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing , Montreal, Que, Canada , 20040517-20040521 , (Sponsor: Institute of Electrical and Electronics Engineers,) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing Proceedings Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing v 1 2004. , 2004 Language: English Abstract: We develop a novel formalism for modeling speech signals which are irregularly or incompletely sampled. This situation can arise in real world applications where the speech signal is being transmitted over an error prone channel where parts of the signal can be dropped. Typical speech systems based on Hidden Markov Models, cannot handle such data since HMMs rely on the assumption that observations are complete and made at regular intervals. In this paper we introduce the asynchronous HMM, a variant of the inhomogenous HMM commonly used in Bioinformatics, and show how it can be used to model irregularly or incompletely sampled data. A nested EM algorithm is presented in brief which can be used to learn the parameters of this asynchronous HMM. Evaluation on real world speech data that has been modified to simulate channel errors, shows that this model and its variants significantly outperforms the standard HMM and methods based on data interpolation. 46) Toward large-scale modeling of the microbial cell for computer simulation Ishii, Nobuyoshi; Robert, Martin; Nakayama, Yoichi; Kanai, Akio; Tomita, Masaru Conference: Highlights from the ECB11: Building Bridges Between Bioscience , Basel, Switzerland , 20030801-20030801 Journal of Biotechnology v 113 n 1-3 Sep 30 2004. p 281-294 , 2004 Language: English Abstract: In the post-genomic era, the large-scale, systematic, and functional analysis of all cellular components using transcriptomics, proteomics, and metabolomics, together with bioinformatics for the analysis of the massive amount of data generated by these "omics" methods are the focus of intensive research activities. As a consequence of these developments, systems biology, whose goal is to comprehend the organism as a complex system arising from interactions between its multiple elements, becomes a more tangible objective. Mathematical modeling of microorganisms and subsequent computer simulations are effective tools for systems biology, which will lead to a better understanding of the microbial cell and will have immense ramifications for biological, medical, environmental sciences, and the pharmaceutical industry.In this review, we describe various types of mathematical models (structured, unstructured, static, dynamic, etc.), of microorganisms that have been in use for a while, and others that are emerging. Several biochemical/cellular simulation platforms to manipulate such models are summarized and the E-Cell system**1 developed in our laboratory is introduced. Finally, our strategy for building a "whole cell metabolism model", including the experimental approach, is presented. copy 2004 Elsevier B.V. All rights reserved. 47) Bringing planning to autonomic applications with ABLE Srivastava, Biplav; Bigus, Joseph P.; Schlosnagle, Donald A. IBM India Research Laboratory IIT Delhi, Hauz Khas, New Delhi 110016, India Conference: Proceedings - International Conference on Autonomic Computing , New York, NY, United States , 20040517-20040518 , (Sponsor: IEEE Computer Society; IBM; Sun Microsystems; National Science Foundation) Proceedings - International Conference on Autonomic Computing Proceedings International Conference on Autonomic Computing 2004. , 2004 Language: English Abstract: Planning has received tremendous interest as a research area within AI over the last three decades but it has not been applied commercially as widely as its other AI counterparts like learning or data mining. The reasons are many: the utility of planning in business applications was unclear, the planners used to work best in small domains and there was no general purpose planning and execution infrastructure widely available. Much has changed lately. Compelling applications have emerged, e.g., computing systems have become so complex that the IT industry recognizes the necessity of deliberative methods to make these systems self-configuring, self-healing, selfoptimizing and self-protecting. Planning has seen an upsurge in the last decade with new planners that are orders of magnitude faster than before and are able to scale this performance to complex domains, e.g., those with metric and temporal constraints. However, planning and execution infrastructure is still tightly tied to a specific application which can have its own idiosyncrasies. In this paper, we fill the infrastructural gap by providing a domain independent planning and execution environment that is implemented in the ABLE agent building toolkit, and demonstrate its ability to solve practical business applications. The planning-enabled ABLE is publicly available and is being used to solve a variety of planning applications in IBM including the selfmanagement/autonomic computing scenarios. 48) Design and implementation of a computational Grid for bioinformatics Yang, Chao-Tung; Kuo, Yu-Lun; Lai, Chuan-Lin High-Perf. Computing Laboratory Department of Computer Science Tunghai University, Taichung, 407, Taiwan Conference: Proceedings - 2004 IEEE International Conference on e-Technology, eCommerce and e-Service, EEE 2004 , Taipei, Taiwan , 20040328-20040331 , (Sponsor: IEEE Task Committee on E-Commerce; Fu-Jen University of Taiwan; BIKMrdc of Fu-Jen University; Academia Sinica; National Science Council of Taiwan) Proceedings - 2004 IEEE International Conference on e-Technology, e-Commerce and eService, EEE 2004 Proceedings - 2004 IEEE International Conference on e-Technology, eCommerce and e-Service, EEE 2004 2004. , 2004 Language: English Abstract: The popular technologies, internet computing and Grid technologies promise to change the way we tackle complex problems. They will enable large-scale aggregation and sharing of computational, data and other resources across institutional boundaries. And harnessing these new technologies effectively will transform scientific disciplines ranging from high-energy physics to the life sciences. The computational analysis of biological sequences is a kind of computation driven science. Cause the biology data growing quickly and these databases are heterogeneous. We can use the grid system sharing and integrating the heterogeneous biology database. As we know, bioinformatics tools can speed up analysis the large-scale sequence data, especially about sequence alignment and analysis. The FASTA is a tool for aligning multiple protein or nucleotide sequences. These two bioinformatics software which we used is a distributed and parallel version. The software uses a message-passing library called MPI (Message Passing Interface) and runs on distributed workstation clusters as well as on traditional parallel computers. A grid computing environment is proposed and constructed on multiple Linux PC Clusters by using Globus Toolkit (GT) and SUN Grid Engine (SGE). The experimental results and performances of the bioinformatics tool using on grid system are also presented in this paper. 49) Proceedings - Fourth IEEE symposium on bioinformatics and bioengineering, BIBE 2004 Anon (Ed.) Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: The proceedings contains 73 papers from the conference on Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004. The topics discussed include: techniques for enhancing computation of DNA curvature molecules; towards automating an interventional radiological procedure; reducing the computational load of energy evaluations for protein folding; segmentation of the sylvian fissure in brain MR images; biomedical ontologies in post-genomic information systems; identifying significant genes from microarray data; good spaced seeds for homology search; and estimating seed sensitivity on homogeneous alignments. 50) SemanticObjects and biomedical informatics Kitazawa, Atsushi; Yoshimura, Masayoshi Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: The use of SemanticObjects (SO) in biomedical informatics is discussed. SO is a virtual database supporting object relational data model accommodating nested relational data. It is observed that advances in biomedical informatics will lead to a new generation of database, knowledge base, software engineering, security, user interface and operating system technologies. Bioinformatics requires intelligent algorithm be developed to solve complex biomedical problems and also new tools to assist physicians and biologists to manage and utilize the large amount of information available. 51) Automating the determination of open reading frames in genomic sequences using the web service techniques - A case study using SARS Coronavirus Chang, Paul Hsueh-Min; Soo, Von-Wun; Chen, Tai-Yu; Lai, Wei-Shen; Su, Shiun-Cheng; Huang, Yu-Ling Department of Computer Science National Tsing-Hua University, Hsinchu, 300, Taiwan Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: As more and more new genome sequences were reported nowadays, analyzing the functions of a new genome sequence becomes more and more desirable and compelling. However, the determination of the functions of a genomic sequence is not an easy task. Even with several bioinformatic tools, the task is still a labor-intensive one. This is because human experts have to intervene during the processing of using these tools. For efficiency, immediacy and reduction of human labor, a system of automating the analyzing process is proposed. We take the automated determination of Open Reading Frames of a genomic sequence as the domain tasks that involve using a number of computational tools and interpreting the results returned from the tools. A service-oriented approach is taken, in which analyzing tools are wrapped as Web services and described in Semantic Web languages including OWL and OWL-S. The SARS Coronavirus genomic sequence is taken as a test case for our approaches. We are in the process of building an agent-based system for automating the tasks, in which an intelligent agent is responsible for understanding purposes of the Web services by parsing the service descriptions, and carrying out the interpretation tasks according to a workflow. 52) Efficient filtration of sequence similarity search through singular value decomposition Aghili, S. Alireza; Sahin, Ozgur D.; Agrawal, Divyakant; El Abbadi, Amr Department of Computer Science Univ. of California Santa Barbara, Santa Barbara, CA 93106, United States Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: Similarity search in textual databases and bioinformatics has received substantial attention in the past decade. Numerous filtration and indexing techniques have been proposed to reduce the curse of dimensionality. This paper proposes a novel approach to map the problem of whole-genome sequence similarity search into an approximate vector comparison in the well-established multidimensional vector space. We propose the application of the Singular Value Decomposition (SVD) dimensionality reduction technique as a pre-processing filtration step to effectively reduce the search space and the running time of the search operation. Our empirical results on a Prokaryote and a Eukaryote DNA contig dataset, demonstrate effective filtration to prune non-relevant portions of the database with up to 2.3 times faster running time compared with q-gram approach. SVD filtration may easily be integrated as a pre-processing step for any of the well-known sequence search heuristics as BLAST, QUASAR and FastA. We analyze the precision of applying SVD filtration as a transformation-based dimensionality reduction technique, and finally discuss the imposed trade-offs. 53) An IDC-based algorithm for efficient homology filtration with guaranteed seriate coverage Lee, Hsiao Ping; Shih, Ching Hua; Tsai, Yin Te; Sheu, Tzu Fang; Tang, Chuan Yi Department of Computer Science National Tsing-Hua University, Hsinchu, Taiwan Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: The homology search within genomic databases is a fundamental and crucial work for biological knowledge discovery. With exponentially increasing sizes and accesses of databases, the filtration approach, which filters impossible homology candidates to reduce the time for homology verification, becomes more important in bioinformatics. Most of known gram-based filtration approaches, like QUASAR, in the literature have limited error tolerance and would conduct potentially higher false-positives. In this paper, we present an IDC-based lossless filtration algorithm with guaranteed seriate coverage and error tolerance for efficient homology discovery. In our method, the original work of homology extraction with requested seriate coverage and error levels is transformed to a longest increasing subsequence problem with range constraints, and an efficient algorithm is proposed for the problem in this paper. The experimental results show that the method significantly outperforms QUASAR. On some comparable sensitivity levels, our homology filter would make the discovery more than three orders of magnitude faster than that QUASAR does, and more than four orders faster than the exhaustive search. 54) ARMEDA II: Supporting genomic medicine through the integration of medical and genetic databases Garcia-Remesal, M.; Maojo, V.; Billhardt, H.; Crespo, J.; Alonso-Calvo, R.; Perez-Rey, D.; Martin, F.; Sousa, A. Biomedical Informatics Group Polytechnical University of Madrid, Madrid, Spain Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: In this paper we present ARMEDA II, a project designed to integrate distributed heterogeneous medical and genetic databases in support of genomic medicine. In this project, we have followed a "virtual repository" or VR approach. Although VRs are entities that do not contain any data, but metadata, they give users the perception of being working with local repositories that integrate data from different and remote sources. Our approach is based on two basic operators employed to connect new databases to the system: mapping and unification. The mapping process produces what is called the "virtual conceptual schema" of the newly created VR while the unification process provides tools to create an integrated virtual schema for at least two pre-existing VRs. We tested the current implementation of ARMEDA II using two tumor databases, one containing information from a hospital and the other containing genetic data associated to the tumor samples. The performance of the system was also evaluated using a pre-created set of 30 queries. For all queries the test yielded promising results since the system successfully retrieved the correct information. The ARMEDA II project is the current version of an ongoing project developed in the framework of an European Commission funded project. 55) European support to biomedical informatics development: In pursue of genomic medicine Sanz, Ferran; Diaz, Carlos; Martin-Sanchez, Fernando; Bonis, Julio Biomed. Informatics Research Group Munic. Inst. of Medical Research IMIM, Barcelona, Spain Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: Analyses of the relationships and synergy between Bioinformatics (BI) and Medical Informatics (MI) show that there is a great potential for synergy between both disciplines with a view on continuity and individualisation of healthcare, but that a collaborative effort is needed to bridge the current gap between them. Biomedical Informatics (BMI) is the emerging discipline that aims to put these two worlds together so that the discovery and creation of novel diagnostic and therapeutic methods is fostered. The INFOBIOMED network is a new approach that aims to set a durable structure for this collaborative strategy in Europe, mobilising the critical mass of resources necessary for enabling the consolidation of BMI as a crucial scientific discipline for future healthcare. The specific objectives of INFOBIOMED aim at enabling systematic progress in clinical and genetic data interoperability and integration and advancing the exchange and interfacing of methods, tools and technologies used in both MI and BI. Moreover, it intends to enable pilot applications in particular fields that demonstrate the benefits of a synergetic approach in BMI, as well as to create a robust framework for education, training and mobility of involved researchers in BMI for the creation of a solid European BMI research capacity. 56) Biomedical ontologies in post-genomic information systems Perez-Rey, D.; Maojo, V.; Garcia-Remesal, M.; Alonso-Calvo, R. Artificial Intelligence Laboratory School of Computer Science Polytechnic University of Madrid, Boadilla del Monte, 28660 Madrid, Spain Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: After the completion of the Human Genome Project, a new, post genomic era, is beginning to analyze and interpret the huge amount of genomic information. Information methods and techniques from areas such as database integration, information retrieval, knowledge discovery in databases (KDD) and decision support systems (DSS) are needed. These systems should take into account idiosyncratic differences between these two interacting fields, medicine and biology. Their correspondent medical informatics (MI) and bioinformatics (BI) should also interact and there is a need for a point to support the communication. Biomedical ontologies can be used to enhance biomedical information systems, providing a knowledge sharing framework. However, ontology tools are still in its infancy and there is a need of standards, services, automatic management tools, etc... to be able to properly apply this technology environment. Nevertheless, ontologies are just the technical framework the most important issue is the content and the use policy. 57) GeneWebEx: Gene annotation web extraction, aggregation, and updating from webbased biomolecular databanks Masseroli, Marco; Stella, Andrea; Meani, Natalia; Alcalay, Myriam; Pinciroli, Francesco Bioengineering Department Politecnico di Milano, I-20133 Milano, Italy Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: Numerous genomic annotations are currently stored in different webaccessible databanks that scientists need to mine with user-defined queries and in a batch mode to orderly integrate the diverse mined data in suitable user-customizable working environments. Unfortunately, to date, most accessible databanks can be interrogated only for a single gene or protein at a time and generally the data retrieved are available in HTML page format only. We developed GeneWebEx to effectively mine data of interest in different HTML pages of web-based databanks, and organize extracted data for further analyses. GeneWebEx utilizes user-defined templates to identify data to extract, and aggregates and structures them in a database designed to allocate the various extractions from distinct biomolecular databanks. Moreover, a template-based module enables automatic updating of extracted data. Validations performed on GeneWebEx allowed us to efficiently gather relevant annotations from various sources, and comprehensively query them to highlight significant biological characteristics. 58) Design of specie-specific primers for virus diagnosis in plants with PCR Rocha, K.; Medeiros, C.; Monteiro, M.; Goncalves, L.; Marinho, P. Univ. Federal do Rio Grande do Norte DCA-CT-UFRN, CEP. 59. 072-970, Natal, RN, Brazil Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: We propose a specialist software to diagnose viral disease in plants. Our strategy is to align nucleotide sequences of plant virus to discover specie-specific regions of genes of the viral genomes, so as to design a primer. The program designs oligonucleotide primers used for polymerase chain reaction (PCR), a very cheap diagnosis technique. The user can specify (or use default) constraints for primer and amplified product lengths, as percentage of G+C, absolute or relative melting temperatures, and primer 3 prime nucleotides. The program screens candidate primer sequences with displayed user-specifiable parameters in order to help minimizing nonspecific priming and primer secondary structure. We tested this tool by designing two specific primers which were used to amplify known viral species, then used to perform a virus diagnosis. 59) Using distributed computing platform to solve high computing and huge data processing problems in bioinformatics Chen, Shih-Nung; Tsai, Jeffrey J.P.; Huang, Chih-Wei; Chen, Rong-Ming; Lin, Raymond C.K. Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: Since the problems in bioinformatics are related to massive computing and massive data. In recent years, due to distributed computing is gaining recognition. The task originally requiring high computing power does not only rely on supercomputer. Distributed computing used off-the-shelf PC with high speed network can offer low cost and high performance computing power to handle the task. Therefore, the purpose of this paper is to implement a complete distributed computing platform based on peer-topeer file sharing technology. The platform integrated scheduling, load balancing, file sharing, maintenance of data integrity, and user-friendly interface etc. functions. Through the platform can assist bioinformaticists in massive computing and massive data problems. Besides, the platform is easier use, more reliable, and more helpful than others for researchers to conduct bioinformatics research. 60) An effective approach for constructing the phylogenetic tree on a grid-based architecture Liu, Damon Shing-Min; Wu, Che-Hao Department of Computer Science National Chung Cheng University, Chiayi, 621, Taiwan Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: In biological research, scientists often need to use the information of the species to infer the evolutionary relationship among them. The evolutionary relationships are generally represented by a labeled binary tree, called the evolutionary tree (or phylogenetic tree). Reconstructing evolutionary tree is a major research problem in biology, and this problem is often known as phylogeny problem. The difficulty of such problem is that the number of possible evolutionary trees is very large. As the number of species increases, exhaustive enumeration of all possible relationships is not feasible. The quantitative nature of species relationships therefore requires the development of more rigorous methods for tree construction. The phylogeny problem is computationally intensive, thus it is suitable for distributed computing environment. Grid Computing (or Computational Grid) is a new concept to integrate the CPU power, the storage and other resources via Internet in order to get overall computing power. Nowadays, many bioinformaticists are developing the BioGrid technology in order to solve the challenges that need intensive computing in biology. In this paper, we design and develop a Gridbased system, and propose an efficient method based on the concept of quartet for solving the phylogeny problem on this architecture. 61) Towards Ubiquitous Bio-Information Computing: Data protocols, middleware, and web services for heterogeneous biological information integration and retrieval Hong, Pengyu; Zhong, Sheng; Wong, Wing H. Conference: Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 , Taichung, Taiwan , 20040519-20040521 , (Sponsor: IEEE Computer Society; IEEE Neural Networks Society; Taichung Healthcare and Management University, Taiwan; Ministry of Education, Taiwan; National Sciences Council, Taiwan; Institute for Information Industry, Taiwan) Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 Proceedings - Fourth IEEE Symposium on Bioinformatics and Bioengineering, BIBE 2004 2004. , 2004 Language: English Abstract: Biological information computing is rapidly advancing from homogeneous data computation to large-scale heterogeneous data computation. However, the development of data specification protocols, software middleware, and Web services, which support large-scale heterogeneous data exchange, integration, and computation, generally falls behind data expansion rates and bioinformatics demands. The Ubiquitous Bio-Information Computing (UBIC**2) project aims to disseminate software packages to assist the development of heterogeneous bio-information computing applications that are interoperable and may run distributedly. UBIC**2 lays down the software architecture for integrating, retrieving, and manipulating heterogeneous biological information so that data behave like being stored in a unified database. UBIC**2 programming library implements the software architecture and provides application programming interfaces (APIs) to facilitate the development of heterogeneous bio-information computing applications. To achieve interoperability, UBIC**2 Web services use XML-based data communication means, which allow distributed applications to consume heterogeneous bio-information regardless of platforms. The documents and software package of UBIC**2 are available at http://www.ubic2.org. Konec rešerše
© Copyright 2026 Paperzz