The Xdom 2.0 documentation
- What is xdom ?
- Running xdom
- The Project menu
- The query menu
- The view menu
- The popup menus
- Citing the xdom2 program
What is xdom ?
Xdom is an X-window application which may visualize the output of the script
mkdom2, thus helping you to achieve a systematic analysis of the modular organisation of any set of protein sequences.
With this program, you may execute different queries to retrieve all or part of your proteins. The tool displays the proteins as a set of boxes; each box represents a domain, Those boxes are clickable, leading to a popup menu, from which you may start a new query, or display the whole family this domain belongs to.
The protein name is also clickable, which gives you the opportunity to execute still other queries, or to display the protein sequence.
You may also print part or all of your query.
You may run
xdomwith the following commands:xdom &xdom project.prj &
The second form starts
xdomand opens a new project file.
The Project menu
For full functionality,
xdomneeds 3 files, with the extensions
srs, xdom, fasta:
xdomfile describes the domains decomposition of the initial proteins set.It is the only required file.
srsfile describes the family of domains. It is possible, with a click on the box representing the domain, to display the family this domain belongs to. Should the
srsfile missing, this functionality would be missing too.
fastafile is the proteins data set. It is possible, with a click on the protein name, to display the corresponding protein entry of this file. Should the
fastafile missing, this functionality would be missing too.
As it is fastidious to open 3 files everytime xdom is launched, we save in a file, with extension
.prjthe names of those 3 files. This file is called a project file.
Create a New Project
Use this option to create a new project. Only the field labelled
Xdom Fileis required, however it is obviously better filling also the
Project Namefield, as the windows of the application are entitled with this name. You must also fill the other two fields to benefit from all the functionalities.
Save a Project
Use this option to save the project (ie the name and up to 3 data files, as previously discussed) inside a new project file.
Open a project
Use this option to open a project previously saved. It is also possible to start the application with the project file specified on the command line, in which case the project is automatically opened. Besides, the mkdom2pp.pl script creates a project file, which can be opened by the
Open a new window
Use this option to open a new window, which will use the same data as the first window. This is convenient to display simultaneously different parts of the file.
The query menu
You may ask the program for some queries, in order to display all or part of the protein set.
During an Xdom session, you'll ask for several queries one after. The queries are stored in a stack, so that you may
redothem as necessary during the session.
Use this option to display all the proteins. When a project is opened,
xdomexecutes this query. However, it is sometimes convenient executing again this request during a work session.
Use this option to ask for a query, giving a protein name as parameter. You have several ways for selecting a protein to query for:
- If you known its name, type its name inside the
- If you don't know exactly its name, type only some characters in the
Filterfield, then press the button
Filter: only the matching proteins are then displayed in the scrolled list, so that it is much more easier to select the protein you are interested in.
- You may also just scroll the list, then click on your protein when it appears in the list. This can be a rather fastidious process, however, when there are many proteins in the set.
You may also use one the the radio buttons labelled
View optionto choose between several options:
- display only the selected protein.
- display the selected protein and all the proteins which have the same architecture as the selected protein, ie the same composition in domains: this query may result in several proteins returned..
- display the selected protein and all the proteins which share a domain with the selected protein. This query may result in still more proteins returned.
Use this option to ask for a query, giving one or several domain names as parameters. You may fill 3 fields for building the query:
AND: write here the domain identifiers you want to be present in the retrieved proteins. If you type several domain identifiers (separated by a space), the identifiers are connected by a logical
OR: write here the domain identifiers you want to be present in the retrieved proteins. If you type several domain identifiers (separated by a space), the identifiers are connected by a logical
NOT: write here the domain identifiers you do not want to be present in the returned proteins.
- If you fill the
ANDfield together with the
NOTfields, the returned proteins will show:
- all the domains from the
- or at least one of the domains from the
- but no domain from
NOTfield: the query illustrated by the figure will return proteins composed with domains 2, 5, 18 (the three domains), or with domain 10, but without the domains 138, 136, 135.
An alternative way of selecting a domain
A click with the right button of the mouse when the cursor is on a domain box will open a popup menu with the option of displaying all proteins which include domain 34. This is quite equivalent to click the Query/Select Domains option, and fill only the
ANDfield, querying only domain nb. 34.
The view menu
With this option, you can select a protein to find among the displayed proteins: you may type the name of the protein, or select its name in the list (this list shows only the names of the displayed proteins), or even type a part of its name, then click the FILTER button to restrict the listes protein names. When found, the protein name is displayed in blue (next figure).
If you just found a protein, and you lost it because you scrolled the window or because you executed another request, you may find it again with this option. Should the protein not returned by the current request, an error message will be produced. If no protein was selected using the Find option, the Find again option is inactive.
This options makes inactive the find again button, thus resetting the Find protein action.
Graphical or textual representations
A protein domain is always represented as a clickable rectangular box; however, the box may be labelled with different ways:
- Write the number of the domain family inside the box
- Draw a black and white motif inside the box, there is one motif per family, we are thus limited to 59 different representations.
- Draw a 2 colored motif inside the box, using one (motif, foreground color, background color) per family, with 16 different colors; we are thus limited to 59 x 15 x 16 = 14160 different representations.
- Draw a 2 colored motif inside the box, using one (motif, foreground color, background color) per family, with 32 different colors; we are thus limited to 59 x 31 x 32 = 58528 different representations.
- Draw a 2 colored motif inside the box, using one (motif, foreground color, background color) per family, with 64 different colors; we are thus limited to 59 x 63 x 64 = 237888 different representations.
The last representation should be in theory the best one, however it is sometimes difficult to distinguish between 2 neibouring colors, when there are as more as 64 color levels. This could lead to some interpretation errors, so that it is sometimes convenient switching between those many representations.
An alternative way of selecting a color mode
The color mode may be selected from the command line, as follows:xdom --color 64
Sort by alphabetical order
Use this option to alphabetically sort the displayed proteins, using their names as a sort key.
Sort by architecture
Use this option to sort the displayed proteins, using their architecture as a sort key. This is a very convenient option, as it puts together the proteins which share the same architecture.
What is an architecture ?
The architecture is defined as the domains composition of the protein. For example, if a protein is composed with domains 76, 34,76, 100, 76 the architecture is the
The sort algorithm
For sorting the proteins by architecture, the following algorithm is executed:
- For each protein, compute a key, which is the architecture as defined above but removing any repeatition: in the example above, the key should be
34-76-100. Please note that many proteins
- Sort the keys, from the lowest family number to the highest. In case of equality, the keys with many families are considered first. the family ID is related to the number of domains present in the family. Thus, this sort will produce the highest families first, and the keys with many families first (they are considered as the most significant ones in a biological point of view).
- For each key, split the set of proteins in different subsets, using the architecture (ie the
When the displayed proteins are sorted by architecture, on can switch between those two options:
simplified output: for each architecture, display only one protein, which may be considered as representing the architecture.
complete output: display each protein, still sorted by architecture.
The popup menus
You may call several functionalities, clicking with the left-button or the right-button of the mouse, when the pointer is located on two sorts of areas:
- Some protein name
- Some domain box
The protein popup menus
This menu is displayed when you click a protein name with the right button.
Only ...Use this option to display only the protein whose name you are clicking.
Proteins sharing architectureUse this option to display the proteins which share the same architecture as the protein whose name you are clicking. Generally more than 1 protein is displayed by this option.
Proteins sharing a domainUse this option to display the proteins which share at least one domain with the protein whose name you are clicking. Still more proteins than with previous option should be generally displayed.
Visualisation of the protein sequenceUse this option to display the protein sequence in a popup window.This option is also executed when you click the protein name with the left button
The domain popup menus
This menu is displayed when you click a protein name with the right button.
proteins with domain...Use this option to display all the proteins which contain the domain you are clicking.
Visualization of the familyUse this option to display the family the clicked domain belongs to.This option is also executed when you click the domain box with the left button
Citing the xdom2 program
Should you use this program for a publication, please cite the following reference: Gouzy J., Eugene P., Greene E.A., Kahn D., Corpet F. (1997).\n XDOM, a graphical tool to analyse domain arrangements in protein families., Comp. Appl. BioSci. The release 2 of xdom is a complete rewrite of the software, but the main functionalities, as described in this paper, were kept unchanged.
The authors of the xdom2 program
The release 2 of
xdomwas written by:
- Glawdys Zielinski
- Yoann Beausse
- Emmanuel Courcelle
at L.I.P.M. (I.N.R.A.-C.N.R.S.), in the research group managed by Daniel Kahn. This work was partially founded by Genopole