VBASE2 - the integrative germ-line V gene database

Welcome to VBASE2!

VBASE2 - Documentation

TOPICS

1. Outline
2. Help
3. Frequently Asked Questions (FAQ)
4. Links
5. References

! If you don't find the information you are looking for, please send an email to
info@vbase2.org.

1. Outline

The Service of VBASE2
The Generation of VBASE2
The Philosophy of VBASE2

The Service of VBASE2

VBASE2 is an integrative database of germ-line V genes from the immunoglobulin loci of human and mouse. It presents V gene sequences extracted from the EMBL nucleotide sequence database and Ensembl together with links to the respective source sequences. Based on the properties of the source sequences, V genes are classified into 3 different classes:

class 1	genomic and rearranged evidence
class 2	genomic evidence only
class 3	rearranged evidence only

This allows careful sequence quality validation by the user.

References to other immunological databases ( KABAT, IMGT/LIGM and VBASE ) are given to provide all public annotation data for each V gene.

The VBASE2 database can be accessed either by the Direct Query interface or by the DNAPLOT Query interface. The Sequences given by the user are aligned with DNAPLOT against the VBASE2 database. Direct Query allows to enter sequence IDs and names (Field 1), choose species, locus, V gene family and class (Field 2) or search for 100% sequences (Field 3). At the DNAPLOT Query, the sequences given by the user are aligned with DNAPLOT against the VBASE2 database. The DNAPLOT program offers V gene nucleotide sequence alignment referring to the IMGT V gene unique numbering.

The Quick Search can be used either for Direct Query to search for sequence IDs and V gene names or for DNAPLOT Query for up to 5 sequences.

The new Fab Analysis allows you to align Fab, scFab, scAb or scFv sequences with DNAPLOT against the VBASE2 database, where both heavy and light chain are analysed.

Go to Outline Back

The Generation of VBASE2

The VBASE2 dataset is generated in an automatic process based on a BLAST search of V genes against the source nucleotide databases ( Ensembl, EMBL-Bank, including Whole Genome Shotgun (WGS) and High Throughput Genomic (HTG) sequences). The sequences of all relevant BLAST hits are aligned against master sequences, compared and sorted with the DNAPLOT program. V(D)J rearrangements and RSS elements are automatically detected.

The resulting germ-line V gene sequences are assigned to V gene families; the V gene family nomenclature in VBASE2 refers to the gene nomenclature of HUGO (human) and MGI (mouse).

Furthermore, the V gene sequences are compared to the VBASE-, KABAT-, and IMGT/LIGM database and to a set of germ-line V gene sequences. V gene trivial names, assigned by their discoverers, are added if any known.

Go to Outline

The Philosophy of VBASE2

VBASE2 proceeds the VBASE sequence database regarding the aim to present germ-line sequences only and to sort all V(D)J rearrangements by its germ-line V genes. However, VBASE2 is not manually annotated and is not limited to human V genes. It combines entries from several databases to offer an integrative access to the V gene sequence and annotation. The sequence quality evaluation within VBASE2 is solely based on the available sequence information.

Go to Outline

Go to Topics

2. Help

Used Sequence Formats (FASTA and RAW)
Example Sequences

Used Sequence Formats (FASTA and RAW)

FASTA Format

According to NCBI 'a sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column'. An example is shown below:

> musIGHV057 294 bp 
CAGGTCCAACTGCAGCAGCCTGGGGCTGAGCTTGTGAAGCCTGGGGCTTCAGTGAAGCTG
TCCTGCAAGGCTTCTGGCTACACCTTCACCAGCTACTGGATGCACTGGGTGAAGCAGAGG
CCTGGACGAGGCCTTGAGTGGATTGGAAGGATTGATCCTAATAGTGGTGGTACTAAGTAC
AATGAGAAGTTCAAGAGCAAGGCCACACTGACTGTAGACAAACCCTCCAGCACAGCCTAC
ATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCGGTCTATTATTGTGCAAGA

Go to Help Back

RAW Format

A single sequence in RAW format includes one or more lines of sequence data without any description. An example is shown below:

CAGGTCCAACTGCAGCAGCCTGGGGCTGAGCTTGTGAAGCCTGGGGCTTCAGTGA
AGCTGTCCTGCAAGGCTTCTGGCTACACCTTCACCAGCTACTGGATGCACTG
GGTGAAGCAGAGGCCTGGACGAGGCCTTGAGTGGATTGGAAGGATTGATCCTAATAGTGGTGGTACTAAGTACAATGAGAAGTTCAAGAGCAAGGCCACACTGACTGTAGACAAACCCTCCAGCACAGCCTACATGCAGCTCAGCAGCCTGACATCTGAGGACTCTGCGGTCTATTATTGTGCAAGA

Go to Help Back

Example Sequences

Single Fab / scFab / scAb / scFv Sequences

These sequences contain both heavy and light chain. An example is shown below:

> Example
CAGGTGCAGCTGCAGCAGTGGGGGGGAGGCGTGGTCCAGCCTGGGAGGTCCCTGAGACTCTCCTGTGCAGCCTCTGGATTCACCTTCAGTAGCTATGCTATGCACTGGGTCCGCCAGGCTCCAGGCAAGGGGCTGGAGTGGGTGGCAGTTATATCATATGATGGAAGCAATAAATACTACGCAGACTCCGTGAAGGG
CCGATTCACCATCTCCAGAGACAATTCCAAGAACACGCTGTATCTGCAAATGAACAGCCTGAGAGCTGAGGACACGGCTGTGTATTACTGTGCGAGAGATCGTTACGGTATGGACGTCTGGGGCCAAGGGACCACGGTCACCGTCTCCTCAGGGAGTGCATCCGCCCCAAAGCTTGAAGAAGGTGAATTTTCAGAAG
CACGCGTACTGCCTGTGCTGACTCAGCCCCCCTCAGCGTCTGGGACCCCCGGGCAGAGGGTCACCATCTCTTGTTCTGGAAGCAGCTCCAACATCGGAAGTAATACTGTAAACTGGTACCAGCAGCTCCCAGGAACGGCCCCCAAACTCCTCATCTATAGTAATAATCAGCGGCCCTCAGCGGTCCCTGACCGATTC
TCTGGCTCCAAGTCTGGCACCTCAGCCTCCCTGGCCATCAGTGGGCTCCGGTCCGAGGATGAGGGTGATTATTACTGTGCAGCATGGGATGACAGCCTGAATGGTGTGGTATTCGGCGGAGGGACCAAGCTGACCGTCCTAG

! Test this sequence with Fab Analysis

Go to Help Back

Go to Topics

3. Frequently Asked Questions (FAQ)

Direct Query
DNAPLOT Query
V gene data
VBASE2 DAS Server (temporary out of service)
Download
Internet Browser

Direct Query

Why does VBASE2 say 'no entries found' although the sequence I pasted is holded in the database?

There may be several reasons why your sequence was not found:
A. Your sequence is longer than the VBASE2 entry. Please restrict your sequence to FR1-FR3 or use the DNAPLOT query.
B. The sequence format is not recognized. Please change it to plain sequence without line-breaks, FASTA format or use the DNAPLOT query.
C. There are one or more missmatches in the alignment of your sequence with the VBASE2 sequence. Please use the DNAPLOT query to find the corresponding VBASE2 sequence.

I have entered a V gene name in Field 1 and I have choosen 'human' in Field 2. Why do I retrieve a murine sequence as a result?

Sorry, but the Query Fields 1 - 3 must be used for seperate queries and there is no way to combine them. The query will always be performed with the values from the field where you have clicked on the query button.

Go to FAQ

DNAPLOT Query

Why do I get a useless DNAPLOT result without any sequence in it?

There may be several reasons why the DNAPLOT program has not recognized your sequence:
A. DNAPLOT may not 'find' your V gene when it has a long 5'-extension (for example, when you use the genomic sequence starting with exon1 and intron). Please shorten your sequence at the 5'-end and try again.
B.There is no V gene in your sequence - please try another one!

Go to FAQ

V gene data

What means 'Unpublished' as a source reference for genomic sequence?

The genomic sequence of the heavy chain locus of the 129/Sv mouse strain is currently worked out at the GBF in the lab of Helmut Bloecker. The sequence is not yet, but will be published soon in EMBL-Bank/GenBank/DDBJ. As the sequence coverage in the assembly is 8 - 10 fold, the sequence quality is very high.

The last few nucleotides of a VBASE2 V gene are missing, although they can be found in the source reference. Why that?

This is a limitation of the automatic procedure. As the V genes may vary in length, the procedure may miss some nucleotides at the end of an exeptional long V gene. However, all efforts are done to overcome this limitation in the next version of the generation procedure.

Why there are more IMGT/LIGM references for a V gene than EMBL-Bank source references?

The source references from the EMBL-Bank are selected by the VBASE2 generation procedure. As this procedure is completely automated it needs to be very strict and requires the proper recognition of J elements and RSS sites. This limits the number of EMBL-Bank source references, whereas all IMGT/LIGM sequences with a 100% match to the V gene sequence are shown.

What means family 'na'?

'na' means 'not assigned'. Per definition, variable gene sequences that show a sequence identity of at least 80% belong to the same V gene family. The VBASE2 generation procedure aligns each V gene against a set of well established and manually edited family master (consensus) sequences. Some V genes which differ substantially from the master sequences can not be aligned properly. Thus, they cannot be assigned to a V gene family. The annotation 'na' is an evidence that this V gene is special. Many 'na' are pseudo genes or orphans.

Why there are different nomenclatures for mouse and human V gene families?

The V gene nomenclature used in the VBASE2 database refers to the respective recommendations of the genome nomenclature committees HUGO (human) and MGI (mouse).

Go to FAQ

VBASE2 DAS Server (temporary out of service)

How can I show the VBASE2 entries within the Ensembl Genome Browser?

You need to attach the human and mouse DAS sources from the DAS server at http://www.dnaplot.com/das by performing the following steps:
Go to the Ensembl website and choose a species and chromosomal position, e.g. human chromosome 14, 105500000-105900000. Scroll to the DetailedView frame. Go to the pull-down menu 'DAS Sources' and select 'Manage sources ...'. Click on 'Add DSN' and next to the field 'domain' on 'more'. Type 'www.dnaplot.com/das' into the domain field and Click on 'DSN list'. Select 'vbase2_human' to show the human VBASE2 V genes in your browser (choose 'vbase2_mouse' for depicting the mouse dataset, respectively). Click on 'next', 'next' and 'finish'. After reloading your browser the Ensembl site shows the VBASE2 V gene entries within the selected region.

Go to FAQ

Download

It seems that some VBASE2 IDs are missing in the VBASE2 V gene sequence file I downloaded. Why that?

There is no serial order of IDs. The only impact of the VBASE2 ID is to provide a unique and stable identifier for a unique V gene sequence within the database.
A VBASE2 ID (example: "musIGHV057") consist of three parts: The first two parts give the species (lower case) and the locus (upper case) of the V gene, in the example: "musIGHV". The third part is a 3-digit number (in the example: "057"). This number was selected randomly, it has no systematic background. However, some incidental concordance between the order of genes on the chromosome and the ID numbers might occur.

Go to FAQ

Internet Browser

Which internet browser should I use?

All PHP sites are W3C conform and should work on all browsers. Nevertheless, they were optimized for the Firefox browser.

Go to FAQ

Go to Topics

4. Links

Primary data resources

Other immunological databases

Tools

DAS Server (for use with Ensembl) (temporary out of service)

The VBASE2 DAS server is located at http://www.dnaplot.com/das/

Go to Topics

5. References

If you publish results obtained using VBASE2, please cite:
Retter I, Althaus HH, Münch R, Müller W: VBASE2, an integrative V gene database. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D671-4.
If you publish results obtained using DNAPLOT, please cite:
Mollova S, Retter I, Müller W: Visualising the immune repertoire. BMC Systems Biology 2007, 1(Suppl 1):P30.

Other References

Retter I: Generation of a dynamic germ-line V gene database and in-silico characterisation of the immunoglobulin heavy chain locus of the mouse. Ph.D. Thesis, 2006
Retter I, Chevillard C et al.: Sequence and Characterization of the Ig Heavy Chain Constant and Partial Variable Region of the Mouse Strain 129S1. The Journal of Immunology 2007, 179: 2419-2427.

Go to Topics