Schema for Assembly - Assembly from Fragments
  Database: hg38    Primary Table: gold    Row Count: 45,130
Format description: How to get through chromosome based on fragments
fieldexampleSQL type info description
bin 585smallint(6) range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 10000int(10) unsigned range start position in chromosome
chromEnd 10615int(10) unsigned range end position in chromosome
ix 2int(11) range ix of this fragment (useless)
type Fchar(1) values (W)GS contig, (P)redraft, (D)raft, (F)inished or (O)ther
frag AP006221.1varchar(255) values which fragment
fragStart 36116int(10) unsigned range start position in frag
fragEnd 36731int(10) unsigned range end position in frag
strand -char(1) values + or - (orientation of fragment)

Sample Rows
 
binchromchromStartchromEndixtypefragfragStartfragEndstrand
585chr110000106152FAP006221.13611636731-
73chr1106151774173FAL627309.15102166904+
586chr11774172076664FFO538757.3200032249+
73chr12576662979686FAP006222.1040302+
73chr13479685016178FAL732372.150153649-
73chr15016175359889PFO681485.2034371-
73chr158598869753711FAC114498.20111549+
73chr169753783505012FAL669831.130137513+
591chr183505083533313OKF495845.10283-
591chr183533387197714FAL669831.13137796174440+

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Assembly (gold) Track Description
 

Description

This track shows the contigs used to construct the GRCh38 (hg38) genome assembly, as defined in the AGP file delivered with the sequence. For information on the AGP file format, see the NCBI AGP Specification. The NCBI website also provides an overview of genome assembly procedures, as well as specific information about the hg38 assembly.

In dense mode, this track depicts the contigs that make up the currently viewed scaffold. Contig boundaries are distinguished by the use of alternating gold and brown coloration. Where gaps exist between contigs, spaces are shown between the gold and brown blocks. The relative order and orientation of the contigs within a scaffold is always known; therefore, a line is drawn in the graphical display to bridge the blocks.

Component types found in this track (with counts of that type in parenthesis):

  • F - finished sequence (35,798)
  • O - other sequence (8,536)
  • W - whole genome shotgun (764)
  • P - pre draft (16)
  • D - draft sequence (8)
  • A - active finishing (8)

In addition to the standard nucleotide codes, the raw sequence files from NCBI also include IUPAC ambiguity codes for bases that could not be positively identified as A, C, G or T (see Wikipedia's IUPAC notation article for more information). As part of the UCSC assembly creation process, all IUPAC ambiguity characters are converted to Ns. The FASTA files available for download from UCSC reflect this. The raw data files containing the original IUPAC characters can be downloaded from the NCBI FTP site.

The following table lists the counts by chromosome of the various IUPAC ambiguity characters in the original NCBI data files:

chromosome
1 2 3 6 7 9 10 12 13 16 17 21 22 X Y Total
code
B 1 1 2
K 1 4 1 2 8
M 1 1 3 1 2 8
R 1 1 1 1 1 13 1 3 1 2 1 1 27
S 1 1 1 1 1 5
W 2 2 6 1 1 1 1 14
Y 4 3 1 2 2 8 2 2 5 2 2 2 35
Total 2 9 7 1 4 3 36 3 3 1 12 3 5 5 5 99