Database overview¶
Gw files¶
Genealogy database could be created by GeneWeb from one or from multiple source files with .gw extension. Those files describe structurally persons, families all kinds of relationships, different events, etc. You can read more about the file structure here. Binary executable gwc
reads files .gw, extracts all persons and families information and passes it to the Gwdb module in order to create the database.
Database entries¶
Transmitted to Gwdb information is composed mainly from:
Array of all strings that could be any kind of information encoded as a string, like for example: person’s name, birth place, marriage place, etc. Identifier
istr
allows to reference the string in the given array (index of an element inside the array).Array of persons where each element encompasses information about one person. Every string field of a person (like his name, birthplace, etc.) is an identifier where the real string is stored in array mentioned before. Reference to other persons by means of identifier
iper
that reference person in the current array (index of an element inside the persons array).Array of families where each element encompasses information about one family (couple, children, marriage date, etc.). Identifier
ifam
allows to reference the family in the given array (index of element inside the array).
Each array keeps a data structure defined in the module Def. Further, those entries will be the main source for every database request.
Storage¶
Gwdb is responsible for creating the database on the disk from the provided inputs. It creates a directory dbname.gwb
containing several
files. The main file base
contains marshalled representation of each array and base.acc
stores offsets to every entry entry that allows to make constant time access. Additionally, it creates some index files that associate useful for requests information to the entry’s identifier in the base
file. That helps to requests to find instantly entry without iteration over all existing ones in the database. For example strings.inx
is a string index that allows to find id for a searched string. One file is slightly different: the patches
file. It stores every modification done inside the base (see Modifications subsection). The storage manipulation interface is described in lib/gwdb_driver.mli/gwdb_driver.mli
. This is a virtual module whose
current implementation is available on gwdb-legacy
. Format and description for every database file is listed below:
base - the base itself
magic number (magic_gwb) : string of length 8
number of persons : binary_int
number of families : binary_int
number of strings : binary_int
persons array offset in file : binary_int
ascends array offset in file : binary_int
unions array offset in file : binary_int
families array offset in file : binary_int
couples array offset in file : binary_int
descends array offset in file : binary_int
strings array offset in file : binary_int
notes origin file : value
persons array : value
ascends array : value
unions array : value
families array : value
couples array : value
descends array : value
strings array : value
base.acc - direct accesses to arrays inside base
persons offsets : array of binary_ints
ascends offsets : array of binary_ints
unions offsets : array of binary_ints
families offsets : array of binary_ints
couples offsets : array of binary_ints
descends offsets : array of binary_ints
strings offsets : array of binary_ints
names.inx - index for names, strings of first names and surnames
offset to sindex : binary_int
offset to findex : binary_int
1st index (mixes between names) : value
array, length = 16383, associating:
- a hash value of a "crushed" (module "Name") name
(modulo length)
- to the array of ids of the corresponding persons
2nd index (surnames sub-strings) : value
array, length = "table_size", associating:
- a hash value of the "crushed" (module "Name") surname
sub-string (modulo length)
- to the array of the corresponding surnnames (string ids)
that contain giving surname sub-string
3rd index (first name sub-strings) : value
array, length = 16383, associating:
- a hash value of the "crushed" (module "Name") first name
sub-string (modulo length)
- to the array of the corresponding string ids that contains
giving first name sub-string
names.acc - direct accesses to values inside arrays in names.inx
strings.inx - index for all strings
length of the strings offset array : binary_int
strings hash table index : 2 arrays of binary_ints
strings offset array (length = prime after 10 * strings
array length)
- associating a hash value of the string modulo length
- to its id in the string array
strings list array (length = string array length)
- associating a string id
- to the id of the next index (previous value) holding the
same hash value
snames.inx - index for surnames
array ordered by surname
- associating the string id of a surname
- to a pointer (offset) inside snames.dat
snames.dat - data associated with snames.inx
array of list of persons holding a surname
fnames.inx - index for first names
array ordered by first name
- associating the string id of a first name
- to a pointer (offset) inside fnames.dat
fnames.dat - data associated with fnames.inx
array of list of persons holding a first name
notes - text file containing data base notes.
notes_d - directory containing .txt for each extended page
particles.txt - text file with autorised name's particles
patches - modification inside the database
When updated, none of the previous files are modified.
Only this one is written and rewritten. It holds a record
of type "patches", composed of association lists
"index" - "new value".
nb_persons - number of real persons (with those added by patches)
synchro_patches - timestamped history of base's modifications.
restrict - defines visibility of each person in the base
Modifications¶
When a modification is requested, geneweb does not update base
file itself. It
completes the patches
file containing all the latest modifications on the
base. Every modification (patch) done is pended until patches are committed with commit_patches
request.
Commit performs update of the patches
file.
Patching signifies only operations that add or modify an entry. Entry suppression is done quite differently.
It is replaced by a dummy entry and then removed by GeneWeb’s garbage collector gwgc
that performs compaction
of database arrays. Another useful fixbase
tool, locates and fixes inconsistencies on the base and updates all database files.
Example¶
Here is an example how GeneWeb displays birth dates of persons that have given name (let’s say “Pierre”) without considering caches:
Firstly, it makes dichotomous search inside
fnames.inx
of a string id (istr
) that references “Pierre”Then it reads (with associated to “Pierre” offset from
fnames.inx
) position in the filefnames.data
where list of ids of persons (iper
) with first name “Pierre” are stored.For every person’s id it gets person’s entry offset from
base.acc
fileThen it reads person’s entry with giving offset and get field associated to the birth date.
Displays all extracted birth dates.