Accessing Heurist Data via SQL

Heurist is underpinned by a clearly defined SQL data model implemented in MySQL. Because the structure of every Heurist database is identical and the data model is simple and highly normalised, it is extremely easy to bypass the web interface and manipulate the underlying data directly with SQL statements. The underlying data includes the semantic structure of the Heurist database, making the database fully self-documenting.

 

Please note that this page is a work in progress and, to some degree, internal notes on useful SQL and scripts, which we expect to complete rather slowly unless it becomes a higher priority. Probable completion 2015.

Use of codes for record types, fields and terms

Data in a Heurist database is never referenced by name or label, always by internal ID, allowing record type names, field names and terms to be edited at will without affecting existing data. In order to manipulate data in the database you therefore need to know the internal code values used by your database.

Note: Internal IDs for the same record type/field type/term will vary between databases (some essential items have fixed low-number codes which are established when the database is created). However, all items have an origin database ID and origin database code which together provide a unique identifier or ‘concept ID’ across the whole Heurist network, allowing for later cross-database search and comparison.

Database Tables

The following section describes some of the more important fields in the Heurist database tables.

Record types

defRecTypes table: Records are stored in the Records data table. rec_RecTypeID identifies the type of each record

rty_ID Code identifying the record type (max 65000)
rty_Name The name of the record (object) type (max 63 char)

Record details (fields)

defDetailTypes table: Fields are stored as key-value pairs in the recDetails data table. dtl_RecID specifies the key. The use of the term ‘detail’ is a historical legacy.

dty_ID Code identifying the field/attribute/detail
dty_Name The name of the field/attribute/detail
dty_Type The type of data stored in this field/attribute

Field data types:

enum, relationtype Term ID (relationtype is specialised version for relationships)
resource Record ID (record pointer value)
freetext, blocktext single and multi line (memo)
integer, float: numeric values
date date/time value (may include fuzzy date descriptors)
file FK > recUploadedFiles (references local or remote file)
geo geographic object (geometry stored as WKT)

Terms

defTerms table: Terms are organised in a hierarchical tree. Term IDs are used as values in the key-value pairsTerms

trm_ID Code identifying the term
trm_Label Human readable term
trm_Code Standardised codes eg. ISO
trm_Domain enum indicates a normal term, relation indicates relationship type terms used in relationship records
trm_ParentTermID ID of the parent term in the hierarchy. 0 or NULL indicates a vocabulary (top level) – vocabularies can never be used as values in a record
Optional code eg. alphanumeric code which may be required for import or export

 

Useful SQL Queries

Count usage of detail type by record type

This query simply reports the usage of each detail type for each record type. It does not consult the record type structures, so it cannot list detail types in a record type which have not been used, or detail types which should not occur for a given record type. The database health check function in Heurist identifies these additional conditions.

select rec_RecTypeID, rty_Name, dtl_DetailTypeID, dty_Name, count(*)
from Records, recDetails, defRecTypes, defDetailTypes
where (Records.rec_ID = recDetails.dtl_RecID) and
(recDetails.dtl_DetailTypeID=defDetailTypes.dty_ID) and
(Records.rec_RecTypeID =  defRecTypes.rty_ID)
group by rec_RecTypeID, dtl_DetailTypeID
order by rec_RecTypeID, dtl_DetailTypeID;

Insert records

For example, to insert a new (blank) record, we can simply run the following SQL:

 INSERT INTO Records (rec_Type, rec_UGrpID, l………….) VALUES (…………………..)

This record will be incomplete as it contains no values and no composite record title is created, so it is necessary to know a few fairly simple rules in order to insert correct data. Heurist also has some rebuild and data verification functions which can come to the rescue.

Insert fields

To insert field values into a record, one simply uses the field codes

 insert into recDetails (dtl_RecID, dtl_dtyID, dtl_Value) VALUES ( …………)

This can also be used to insert data from a temporary table into which some existing data has been loaded, for example (the Heurist ID of existing records is in hid, 134 is the field into which the data is being inserted):

insert into recDetails (dtl_RecID,dtl_DetailTypeID,dtl_Value) select hid,134,entry from tmp_pdf_texts where `hid` IS NOT NULL

It can be useful to use Heurist’s report formatter to format an output file which has the Heurist IDs in it, which can then be edited with additional data to create a file for import using this method (or via the Heurist import functions).

Delete records and fields

Records are easily deleted, and Heurist’s cascading referential integrity takes care of deleting associated fields, bookmarks and tags. For instance, to delete a specific record and all its values:

delete from Records where rec_ID = 2785

or to delete all records and fields with a value under 10 in field 37 (we will return to internal codes in a moment)

delete from Records, recDetails
where (rec_ID = dtl_RecID) AND (dtl_dtyID = 37) AND (dtl_Value < 10)

Modify field values

In the same way, field values can be systematically modified using:

update recDetails set dtl_Value = ???? where <condition>

Locating records with specific detail values

SELECT r.rec_ID, d.dtl_Value FROM Records r left join recDetails d
on rec_Id=dtl_RecID and dtl_DetailTypeID=5
where rec_RecTypeId=1 and (d.dtl_Value is null or d.dtl_Value=”);

Record Type 1 is Relationships, Detail type 5 is the Relationship Type, this searches for null and blank values in this detail type

Finding ‘floating’ records (not linked from another record)

This query looks for multimedia records (type 5) not linked from another record. Omit final Where restriction ot RecTypeId=5 for any records which are unlinked.

SELECT r.rec_ID, r.rec_Title, d.dtl_DetailTypeID FROM Records r left join recDetails d on r.rec_ID=d.dtl_Value and d.dtl_DetailTypeID in (select dty_ID from defDetailTypes where dty_Type=’resource’) where rec_RecTypeID=5 and d.dtl_Value is NULL;

Obtaining user and group IDs

Changing or deleting field values is even easier <to do>

Adding pointers from a record to its child records

In some cases, child records – such as events in a person’s life – may be imported into the database using the CSV / TSV importer. These child events will contain a pointer to the parent (Person) record, normally set by matching the name(s) of the parent, eg. person name(s), place name, reference ID or other unique identifier of the parent. You then wish to set record pointer fields in the parent records pointing to the child records. The following query shows how to do this:

Eventlet records are code 24 and contain a pointer field (code 16) back to their parent (Person). Person records contain a record pointer field (code 79) which points to eventlets belonging to that person.

insert into recDetails (dtl_RecID,dtl_DetailTypeID,dtl_Value,dtl_ValShortened)
select recDetails.dtl_Value,79,rec_ID,rec_ID
— record ID of person to be updated, eventlet record pointer code, eventlet record ID x 2
from Records,recDetails
where (Records.rec_ID=recDetails.dtl_RecID) — detail belongs to the record
and (Records.rec_RecTypeID=24) — Record is an eventlet record
and (recDetails.dtl_DetailTypeID=16);
— detail is a record pointer to the parent person of the eventlet record

Check for duplicate records (on multiple fields)

SELECT firstnames, surname,COUNT(*) AS Cnt
FROM temp
GROUPBY SURNAME, FIRSTNAMES
HAVING Cnt >1

Average Active Users

This procedure can be modified for other metrics. Run the following scripts:

get_metrics_for_all.sql

rm metrics.csv
for db in `echo “select schema_name from information_schema.schemata where schema_name like ‘hdb_%’ “| mysql -u root -psmith18`; do mysql –skip-column-names -u root -psmith18 $db < update_h3_mysql_structure.sql  >> metrics.csv;  done

get_metrics_for_db.sql

SELECT database(),’,’, ‘rec’ as typ,’,’,year(rec_Modified) as yr,’,’,month(rec_Modified)as mth,’,’,count(distinct day(rec_Modified),rec_OwnerUGrpID) as cnt from Records group by year(rec_Modified),month(rec_Modified) order by year(rec_Modified),month(rec_Modified),day(rec_Modified),rec_OwnerUGrpID;

Note: the extra commas are b/c MySQL exports with tabs and imports with commas …

 Empty and then load metrics into Heurist_METRICS.newrec_person_month_counts

id int(11) NOT NULL auto_increment,  db varchar(100) collate utf8_unicode_ci NOT NULL,  typ text collate utf8_unicode_ci NOT NULL,  yr year(4) NOT NULL,  mth tinyint(2) NOT NULL,  cnt mediumint(9) NOT NULL COMMENT ‘Number of distinct users/days of record or bookmark mods per month’,  PRIMARY KEY  (id)

 To get metrics result:

SELECT yr,mth,sum(cnt) FROM newrec_person_month_counts group by yr,mth order by yr,mth

 Alternative direct insertion approach (but current script won’t pass database as parameter, so it tries to open Records in the metrics database:

insert into Heurist_METRICS.newrec_person_month_counts (db,typ, yr, mth, cnt) ( SELECT database(), ‘rec’ as typ,year(rec_Modified) as yr,month(rec_Modified)as mth,count(distinct day(rec_Modified),rec_OwnerUGrpID) as cnt from ????.Records
group by year(rec_Modified),month(rec_Modified)
order by year(rec_Modified),month(rec_Modified),day(rec_Modified),rec_OwnerUGrpID)

Alternative person-month metrics based on bookmarks:

 SELECT ‘bkm’ as typ,year(bkm_Modified) as year, month(bkm_Modified) as month, count(distinct day(bkm_Modified),bkm_UGrpID)
from usrBookmarks
group by year(bkm_Modified),month(bkm_Modified)
order by year(bkm_Modified),month(bkm_Modified),day(bkm_Modified),bkm_UGrpID

Select the detail types (fields) of the given record type

SELECT * FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID WHERE rst.rst_RectypeID=1;

Map layer data source

dty_Type = resource

dty_PtrTargetRectypeIDs = 11, 1014, 1017, 1018, 1021

Use the record name

$rectype

use rst_DisplayName

dty_Type=“resource”

 Count occurrence

SELECT COUNT(rl.rl_ID) FROM recLinks rl WHERE rl_DetailTypeID=1018;

DATABASE SUMMARY QUERIES
Select all resource detail (field) types for the given record type

SELECT * FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID WHERE rst.rst_RectypeID=1020 AND dty.dty_Type LIKE “resource”;

Use:

rst.rst_DetailTypeID as id

rst.rst_DisplayName as name

dty.dty_PtrTargetRectypeIDs as ids

Retrieve count:

SELECT COUNT(rd.dtl_ID) as count FROM recDetails rd WHERE rd.dtl_DetailTypeID=1083;

Use:

COUNT(rd.dtl_ID) as count

Combine queries:

SELECT * FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID INNER JOIN recDetails rd ON rd.dtl_DetailTypeID=rst.rst_DetailTypeID WHERE rst.rst_RectypeID=1020 AND dty.dty_Type LIKE “resource” GROUP BY rst.rst_DetailTypeID;

Correct fields:

SELECT rst_DetailTypeID as id, rst_DisplayName as name, COUNT(rd.dtl_ID) as count, dty.dty_PtrTargetRectypeIDs as ids FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID INNER JOIN recDetails rd ON rd.dtl_DetailTypeID=rst.rst_DetailTypeID WHERE rst.rst_RectypeID=1020 AND dty.dty_Type LIKE “resource” GROUP BY rst.rst_DetailTypeID;

Retrieve the pointed records

Count the occurrence:

SELECT COUNT(r2.rec_ID) as count, rl.rl_DetailTypeID, r1.rec_Title, r1.rec_RecTypeID, r2.rec_Title, r2.rec_RecTypeID FROM recLinks rl INNER JOIN Records r1 ON r1.rec_ID=rl.rl_SourceID INNER JOIN Records r2 ON r2.rec_ID=rl.rl_TargetID WHERE r1.rec_RecTypeID=5 AND r2.rec_RecTypeID=10;

Use:

COUNT(rd.dtl_ID) as count


Dictionary of Sydney

Update to split up entities into multiple entity types

Insert the next available ID (30) into defRectyes, copying the current rectype (in this case 25):

INSERT INTO defRecTypes (SELECT 30,”Entity30″, rty_OrderInGroup, rty_Description, rty_TitleMask, rty_CanonicalTitleMask, rty_Plural, rty_Status, rty_OriginatingDBID, rty_NameInOriginatingDB, rty_IDInOriginatingDB, rty_NonOwnerVisibility, rty_ShowInLists, rty_RecTypeGroupID, rty_RecTypeModelIDs, rty_FlagAsFieldset, rty_ReferenceURL, rty_AlternativeRecEditor, rty_Type, rty_ShowURLOnEditForm, rty_ShowDescriptionOnEditForm, rty_Modified, rty_LocallyModified
FROM `hdb_DoS3_redesign`.`defRecTypes` where rty_ID=25);

INSERT INTO hdb_DoS3_redesign.defRecStructure (rst_RecTypeID,rst_DetailTypeID, rst_DisplayName, rst_DisplayHelpText, rst_DisplayExtendedDescription, rst_DisplayOrder, rst_DisplayWidth, rst_DefaultValue, rst_RecordMatchOrder, rst_CalcFunctionID, rst_RequirementType, rst_NonOwnerVisibility, rst_Status, rst_MayModify, rst_OriginatingDBID, rst_IDInOriginatingDB, rst_MaxValues, rst_MinValues, rst_DisplayDetailTypeGroupID, rst_FilteredJsonTermIDTree, rst_PtrFilteredIDs, rst_OrderForThumbnailGeneration, rst_TermIDTreeNonSelectableIDs, rst_Modified, rst_LocallyModified)
SELECT 30, rst_DetailTypeID, rst_DisplayName, rst_DisplayHelpText, rst_DisplayExtendedDescription, rst_DisplayOrder, rst_DisplayWidth, rst_DefaultValue, rst_RecordMatchOrder, rst_CalcFunctionID, rst_RequirementType, rst_NonOwnerVisibility, rst_Status, rst_MayModify, rst_OriginatingDBID, rst_IDInOriginatingDB, rst_MaxValues, rst_MinValues, rst_DisplayDetailTypeGroupID, rst_FilteredJsonTermIDTree, rst_PtrFilteredIDs, rst_OrderForThumbnailGeneration, rst_TermIDTreeNonSelectableIDs, rst_Modified, rst_LocallyModified
from hdb_DoS3_redesign.defRecStructure where rst_RecTypeID=25;

detail 75 is Entity type, codes are as below:

use hdb_DoS3_redesign;
update Records, recDetails set rec_RecTypeID=30
where (Records.rec_ID=recDetails.dtl_RecID)
And (rec_RecTypeID=25) and (dtl_DetailTypeID=75) and (dtl_Value=3291);

artefact=3291 –> rectype 30
building 3294 –> 31
event 3296 –> 32
nat ftr 3298 –> 33
org 3300 –> 34
pers 3301 –> 35
place 3302 –> 36
struc 3305  –> 37

Change all entities of a particular type into the appropriate record type

use hdb_DoS3_redesign;
update Records, recDetails set rec_RecTypeID=30
where (Records.rec_ID=recDetails.dtl_RecID)
And (rec_RecTypeID=25) and (dtl_DetailTypeID=75) and (dtl_Value=3291);

Rectype 25 is Entity (undifferentiated), Detail type 75 = entity type, Dtl_value 3291 is Artefact

Heurist Registration

Registration allows creation of Heurst databases and subscribes you to occasional news updates (single-click to unsubscribe).

We will not share your email information with any third party.

Thank you for registering. We have sent you an email, allowing you to confirm your registration and create your first Heurist database.