Accessing Heurist Data via SQL
Heurist is underpinned by a clearly defined SQL data model implemented in MySQL. Because the structure of every Heurist database is identical and the data model is simple and highly normalised, it is extremely easy to bypass the web interface and manipulate the underlying data directly with SQL statements. The underlying data includes the semantic structure of the Heurist database, making the database fully self-documenting.
Please note that this page is a work in progress and, to some degree, internal notes on useful SQL and scripts, which we expect to complete rather slowly unless it becomes a higher priority. Probable completion 2015.
Use of codes for record types, fields and terms
Data in a Heurist database is never referenced by name or label, always by internal ID, allowing record type names, field names and terms to be edited at will without affecting existing data. In order to manipulate data in the database you therefore need to know the internal code values used by your database.
Note: Internal IDs for the same record type/field type/term will vary between databases (some essential items have fixed low-number codes which are established when the database is created). However, all items have an origin database ID and origin database code which together provide a unique identifier or ‘concept ID’ across the whole Heurist network, allowing for later cross-database search and comparison.
Database Tables
The following section describes some of the more important fields in the Heurist database tables.
Record types
defRecTypes table: Records are stored in the Records data table. rec_RecTypeID identifies the type of each record
rty_ID | Code identifying the record type (max 65000) |
rty_Name | The name of the record (object) type (max 63 char) |
Record details (fields)
defDetailTypes table: Fields are stored as key-value pairs in the recDetails data table. dtl_RecID specifies the key. The use of the term ‘detail’ is a historical legacy.
dty_ID | Code identifying the field/attribute/detail |
dty_Name | The name of the field/attribute/detail |
dty_Type | The type of data stored in this field/attribute |
Field data types:
enum, relationtype | Term ID (relationtype is specialised version for relationships) |
resource | Record ID (record pointer value) |
freetext, blocktext | single and multi line (memo) |
integer, float: | numeric values |
date | date/time value (may include fuzzy date descriptors) |
file | FK > recUploadedFiles (references local or remote file) |
geo | geographic object (geometry stored as WKT) |
Terms
defTerms table: Terms are organised in a hierarchical tree. Term IDs are used as values in the key-value pairsTerms
trm_ID | Code identifying the term |
trm_Label | Human readable term |
trm_Code | Standardised codes eg. ISO |
trm_Domain | enum indicates a normal term, relation indicates relationship type terms used in relationship records |
trm_ParentTermID | ID of the parent term in the hierarchy. 0 or NULL indicates a vocabulary (top level) – vocabularies can never be used as values in a record |
Optional code eg. alphanumeric code which may be required for import or export |
Useful SQL Queries
Count usage of detail type by record type
This query simply reports the usage of each detail type for each record type. It does not consult the record type structures, so it cannot list detail types in a record type which have not been used, or detail types which should not occur for a given record type. The database health check function in Heurist identifies these additional conditions.
select rec_RecTypeID, rty_Name, dtl_DetailTypeID, dty_Name, count(*)
from Records, recDetails, defRecTypes, defDetailTypes
where (Records.rec_ID = recDetails.dtl_RecID) and
(recDetails.dtl_DetailTypeID=defDetailTypes.dty_ID) and
(Records.rec_RecTypeID = defRecTypes.rty_ID)
group by rec_RecTypeID, dtl_DetailTypeID
order by rec_RecTypeID, dtl_DetailTypeID;
Insert records
For example, to insert a new (blank) record, we can simply run the following SQL:
INSERT INTO Records (rec_Type, rec_UGrpID, l………….) VALUES (…………………..)
This record will be incomplete as it contains no values and no composite record title is created, so it is necessary to know a few fairly simple rules in order to insert correct data. Heurist also has some rebuild and data verification functions which can come to the rescue.
Insert fields
To insert field values into a record, one simply uses the field codes
insert into recDetails (dtl_RecID, dtl_dtyID, dtl_Value) VALUES ( …………)
This can also be used to insert data from a temporary table into which some existing data has been loaded, for example (the Heurist ID of existing records is in hid, 134 is the field into which the data is being inserted):
insert into recDetails (dtl_RecID,dtl_DetailTypeID,dtl_Value) select hid,134,entry from tmp_pdf_texts where `hid` IS NOT NULL
It can be useful to use Heurist’s report formatter to format an output file which has the Heurist IDs in it, which can then be edited with additional data to create a file for import using this method (or via the Heurist import functions).
Delete records and fields
Records are easily deleted, and Heurist’s cascading referential integrity takes care of deleting associated fields, bookmarks and tags. For instance, to delete a specific record and all its values:
delete from Records where rec_ID = 2785
or to delete all records and fields with a value under 10 in field 37 (we will return to internal codes in a moment)
delete from Records, recDetails
where (rec_ID = dtl_RecID) AND (dtl_dtyID = 37) AND (dtl_Value < 10)
Modify field values
In the same way, field values can be systematically modified using:
update recDetails set dtl_Value = ???? where <condition>
Locating records with specific detail values
SELECT r.rec_ID, d.dtl_Value FROM Records r left join recDetails d
on rec_Id=dtl_RecID and dtl_DetailTypeID=5
where rec_RecTypeId=1 and (d.dtl_Value is null or d.dtl_Value=”);
Record Type 1 is Relationships, Detail type 5 is the Relationship Type, this searches for null and blank values in this detail type
Finding ‘floating’ records (not linked from another record)
This query looks for multimedia records (type 5) not linked from another record. Omit final Where restriction ot RecTypeId=5 for any records which are unlinked.
SELECT r.rec_ID, r.rec_Title, d.dtl_DetailTypeID FROM Records r left join recDetails d on r.rec_ID=d.dtl_Value and d.dtl_DetailTypeID in (select dty_ID from defDetailTypes where dty_Type=’resource’) where rec_RecTypeID=5 and d.dtl_Value is NULL;
Obtaining user and group IDs
Changing or deleting field values is even easier <to do>
Adding pointers from a record to its child records
In some cases, child records – such as events in a person’s life – may be imported into the database using the CSV / TSV importer. These child events will contain a pointer to the parent (Person) record, normally set by matching the name(s) of the parent, eg. person name(s), place name, reference ID or other unique identifier of the parent. You then wish to set record pointer fields in the parent records pointing to the child records. The following query shows how to do this:
Eventlet records are code 24 and contain a pointer field (code 16) back to their parent (Person). Person records contain a record pointer field (code 79) which points to eventlets belonging to that person.
Check for duplicate records (on multiple fields)
SELECT firstnames, surname,COUNT(*) AS Cnt
FROM temp
GROUPBY SURNAME, FIRSTNAMES
HAVING Cnt >1
Average Active Users
This procedure can be modified for other metrics. Run the following scripts:
get_metrics_for_all.sql
rm metrics.csv
for db in `echo “select schema_name from information_schema.schemata where schema_name like ‘hdb_%’ “| mysql -u root -psmith18`; do mysql –skip-column-names -u root -psmith18 $db < update_h3_mysql_structure.sql >> metrics.csv; done
get_metrics_for_db.sql
SELECT database(),’,’, ‘rec’ as typ,’,’,year(rec_Modified) as yr,’,’,month(rec_Modified)as mth,’,’,count(distinct day(rec_Modified),rec_OwnerUGrpID) as cnt from Records group by year(rec_Modified),month(rec_Modified) order by year(rec_Modified),month(rec_Modified),day(rec_Modified),rec_OwnerUGrpID;
Note: the extra commas are b/c MySQL exports with tabs and imports with commas …
Empty and then load metrics into Heurist_METRICS.newrec_person_month_counts
id int(11) NOT NULL auto_increment, db varchar(100) collate utf8_unicode_ci NOT NULL, typ text collate utf8_unicode_ci NOT NULL, yr year(4) NOT NULL, mth tinyint(2) NOT NULL, cnt mediumint(9) NOT NULL COMMENT ‘Number of distinct users/days of record or bookmark mods per month’, PRIMARY KEY (id)
To get metrics result:
SELECT yr,mth,sum(cnt) FROM newrec_person_month_counts group by yr,mth order by yr,mth
Alternative direct insertion approach (but current script won’t pass database as parameter, so it tries to open Records in the metrics database:
insert into Heurist_METRICS.newrec_person_month_counts (db,typ, yr, mth, cnt) ( SELECT database(), ‘rec’ as typ,year(rec_Modified) as yr,month(rec_Modified)as mth,count(distinct day(rec_Modified),rec_OwnerUGrpID) as cnt from ????.Records
group by year(rec_Modified),month(rec_Modified)
order by year(rec_Modified),month(rec_Modified),day(rec_Modified),rec_OwnerUGrpID)
Alternative person-month metrics based on bookmarks:
SELECT ‘bkm’ as typ,year(bkm_Modified) as year, month(bkm_Modified) as month, count(distinct day(bkm_Modified),bkm_UGrpID)
from usrBookmarks
group by year(bkm_Modified),month(bkm_Modified)
order by year(bkm_Modified),month(bkm_Modified),day(bkm_Modified),bkm_UGrpID
Select the detail types (fields) of the given record type
SELECT * FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID WHERE rst.rst_RectypeID=1;
Map layer data source
dty_Type = resource
dty_PtrTargetRectypeIDs = 11, 1014, 1017, 1018, 1021
Use the record name
$rectype
use rst_DisplayName
dty_Type=“resource”
Count occurrence
SELECT COUNT(rl.rl_ID) FROM recLinks rl WHERE rl_DetailTypeID=1018;
DATABASE SUMMARY QUERIES
Select all resource detail (field) types for the given record type
SELECT * FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID WHERE rst.rst_RectypeID=1020 AND dty.dty_Type LIKE “resource”;
Use:
rst.rst_DetailTypeID as id
rst.rst_DisplayName as name
dty.dty_PtrTargetRectypeIDs as ids
Retrieve count:
SELECT COUNT(rd.dtl_ID) as count FROM recDetails rd WHERE rd.dtl_DetailTypeID=1083;
Use:
COUNT(rd.dtl_ID) as count
Combine queries:
SELECT * FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID INNER JOIN recDetails rd ON rd.dtl_DetailTypeID=rst.rst_DetailTypeID WHERE rst.rst_RectypeID=1020 AND dty.dty_Type LIKE “resource” GROUP BY rst.rst_DetailTypeID;
Correct fields:
SELECT rst_DetailTypeID as id, rst_DisplayName as name, COUNT(rd.dtl_ID) as count, dty.dty_PtrTargetRectypeIDs as ids FROM defRecStructure rst INNER JOIN defDetailTypes dty ON rst.rst_DetailTypeID=dty.dty_ID INNER JOIN recDetails rd ON rd.dtl_DetailTypeID=rst.rst_DetailTypeID WHERE rst.rst_RectypeID=1020 AND dty.dty_Type LIKE “resource” GROUP BY rst.rst_DetailTypeID;
Retrieve the pointed records
Count the occurrence:
SELECT COUNT(r2.rec_ID) as count, rl.rl_DetailTypeID, r1.rec_Title, r1.rec_RecTypeID, r2.rec_Title, r2.rec_RecTypeID FROM recLinks rl INNER JOIN Records r1 ON r1.rec_ID=rl.rl_SourceID INNER JOIN Records r2 ON r2.rec_ID=rl.rl_TargetID WHERE r1.rec_RecTypeID=5 AND r2.rec_RecTypeID=10;
Use:
COUNT(rd.dtl_ID) as count
Dictionary of Sydney
Update to split up entities into multiple entity types
Insert the next available ID (30) into defRectyes, copying the current rectype (in this case 25):
INSERT INTO defRecTypes (SELECT 30,”Entity30″, rty_OrderInGroup, rty_Description, rty_TitleMask, rty_CanonicalTitleMask, rty_Plural, rty_Status, rty_OriginatingDBID, rty_NameInOriginatingDB, rty_IDInOriginatingDB, rty_NonOwnerVisibility, rty_ShowInLists, rty_RecTypeGroupID, rty_RecTypeModelIDs, rty_FlagAsFieldset, rty_ReferenceURL, rty_AlternativeRecEditor, rty_Type, rty_ShowURLOnEditForm, rty_ShowDescriptionOnEditForm, rty_Modified, rty_LocallyModified
FROM `hdb_DoS3_redesign`.`defRecTypes` where rty_ID=25);
INSERT INTO hdb_DoS3_redesign.defRecStructure (rst_RecTypeID,rst_DetailTypeID, rst_DisplayName, rst_DisplayHelpText, rst_DisplayExtendedDescription, rst_DisplayOrder, rst_DisplayWidth, rst_DefaultValue, rst_RecordMatchOrder, rst_CalcFunctionID, rst_RequirementType, rst_NonOwnerVisibility, rst_Status, rst_MayModify, rst_OriginatingDBID, rst_IDInOriginatingDB, rst_MaxValues, rst_MinValues, rst_DisplayDetailTypeGroupID, rst_FilteredJsonTermIDTree, rst_PtrFilteredIDs, rst_OrderForThumbnailGeneration, rst_TermIDTreeNonSelectableIDs, rst_Modified, rst_LocallyModified)
SELECT 30, rst_DetailTypeID, rst_DisplayName, rst_DisplayHelpText, rst_DisplayExtendedDescription, rst_DisplayOrder, rst_DisplayWidth, rst_DefaultValue, rst_RecordMatchOrder, rst_CalcFunctionID, rst_RequirementType, rst_NonOwnerVisibility, rst_Status, rst_MayModify, rst_OriginatingDBID, rst_IDInOriginatingDB, rst_MaxValues, rst_MinValues, rst_DisplayDetailTypeGroupID, rst_FilteredJsonTermIDTree, rst_PtrFilteredIDs, rst_OrderForThumbnailGeneration, rst_TermIDTreeNonSelectableIDs, rst_Modified, rst_LocallyModified
from hdb_DoS3_redesign.defRecStructure where rst_RecTypeID=25;
detail 75 is Entity type, codes are as below:
use hdb_DoS3_redesign;
update Records, recDetails set rec_RecTypeID=30
where (Records.rec_ID=recDetails.dtl_RecID)
And (rec_RecTypeID=25) and (dtl_DetailTypeID=75) and (dtl_Value=3291);
artefact=3291 –> rectype 30
building 3294 –> 31
event 3296 –> 32
nat ftr 3298 –> 33
org 3300 –> 34
pers 3301 –> 35
place 3302 –> 36
struc 3305 –> 37
Change all entities of a particular type into the appropriate record type
use hdb_DoS3_redesign;
update Records, recDetails set rec_RecTypeID=30
where (Records.rec_ID=recDetails.dtl_RecID)
And (rec_RecTypeID=25) and (dtl_DetailTypeID=75) and (dtl_Value=3291);
Rectype 25 is Entity (undifferentiated), Detail type 75 = entity type, Dtl_value 3291 is Artefact