Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

AUTOMATED FEATURE ENGINEERING OF HIERARCHICAL ENSEMBLE CONNECTOMES / INGÉNIERIE AUTOMATISÉE DES CARACTÉRISTIQUES DE CONNECTOMES D'ENSEMBLE HIÉRARCHIQUE

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Publication Date:
    Apr 01, 2019
  • Language:
    English
  • Additional Information
    • Patent Number:
      2019191784
    • Filing Authority:
      World Intellectual Property Organization (WIPO)
    • Appl. No:
      PCT/US2019/025260
    • Application Filed:
      Apr 01, 2019
    • Kind Code:
      A1
    • Abstract:
      Existing methods for analyzing person-specific 'connectomes'are not computationally equipped for scalable, flexible, and integrated processing across multiple network resolutions and drawing from disparate data modalities— a major roadblock to utilizing ensembles and hierarchies of connectomes to solve person-specific machine-learning problems. The processes implemented in software described herein consists of an end-to-end pipeline for deploying ensembles and hierarchies of network-generating workflows that can utilize multimodal, person-specific data to sample networks, extracted from that data, across a grid of network-defining hyperparameters. In essence, this pipeline enables users to perform ensemble sampling of connectomes for given individual(s) based on any input phenotypic datatype, constructed from any data modality or hierarchy of modalities at any scale, and based on any set of network-defining hyperparameters.
      Selon la présente invention, les procédés existants d'analyse des « connectomes » spécifiques à une personne ne sont pas équipés en termes de capacité de calcul pour mettre en œuvre un traitement évolutif, flexible et intégré sur de multiples résolutions de réseau et pour mobiliser des modalités de données disparates - un frein majeur à l'utilisation d'ensembles et de hiérarchies de connectomes pour résoudre des problèmes d'apprentissage automatique spécifiques à une personne. Les processus mis en œuvre dans un logiciel selon la présente invention se composent d'un pipeline de bout en bout permettant de déployer des ensembles et des hiérarchies de flux de travaux de génération de réseau qui peuvent utiliser des données multimodales, spécifiques à une personne, pour échantillonner des réseaux, extraits de ces données, sur une grille d'hyperparamètres définissant un réseau. Essentiellement, ce pipeline permet à des utilisateurs d'effectuer un échantillonnage d'ensemble de connectomes pour un ou plusieurs individus donnés sur la base de n'importe quel type de données phénotypiques d'entrée, construit à partir de n'importe quelle modalité de données ou de n'importe quelle hiérarchie de modalités à n'importe quelle échelle, et sur la base de n'importe quel ensemble d'hyperparamètres définissant un réseau.
    • Inventors:
      PISNER, Derek (US)
    • Applicants:
      PISNER, Derek (US)
    • Claim:
      CLAIMS:
    • Claim:
      What is claimed is:
    • Claim:
      1. A connectome generating process, comprising:
    • Claim:
      a. An automated method of transforming person-related data into at least one ensemble of connectomes, in the form of networks defined by nodes and edges, recognized among a plurality of possible networks that can be generated based on at least one combination of hyperparameters used in the connectome-generating process;
    • Claim:
      i. Wherein connectome ensemble(s) are sampled at one or more hyperparameter resolutions, comprising a set of either: modality- specific networks that collectively capture one or more target network(s) or optimize some derivative graph metric(s), a hierarchy (or hierarchies) of modality-specific networks that collectively capture one or multiple target network(s) of interest or optimize some derivative graph metric(s), a hierarchy (or hierarchies) of networks across multiple data modalities that collectively capture one or multiple target networks of interest or optimize some derivative graph metric(s); ii. Wherein connectome-generating hyperparameters comprise of data modality-specific hyperparameters and/or modality-unspecific hyperparameters which impact the estimation of the networks produced by the connectome-generating process;
    • Claim:
      iii. Wherein modality-unspecific hyperparameters include but are not limited to connectivity model type used to define network edges, along with network thresholding approach;
    • Claim:
      1. With respect to the connectivity model hyperparameter, any combination of a variety of connectivity models can be used, including covariance, correlation, and/or a modality-specific definition (i.e. making the hyperparameter modality-specific);
    • Claim:
      2. With respect to the thresholding approach, multiple sub parameters may be specified including types and schemes, where types include both global and local forms of thresholding, and schemes include any method(s) used to achieve the target thresholding type(s);
    • Claim:
      iv. Wherein modality-specific hyperparameters are those hyperparameters which are unique to specific data modalities for which sub- workflows (or interfaces to connectome-generating sub-workflows from third- party packages) are available;
    • Claim:
      v. Wherein a hard-coded hyperparameter filter, preceding modality- specific sub-workflow execution, that consists of preprogrammed logic for parsing and evaluating the compatibility of user-specified hyperparameters; and vi. Wherein input data may be manually specified by the user (i.e. via explicit file path(s) stated as a sequence of alphanumeric strings) or semi-automatically specified (i.e. via explicit path(s), stated as a sequence of alphanumeric strings, to a base directory of dataset or datasets structured according to a supported standardized data specification protocol).
    • Claim:
      2. The connectome generating process in Claim (1) wherein multiple layers of computational parallelism facilitate the generation of connectome ensembles using distributed multiprocessing (i.e. as opposed to serial execution), these parallel layers may include but are not limited to:
    • Claim:
      a. Parallelization across multiple individuals;
    • Claim:
      b. Parallelization of nested sub- workflows specific to each data modality; and c. Parallelization of modular functions within each nested sub- workflow specific to each data modality.
    • Claim:
      3. The connectome generating process in Claim (1) wherein, the degree of user- requested or possible parallelism described in Claim (2) exceeds available compute resources required to ensure efficient scheduling of processes, a network flow- optimized job scheduling utility for assigning priority to parallelized connectome- generating processes may be used. Network flow optimization of for the sake of optimizing runtime execution according to any of several optimization objectives including but not limited to the following:
    • Claim:
      a. Minimizing computational load;
    • Claim:
      b. Minimizing compute dollar spending; and c. Minimizing runtime duration.
    • Claim:
      4. A four-stage process for converting person-related data into network ensemble(s) comprised of multiple steps:
    • Claim:
      a. Initializing at least one meta-workflow configured to trigger one or more nested sub-workflows determined by hard-coded logic to be applicable according to the unique combination of input data and hyperparameters as defined in Claim (1);
    • Claim:
      i. Whereby any of a variety of inputs (boolean, string, and numeric) are specified by the user as command-line options or through a Graphical User Interface (GUI) on a cloud or dedicated server, and with a local, remote, or containerized environment; and ii. Whereby the initialized individual or multi-individual workflow itself may or may not form a Directed Acyclic execution Graph (DAG) that pre-defines the phases of workflow execution and iterable expansion of all specified files and hyperparameters as defined according to Claim (1).
    • Claim:
      b. Executing nested sub-workflows as indicated according to the logic outlined in Claim (la,v-vi);
    • Claim:
      i. Whereby each sub-workflow receives dependent variable inputs from the processes described in Claim (4a) (i.e. both saved in memory cache and in the DAG);
    • Claim:
      ii. Whereby modality-specific sub-workflows are triggered if and only if the minimally required set of data and hyperparameter inputs (including defaults) are supplied by the user; and iii. Whereby a modality unspecific sub-workflow may be trigger in the case that a raw graph or raw graphs (i.e. pre-constructed connectomes) are provided as inputs by the user during the processes described in Claim (4a), and thus bypassing the connectome-generating process described in Claim (1);
    • Claim:
      c. Applying graph analysis to the ensemble(s) of connectomes generated from the processes described in Claim (4a-b) such that:
    • Claim:
      i. Graph analysis commences after all valid connectomes from the processes described in Claim (4a-b) have been generated;
    • Claim:
      ii. Graph analysis repeats iteratively (or may be parallelized to accommodate for this iteration in accordance with Claims (2) and (3)) for each of the connectomes generated from the processes described in Claim (4a-b);
    • Claim:
      iii. Whereby graph analysis may consist of any of a variety of conventional or custom global and/or local graph analysis algorithms, as listed in a provided configuration file that can be modified as needed by the user;
    • Claim:
      iv. Whereby those graph analysis algorithms that have been specified but which are inapplicable to the network input(s) are hard-coded to be skipped;
    • Claim:
      v. Whereby resulting graph analysis measures are organized into a 1- dimensional vector of scalar values (per unique network input into the processes described in Claim (4c)), which is subsequently written to disk as an intermediary text file to be used in the processes described in Claim (4d).
    • Claim:
      d. Aggregating graph-analytic vectors produced from the processes described in Claim (4c) for each of the networks produced from the processes described in Claim (4a-b) are organized into a dataframe (or dataframes in the case that multiple individual workflows are executed or multiple independent target networks are specified) of features for subsequent use in machine-learning or other analytic scenarios wherein the networks produced by the processes described in Claim (4a-b) are:
    • Claim:
      i. Aggregated into 1 -dimensional vector(s) of edge weights for each node-pair;
    • Claim:
      ii. Used to calculate basic measures of central tendency (e.g. mean, median, mode), select sub-samples of the network ensemble(s) based on some feature-selection criterion specified by the user during the processes described in Claim (4a) including but not limited to:
    • Claim:
      1. Bayesian Markov-Chain probabilities assigned to each network;
    • Claim:
      2. Discriminability, identifiability, or other reproducibility/reliability optimization function;
    • Claim:
      3. Minimum-variance and/or maximum collinearity threshold;
    • Claim:
      5. The four stage process in claim 4 wherein multiple modality-specific (or modality- unspecific) sub -workflows, hierarchical‘multigraph’ analysis will be triggered for the processes described in Claim (4c), unless otherwise specified by the user through an override indicated by the user during the processes described in Claim (4a);
    • Claim:
      a. Whereby hierarchical multigraph analysis consists of the simultaneous consideration of modality-specific and/or modality-unspecific networks or ensembles of networks as distinct network objects to be analyzed; b. Whereby additional analytic approaches and hyperparameters (i.e. beyond those available for non-hierarchical graphs) may be implemented including but not limited to:
    • Claim:
      i. Adaptive thresholding of networks in the hierarchy or hierarchies during the processes described in Claim (4a-c) to optimize similarity of graph properties across the hierarchy or hierarchies; and ii. The inclusion of additional derivative graph analytic calculations (e.g. versatility) that are only applicable for the analysis of hierarchical multi graphs.
    • IPCR Classification Code:
      G09G 5/02 20060101AFI BHUS; G06T 11/20 20060101ALI BHUS
    • Rights:
      User is aware and acknowledges that Lighthouse IP shall retain all right, title and interest in and to this record and its structure under relevant and applicable copyright laws. User receives no ownership or any other rights to this record and its structure. User is aware and confirms accepting the terms and conditions of use as defined in the relevant user agreement either with Lighthouse IP or with its partner(s).
    • Date Entry:
      20191003
    • Accession Number:
      WO2019191784A1
  • Citations
    • ABNT:
      INVENTOR, P. D. (US). Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique. [s. l.], 2019. Disponível em: http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=gpd&AN=WO2019191784A1. Acesso em: 29 set. 2020.
    • AMA:
      Inventor PD (US). Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique. 20190401 2019. Accessed September 29, 2020. http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=gpd&AN=WO2019191784A1
    • APA:
      Inventor, P. D. (US). (2019). Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique.
    • Chicago/Turabian: Author-Date:
      Inventor, PISNER, Derek (US). 2019. “Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique.” http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=gpd&AN=WO2019191784A1.
    • Harvard:
      Inventor, P. D. (US) (2019) ‘Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique’. Available at: http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=gpd&AN=WO2019191784A1 (Accessed: 29 September 2020).
    • Harvard: Australian:
      Inventor, PD (US) 2019, ‘Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique’, viewed 29 September 2020, .
    • MLA:
      Inventor, PISNER, Derek (US). Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique. 20190401 2019. EBSCOhost, search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=gpd&AN=WO2019191784A1.
    • Chicago/Turabian: Humanities:
      Inventor, PISNER, Derek (US). “Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique,” 20190401 2019. http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=gpd&AN=WO2019191784A1.
    • Vancouver/ICMJE:
      Inventor PD (US). Automated Feature Engineering of Hierarchical Ensemble Connectomes / Ingénierie Automatisée Des Caractéristiques De Connectomes D’Ensemble Hiérarchique. 2019 20190401 [cited 2020 Sep 29]; Available from: http://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=gpd&AN=WO2019191784A1