• Mainframe files have structured or unstructured data. Copybooks are the data dictionary that describe the schema of the data files
  • Copybooks are manually maintained. Overtime they get out of sync with the data structure making the data files incomprehensible
  • Build a feature set from the data in the data filescapturing their hierarchical, spatial and other relationships.
  • Use unsupervised learning clustering model to build clusters of data from the feature set generated above.
  • Analyse the clusters along with current knowledge of the data schema from existing copybook information, and subject matter expertise to extract the missing or lost knowledge