Cours : Graph Mining: An Introduction

Aperçu des sections

Généralités

Tout replier Tout déplier
- Misc Announcements Forum
  
  Course: This course relates to Big Data, Data Science, and Graph Mining. This can be thought of as a 4th year course or an MSc level course in Computer Science, Data Science or related departments. The course is under development We had an online session to introduce the course on Nov 1st, 2019.
- Prerequisite Page
  
  As long as you are interested, you can enroll to the course. Please open this page for details.
- Job Prospect for Graph Mining Page
  
  Open this page and check the details for sample jobs.
- Introducing Graph Mining Course Fichier
Course Topics
- Course Topics Page
  
  Graph Theory Introduction, Graph Mining Introduction, Network Properties, Random Graphs, Small World Graphs, Node Importance, Node Similarity, Clustering & Community Detection, Link Prediction, Anomaly detection, Time Evolving Graphs, Influence/Virus Propagation, Graph Mining Use Cases, Big Data Graph Databases, Big Data Graph Processing, GraphX, Neo4j, Feature Based Classification, Graph Classification, Graph Kernels, Label Propagation
- Resources: Public URLs for Some of the Included Topics Page
Contact and Teacher/Instructor/Professor/Faculty Information
Chat with Classmates
- Everyday 20:00 PM EST (Toronto): 6 AM (Dhaka) Chat with each other to discuss the course topics
  
  Chat with each other to discuss the course topics. Everyday at 22:00 pm EST. Discuss only relevant topics. Chat messages are visible to others even past chat messages.
- Everyday at 10:00 AM EST (Toronto), 8 PM (Bangladesh): Chat with Others
  
  Focus on the course topics and related, be respectful and collaborative
Discussion Forum
- Discussion Forum: Discuss Course Topics By Modules
Learning Outcomes and Assessments
- Learning Outcomes Page
  
  Learning Outcomes : Learning Outcomes considering all three domains of Learning such as Cognitive, Psychomotor, and Affective. Click the above link and check details
- Assessments when a full length course Page
  
  Assignments: Three (3) : 3 * 15% = 45%, Project: 30%, Class Participation: 15% (Attendance and Participation - 7%, Quizzes, and Reflection - 8%), Final Review Exam: 10% (Subject to revise over time)
- Assessment for a Certificate course: Short Term: An Introductory Course Page
  
  Might include: Primarily Quiz, Reflection for Each Topic, Small Code Test, Short Questions and Answers, Tests on understanding how to apply the concepts. Details will be provided at assessment times.
  For certificate, participants/students must have to prove their knowledge, and skills where multiple attempts will be allowed. We will make sure, if we issue a certificate, you are qualified to get the certificate, and you will be given multiple opportunities to prove yourselves. Though each time the question set will be different.
Introducing Graphs
- Resources to Learn From Page
- Read the resources above on Introducing Graphs. Then Learn by finding answers to the questions in this list? Page
- Quiz : Pre Test (i.e. Start of the class)
  
  Quiz : Pre Test (i.e. Start of the class). Future Work
- Quiz : Post Test : End of the class
  
  Quiz : Post Test : End of the class. Future work
- Every Class Assignment: Reflection on today's class Devoir
  
  Your learning and your relation with today's topics: Write down in a document on what did you learn from Today's class. What was the most interesting/useful/important topics for today (in your judgement). What Psychomotor skill did you achieve today? How will you relate today's topic to real-world i.e. industry and research? What ethical and legal considerations are there for today's topics? How can you relate your past to this topic, where you can or want to take the topics for today. Any relevant thoughts? what more could be related to these topics in the class?
Random Graphs
- Resources to Learn From Page
- Read the resources above and find answer on SMALL WORLD GRAPHS & RANDOM GRAPH GENERATORS. Learn by finding answers to the following questions? Page
HITS Pagerank HUBS Anchors and Graph Theories
- Resources to Learn From Page
- Read the resources above and Learn by finding Answers to the Following Questions. Can you answer the following questions? Page
Node Importance
- Resources to Learn From Page
- Read the resources above to find answers. Betweenness Based Clustering: Learn by finding answers to the following questions. Can you answer the following? Page
Community and Cluster Detection
- Resources to Learn From Page
- Read the resources above to find answers. Community Detection: Learn by finding answers to the following questions. Can you answer the following questions on Community Detection? Page
Shared Nearest Neighbors - Community Detection
- Resources to Learn From Page
- Read the resources above to find answers. Shared Nearest Neighbors : Clustering : Community Detection Page
Spanning Tree and Community Detection
- Resources to Learn From Page
- Read the resources above to find answers. K-Spanning Trees: Learn by finding answers to the following questions. Can you answer the questions? Page
Louvian Modularity and Community Detection
- Resources to Learn From Page
- Read the resources above to find answers. Louvian Modularity: Learn by finding answers to the following questions. Can you answer the following questions? Page
Highly Connected Subgraphs
- Resources to Learn From Page
- Read the resources above to find answers. Highly Connected Subgraph Clustering: Learn by finding answers to the following questions. Page
Link Prediction
- Resources to Learn From Page
- Read the resources above to find answers. Link Prediction. Learn by finding answers to the following questions. Page
Time Evolving Graphs
- To Learn From: Time Evolving Graphs URL
  
  https://link.springer.com/article/10.1007/s41019-019-00105-0
- Learn by finding the answers to the following questions from the given/external resources? Page
Graph Mining Use Cases
- To Learn From: Graphs are everywhere URL
  
  Graphs are everywhere. https://pdfs.semanticscholar.org/144b/130323cb94d618c2c5e66982a56f31d36396.pdf
Anomaly detection
- Learn From: Anomaly Detection and Graph Mining URL
  
  https://hpi.de/fileadmin/user_upload/fachgebiete/mueller/courses/graphmining/GraphMining-13a-AnomalyDetection.pdf
- Graph based Anomaly Detection and Description: A Survey URL
  
  Graph based Anomaly Detection and Description: A Survey
Influence/Virus/Label Propagation
- Resources to learn from Page
- A presentation on Influence/Virus Propagation URL
Big Data Graph Databases
Big Data Graph Processing
Social Network Analysis
- Learn From: Social Network Analysis URL
  
  Some other topics before relates to this topic. https://hpi.de/fileadmin/user_upload/fachgebiete/mueller/courses/graphmining/GraphMining-02-Social-Network-Analysis.pdf
Querying Graphs: Isomorphic Graphs
- Learn From: Querying Graphs: Isomorphic Graphs URL
  
  https://hpi.de/fileadmin/user_upload/fachgebiete/mueller/courses/graphmining/GraphMIning-03-Querying-Graphs.pdf
Mining graph patterns
- Learn From: Mining graph patterns URL
  
  https://hpi.de/fileadmin/user_upload/fachgebiete/mueller/courses/graphmining/GraphMining-04-FrequentSubgraph.pdf
Graph Mining and Classifications
Tools and Examples
- List 1: Top 30 Social Network Analysis and Visualization Tools URL
  
  Commetrix, Cuttlefish, Cytoscape, EgoNet, Gephi, Graph-tool, GraphChi, Graphviz
- List 2: Graph Analytics Tools URL
  
  igraph, NetworkX, graph-tool, Gephi, SNAP, JUNG, Mathematica, D3.js, Cytoscape
Probable Assignments
- Graph Datasets for Assignments and Projects; also for research Page
  
  You will see many datasets on the URL: You can apply your implementation on a dataset/graph of your choice though make sure the properties (directed/undirected/weighted/unweighted/connected/not-connected and others) match with the question.
- Python or R Libraries for Graph Mining Implementations Page
  
  https://networkx.github.io/documentation/latest/_downloads/networkx_reference.pdf
  https://networkx.github.io/documentation/stable/reference/introduction.html
- List 1: Implement the following algorithms and apply on large graphs (in Plain Python or R without Graph Libraries) Page
  
  Click the above link for details: BFS, DFS, Single Source Shortest Path, All pair shortest path, Karger's Algorithm, Min-Cut Algorithm, Cliques Algorithms, Dijiktra’s Shortest Path implementation (for shortest paths or the longest paths)
  More will be added later .. any algorithm that you will come across can be an assignment as well. You might try to study and implement the most important algorithms that are used practically and are famous (or solve important problems)
- List 2: Use Graph Algorithms provided by NetworkX (Python, R) libraries Page
  
  Assignments related to demonstrating the capability to be able to use the Graph Algorithms provided in the NetworkX library. Such as find the related library methods/algorithm as mentioned in list 1 and apply on the same datasets : do you get the same results.
- List 3: Spanning Tree, Connected Subgraphs Page
  
  In Short: Implement Spanning Tree, and Highly Connected Subgraphs
- List 4: Based on Graph Properties and Concepts Page
  
  Given a graph (use a data-set or a small graph first) then apply on large graphs. Write Python or R code to Identify the isolated nodes in the graph, count bi-directional edges in the graph, identify top 10 vertices based on in-degrees, identify top 100 vertices based on their out-degrees, count the number of cliques in the Graph, identify number of disconnected subgraphs.
- List 5: Centrality-PageRank-Betweenness
  
  For a Graph/graph-dataset such as political blogs (see example dataset section), implement Vertex Betweenness, Edge Betweenness, Closeness Centrality.
  
  Implement Page ranking Algorithms such as HITS and Anchor/Hubs
Potential Project Ideas
- List 1: Project Ideas with real life applications Page
  From: https://cs.stanford.edu/people/jure/talks/www08tutorial/, you need to find datasets to implement these ideas. Check the above URLs for DataSets (I did not check yet)
  
  Part 4: Case studies
  Communication patterns of MSN Messenger. The application of above mentioned tools and algorithms to a large network of communication on MSN Instant Messenger (30 billion conversations, 240 million people).
  Detecting fraud on eBay. How to find fraudulent people on eBay. We present a belief propagation method that is able to find fraudulent people in large networks.
  Monitoring social and communication networks over time -- intrusion and outlier detection. An application of tensor decomposition techniques to monitor multiple time series over time and detect outliers and abnormal events
  Web projections. Exploiting the structure of web graph to predict the quality of search results, user intention to reformulate queries and to find spam search results.
  Connection subgraphs and CenterPiece subgraphs.
- List 2: Check the Data Science Competitions to get Project Ideas Page
  
  Click above and Check the Data Science Competitions to get Project Ideas. For example, can you map current dengue spread into a graph, and predict the path for how the disease will spread?
- List 3: Take a Google Function Area in Graph Mining and Apply on Very Large Graphs and see what insight you can get Page
  
  You can as well do research in any of these areas to improve the algorithms and performance as well as apply on new problems/challenges.
  Google Job Areas: Large-Scale Balanced Partitioning: Example Google Maps Driving Directions, Large-Scale Clustering:clustering graphs at Google scale, Large-Scale Connected Components, Large-Scale Link Modeling: similarity ranking and centrality metrics: link prediction and anomalous link discovery., Large-Scale Similarity Ranking: Personalized PageRank, Egonet similarity, Adamic Adar, and others, Public-private Graph Computation, Streaming and Dynamic Graph Algorithms, ASYMP: Async Message Passing Graph Mining, Large-Scale Centrality Ranking, Large-Scale Graph Building
Potential Research Topics
- List 1: Research on Google Job/Function areas in Graph Mining Page
  
  Take one of these Google job/function areas in Graph Mining and improve the algorithms for performance as well as apply on new problems/challenges.
  Google Job Areas: Large-Scale Balanced Partitioning: Example Google Maps Driving Directions, Large-Scale Clustering:clustering graphs at Google scale, Large-Scale Connected Components, Large-Scale Link Modeling: similarity ranking and centrality metrics: link prediction and anomalous link discovery., Large-Scale Similarity Ranking: Personalized PageRank, Egonet similarity, Adamic Adar, and others, Public-private Graph Computation, Streaming and Dynamic Graph Algorithms, ASYMP: Async Message Passing Graph Mining, Large-Scale Centrality Ranking, Large-Scale Graph Building
Example Projects in Graph Mining
- My Implementation: Louvian Algorithm for Community Detection in Large Graph Networks URL
  
  https://github.com/sayedum/spark-implementation-louvian-modularity.git.
  This is a private repository. I might give access to it to selected participants. You have to request for it. I need to know what will you do with it. This utilized PySpark, Spark GraphFrames on Hadoop Platforms. There is a non-spark, non-parallel, Python implementation as well. With some extensions and trying to answer the right question - this has the potential to become a research publication as well.
- GraphX Implementation of Louvian Modularity Algorithm for Community Detection URL
  
  https://github.com/Sotera/spark-distributed-louvain-modularity.
  Not my implementation. GraphX is kind of older than GraphFrame.
- PageRank Algorithm Implementation URL
  
  https://www.geeksforgeeks.org/page-rank-algorithm-implementation/ . Not my implementation. I might share code blocks from my implementation.
- Centrality and Betweenness or similar concept implementation. URL
  
  https://www.geeksforgeeks.org/betweenness-centrality-centrality-measure/
  However, it will be the best, first you try to implement on your own. Better that you just don't memorize; however, try to earn the capability to convert textual concept/algorithm (mathematics) into code.
- Dijkstra’s shortest path algorithm URL
  
  https://www.geeksforgeeks.org/dijkstras-shortest-path-algorithm-greedy-algo-7/
- Floyd Warshall Algorithm | All pair Shortest Path URL
  
  https://www.geeksforgeeks.org/floyd-warshall-algorithm-dp-16/
- Karger's mincut algorithm in Python URL
  
  Details: http://goatleaps.xyz/programming/kargers-algorithm.html Code File: http://goatleaps.xyz/assets/code/kargers_mincut.py