layout | title | parent | nav_order |
---|---|---|---|
page |
Milestone 1 |
Projects |
1 |
Key dates:
- project proposals due: Sep 20th
- staff feedback: Sep 25th
For the first milestone, your team will propose a project that aligns with your personal, professional, and academic interests and passions.
Allowing you to propose your own projects, will enhance your engagement and lead to better learning outcomes. This approach will also foster your independence, critical thinking skills, and creativity, preparing you for real-world scenarios where you may be required to initiate and lead your own projects. Call on your inner data scientist and take charge of your project experience.
Platform for Team Formation:
You may use the Ed platform to find your teammates. Alternatively, you may form teams independently.
Team Registration:
Once you have finalized your team, please enter your team name and the names of team members in this shared spreadsheet.
Your Statement of Work should act as a blueprint for your project. It doesn't have to be extensive, but it should be clear and focused.
- Title: An engaging, relevant, and informative title that captures the essence of your project.
- Authors: Names of all team members and their respective email addresses.
Provide a brief background on the topic you have chosen. Explain why you find it interesting or important, and mention any previous background, research interests, or readings that have influenced your choice.
Clearly outline the problem or question your project aims to solve. Make sure the scope is well-defined so that there is no ambiguity regarding your project's objectives.
-
Length: 1-2 pages
-
Format: PDF
-
Submit via Canvas
Data is the backbone of any data science project and therefore for any MLOps project, making it crucial to identify appropriate datasets for your endeavor. In your Statement of Work, you must address the following aspects regarding data:
- Identify where the data comes from (e.g., public repository, generated by the team, etc.).
- Offer a brief overview of what the dataset contains. Is it time-series data, images, textual data, etc.?
- Describe the variables or features that are most relevant to your problem.
- Explain how the data is suited to solving the problem or question you've posed. Why is this data set useful or relevant?
- If applicable, indicate any potential challenges related to data quality that you foresee (e.g., missing data, inconsistencies, or the need to merge multiple datasets). Mention your preliminary plan to tackle these issues.
Statements of Work that do not include information on available and relevant data will not be accepted.
The scope of your project is largely up to you and your team. Whether it's simple or complex, the aim should be to align with the course's learning objectives. However, for a project to be considered comprehensive, it should ideally include a few of the following minimum components:
-
Large or Heterogeneous Data: Your project should involve a sizable or diverse dataset that requires careful handling and processing.
-
Scalability: Consider how your solution will scale for many users, particularly in the application you intend to build.
-
Complex Models: The project should explore models that are challenging to train, which will showcase your understanding of MLOps challenges.
-
Computationally Expensive Inference: If your project involves inference models, they should be computationally intensive to align with real-world challenges.
- Clearly outline the problem or question your project addresses.
- List the primary goals or outcomes, which should align with your problem statement and the minimum components outlined above.
- Opt for models and methods that your team understands. The project should reflect your grasp of course concepts.
- Include a preliminary design or sketch for the application you intend to develop. This could range from simple wireframes to a more detailed, clickable prototype.
- Reference papers, blog posts, or other scholarly materials that aid your project and align with your objectives.
- The project should also be a space for you to enjoy both the subject matter and the developmental process.
- Discuss any anticipated challenges or limitations, such as data quality issues or technical constraints.
- List key milestones for both your project and application development. Include tentative deadlines, if possible.
Statements of Work that do not include both a well-defined scope and a preliminary design for the application will not be accepted.
Deliverables: Submit a PDF of your proposal on canvas by 9:00 PM EST on Sep 20th.
Below are two samples SOW for such apps:
- [Pavlov Protovief]
- [Paolo Primopadre]
- [Pablo El Padron ]
To develop an application that can identify various species of butterflies in the wild using computer vision and offer educational content through a chatbot interface.
- Large Data: Collection and utilization of a varied and substantial dataset of butterfly images.
- Scalability: Ability to handle multiple users querying the computer vision model and chatbot simultaneously.
- Complex Models: Use of deep learning-based computer vision models for accurate species identification.
- Computationally Expensive Inference: Implementation of efficient algorithms to minimize latency during species identification.
- Collect and preprocess a diverse dataset of butterfly images.
- Develop a computer vision model to identify butterfly species.
- Implement a scalable backend to handle multiple queries.
- Design an intuitive and user-friendly frontend.
- Integrate a chatbot for answering user questions about butterflies.
The project will focus on employing a convolutional neural network for image recognition and natural language processing techniques for the chatbot, both areas covered in the course.
The application will feature two main interfaces:
- A camera interface for capturing butterfly images.
- A chatbot interface to interact with users. (Additional wireframes or mock-ups can be attached).
-
Collect and preprocess a diverse dataset of butterfly images:
- Data Source: The dataset will primarily come from public repositories like [X Dataset Source] and [Y Dataset Source]. We will also supplement this with images captured during field trips and those sourced from citizen science platforms.
- Data Attributes: Images should ideally be labeled with the species name, geographic location, and date of capture.
- Data Relevance: The comprehensiveness of the dataset is vital for training a robust computer vision model capable of identifying a wide range of butterfly species.
(Note: If acquiring new images, all team members should adhere to ethical guidelines concerning wildlife photography and data collection.)
-
Source: Data will be collated from a combination of open-source databases, user-generated content from platforms like iNaturalist, and field data collection by team members.
-
Description: The dataset will comprise images of various butterfly species, ideally annotated with species names and other metadata like geographic location and date.
-
Key Attributes: The dataset should include high-resolution images suitable for computer vision algorithms, along with associated metadata to enrich the model's understanding.
-
Relevance: The dataset is fundamental to train the computer vision model to identify different species of butterflies accurately.
-
Data Quality: We anticipate that some images might be poorly labeled or low in quality. These will either be cleaned or supplemented with additional data.
We will review literature and open-source projects related to computer vision in biological classification and natural language processing for educational chatbots.
Exploring the intersection of biology and technology, while learning about butterflies, makes this project particularly engaging.
- Possible challenges in obtaining a large and diverse enough dataset.
- Computational limitations when deploying complex models.
- Data collection and preprocessing: [Tentative Deadline]
- Computer vision model development: [Tentative Deadline]
- Backend implementation: [Tentative Deadline]
- Frontend development: [Tentative Deadline]
- Chatbot integration: [Tentative Deadline]
- Final testing and deployment: [Tentative Deadline]