# Key Concepts Source: https://nixtla.io/docs/about/key-concepts Understanding the foundations of time series forecasting with TimeGPT These key concepts cover the foundations of time series data, how forecasts are generated, and the role of TimeGPT in predicting future values and detecting anomalies. Use these concepts as a reference to better understand how TimeGPT simplifies tasks such as demand forecasting, anomaly detection, and multi-series forecasting. A sequence of numerical data points arranged in chronological order. Predicting future values by analyzing historical data and patterns. Identifying unusual or unexpected events that deviate from typical behavior. Managing and forecasting multiple time series data at once. Nixtla's generative pre-trained model for time series forecasting. Segments of historical data that inform TimeGPT's forecasting process. ## Time Series A time series is a sequence of numerical data points arranged in chronological order. In the context of TimeGPT, each data point in the series serves as input to the model. The model learns from patterns in the data and uses this understanding to forecast future values. Time series data appear in various domains, such as stock prices, weather recordings, and sales figures. ## Forecasting Forecasting is a method used in many fields—such as business and environmental studies—to predict future outcomes based on historical information. It involves analyzing past data to detect patterns, trends, or recurring behaviors and extending these insights into the future. One significant advancement in forecasting is the application of modern machine-learning methods, including deep learning. Models like TimeGPT can handle large datasets and identify complex patterns with enhanced prediction accuracy. For example, a retailer might analyze past sales to forecast product demand, while an economist uses historical data to anticipate future economic conditions. TimeGPT makes these advanced capabilities accessible even to users without in-depth machine-learning expertise. ![TimeGPT output](https://files.readme.io/f0402e5-image.png) ## Anomaly Detection Analyzing sequential data often requires identifying anomalies or unexpected events that deviate from standard patterns. TimeGPT supports anomaly detection by monitoring data sequences (such as daily temperatures) for unusual fluctuations. Detecting anomalies is crucial for timely responses. Sudden changes in market behavior, unusual network activity, or abnormal sensor readings can all indicate a need for prompt investigation. For example, in finance, TimeGPT can highlight abrupt market changes; in cybersecurity, it helps uncover suspicious network activity. Anomaly detection enhances forecasting by flagging significant outliers, improving overall data insights. ![Anomaly detection](https://files.readme.io/9655290-slice4.png) ## Multiple Series TimeGPT provides robust support for multi-series forecasting, allowing simultaneous analysis of multiple time series. Users can train the model on many related series, improving accuracy and enabling more flexible customization for specific forecasting requirements. ![Multiple series forecasting](https://files.readme.io/8b9b818-slice8.png) ## TimeGPT TimeGPT by Nixtla is a generative pre-trained model specifically designed for time series forecasting. It reviews historical series values (and optional exogenous variables) to generate predictions. Beyond forecasting, TimeGPT enables tasks like anomaly detection and financial forecasts. TimeGPT scans time series data similarly to how a person might read text: sequentially, from left to right. It can interpret historical windows (tokens) and leverage temporal patterns learned from billions of data points. With the TimeGPT API, you can access these forecasting capabilities for various potential use cases—from scenario planning to anomaly detection and beyond. ![TimeGPT API](https://files.readme.io/6f59c1b-Screenshot_2023-08-09_at_2.49.05_PM.png) ## Get Started with TimeGPT Now that you understand the key concepts, you're ready to start using TimeGPT for your forecasting needs. Learn more about TimeGPT and how it can transform your time series analysis. Get up and running with TimeGPT in minutes with our step-by-step guide. # Privacy Notice Source: https://nixtla.io/docs/about/privacy-notice Details on how Nixtla collects, uses, and protects your personal information. We at Nixtla Inc. (together with our affiliates, “**Nixtla**”, “**we**”, “**our**” or “**us**”) respect your privacy and are strongly committed to keeping secure any information we obtain from you or about you. This Privacy Policy describes our practices with respect to Personal Information we collect from or about you when you use our website, applications, and services (collectively, “**Services**”). This Privacy Policy does not apply to content that we process on behalf of customers of our business offerings, such as our API. Our use of that data is governed by our customer agreements covering access to and use of those offerings. # 1. Personal Information we collect We collect personal information relating to you (“**Personal Information**”) as follows: Personal Information You Provide: We collect Personal Information if you create an account to use our Services or communicate with us as follows: **Account Information**: When you create an account with us, we will collect information associated with your account, including your name, contact information, account credentials, payment card information, and transaction history, (collectively, “**Account Information**”). **User Content**: When you use our Services, we collect Personal Information that is included in the input, file uploads, or feedback that you provide to our Services (“**Content**”). **Communication Information**: If you communicate with us, we collect your name, contact information, and the contents of any messages you send (“**Communication Information**”). **Social Media Information**: We have pages on social media sites like Medium, Twitter, YouTube, and LinkedIn. When you interact with our social media pages, we will collect Personal Information that you elect to provide to us, such as your contact details (collectively, “**Social Information**”). In addition, the companies that host our social media pages may provide us with aggregate information and analytics about our social media activity. **Personal Information We Receive Automatically From Your Use of the Services**: When you visit, use, or interact with the Services, we receive the following information about your visit, use, or interactions (“**Technical Information**”): **Log Data**: Information that your browser automatically sends when you use our Services. Log data includes your Internet Protocol address, browser type and settings, the date and time of your request, and how you interact with our website. **Usage Data**: We may automatically collect information about your use of the Services, such as the types of content that you view or engage with, the features you use, and the actions you take, as well as your time zone, country, the dates and times of access, user agent and version, type of computer or mobile device, and your computer connection. **Device Information**: Includes name of the device, operating system, device identifiers, and browser you are using. Information collected may depend on the type of device you use and its settings. **Cookies**: We use cookies to operate and administer our Services, and improve your experience. A “cookie” is a piece of information sent to your browser by a website you visit. You can set your browser to accept all cookies, to reject all cookies, or to notify you whenever a cookie is offered so that you can decide each time whether to accept it. However, refusing a cookie may in some cases preclude you from using, or negatively affect the display or function of, a website or certain areas or features of a website. For more details on cookies, please visit All About Cookies. **Analytics**: We may use a variety of online analytics products that use cookies to help us analyze how users use our Services and enhance your experience when you use the Services. # 2. We may use Personal Information for the following purposes: 1. To provide, administer, maintain, and/or analyze the Services; 2. To improve our Services and conduct research; 3. To communicate with you; 4. To develop new programs and services; 5. To prevent fraud, criminal activity, or misuses of our Services, and to protect the security of our IT systems, architecture, and networks; 6. To carry out business transfers; and 7. To comply with legal obligations and legal processes and to protect our rights, privacy, safety, or property, and/or that of our affiliates, you, or other third parties. **Aggregated or De-Identified Information**. We may aggregate or de-identify Personal Information so that it may no longer be used to identify you and use such information to analyze the effectiveness of our Services, to improve and add features to our Services, to conduct research and for other similar purposes. In addition, from time to time, we may analyze the general behavior and characteristics of users of our Services and share aggregated information like general user statistics with third parties, publish such aggregated information or make such aggregated information generally available. We may collect aggregated information through the Services, through cookies, and through other means described in this Privacy Policy. We will maintain and use de-identified information in anonymous or de-identified form and we will not attempt to reidentify the information, unless required by law. As noted above, we may use Content you provide us to improve our Services, for example to train the models that power TimeGPT. Fill [this form](https://forms.gle/rvF58qkNCt2uNjSX8) to opt out of our use of your Content to train our models. # 3. Disclosure of personal information In certain circumstances we may provide your Personal Information to third parties without further notice to you, unless required by the law: **Vendors and Service Providers**. To assist us in meeting business operations needs and to perform certain services and functions, we may provide Personal Information to vendors and service providers, including providers of hosting services, cloud services, and other information technology services providers, email communication software, and web analytics services, among others. Pursuant to our instructions, these parties will access, process, or store Personal Information only in the course of performing their duties to us. **Business Transfers**. If we are involved in strategic transactions, reorganization, bankruptcy, receivership, or transition of service to another provider (collectively, a “**Transaction**”), your Personal Information and other information may be disclosed in the diligence process with counterparties and others assisting with the Transaction and transferred to a successor or affiliate as part of that Transaction along with other assets. **Legal Requirements**. We may share your Personal Information, including information about your interaction with our Services, with government authorities, industry peers, or other third parties (i) if required to do so by law or in the good faith belief that such action is necessary to comply with a legal obligation, (ii) to protect and defend our rights or property, (iii) if we determine, in our sole discretion, that there is a violation of our terms, policies, or the law; (iv) to detect or prevent fraud or other illegal activity; (v) to protect the safety, security, and integrity of our products, employees, or users, or the public, or (vi) to protect against legal liability. **Affiliates**. We may disclose Personal Information to our affiliates, meaning an entity that controls, is controlled by, or is under common control with Nixtla. Our affiliates may use the Personal Information we share in a manner consistent with this Privacy Policy. # 4. Your choices and controls Depending on where you live, you may have the right to exercise certain controls and choices regarding our collection, use, and sharing of your Personal Information. To opt-out of marketing communications please email us at [support@nixtla.io](mailto:support@nixtla.io) or by following the instructions included in the email or text correspondence. Please note that, even if you unsubscribe from certain correspondence, we may still need to contact you with important transactional or administrative information, as permitted by law. Additionally, if you choose not to provide certain Personal Information, we may be unable to provide some or all of our Services to you. # 5. Children Our Services are not directed to children under the age of 13. Nixtla does not knowingly collect Personal Information from children under the age of 13. If you have reason to believe that a child under the age of 13 has provided Personal Information to Nixtla through the Services, please email us at [support@nixtla.io](mailto:support@nixtla.io) We will investigate any notification and if appropriate, delete the Personal Information from our systems. If you are 13 or older, but under 18, you must have consent from your parent or guardian to use our Services. # 6. Links to other websites The Services may contain links to other websites not operated or controlled by Nixtla, including social media services (“**Third Party Sites**”). The information that you share with Third Party Sites will be governed by the specific privacy policies and terms of service of the Third Party Sites and not by this Privacy Policy. By providing these links we do not imply that we endorse or have reviewed these sites. Please contact the Third Party Sites directly for information on their privacy practices and policies. # 7. Security and Retention We implement commercially reasonable technical, administrative, and organizational measures to protect Personal Information both online and offline from loss, misuse, and unauthorized access, disclosure, alteration, or destruction. However, no Internet or email transmission is ever fully secure or error-free. In particular, emails sent to or from us may not be secure. Therefore, you should take special care in deciding what information you send to us via the Services or email. In addition, we are not responsible for circumvention of any privacy settings or security measures contained on the Services, or third-party websites. We’ll retain your Personal Information for only as long as we need in order to provide our Services to you, or for other legitimate business purposes such as resolving disputes, safety and security reasons, or complying with our legal obligations. How long we retain Personal Information will depend on a number of factors, such as the amount, nature, and sensitivity of the information, the potential risk of harm from unauthorized use or disclosure, our purpose for processing the information, and any legal requirements. # 8. Changes to the privacy policy We may update this Privacy Policy from time to time. All changes will be effective immediately upon posting to this page. Material changes will be conspicuously posted on this page or otherwise communicated to you as required by law. # 9. How to contact us Please contact us at [support@nixtla.io](mailto:support@nixtla.io) if you have any questions or concerns not already addressed in this Privacy Policy. # Nixtla Source: https://nixtla.io/docs/about/sub-categoria About us # Nixtla Nixtla is to numbers what Anthropic or Open AI are to language and images. We are the creators of TimeGPT—a pre-trained model that allows enterprises to upload their data and receive predictions within minutes. This approach saves significant money, development time, and maintenance effort. TimeGPT was trained on the largest collection of time series data in history—over 100 billion rows across financial, weather, energy, and web data. Nixtla has also built the most comprehensive time series ecosystem, with over 5 million downloads worldwide. Our software is trusted and used in production by leading companies such as Amazon, Walmart, and Lyft. We are a group of hackers driven by curiosity and a profound desire to make a meaningful impact. With backgrounds ranging from research and development to philosophy, we have united to revolutionize the time series field. We embrace diversity, champion inclusivity, and believe that the future belongs to everyone. We stand by our roots in Latin America. We are queer, we are different, and we take pride in it. Our shared passion for understanding the world guides us in pushing the boundaries of what’s possible with time series analysis. ## Our Open Source Initiatives TimeGPT is only one part of our story. Before its creation, Nixtla developed an open-source time series ecosystem that quickly flourished, garnering millions of downloads. ![Nixtla Open Source](https://files.readme.io/d1318f2-Screenshot_2023-08-04_at_3.13.41_PM.png) Our thriving open-source community is a testament to the power of collaboration. Join us in building innovative tools for time series analysis. ## Our Origin Story Nixtla began as a side project. We built tools for an old company we worked for, and then everyone took different paths—some pursued academic careers, others founded companies, and some focused on shipping products. We eventually reunited to turn what started as a modest open-source library into the most comprehensive time-series ecosystem. By challenging the status quo and giants like Facebook, Amazon, and Google, we proved how a dedicated group of passionate individuals, powered by open-source software, can successfully compete with major players. As Nixtla’s usage soared, our community grew, fueling our development. Today, Nixtla is the most impactful time series ecosystem worldwide, relied upon by innovators in both industry and academia. Recognizing this was only the beginning, we set our sights on a new challenge—pioneering foundation models for time series. This breakthrough helps us share the future of data science with everyone. ## Follow Us Connect with fellow developers, researchers, and enthusiasts in our  [Slack Channel](https://join.slack.com/t/nixtlacommunity/shared_invite/zt-1pmhan9j5-F54XR20edHk0UtYAPcW4KQ). Stay up-to-date with the latest Nixtla news and community highlights on  [Twitter](https://twitter.com/nixtlainc). Be part of our open-source evolution by contributing to Nixtla’s core projects on  [GitHub](https://github.com/Nixtla). ![Join Nixtla Community](https://files.readme.io/3f17d41-Screenshot_2024-05-04_at_3.24.35_PM.png) Together, we are not just shaping Nixtla— we are defining the future of data science. # Terms and Conditions Source: https://nixtla.io/docs/about/terms-and-conditions Terms and conditions for using Nixtla Services. Thank you for using Nixtla's TimeGPT and or TimeGEN! These Terms of Use apply when you use the services of Nixtla, Inc. or our affiliates, including our application programming interface, software, tools, developer services, data, documentation, and websites ("**Services**"). The Terms include other terms and conditions, documentation, guidelines, or policies we may provide in writing. By using our Services, you agree to these Terms. Our [Privacy Notice](/docs/about/privacy-notice) explains how we collect and use personal information. # 1. Registration and Access You must be at least 13 years old to use the Services. If you are under 18 you must have your parent or legal guardian's permission to use the Services. If you use the Services on behalf of another person or entity, you must have the authority to accept the Terms on their behalf. You must provide accurate and complete information to register for an account. You may not make your access credentials or account available to others outside your organization, and you are responsible for all activities that occur using your credentials. # 2. Usage Requirements **(a) Use of Services**. You may access, and we grant you a non-exclusive right to use, the Services in accordance with these Terms. You will comply with these Terms and all applicable laws when using the Services. We and our affiliates own all rights, title, and interest in and to the Services. **(b) Feedback**. We appreciate feedback, comments, ideas, proposals and suggestions for improvements. If you provide any of these things, we may use it without restriction or compensation to you. **(c) Restrictions**. You may not (i) use the Services in a way that infringes, misappropriates or violates any person's rights; (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Services (except to the extent such restrictions are contrary to applicable law); (iii) use output from the Services to develop models that compete with Nixtla; (iv) except as permitted through the API, use any automated or programmatic method to extract data or output from the Services, including scraping, web harvesting, or web data extraction; (v) represent that output from the Services was human-generated when it is not or otherwise violate our policies; (vi) buy, sell, or transfer API keys without our prior consent; or (vii), send us any personal information of children under 13 or the applicable age of consent. You will comply with any rate limits and other requirements in our documentation. You may use Services only in geographies currently supported by Nixtla. **(d) Third Party Services**. Any third party software, services, or other products you use in connection with the Services are subject to their own terms, and we are not responsible for third party products. # 3. Content **(a) Your Content**. You may provide input to the Services ("**Input**"), and receive output generated and returned by the Services based on the Input ("**Output**"). Input and Output are collectively ("**Content**"). As between the parties and to the extent permitted by applicable law, you own all Input. Subject to your compliance with these Terms, Nixtla hereby assigns to you all its rights, title, and interest in and to Output. This means you can use Content for any purpose, including commercial purposes such as sale or publication, if you comply with these Terms. Nixtla may use Content to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms. **(b) Use of Content to Improve Services**. In order to improve our Services, we may use Content that you provide to or receive from our API ("**API Content**") to develop or improve our Services. We may use Content from Services other than our API ("**Non-API Content**") to help develop and improve our Services. Nixtla may use aggregated, de-identified data to enhance and operate the Services and for other business activities, including creating industry benchmarks and best practice guides for users. If you do not want your Content used to improve Services, you can opt-out by filling out [this form](https://forms.gle/rvF58qkNCt2uNjSX8). In case you opt-out, we will not use the Content you provide after opt-out to train our machine-learning models or otherwise use your Content in any way to improve our Services. Please note that in some cases this may limit the ability of our Services to better address your specific use case. **(c) Accuracy**. Artificial intelligence and machine learning are rapidly evolving fields of study. We are constantly working to improve our Services to make them more accurate, reliable, safe, and beneficial. Given the probabilistic nature of machine learning, the use of our Services may in some situations result in incorrect Output. You should always evaluate the accuracy of any Output as appropriate for your use case, including by using human review of the Output. # 4. Fees and Payments **(a) Fees and Billing**. You will pay all fees charged to your account ("**Fees**") according to the prices and terms on the applicable pricing page, or as otherwise agreed between us in writing. We have the right to correct pricing errors or mistakes even if we have already issued an invoice or received payment. You will provide complete and accurate billing information including a valid and authorized payment method. We will charge your payment method on an agreed-upon periodic basis, but may reasonably change the date on which the charge is posted. You authorize Nixtla and its affiliates, and our third-party payment processor(s), to charge your payment method for the Fees. If your payment cannot be completed, we will provide you written notice and may suspend access to the Services until payment is received. Fees are payable in U.S. dollars and are due upon invoice issuance. Payments are nonrefundable except as provided in this Agreement. **(b) Taxes**. Unless otherwise stated, Fees do not include federal, state, local, and foreign taxes, duties, and other similar assessments ("**Taxes**"). You are responsible for all Taxes associated with your purchase, excluding Taxes based on our net income, and we may invoice you for such Taxes. You agree to timely pay such Taxes and provide us with documentation showing the payment, or additional evidence that we may reasonably require. Nixtla uses the name and address in your account registration as the place of supply for tax purposes, so you must keep this information accurate and up-to-date. **(c) Price Changes**. We may change our prices by posting notice to your account and/or to our website. Price increases will be effective 14 days after they are posted, except for increases made for legal reasons or increases made to Beta Services, which will be effective immediately. Any price changes will apply to the Fees charged to your account immediately after the effective date of the changes. **(d) Disputes and Late Payments**. If you want to dispute any Fees or Taxes, please contact [support@nixtla.io](mailto:support@nixtla.io) within thirty (30) days of the date of the disputed invoice. Undisputed amounts past due may be subject to a finance charge of 1.5% of the unpaid balance per month. If any amount of your Fees are past due, we may suspend your access to the Services after we provide you written notice of late payment. **(e) Free Tier**. You may not create more than one account to benefit from credits provided in the free tier of the Services. If we believe you are not using the free tier in good faith, we may charge you standard fees or stop providing access to the Services. # 5. Confidentiality, Security and Data Protection **(a) Confidentiality**. You may be given access to Confidential Information of Nixtla, its affiliates and other third parties. You may use Confidential Information only as needed to use the Services as permitted under these Terms. You may not disclose Confidential Information to any third party, and you will protect Confidential Information in the same manner that you protect your own confidential information of a similar nature, using at least reasonable care. Confidential Information means nonpublic information that Nixtla or its affiliates or third parties designate as confidential or should reasonably be considered confidential under the circumstances, including software, specifications, and other nonpublic business information. Confidential Information does not include information that: (i) is or becomes generally available to the public through no fault of yours; (ii) you already possess without any confidentiality obligations when you received it under these Terms; (iii) is rightfully disclosed to you by a third party without any confidentiality obligations; or (iv) you independently developed without using Confidential Information. You may disclose Confidential Information when required by law or the valid order of a court or other governmental authority if you give reasonable prior written notice to Nixtla and use reasonable efforts to limit the scope of disclosure, including assisting us with challenging the disclosure requirement, in each case where possible. **(b) Security**. You must implement reasonable and appropriate measures designed to help secure your access to and use of the Services. If you discover any vulnerabilities or breaches related to your use of the Services, you must promptly contact Nixtla and provide details of the vulnerability or breach. **(c) Processing of Personal Data**. If you use the Services to process personal data, you must provide legally adequate privacy notices and obtain necessary consents for the processing of such data, and you represent to us that you are processing such data in accordance with applicable law. # 6. Term and Termination **(a) Termination; Suspension**. These Terms take effect when you first use the Services and remain in effect until terminated. You may terminate these Terms at any time for any reason by discontinuing the use of the Services and Content. We may terminate these Terms for any reason by providing you at least 30 days' advance notice. We may terminate these Terms immediately upon notice to you if you materially breach Sections 2 (Usage Requirements), 5 (Confidentiality, Security and Data Protection), 8 (Dispute Resolution) or 9 (General Terms), if there are changes in relationships with third-party technology providers outside of our control, or to comply with law or government requests. We may suspend your access to the Services if you do not comply with these Terms, if your use poses a security risk to us or any third party, or if we suspect that your use is fraudulent or could subject us or any third party to liability. **(b) Effect on Termination**. Upon termination, you will stop using the Services and you will promptly return or, if instructed by us, destroy any Confidential Information. The sections of these Terms which by their nature should survive termination or expiration should survive, including but not limited to Sections 3 and 5-9. # 7. Indemnification; Disclaimer of Warranties; Limitations on Liability **(a) Indemnity**. You will defend, indemnify, and hold harmless us, our affiliates, and our personnel, from and against any claims, losses, and expenses (including attorneys' fees) arising from or relating to your use of the Services, including your Content, products or services you develop or offer in connection with the Services, and your breach of these Terms or violation of applicable law. **(b) Disclaimer**. THE SERVICES ARE PROVIDED "AS IS." EXCEPT TO THE EXTENT PROHIBITED BY LAW, WE AND OUR AFFILIATES AND LICENSORS MAKE NO WARRANTIES (EXPRESS, IMPLIED, STATUTORY OR OTHERWISE) WITH RESPECT TO THE SERVICES, AND DISCLAIM ALL WARRANTIES INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, SATISFACTORY QUALITY, NON-INFRINGEMENT, AND QUIET ENJOYMENT, AND ANY WARRANTIES ARISING OUT OF ANY COURSE OF DEALING OR TRADE USAGE. WE DO NOT WARRANT THAT THE SERVICES WILL BE UNINTERRUPTED, ACCURATE OR ERROR FREE, OR THAT ANY CONTENT WILL BE SECURE OR NOT LOST OR ALTERED. **(c) Limitations of Liability**. NEITHER WE NOR ANY OF OUR AFFILIATES OR LICENSORS WILL BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR EXEMPLARY DAMAGES, INCLUDING DAMAGES FOR LOSS OF PROFITS, GOODWILL, USE, OR DATA OR OTHER LOSSES, EVEN IF WE HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. OUR AGGREGATE LIABILITY UNDER THESE TERMS SHALL NOT EXCEED THE GREATER OF THE AMOUNT YOU PAID FOR THE SERVICE THAT GAVE RISE TO THE CLAIM DURING THE 12 MONTHS BEFORE THE LIABILITY AROSE OR ONE HUNDRED DOLLARS (\$100). THE LIMITATIONS IN THIS SECTION APPLY ONLY TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW. # 8. Dispute Resolution YOU AGREE TO THE FOLLOWING MANDATORY ARBITRATION AND CLASS ACTION WAIVER PROVISIONS: **(a) MANDATORY ARBITRATION**. You and Nixtla agree to resolve any past or present claims relating to these Terms or our Services through final and binding arbitration, except that you have the right to opt out of these arbitration terms, and future changes to these arbitration terms, by emailing [support@nixtla.io](mailto:support@nixtla.io) within 30 days of agreeing to these arbitration terms or the relevant changes. **(b) Informal Dispute Resolution**. We would like to understand and try to address your concerns prior to formal legal action. Before filing a claim against Nixtla, you agree to try to resolve the dispute informally by sending us notice at [support@nixtla.io](mailto:support@nixtla.io) of your name, a description of the dispute, and the relief you seek. If we are unable to resolve a dispute within 60 days, you may bring a formal proceeding. Any statute of limitations will be tolled during the 60-day resolution process. If you reside in the EU, the European Commission provides for an online dispute resolution platform, which you can access at [https://ec.europa.eu/consumers/odr](https://ec.europa.eu/consumers/odr). **(c) Arbitration Forum**. Either party may commence binding arbitration through ADR Services, an alternative dispute resolution provider. The parties will pay equal shares of the arbitration fees. If the arbitrator finds that you cannot afford to pay the arbitration fees and cannot obtain a waiver, Nixtla will pay them for you. Nixtla will not seek its attorneys' fees and costs in arbitration unless the arbitrator determines that your claim is frivolous. **(d) Arbitration Procedures**. The arbitration will be conducted by telephone, based on written submissions, via video conference, or in person in San Francisco, California, or at another mutually agreed location. The arbitration will be conducted by a sole arbitrator by ADR Services under its then-prevailing rules. All issues are for the arbitrator to decide, except a California court has the authority to determine (i) the scope, enforceability, and arbitrability of this Section 8, including the mass filing procedures below, and (ii) whether you have complied with the pre-arbitration requirements in this section. The amount of any settlement offer will not be disclosed to the arbitrator by either party until after the arbitrator determines the final award, if any. **(e). Exceptions**. This arbitration section does not require arbitration of the following claims: (i) individual claims brought in small claims court; and (ii) injunctive or other equitable relief to stop unauthorized use or abuse of the Services or intellectual property infringement. **(f) NO CLASS ACTIONS**. Disputes must be brought on an individual basis only, and may not be brought as a plaintiff or class member in any purported class, consolidated, or representative proceeding. Class arbitrations, class actions, private attorney general actions, and consolidation with other arbitrations are not allowed. If for any reason a dispute proceeds in court rather than through arbitration, each party knowingly and irrevocably waives any right to trial by jury in any action, proceeding, or counterclaim. This does not prevent either party from participating in a class-wide settlement of claims. **(g) Mass Filings**. If, at any time, 30 or more similar demands for arbitration are asserted against Nixtla or related parties by the same or coordinated counsel or entities ("**Mass Filing**"), ADR Services will randomly assign sequential numbers to each of the Mass Filings. Claims numbered 1-10 will be the "Initial Test Cases" and will proceed to arbitration first. The arbitrators will render a final award for the Initial Test Cases within 120 days of the initial pre-hearing conference, unless the claims are resolved in advance or the parties agree to extend the deadline. The parties will then have 90 days (the "**Mediation Period**") to resolve the remaining cases in mediation based on the awards from the Initial Test Cases. If the parties are unable to resolve the outstanding claims during this time, the parties may choose to opt out of the arbitration process and proceed in court by providing written notice to the other party within 60 days after the Mediation Period. Otherwise, the remaining cases will be arbitrated in their assigned order. Any statute of limitations will be tolled from the time the Initial Test Cases are chosen until your case is chosen as described above. **(h) Severability**. If any part of this Section 8 is found to be illegal or unenforceable, the remainder will remain in effect, except that if a finding of partial illegality or unenforceability would allow Mass Filing or class or representative arbitration, this Section 8 will be unenforceable in its entirety. Nothing in this section will be deemed to waive or otherwise limit the right to seek public injunctive relief or any other non-waivable right, pending a ruling on the substance of such claim from the arbitrator. # 9. General Terms **(a) Relationship of the Parties**. These Terms do not create a partnership, joint venture, or agency relationship between you and Nixtla or any of Nixtla's affiliates. Nixtla and you are independent contractors and neither party will have the power to bind the other or to incur obligations on the other's behalf without the other party's prior written consent. **(b) Use of Brands**. You may not use Nixtla's or any of its affiliates' names, logos, or trademarks, without our prior written consent. **(c) U.S. Federal Agency Entities**. The Services were developed solely at private expense and are commercial computer software and related documentation within the meaning of the applicable U.S. Federal Acquisition Regulation and agency supplements thereto. **(d) Copyright Complaints**. If you believe that your intellectual property rights have been infringed, please send notice to the address below or fill out [this form](https://forms.gle/N3xmuZss1Y7rrb889). We may delete or disable content alleged to be infringing and may terminate accounts of repeat infringers. Written claims concerning copyright infringement must include the following information: 1. A physical or electronic signature of the person authorized to act on behalf of the owner of the copyright interest; 2. A description of the copyrighted work that you claim has been infringed upon; 3. A description of where the material that you claim is infringing is located on the site; 4. Your address, telephone number, and e-mail address; 5. A statement by you that you have a good-faith belief that the disputed use is not authorized by the copyright owner, its agent, or the law; and 6. A statement by you, made under penalty of perjury, that the above information in your notice is accurate and that you are the copyright owner or authorized to act on the copyright owner's behalf. **(e) Assignment and Delegation**. You may not assign or delegate any rights or obligations under these Terms, including in connection with a change of control. Any purported assignment and delegation shall be null and void. We may assign these Terms in connection with a merger, acquisition, or sale of all or substantially all of our assets, or to any affiliate or as part of a corporate reorganization. **(f) Modifications**. We may amend these Terms from time to time by posting a revised version on the website, or if an update materially adversely affects your rights or obligations under these Terms we will provide notice to you either by emailing the email associated with your account or providing an in-product notification. Those changes will become effective no sooner than 30 days after we notify you. All other changes will be effective immediately. Your continued use of the Services after any change means you agree to such change. **(g) Notices**. All notices will be in writing. We may notify you using the registration information you provided or the email address associated with your use of the Services. Service will be deemed given on the date of receipt if delivered by email or on the date sent via courier if delivered by post. Nixtla accepts service of process at this address: Nixtla, Inc. 166 Geary Str 15th FL #1056 San Francisco, CA 94108 United States. Attn: Nixtla, Inc. - [support@nixtla.io](mailto:support@nixtla.io) **(h) Waiver and Severability**. If you do not comply with these Terms, and Nixtla does not take action right away, this does not mean Nixtla is giving up any of our rights. Except as provided in Section 8, if any part of these Terms is determined to be invalid or unenforceable by a court of competent jurisdiction, that term will be enforced to the maximum extent permissible and it will not affect the enforceability of any other terms. **(i) Export Controls**. The Services may not be used in or for the benefit of, exported, or re-exported (a) into any U.S. embargoed countries (collectively, the "**Embargoed Countries**") or (b) to anyone on the U.S. Treasury Department's list of Specially Designated Nationals, any other restricted party lists (existing now or in the future) identified by the Office of Foreign Asset Control, or the U.S. Department of Commerce Denied Persons List or Entity List, or any other restricted party lists (collectively, "**Restricted Party Lists**"). You represent and warrant that you are not located in any Embargoed Countries and not on any such restricted party lists. You must comply with all applicable laws related to Embargoed Countries or Restricted Party Lists, including any requirements or obligations to know your end users directly. **(j) Equitable Remedies**. You acknowledge that if you violate or breach these Terms, it may cause irreparable harm to Nixtla and its affiliates, and Nixtla shall have the right to seek injunctive relief against you in addition to any other legal remedies. **(k) Entire Agreement**. These Terms and any policies incorporated in these Terms contain the entire agreement between you and Nixtla regarding the use of the Services and, other than any Service specific terms of use or any applicable enterprise agreements, supersedes any prior or contemporaneous agreements, communications, or understandings between you and Nixtla on that subject. **(l) Jurisdiction, Venue and Choice of Law**. These Terms will be governed by the laws of the State of California, excluding California's conflicts of law rules or principles. Except as provided in the "Dispute Resolution" section, all claims arising out of or relating to these Terms will be brought exclusively in the federal or state courts of San Francisco County, California, USA. # Add Exogenous Variables Source: https://nixtla.io/docs/anomaly_detection/exogenous_variables Learn how to improve anomaly detection by incorporating external factors. ## Why Use Exogenous Variables? Including relevant exogenous variables can greatly improve anomaly detection, especially for time series influenced by external factors such as weather or market indicators. Key benefits of using exogenous variables: * Improve anomaly detection accuracy * Enhance model interpretability * Provide additional context for anomaly detection ## How to Use Exogenous Variables [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/historical-anomaly-detection/02_anomaly_exogenous.ipynb) ### Step 1: Set Up Data and Client Follow the steps in the [historical anomaly detection tutorial](/docs/anomaly_detection/historical_anomaly_detection) to set up the data and client. ### Step 2: Detect Anomalies with Exogenous Features Use the `detect_anomalies` method to identify anomalies. The method will automatically detect and utilize any exogenous features present in your DataFrame: ```python theme={null} anomalies_df = nixtla_client.detect_anomalies( df=df, time_col='ds', target_col='y' ) ``` ### Step 3: Add Date Features (Optional) Adding date features is a powerful way to enrich datasets for historical anomaly detection—especially when external exogenous variables are unavailable. By passing date components like `['month', 'year']` and enabling `date_features_to_one_hot=True`, TimeGPT automatically encodes these as one-hot vectors. This allows the model to better detect seasonal patterns, calendar effects, and periodic anomalies. ```python theme={null} anomalies_df = nixtla_client.detect_anomalies( df=df, time_col='ds', target_col='y', date_features=['month', 'year'], date_features_to_one_hot=True ) ``` ### Step 4: Visualize Anomalies Use the `plot` method to visualize the detected anomalies in the time series data. ```python theme={null} nixtla_client.plot(df, anomalies_df) ``` ![Detected anomalies in time series with exogenous variables](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/historical-anomaly-detection/02_anomaly_exogenous_files/figure-markdown_strict/cell-11-output-2.png) The plot shows the time series with detected anomalies marked in red. The blue line represents the actual values, while the shaded area indicates the confidence interval. Points that fall outside this interval are flagged as anomalies. ### Step 5: Inspect Model Weights (Optional) Use the `weights_x` method to view the relative weights of the exogenous features to understand their impact: ```python theme={null} nixtla_client.weights_x.plot.barh( x='features', y='weights' ) ``` ![Weights of exogenous date features](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/historical-anomaly-detection/02_anomaly_exogenous_files/figure-markdown_strict/cell-12-output-1.png) The horizontal bar plot shows the relative importance of each exogenous feature in the anomaly detection model. Features with larger weights have a stronger influence on the model's predictions. This visualization helps identify which external factors are most significant in determining anomalies in your time series. # Quickstart Source: https://nixtla.io/docs/anomaly_detection/historical_anomaly_detection Get started with TimeGPT's historical anomaly detection capabilities. * Understand how TimeGPT detects anomalies in historical time series. * How to setup and detect anomalies with TimeGPT. * How to plot and interpret identified anomalies. * Quickly identify outliers in large time series. * Improve decision-making by focusing on unusual data points. * Automate anomaly alerts to save time and resources. ## What Is Historical Anomaly Detection? Historical anomaly detection is a technique that identifies data points that significantly deviate from expected patterns in a time series. This technique is useful for uncovering potential fraud, security breaches, or other unusual events. ## Overview of TimeGPT's Historical Anomaly Detection TimeGPT's historical anomaly detection works by: 1. Generating predictions for future values (or reconstructing missing values) within your historical time series. 2. Constructing a confidence interval based on the model's predictions. 3. Flagging any historical data point that falls outside your chosen confidence interval as an anomaly. ## Quickstart Example You'll learn how historical anomaly detection works—illustrated through an example analyzing daily visits to the Wikipedia page of Peyton Manning. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/historical-anomaly-detection/01_quickstart.ipynb) ### Step 1: Import Packages and Create a NixtlaClient Instance We'll start by importing required packages and setting up our API key. ```python theme={null} import pandas as pd from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Load the Dataset This dataset tracks the daily visits to the Wikipedia page of Peyton Manning. ```python theme={null} df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv') df.head() ``` | | unique\_id | ds | y | | - | ---------- | ---------- | -------- | | 0 | 0 | 2007-12-10 | 9.590761 | | 1 | 0 | 2007-12-11 | 8.519590 | | 2 | 0 | 2007-12-12 | 8.183677 | | 3 | 0 | 2007-12-13 | 8.072467 | | 4 | 0 | 2007-12-14 | 7.893572 | ### Step 3: Visualize the Data You can visualize the time series with the following command: ```python theme={null} nixtla_client.plot(df, max_insample_length=365) ``` ![Data plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-11-output-1.png) ### Step 4: Perform Anomaly Detection By default, TimeGPT uses a 99% confidence interval. Points outside this interval are flagged as anomalies. ```python theme={null} anomalies_df = nixtla_client.detect_anomalies(df, freq='D') anomalies_df.head() ``` | | unique\_id | ds | y | TimeGPT | TimeGPT-hi-99 | TimeGPT-lo-99 | anomaly | | - | ---------- | ---------- | --------- | -------- | ------------- | ------------- | ------- | | 0 | 0 | 2008-01-10 | 8.281724 | 8.224187 | 9.503586 | 6.944788 | False | | 1 | 0 | 2008-01-11 | 8.292799 | 8.151533 | 9.430932 | 6.872135 | False | | 2 | 0 | 2008-01-12 | 8.199189 | 8.127243 | 9.406642 | 6.847845 | False | | 3 | 0 | 2008-01-13 | 9.996522 | 8.917259 | 10.196658 | 7.637861 | False | | 4 | 0 | 2008-01-14 | 10.127071 | 9.002326 | 10.281725 | 7.722928 | False | A `False` anomaly value indicates a normal data point; `True` identifies an outlier. ### Step 5: Review Anomalies ```python theme={null} nixtla_client.plot(df, anomalies_df) ``` ![Anomalies plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/20_anomaly_detection_files/figure-markdown_strict/cell-13-output-1.png) ### Step 6: Inspect and Iterate Inspect the anomalies flagged by the model. These points are potential indicators of significant deviations in your data.If you find that the model is overly sensitive or missing critical outliers, adjust the confidence interval or include additional features (e.g., exogenous data, date features) to improve detection accuracy. Congratulations! You've successfully performed anomaly detection using TimeGPT. You can now start experimenting with this example or apply it to your own data. For advanced tips on improving detection performance, explore the following sections on using exogenous variables and adjusting confidence intervals. # Controlling the Anomaly Detection Process Source: https://nixtla.io/docs/anomaly_detection/real-time/adjusting_detection Learn how to tune TimeGPT's anomaly detection parameters for optimal accuracy. Step-by-step guide to adjusting detection_size, level, confidence intervals, and fine-tuning strategies with Python code examples. ## Overview Fine-tuning anomaly detection parameters is essential for reducing false positives and improving detection accuracy in time series data. This guide shows you how to optimize TimeGPT's `detect_anomalies_online` method by adjusting key parameters like detection sensitivity, window sizes, and model fine-tuning options. For an introduction to real-time anomaly detection, see our [Real-Time Anomaly Detection guide](/docs/anomaly_detection/real-time/introduction). To understand local vs global detection strategies, check out [Local vs Global Anomaly Detection](/docs/anomaly_detection/real-time/univariate_multivariate). ## Why Parameter Tuning Matters TimeGPT leverages forecast errors to identify anomalies in your time-series data. By optimizing parameters, you can detect subtle deviations, reduce false positives, and customize results for specific use cases. ## Key Parameters for Anomaly Detection TimeGPT's anomaly detection can be controlled through three primary parameters: * **detection\_size**: Controls the data window size for threshold calculation, determining how much historical context is used * **level**: Sets confidence intervals for anomaly thresholds (e.g., 80%, 95%, 99%), controlling detection sensitivity * **freq**: Aligns detection with data frequency (e.g., 'D' for daily, 'H' for hourly, 'min' for minute-level data) ## Common Use Cases Adjusting anomaly detection parameters is crucial for: * **Reducing false positives** in noisy time series data * **Increasing sensitivity** to detect subtle anomalies * **Optimizing detection** for different data frequencies (hourly, daily, weekly) * **Improving accuracy** through model fine-tuning with custom loss functions ## How to Adjust the Anomaly Detection Process [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/02_adjusting_detection_process.ipynb) ### Step 1: Install and Import Dependencies In your environment, install and import the necessary libraries: ```python theme={null} import pandas as pd from nixtla import NixtlaClient import matplotlib.pyplot as plt ``` ### Step 2: Initialize the Nixtla Client Create an instance of NixtlaClient with your API key: ```python theme={null} nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla') ``` ### Step 3: Conduct a baseline detection Load a portion of the Peyton Manning dataset to illustrate the default anomaly detection process. We use the Peyton Manning Wikipedia page views dataset to demonstrate parameter tuning on real-world data with natural anomalies and trends. ```python theme={null} df = pd.read_csv( 'https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv', parse_dates=['ds'] ).tail(200) df.head() ``` | x | unique\_id | ds | y | | ---- | ---------- | ---------- | -------- | | 2764 | 0 | 2015-07-05 | 6.499787 | | 2765 | 0 | 2015-07-06 | 6.859615 | | 2766 | 0 | 2015-07-07 | 6.881411 | | 2767 | 0 | 2015-07-08 | 6.997596 | | 2768 | 0 | 2015-07-09 | 7.152269 | Set a baseline by using only the default parameters of the method. ```python theme={null} anomaly_df = nixtla_client.detect_anomalies_online( df, freq='D', h=14, level=80, detection_size=150 ) ``` ```bash Baseline Detection Log Output theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold... INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint... ``` ![Baseline Anomaly Detection Visualization](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/02_adjusting_detection_process_files/figure-markdown_strict/cell-13-output-1.png) ### Step 4: Fine-tuned detection TimeGPT detects anomalies based on forecast errors. By improving your model's forecasts, you can strengthen anomaly detection performance. The following parameters can be fine-tuned: * **finetune\_steps**: Number of additional training iterations * **finetune\_depth**: Depth level for refining the model * **finetune\_loss**: Loss function used during fine-tuning ```python theme={null} anomaly_online_ft = nixtla_client.detect_anomalies_online( df, freq='D', h=14, level=80, detection_size=150, finetune_steps=10, finetune_depth=2, finetune_loss='mae' ) ``` ```bash Fine-tuned Detection Log Output theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold... INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint... ``` ![Fine-tuned TimeGPT Anomaly Detection](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/02_adjusting_detection_process_files/figure-markdown_strict/cell-15-output-1.png) From the plot above, we can see that fewer anomalies were detected by the model, since the fine-tuning process helps TimeGPT better forecast the series. ### Step 5: Adjusting Forecast Horizon and Step Size Similar to cross-validation, the anomaly detection method generates forecasts for historical data by splitting the time series into multiple windows. The way these windows are defined can impact the anomaly detection results. Two key parameters control this process: * `h`: Specifies how many steps into the future the forecast is made for each window. * `step_size`: Determines the interval between the starting points of consecutive windows. Note that when `step_size` is smaller than `h`, then we get overlapping windows. This can make the detection process more robust, as TimeGPT will see the same time step more than once. However, this comes with a computational cost, since the same time step will be predicted more than once. ```python theme={null} anomaly_df_horizon = nixtla_client.detect_anomalies_online( df, time_col='ds', target_col='y', freq='D', h=2, step_size=1, level=80, detection_size=150 ) ``` ![Adjusted Horizon and Step Size Visualization](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/02_adjusting_detection_process_files/figure-markdown_strict/cell-17-output-1.png) **Choosing `h` and `step_size`** depends on the nature of your data: * Frequent or short anomalies: Use smaller `h` and `step_size` * Smooth or longer trends: Choose larger `h` and `step_size` ## Summary You've learned how to control TimeGPT's anomaly detection process through: 1. **Baseline detection** using default parameters 2. **Fine-tuning** with custom training iterations and loss functions 3. **Window adjustment** using forecast horizon and step size parameters Experiment with these parameters to optimize detection for your specific use case and data patterns. ## Frequently Asked Questions **How do I reduce false positives in anomaly detection?** Increase the `level` parameter (e.g., from 80 to 95 or 99) to make detection stricter, or use fine-tuning parameters like `finetune_steps` to improve forecast accuracy. **What's the difference between detection\_size and step\_size?** `detection_size` determines how many data points to analyze, while `step_size` controls the interval between detection windows when using overlapping windows. **When should I use fine-tuning for anomaly detection?** Use fine-tuning when you have domain-specific patterns or when baseline detection produces too many false positives. Fine-tuning helps TimeGPT better understand your specific time series characteristics. **How does overlapping windows improve detection?** When `step_size` \< `h`, TimeGPT analyzes the same time steps multiple times from different perspectives, making detection more robust but requiring more computation. # Online (Real-Time) Anomaly Detection Source: https://nixtla.io/docs/anomaly_detection/real-time/introduction Learn how to detect anomalies in real-time streaming data using TimeGPT's detect_anomalies_online method. Complete Python tutorial with code examples for monitoring server logs, IoT sensors, and live data streams. ## Overview Real-time anomaly detection enables you to identify unusual patterns in streaming time series data instantly—essential for monitoring server performance, detecting fraud, identifying system failures, and tracking IoT sensor anomalies. TimeGPT's `detect_anomalies_online` method provides: * **Flexible Control**: Fine-tune detection sensitivity and confidence levels * **Local & Global Detection**: Analyze individual series or detect system-wide anomalies across multiple correlated metrics * **Stream Processing**: Monitor live data feeds with rolling window analysis ## Common Use Cases * **Server Monitoring**: Detect CPU spikes, memory leaks, and downtime * **IoT Sensors**: Identify equipment failures and sensor malfunctions * **Fraud Detection**: Flag suspicious transactions in real-time * **Application Performance**: Monitor API response times and error rates ## Quick Start [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/01_quickstart.ipynb) ### Step 1: Set up your environment Initialize your Python environment by importing the required libraries: ```python theme={null} import pandas as pd from nixtla import NixtlaClient import matplotlib.pyplot as plt ``` ### Step 2: Configure your NixtlaClient Provide your API key (and optionally a custom base URL). ```python theme={null} nixtla_client = NixtlaClient( # defaults to os.environ.get("NIXTLA_API_KEY") api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 3: Load your dataset We use a minute-level time series dataset that monitors server usage. This dataset is ideal for showcasing streaming data scenarios, where the task is to detect server failures or downtime in real time. ```python theme={null} df = pd.read_csv( 'https://datasets-nixtla.s3.us-east-1.amazonaws.com/machine-1-1.csv', parse_dates=['ts'] ) ``` We observe that the time series remains stable during the initial period; however, a spike occurs in the last 20 steps, indicating anomalous behavior. Our goal is to capture this abnormal jump as soon as it appears. ![Server Data with Spike Anomaly](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/01_quickstart_files/figure-markdown_strict/cell-11-output-1.png) ### Step 4: Detect anomalies in real time The `detect_anomalies_online` method detects anomalies in a time series leveraging TimeGPT's forecast power. It uses the forecast error in deciding the anomalous step so you can specify and tune the parameters like that of the `forecast` method. This function will return a dataframe that contains anomaly flags and anomaly score (its absolute value quantifies the abnormality of the value). To perform real-time anomaly detection, set the following parameters: * `df`: A pandas DataFrame containing the time series data. * `time_col`: The column that identifies the datestamp. * `target_col`: The variable to forecast. * `h`: Horizon is the number of steps ahead to make a forecast. * `freq`: The frequency of the time series in Pandas format. * `level`: Percentile of scores distribution at which the threshold is set, controlling how strictly anomalies are flagged. Default at 99%. * `detection_size`: The number of steps to analyze for anomaly at the end of time series. ```python theme={null} anomaly_online = nixtla_client.detect_anomalies_online( df, time_col='ts', target_col='y', freq='min', # Specify the frequency of the data h=10, # Specify the forecast horizon level=99, # Set the confidence level for anomaly detection detection_size=100 # Number of steps to analyze for anomalies ) anomaly_online.tail() ``` ```bash Log Output theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint... ``` View last 5 anomaly detections: | unique\_id | ts | y | TimeGPT | anomaly | anomaly\_score | TimeGPT-hi-99 | TimeGPT-lo-99 | | ------------------ | ------------------- | -------- | -------- | ------- | -------------- | ------------- | ------------- | | machine-1-1\_y\_29 | 2020-02-01 22:11:00 | 0.606017 | 0.544625 | True | 18.463266 | 0.553161 | 0.536090 | | machine-1-1\_y\_29 | 2020-02-01 22:12:00 | 0.044413 | 0.570869 | True | -158.933850 | 0.579404 | 0.562333 | | machine-1-1\_y\_29 | 2020-02-01 22:13:00 | 0.038682 | 0.560303 | True | -157.474880 | 0.568839 | 0.551767 | | machine-1-1\_y\_29 | 2020-02-01 22:14:00 | 0.024355 | 0.521797 | True | -150.178240 | 0.530333 | 0.513261 | | machine-1-1\_y\_29 | 2020-02-01 22:15:00 | 0.044413 | 0.467860 | True | -127.848560 | 0.476396 | 0.459325 | ![Identified Anomalies](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/01_quickstart_files/figure-markdown_strict/cell-13-output-1.png) From the plot, we observe that the anomalous period is promptly detected. Here we use a detection size of 100 to illustrate the anomaly detection process. In production, running detections more frequently with smaller detection sizes can help identify anomalies as soon as they occur. ## Frequently Asked Questions **What's the difference between online and historical anomaly detection?** Online detection analyzes recent data windows for immediate alerting, while historical detection analyzes complete datasets for pattern discovery. **Can I adjust detection sensitivity?** Yes, tune the `level` parameter (confidence threshold) and `detection_size` (analysis window) to control false positive rates. ## Next Steps Now that you've detected your first anomalies in real-time, explore these guides to optimize your detection: * [Controlling the Anomaly Detection Process](/docs/anomaly_detection/real-time/adjusting_detection) - Learn how to fine-tune key parameters for more accurate detection * [Local vs Global Anomaly Detection](/docs/anomaly_detection/real-time/univariate_multivariate) - Choose the right detection strategy for single vs multiple correlated time series # Local vs Global Anomaly Detection Source: https://nixtla.io/docs/anomaly_detection/real-time/univariate_multivariate Compare local vs global anomaly detection methods for time series. Learn when to use univariate detection for independent metrics vs multivariate detection for correlated server data with Python examples. ## Overview When monitoring multiple time series simultaneously, such as server metrics (CPU, memory, disk I/O), you need to choose between local and global anomaly detection strategies. This guide demonstrates: * **Local (Univariate) Detection**: Analyzing each time series independently for isolated metric anomalies * **Global (Multivariate) Detection**: Analyzing all time series collectively to detect system-wide failures Both methods use TimeGPT's `detect_anomalies_online` with the `threshold_method` parameter. The main difference is whether anomalies are identified individually per series (local) or collectively across multiple correlated series (global). For an introduction to real-time anomaly detection, see our [Real-Time Anomaly Detection guide](/docs/anomaly_detection/real-time/introduction). To learn about parameter tuning, check out [Controlling the Anomaly Detection Process](/docs/anomaly_detection/real-time/adjusting_detection). ## When to Use Each Method ### Use Local Detection When: * Monitoring independent, uncorrelated metrics * Each metric has distinct baseline behavior * You need low computational overhead * False positives in individual series are acceptable ### Use Global Detection When: * Monitoring correlated server or system metrics * System-wide failures affect multiple metrics simultaneously * You need to detect coordinated anomalies (e.g., CPU spike + memory spike + network spike) * Reducing false positives by considering metric relationships ## How to Detect Anomalies Across Multiple Time Series [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/03_univariate_vs_multivariate_anomaly_detection.ipynb) ### Step 1: Set Up Your Environment Import dependencies that you will use in the tutorial. ```python theme={null} import numpy as np import pandas as pd import matplotlib.pyplot as plt from nixtla import NixtlaClient ``` Create a NixtlaClient instance. Replace 'my\_api\_key\_provided\_by\_nixtla' with your actual API key. ```python theme={null} nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load the Dataset This tutorial uses the SMD (Server Machine Dataset), a benchmark dataset for anomaly detection across multiple time series. SMD monitors abnormal patterns in server machine data. We analyze monitoring data from a single server (machine-1-1) containing 38 time series. Each series represents a different server metric: CPU usage, memory usage, disk I/O, network throughput, and other system performance indicators. ```python theme={null} df = pd.read_csv( 'https://datasets-nixtla.s3.us-east-1.amazonaws.com/SMD_test.csv', parse_dates=['ts'] ) df.unique_id.nunique() ``` Output: ```bash theme={null} 38 ``` ### Step 3: Local and Global Anomaly Detection Methods #### Method Comparison | Aspect | Local (Univariate) | Global (Multivariate) | | ------------------------- | ------------------------------- | --------------------------------- | | **Analysis Scope** | Individual series | All series collectively | | **Best For** | Independent metrics | Correlated metrics | | **Computational Cost** | Low | Higher | | **System-wide Anomalies** | May miss | Detects effectively | | **Parameter** | `threshold_method='univariate'` | `threshold_method='multivariate'` | #### Step 3.1: Local Method Local anomaly detection analyzes each time series in isolation, flagging anomalies based on each series' individual deviation from its expected behavior. This approach is efficient for individual metrics or when correlations between metrics are not relevant. However, it may miss large-scale, system-wide anomalies that are only apparent when multiple series deviate simultaneously. Example usage: ```python theme={null} anomaly_online = nixtla_client.detect_anomalies_online( df[['ts', 'y', 'unique_id']], time_col='ts', target_col='y', freq='h', h=24, level=95, detection_size=475, threshold_method='univariate' # local anomaly detection ) ``` Log output: ```bash theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold... INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint... ``` Visualize the anomalies: ```python theme={null} # Utility function to plot anomalies def plot_anomalies(df, unique_ids, rows, cols): fig, axes = plt.subplots(rows, cols, figsize=(12, rows * 2)) for i, (ax, uid) in enumerate(zip(axes.flatten(), unique_ids)): filtered_df = df[df['unique_id'] == uid] ax.plot(filtered_df['ts'], filtered_df['y'], color='navy', alpha=0.8, label='y') ax.plot(filtered_df['ts'], filtered_df['TimeGPT'], color='orchid', alpha=0.7, label='TimeGPT') ax.scatter( filtered_df.loc[filtered_df['anomaly'] == 1, 'ts'], filtered_df.loc[filtered_df['anomaly'] == 1, 'y'], color='orchid', label='Anomalies Detected' ) ax.set_title(f"Unique_id: {uid}", fontsize=8) ax.tick_params(axis='x', labelsize=6) fig.legend(loc='upper center', ncol=3, fontsize=8, labels=['y', 'TimeGPT', 'Anomaly']) plt.tight_layout(rect=[0, 0, 1, 0.95]) plt.show() display_ids = ['machine-1-1_y_0', 'machine-1-1_y_1', 'machine-1-1_y_6', 'machine-1-1_y_29'] plot_anomalies(anomaly_online, display_ids, rows=2, cols=2) ``` ![Local Anomaly Detection Results](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/03_univariate_vs_multivariate_anomaly_detection_files/figure-markdown_strict/cell-13-output-1.png) *This figure highlights anomalies detected in four selected metrics. Each metric is analyzed independently, so anomalies reflect unusual behavior within that series alone.* #### Step 3.2: Global Method Global anomaly detection considers all time series collectively, flagging a time step as anomalous if the aggregate deviation across all series at that time exceeds a threshold. This approach captures systemic or correlated anomalies that might be missed when analyzing each series in isolation. However, it comes with slightly higher complexity and computational overhead, and may require careful threshold tuning. Example usage: ```python theme={null} anomaly_online_multi = nixtla_client.detect_anomalies_online( df[['ts', 'y', 'unique_id']], time_col='ts', target_col='y', freq='h', h=24, level=95, detection_size=475, threshold_method='multivariate' # global anomaly detection ) ``` Log output: ```bash theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold... INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint... ``` Visualize the anomalies: ```python theme={null} plot_anomalies(anomaly_online_multi, display_ids, rows=2, cols=2) ``` ![Global Anomaly Detection Results](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/online-anomaly-detection/03_univariate_vs_multivariate_anomaly_detection_files/figure-markdown_strict/cell-15-output-1.png) *In global mode, an anomaly is flagged when the combined deviation across these series reaches a threshold. This can reveal system-wide anomalies.* In global anomaly detection, anomaly scores from all series at each time step are aggregated. A step is anomalous if the combined score exceeds the threshold. This reveals systemic anomalies that may go unnoticed if each series is considered alone. ## Real-World Use Cases ### Local Detection Examples: * **Independent application metrics**: Response time, error rates, request counts for different microservices * **IoT sensor networks**: Temperature sensors at different locations with no correlation * **Business metrics**: Sales figures across different product categories ### Global Detection Examples: * **Server monitoring**: CPU, memory, disk I/O, and network metrics from the same server * **Distributed system health**: Correlated metrics across multiple nodes indicating cluster-wide issues * **Manufacturing equipment**: Multiple sensor readings from a single machine indicating equipment failure ## Summary * **Local:** Best for detecting anomalies in a single metric or uncorrelated metrics. Low computational overhead, but may overlook cross-series patterns. * **Global:** Considers correlations across metrics, capturing system-wide issues. More complex and computationally intensive than local methods. Both detection approaches use Nixtla's online anomaly detection method. Choose the strategy that best fits your use case and data characteristics. ## Frequently Asked Questions **What's the difference between univariate and multivariate anomaly detection?** Univariate (local) detection analyzes each time series independently using the `threshold_method='univariate'` parameter, while multivariate (global) detection analyzes all series together using `threshold_method='multivariate'`, considering correlations between metrics. **When should I use global detection instead of local?** Use global detection when your time series are correlated and system-wide failures affect multiple metrics simultaneously, such as monitoring CPU, memory, and network metrics from the same server. **Does global detection increase computational cost?** Yes, global detection requires analyzing relationships across all time series, making it more computationally intensive. However, it can reduce overall false positives by considering metric correlations. **Can I run both local and global detection?** Yes, you can run both methods and compare results. Local detection may catch metric-specific anomalies while global detection identifies system-wide issues. # Delete Fine-tuned Model Source: https://nixtla.io/docs/api-reference/delete-fine-tuned-model /openapi.json delete /v2/finetuned_models/{finetuned_model_id} Delete a previously saved finetuned model. It takes the ID of the model that you want to delete as a path parameter. # Foundational Time Series Model Multi Series Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series /openapi.json post /v2/forecast Based on the provided data, this endpoint predicts the future values of multiple time series at once. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains the predicted values for each series based on the input arguments. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference. # Foundational Time Series Model Multi Series Anomaly Detector Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-anomaly-detector /openapi.json post /v2/anomaly_detection Based on the provided data, this endpoint detects the anomalies in the historical perdiod of multiple time series at once. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains a flag indicating if the date has an anomaly and also provides the prediction interval used to define if an observation is an anomaly.Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference. # Foundational Time Series Model Multi Series Cross Validation Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-cross-validation /openapi.json post /v2/cross_validation Perform Cross Validation for multiple series # Foundational Time Series Model Multi Series Finetuning Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-finetuning /openapi.json post /v2/finetune Fine-tune the large time model to your data and save it for later use. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains the ID of the finetuned model, which you can provide in other endpoints to use that model to make the forecasts. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference. # Foundational Time Series Model Multi Series Historic (Deprecated) Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-historic-deprecated /openapi.json post /v2/historic_forecast **Deprecated:** This endpoint is deprecated and will be removed in a future release. Please use [`/v2/cross_validation`](#tag/default/POST/v2/cross_validation) instead, which offers equivalent in-sample evaluation capabilities through rolling-window cross validation. Based on the provided data, this endpoint predicts the in-sample period (historical period) values of multiple time series at once. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains the predicted values for the historical period. Usually useful for anomaly detection. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference. # Foundational Time Series Model Online Multi Series Anomaly Detector Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-online-multi-series-anomaly-detector /openapi.json post /v2/online_anomaly_detection This endpoint performs online anomaly detection based on the provided data. It uses cross-validation for more robust detection of anomalies and it supports detection for univariate and multivariate scenarios. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains a flag indicating if the date has an anomaly, it provides the prediction interval used to define if an observation is an anomaly, and it reports the associated z-score for each point. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference. # Get single Fine-tuned Model Source: https://nixtla.io/docs/api-reference/get-single-fine-tuned-model /openapi.json get /v2/finetuned_models/{finetuned_model_id} Retrieve metadata for a previously fine-tuned model. The response contains the metadata of a model that you have fine-tuned and is available to make forecasts. # List Fine-tuned Models Source: https://nixtla.io/docs/api-reference/list-fine-tuned-models /openapi.json get /v2/finetuned_models List all the finetuned models that you have created. The response contains a list with the IDs of the models that you have fine-tuned and are available to make forecasts. # Validate Api Key Source: https://nixtla.io/docs/api-reference/validate-api-key /openapi.json get /validate_api_key # Audit and Clean Data Source: https://nixtla.io/docs/data_requirements/audit_clean Learn how to audit and clean your data with TimeGPT. The `audit_data` and `clean_data` methods from TimeGPT can help you identify and fix potential issues in your data. The `audit_data` method checks for common problems such as duplicates, missing dates, categorical columns, negative values, and leading zeros. While not all issues will result in errors, addressing them can improve the quality of the forecasts, depending on your specific use case. Once identified, `clean_data` can be used to automatically fix these issues. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/24_audit_data.ipynb) ## How to Use the Audit and Clean Methods ### Step 1: Import Packages To use the `audit_data` and `clean_data` methods, you first need to import and instantiate the `NixtlaClient` class. ```python theme={null} import pandas as pd from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # defaults to os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Create Minimal Example The `audit_data` method performs a series of checks to identify issues in your data. These checks fall into two categories:
Check Type Description Checks Performed
Fail Issues that will cause errors when you run TimeGPT Duplicate rows (D001)
Missing dates (D002)
Categorical feature columns (F001)
Case-specific Issues that may not cause errors but could negatively affect your results Negative values (V001)
Leading zeros (V002)
To show how the `audit_data` method works, we will create a sample dataset with missing dates, negative values and leading zeros. ```python theme={null} df = pd.DataFrame({ 'unique_id': ['id1', 'id1', 'id1', 'id2', 'id2', 'id2', 'id2', 'id3', 'id3', 'id3', 'id3'], 'ds': ['2023-01-01', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'], 'y': [1, 1, 1, 0, 0, 1, 2, -1, 0, 1, -2] }) df ``` | unique\_id | ds | y | | ---------- | ---------- | -- | | id1 | 2023-01-01 | 1 | | id1 | 2023-01-03 | 1 | | id1 | 2023-01-04 | 1 | | id2 | 2023-01-01 | 0 | | id2 | 2023-01-02 | 0 | | id2 | 2023-01-03 | 1 | | id2 | 2023-01-04 | 2 | | id3 | 2023-01-01 | -1 | | id3 | 2023-01-02 | 0 | | id3 | 2023-01-03 | 1 | | id3 | 2023-01-04 | -2 | ### Step 3: Audit Data The `audit_data` method requires the following parameters: * `df` *(required)*: A pandas DataFrame with your input data. * `freq` *(required)*: The frequency of your time series data (e.g., `D` for daily, `M` for monthly). * `id_col`: Column name identifying each unique series. Default is `unique_id`. * `time_col`: Column name containing timestamps. Default is `ds`. * `target_col`: Column name containing the target variable. Default is `y`. Additionally, you can use the following optional parameters to specify how missing dates are identified: * `start`: The initial timestamp for the series. * `end`: The final timestamp for the series. Both `start` and `end` can take the following options: * `per_serie`: Uses the first or last timestamp of each individual series. * `global`: Uses the earliest or latest timestamp from the entire dataset. * A specific timestamp or integer (e.g., `2025-01-01`, `2025`, or `datetime(2025, 1, 1)`). ```python theme={null} all_pass, fail_dfs, case_specific_dfs = nixtla_client.audit_data( df = df, freq = 'D', start = 'per_serie', end = 'per_serie' ) ``` The audit\_data method returns three values: * **all\_pass** (bool): True if every check passed, otherwise False. * **fail\_dfs** (dict): Any failed tests (D001, D002 or F001), each paired with the rows that failed. * **case\_specific\_dfs** (dict): Any case-specific tests (V001 or V002), each paired with the rows flagged. In the example above, the `audit_data` method should find missing dates (D002), negative values (V001), and leading zeros (V002). ### Step 4. Clean Data The `clean_data` method fixes the issues identified by the `audit_data` method. It requires the output of `audit_data`, so it must always be run after it. The `clean_data` method takes the following parameters: * `df` *(required)*: A pandas DataFrame with your input data. * `fail_dict` *(required)*: A dictionary with failed checks, as returned by the `audit_data` method. * `case_specific_dict` *(required)*: A dictionary with case-specific checks, also returned by the `audit_data` method. * `freq` *(required)*: The frequency of your time series data (e.g., `D` for daily, `M` for monthly). Can be a string, integer, or pandas offset. * `clean_case_specific`: Whether to clean case-specific issues (e.g., negative values, leading zeros). Default is `False`. * `id_col`: Column name identifying each unique series. Default is `unique_id`. * `time_col`: Column name containing timestamps or integer steps. Default is `ds`. * `target_col`: Column name containing the target variable. Default is `y`. ```python theme={null} clean_df, all_pass, fail_dfs, case_specific_dfs = nixtla_client.clean_data( df = df, fail_dict = fail_dfs, case_specific_dict = case_specific_dfs, clean_case_specific = True, freq = 'D' ) clean_df ``` | unique\_id | ds | y | | ---------- | ---------- | --- | | id1 | 2023-01-01 | 1.0 | | id1 | 2023-01-03 | 1.0 | | id1 | 2023-01-04 | 1.0 | | id1 | 2023-01-02 | NaN | | id2 | 2023-01-03 | 1.0 | | id2 | 2023-01-04 | 2.0 | | id3 | 2023-01-01 | 0.0 | | id3 | 2023-01-02 | 0.0 | | id3 | 2023-01-03 | 1.0 | | id3 | 2023-01-04 | 0.0 | In this example, `clean_data` added the missing date in `id1`, removed the leading zeros in `id2`, and replaced the negative values in `id3`. However, replacing negative values with zeros introduced new leading zeros in `id3`, so a second run of `clean_data` is required. ```python theme={null} clean_df2, all_pass, fail_dfs, case_specific_dfs = nixtla_client.clean_data( df = clean_df, fail_dict = fail_dfs, case_specific_dict = case_specific_dfs, clean_case_specific = True, # if False, the case-specific tests will be ignored freq = 'D' ) clean_df2 ``` | unique\_id | ds | y | | ---------- | ---------- | --- | | id1 | 2023-01-01 | 1.0 | | id1 | 2023-01-03 | 1.0 | | id1 | 2023-01-04 | 1.0 | | id1 | 2023-01-02 | NaN | | id2 | 2023-01-03 | 1.0 | | id2 | 2023-01-04 | 2.0 | | id3 | 2023-01-03 | 1.0 | | id3 | 2023-01-04 | 0.0 | After the second run of `clean_data`, the leading zeros in `id3` have been removed. The only remaining step is to fill the missing value created when the missing date was added in `id1`, and to sort the DataFrame by `unique_id` and `ds`. | unique\_id | ds | y | | ---------- | ---------- | --- | | id1 | 2023-01-01 | 1.0 | | id1 | 2023-01-02 | 0.0 | | id1 | 2023-01-03 | 1.0 | | id1 | 2023-01-04 | 1.0 | | id2 | 2023-01-03 | 1.0 | | id2 | 2023-01-04 | 2.0 | | id3 | 2023-01-03 | 1.0 | | id3 | 2023-01-04 | 0.0 | ## Conclusion The `audit_data` method helps you identify issues that may prevent TimeGPT from running properly. These include fail tests (duplicate rows, missing dates, and categorical feature columns), which will always result in errors if not addressed. It also flags case-specific issues (negative values and leading zeros), which may not cause errors but can affect the quality of your forecasts depending on your use case. The `clean_data` method can automatically fix the issues identified by `audit_data`. Be cautious when removing negative values or leading zeros, as they may contain important information about your data. Above all, when auditing and cleaning your data, make decisions based on the needs and context of your specific use case. # Data Requirements Source: https://nixtla.io/docs/data_requirements/data_requirements Overview of the data format and requirements for TimeGPT forecasting. TimeGPT accepts **pandas** and **polars** dataframes in [long format](https://www.theanalysisfactor.com/wide-and-long-data/#comments). The minimum required columns are: * **unique\_id**: String or numerical value to label each series. * **ds**(timestamp): String or datetime in `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS` format. * **y**(numeric): Numerical target variable to forecast. If a DataFrame lacks the `ds` column but uses a **DatetimeIndex**, that is also supported. TimeGPT also supports distributed dataframe libraries such as **dask**, **spark**, and **ray**. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/5_data_requirements.ipynb) You can include additional exogenous features in the same DataFrame. See the [Exogenous Variables tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details. *** ## Example DataFrame Below is a sample of a valid input DataFrame for TimeGPT (with columns named `timestamp` and `value` instead of `ds` and `y`): ```python Sample Data Loading theme={null} import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv') df["unique_id"] = "series1" df.head() ``` **Sample Data Preview** | **unique\_id** | **timestamp** | **value** | | -------------- | ------------- | --------- | | series1 | 1949-01-01 | 112 | | series1 | 1949-02-01 | 118 | | series1 | 1949-03-01 | 132 | | series1 | 1949-04-01 | 129 | | series1 | 1949-05-01 | 121 | In this example: * `unique_id` identifies the series * `timestamp` corresponds to `ds`. * `value` corresponds to `y`. *** ## Matching Columns to TimeGPT You can choose how to align your DataFrame columns with TimeGPT’s expected structure: Rename `timestamp` to `ds` and `value` to `y`: ```python Rename Columns Example theme={null} df = df.rename(columns={'timestamp': 'ds', 'value': 'y'}) ``` Now your DataFrame has the explicitly required columns: ```bash Show Head of DataFrame theme={null} print(df.head()) ``` Specify column names directly when calling `NixtlaClient`: ```python NixtlaClient Forecast Example theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla') fcst = nixtla_client.forecast( df=df, h=12, time_col='timestamp', target_col='value' ) fcst.head() ``` This way, you don’t need to rename your DataFrame columns, as TimeGPT will know which ones to treat as `ds` and `y`. *** ## Example Forecast When you run the forecast method: ```python Forecast Example theme={null} fcst = nixtla_client.forecast( df=df, h=12, time_col='timestamp', target_col='value' ) fcst.head() ``` ```bash Forecast Logs theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Inferred freq: MS INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Querying model metadata... INFO:nixtla.nixtla_client:Restricting input... INFO:nixtla.nixtla_client:Calling Forecast Endpoint... ``` | unique\_id | timestamp | TimeGPT | | ---------- | ---------- | --------- | | series1 | 1961-01-01 | 437.83792 | | series1 | 1961-02-01 | 426.06270 | | series1 | 1961-03-01 | 463.11655 | | series1 | 1961-04-01 | 478.24450 | | series1 | 1961-05-01 | 505.64648 | TimeGPT attempts to automatically infer your data’s frequency (`freq`). You can override this by specifying the **freq** parameter (e.g., `freq='MS'`). For more information, see the [TimeGPT Quickstart](/docs/forecasting/timegpt_quickstart). *** ## Important Considerations **Warning:** Data passed to TimeGPT must not contain missing values or time gaps. To handle missing data, see [Dealing with Missing Values in TimeGPT](/docs/data_requirements/missing_values). *** ### Minimum Data Requirements (Azure AI) These are the minimum data sizes required for each frequency when using Azure AI: | Frequency | Minimum Size | | -------------------------------- | ------------ | | Hourly and subhourly (e.g., "H") | 1008 | | Daily ("D") | 300 | | Weekly (e.g., "W-MON") | 64 | | Monthly and others | 48 | When preparing your data, also consider: Number of future periods you want to predict. How many times to test the model's performance. Periodic offset between validation windows during cross-validation. This ensures you have enough data for both training and evaluation. # Missing Values Source: https://nixtla.io/docs/data_requirements/missing_values Learn how to handle missing values in time series data for accurate forecasting with TimeGPT. ## Missing Values in Time Series TimeGPT can handle missing values in your target series, but it needs a continuous series of timestamps. While you may have multiple series starting and ending on different dates, each one must maintain a continuous date sequence. Any unobserved values in your target series can be labelled as `NaN`. Whenever possible, we recommend to fill missing values by interpolation or any other method that makes sense in your particular context. This tutorial shows you how to handle missing values for use with TimeGPT. For reference, this tutorial is based on the skforecast tutorial: [Forecasting Time Series with Missing Values](https://cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values). Managing missing values ensures your forecasts with TimeGPT are accurate and reliable. When dates or values are missing, fill or interpolate them according to the nature of your dataset. If values cannot be filled, you can label them as `NaN`. ## Tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/15_missing_values.ipynb) ### Step 1: Load Data Load the daily bike rental counts dataset using pandas. Note that the original column names are in Spanish; you will rename them to match `ds` and `y`. ```python theme={null} import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/master/data/usuarios_diarios_bicimad.csv') df = df[['fecha', 'Usos bicis total día']] df.rename(columns={'fecha': 'ds', 'Usos bicis total día': 'y'}, inplace=True) df.head() ``` | | ds | y | | - | ---------- | --- | | 0 | 2014-06-23 | 99 | | 1 | 2014-06-24 | 72 | | 2 | 2014-06-25 | 119 | | 3 | 2014-06-26 | 135 | | 4 | 2014-06-27 | 149 | Next, convert your dates to timestamps and assign a unique identifier (`unique_id`) to handle multiple series if needed: ```python theme={null} df['ds'] = pd.to_datetime(df['ds']) df['unique_id'] = 'id1' df = df[['unique_id', 'ds', 'y']] ``` Reserve the last 93 days for testing: ```python theme={null} train_df = df[:-93] test_df = df[-93:] ``` To simulate missing data, remove specific date ranges from the training dataset: ```python theme={null} mask = ~((train_df['ds'] >= '2020-09-01') & (train_df['ds'] <= '2020-10-10')) & \ ~((train_df['ds'] >= '2020-11-08') & (train_df['ds'] <= '2020-12-15')) train_df_gaps = train_df[mask] ``` ### Step 2: Initialize TimeGPT Initialize a `NixtlaClient` object with your Nixtla API key: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla') ``` ### Step 3: Visualize Data Plot your dataset and examine the gaps introduced above: ```python theme={null} nixtla_client.plot(train_df_gaps) ``` ![Chart Image](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-14-output-1.png) Note that there are two gaps in the data: from September 1, 2020, to October 10, 2020, and from November 8, 2020, to December 15, 2020. To better visualize these gaps, you can use the `max_insample_length` argument of the `plot` method or you can simply zoom in on the plot. ```python theme={null} nixtla_client.plot(train_df_gaps, max_insample_length=800) ``` ![Chart Image](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-15-output-1.png) Additionally, notice a period from March 16, 2020, to April 21, 2020, where the data shows zero rentals. These are not missing values, but actual zeros corresponding to the COVID-19 lockdown in the city. ### Step 4: Fill Missing Dates You can use `fill_gaps` from `utilsforecast` to insert the missing dates: Before using TimeGPT, we need to ensure that **all timestamps** from the start date to the end date are present in the data. Missing values in the series can be present as `NaN`. To address the first issue, we will use the `fill_gaps` function from `utilsforecast`, a Python package from Nixtla that provides essential utilities for time series forecasting, such as functions for data preprocessing, plotting, and evaluation. The `fill_gaps` function will fill in the missing dates in the data. To do this, it requires the following arguments: * `df`: The DataFrame containing the time series data. * `freq` (str or int): The frequency of the data. ```python theme={null} from utilsforecast.preprocessing import fill_gaps print('Number of rows before filling gaps:', len(train_df_gaps)) train_df_complete = fill_gaps(train_df_gaps, freq='D') print('Number of rows after filling gaps:', len(train_df_complete)) ``` ```bash theme={null} Number of rows before filling gaps: 2851 Number of rows after filling gaps: 2929 ``` > NOTE: In this tutorial, the data contains only one time series. However, TimeGPT > supports passing multiple series to the model. In this case, none of the time > series can have missing values from their individual earliest timestamp until > their individual latest timestamp. If these individual time series have missing > values, the user must decide how to fill these gaps for the individual time > series. The `fill_gaps` function provides a couple of additional arguments to > assist with this (refer to the documentation for complete details), namely > `start` and `end`. Now we need to decide whether to fill the missing values in the target column or not. In this tutorial, we decide to use interpolation, but it is important to consider the specific context of your data when selecting a filling strategy. For example, if you are dealing with daily retail data, a missing value most likely indicates that there were no sales on that day, and you can fill it with zero. Conversely, if you are working with hourly temperature data, a missing value probably means that the sensor was not functioning, and you might prefer to keep the value as `NaN`. In this case, we will handle the newly inserted missing values by interpolation. ```python theme={null} train_df_complete['y'] = train_df_complete['y'].interpolate( method='linear', limit_direction='both' ) train_df_complete.isna().sum() ``` ```bash theme={null} unique_id 0 ds 0 y 0 dtype: int64 ``` ### Step 5: Forecast with TimeGPT Typically, a horizon > 2 times the typical seasonality is considered long. In this case, the data has a seasonality of 7 days and a horizon of 93 days. Since the forecast horizon is long compared to the frequency of the data (daily), we will use `timegpt-1-long-horizon` model. ```python theme={null} fcst = nixtla_client.forecast( train_df_complete, h=len(test_df), model='timegpt-1-long-horizon' ) ``` Visualize the forecasts against the actual test data: ```python theme={null} nixtla_client.plot(test_df, fcst) ``` ![Forecast with Missing Data Filled](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/15_missing_values_files/figure-markdown_strict/cell-21-output-1.png) Evaluate performance using `utilsforecast`. We will use Mean Absolute Error (MAE) as the evaluation metric, but you can choose others like MSE, RMSE, etc.: ```python theme={null} from utilsforecast.evaluation import evaluate from utilsforecast.losses import mae fcst['ds'] = pd.to_datetime(fcst['ds']) result = test_df.merge(fcst, on=['ds', 'unique_id'], how='left') evaluate(result, metrics=[mae]) ``` | | unique\_id | metric | TimeGPT | | - | ---------- | ------ | ----------- | | 0 | id1 | mae | 1824.693059 | ### Step 6: Conclusion * Always ensure that your data is free of missing dates before forecasting with TimeGPT. * Select a gap-filling strategy based on your domain knowledge (linear interpolation, constant filling, etc.). * You may want to keep missing values as `NaN` if no gap-filling strategy makes sense in your context. ## References * [Exclude COVID Impact in Time Series Forecasting](https://www.cienciadedatos.net/documentos/py45-weighted-time-series-forecasting.html) * [Forecasting Time Series with Missing Values](https://cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values.html) # Multiple Time Series Source: https://nixtla.io/docs/data_requirements/multiple_series Learn how to handle missing values in time series data for accurate forecasting with TimeGPT. You can pass multiple time series within the same dataset to TimeGPT. We can then make forecasts or detect anomalies on all series simultaneously. To include multiple series, simply include a unique identifier column. By default, we expect this column to be called `unique_id`. The identifier column assigns a value to each series such that we can distinguish between them. ## Load Data with Multiple Series Here is an example of loading a dataset with multiple series inside. ```python theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv') df['ds'] = pd.to_datetime(df['ds']) df = df[["unique_id", "ds", "y"]] df.groupby('unique_id').head(1) ``` | unique\_id | ds | y | | ---------- | ---------- | ----- | | BE | 2016-10-22 | 70.00 | | DE | 2017-10-22 | 19.10 | | FR | 2016-10-22 | 54.70 | | NP | 2018-10-15 | 2.17 | Above, we can see that we have four unique series in the dataset, as there are four different values in `unique_id`. Note that each series can start at different dates. To forecast mutliple series, we can simply call: ```python Multiple Series Forecast Example theme={null} fcst = nixtla_client.forecast(df=df, h=24) fcst.head() ``` TimeGPT will produce forecasts for all unique IDs in your DataFrame simultaneously. ### Specifying the series identifier column In the case where unique identifier is not stored in a column called `unique_id`, you can specify the name of the column when making a call to TimeGPT: ```python Specify the name of the column for the series identifier theme={null} fcst = nixtla_client.forecast(df=df, h=24, id_col="your_column_name") fcst.head() ``` *** ## Exogenous Variables TimeGPT supports the use of exogenous features. These are variables that are not part of the series you are trying to forecast. For example, suppose that you are forecasting electricity consumption, which is affected by the temperature outside. In this case, the temperature is an exogenous feature, meaning that you want to use the information from the temperature to forecast the electricity consumption. In such case, exogenous features can be included as new columns in the dataset. Any additional column to the standard `unique_id`, `ds`, `y` format is considered as an exogenous feature. Here is an example of loading a dataset with multiple series inside and exogenous features. ```python Multiple Series Data Loading theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv') df['ds'] = pd.to_datetime(df['ds']) df.groupby('unique_id').head(1) ``` | unique\_id | ds | y | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 | | ---------- | ---------- | ----- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | BE | 2016-10-22 | 70.00 | 57253.00 | 49593 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | DE | 2017-10-22 | 19.10 | 16972.75 | 15779 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | | FR | 2016-10-22 | 54.70 | 57253.00 | 49593 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | NP | 2018-10-15 | 2.17 | 34078.00 | 1791 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | Above, we can see that we have the columns from `Exogenous1` to `day_6` will be considered as exogenous features when forecasting with TimeGPT. For more information on forecasting with exogenous features, read the [Exogenous Variables tutorial](/docs/forecasting/exogenous-variables/numeric_features) for further details. *** # Cross-validation Tutorial Source: https://nixtla.io/docs/forecasting/evaluation/cross_validation Master time series cross-validation with TimeGPT. Complete Python tutorial for model validation, rolling-window techniques, and prediction intervals with code examples. ## What is Cross-validation? Time series cross-validation is essential for validating machine learning models and ensuring accurate forecasts. Unlike traditional k-fold cross-validation, time series validation requires specialized rolling-window techniques that respect temporal order. This comprehensive tutorial shows you how to perform cross-validation in Python using TimeGPT, including prediction intervals, exogenous variables, and model performance evaluation. One of the primary challenges in time series forecasting is the inherent uncertainty and variability over time, making it crucial to validate the accuracy and reliability of the models employed. Cross-validation, a robust model validation technique, is particularly adapted for this task, as it provides insights into the expected performance of a model on unseen data, ensuring the forecasts are reliable and resilient before being deployed in real-world scenarios. TimeGPT incorporates the `cross_validation` method, designed to streamline the validation process for [time series forecasting models](/docs/forecasting/timegpt_quickstart). This functionality enables practitioners to rigorously test their forecasting models against historical data, with support for [prediction intervals](/docs/forecasting/probabilistic/prediction_intervals) and [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features). This tutorial will guide you through the nuanced process of conducting cross-validation within the `NixtlaClient` class, ensuring your time series forecasting models are not just well-constructed, but also validated for trustworthiness and precision. ### Why Use Cross-Validation for Time Series? Cross-validation provides several critical benefits for time series forecasting: * **Prevent overfitting**: Test model performance across multiple time periods * **Validate generalization**: Ensure forecasts work on unseen data * **Quantify uncertainty**: Generate prediction intervals for risk assessment * **Compare models**: Evaluate different forecasting approaches systematically * **Optimize hyperparameters**: Fine-tune model parameters with confidence ## How to Perform Cross-validation with TimeGPT **Quick Summary**: Learn time series cross-validation with TimeGPT in Python. This tutorial covers rolling-window validation, prediction intervals, model performance metrics, and advanced techniques with real-world examples using the Peyton Manning dataset. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/08_cross_validation.ipynb) ### Step 1: Import Packages and Initialize NixtlaClient First, we install and import the required packages and initialize the Nixtla client. We start off by initializing an instance of `NixtlaClient`. ```python theme={null} import pandas as pd from nixtla import NixtlaClient from IPython.display import display # Initialize TimeGPT client for cross-validation nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load Example Data Use the Peyton Manning dataset as an example. The dataset can be loaded directly from Nixtla's S3 bucket: ```python theme={null} pm_df = pd.read_csv( 'https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv' ) ``` If you are using your own data, ensure your data is properly formatted: you must have a time column (e.g., `ds`), a target column (e.g., `y`), and, if necessary, an identifier column (e.g., `unique_id`) for multiple time series. ### Step 3: Implement Rolling-Window Cross-Validation The `cross_validation` method within the TimeGPT class is an advanced functionality crafted to perform systematic validation on time series forecasting models. This method necessitates a dataframe comprising time-ordered data and employs a rolling-window scheme to meticulously evaluate the model's performance across different time periods, thereby ensuring the model's reliability and stability over time. The animation below shows how TimeGPT performs cross-validation. ![Rolling-window cross-validation](https://raw.githubusercontent.com/Nixtla/statsforecast/main/nbs/imgs/ChainedWindows.gif) Key parameters include: * `freq`: Frequency of your data (e.g., `'D'` for daily). If not specified, it will be inferred. * `id_col`, `time_col`, `target_col`: Columns representing series ID, timestamps, and target values. * `n_windows`: Number of separate validation windows. * `step_size`: Step size between each validation window. * `h`: Forecast horizon (e.g., the number of days ahead to predict). In execution, `cross_validation` assesses the model's forecasting accuracy in each window, providing a robust view of the model's performance variability over time and potential overfitting. This detailed evaluation ensures the forecasts generated are not only accurate but also consistent across diverse temporal contexts. **Key Concepts**: Rolling-window cross-validation splits your dataset into multiple training and testing sets over time. Each window moves forward chronologically, training on historical data and validating on future periods. This approach mimics real-world forecasting scenarios where you predict forward in time. Use `cross_validation` on the Peyton Manning dataset: ```python theme={null} # Perform cross-validation with 5 windows and 7-day forecast horizon timegpt_cv_df = nixtla_client.cross_validation( pm_df, h=7, # Forecast 7 days ahead n_windows=5, # Test across 5 different time periods freq='D' # Daily frequency ) timegpt_cv_df.head() ``` The logs below indicate successful cross-validation calls and data preprocessing. ```bash theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Querying model metadata... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Restricting input... INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint... ``` Cross-validation output includes the forecasted values (`TimeGPT`) aligned with historical values (`y`). | unique\_id | ds | cutoff | y | TimeGPT | | ---------- | ---------- | ---------- | -------- | -------- | | 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 7.939553 | | 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 7.887512 | | 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.766617 | | 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.931502 | | 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.312632 | ### Step 4: Plot Cross-Validation Results Visualize forecast performance for each cutoff period. Here's an example plotting the last 100 rows of actual data along with cross-validation forecasts for each cutoff. ```python theme={null} cutoffs = timegpt_cv_df['cutoff'].unique() for cutoff in cutoffs: fig = nixtla_client.plot( pm_df.tail(100), timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']), ) display(fig) ``` ![Cross-validation Example](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-12-output-1.png) ![Cross-validation Example](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-12-output-2.png) ![Cross-validation Example](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-12-output-3.png) ![Cross-validation Example](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-12-output-4.png) ![Cross-validation Example](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-12-output-5.png) ### Step 5: Generate Prediction Intervals for Model Uncertainty It is also possible to generate prediction intervals during cross-validation. To do so, we simply use the `level` argument. ```python theme={null} timegpt_cv_df = nixtla_client.cross_validation( pm_df, h=7, n_windows=5, freq='D', level=[80, 90], ) timegpt_cv_df.head() ``` | | unique\_id | ds | cutoff | y | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 | | - | ---------- | ---------- | ---------- | -------- | -------- | ------------- | ------------- | ------------- | ------------- | | 0 | 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 7.939553 | 8.201465 | 8.314956 | 7.677642 | 7.564151 | | 1 | 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 7.887512 | 8.175414 | 8.207470 | 7.599609 | 7.567553 | | 2 | 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.766617 | 8.267363 | 8.386674 | 7.265871 | 7.146560 | | 3 | 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.931502 | 8.205929 | 8.369983 | 7.657075 | 7.493020 | | 4 | 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.312632 | 9.184893 | 9.625794 | 7.440371 | 6.999469 | Plot the prediction intervals for the cross-validation results. ```python theme={null} cutoffs = timegpt_cv_df['cutoff'].unique() for cutoff in cutoffs: fig = nixtla_client.plot( pm_df.tail(100), timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']), level=[80, 90], models=['TimeGPT'] ) display(fig) ``` ![Cross-validation Example with Prediction Intervals](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-1.png) ![Cross-validation Example with Prediction Intervals](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-2.png) ![Cross-validation Example with Prediction Intervals](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-3.png) ![Cross-validation Example with Prediction Intervals](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-4.png) ![Cross-validation Example with Prediction Intervals](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-14-output-5.png) ### Step 6: Enhance Forecasts with Exogenous Variables #### Time Features It is possible to include exogenous variables when performing cross-validation. Here we use the `date_features` parameter to create labels for each month. These features are then used by the model to make predictions during cross-validation. ```python theme={null} timegpt_cv_df = nixtla_client.cross_validation( pm_df, h=7, n_windows=5, freq='D', date_features=['month'], ) timegpt_cv_df.head() ``` | | unique\_id | ds | cutoff | y | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 | | - | ---------- | ---------- | ---------- | -------- | -------- | ------------- | ------------- | ------------- | ------------- | | 0 | 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 8.426320 | 8.721996 | 8.824101 | 8.130644 | 8.028540 | | 1 | 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 8.049962 | 8.452083 | 8.658603 | 7.647842 | 7.441321 | | 2 | 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.509098 | 7.984788 | 8.138017 | 7.033409 | 6.880180 | | 3 | 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.739536 | 8.306914 | 8.641355 | 7.172158 | 6.837718 | | 4 | 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.027471 | 8.722828 | 9.152306 | 7.332113 | 6.902636 | Plot the cross-validation results with the time features. ```python theme={null} cutoffs = timegpt_cv_df['cutoff'].unique() for cutoff in cutoffs: fig = nixtla_client.plot( pm_df.tail(100), timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']), date_features=['month'], models=['TimeGPT'] ) display(fig) ``` ![Cross-validation Example with Time Features](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-16-output-1.png) ![Cross-validation Example with Time Features](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-16-output-2.png) ![Cross-validation Example with Time Features](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-16-output-3.png) ![Cross-validation Example with Time Features](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-16-output-4.png) ![Cross-validation Example with Time Features](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-16-output-5.png) #### Dynamic Features Additionally you can pass dynamic exogenous variables to better inform TimeGPT about the data. You just simply have to add the exogenous regressors after the target column. ```python theme={null} Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity.csv') X_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/exogenous-vars-electricity.csv') df = Y_df.merge(X_df) ``` Now let's cross validate `TimeGPT` considering this information ```python theme={null} timegpt_cv_df_x = nixtla_client.cross_validation( df.groupby('unique_id').tail(100 * 48), h=48, n_windows=2, level=[80, 90] ) cutoffs = timegpt_cv_df_x.query('unique_id == "BE"')['cutoff'].unique() for cutoff in cutoffs: fig = nixtla_client.plot( df.query('unique_id == "BE"').tail(24 * 7), timegpt_cv_df_x.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']), models=['TimeGPT'], level=[80, 90], ) display(fig) ``` ![Cross-validation Example with Dynamic Exogenous Variables](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-19-output-2.png) ![Cross-validation Example with Dynamic Exogenous Variables](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-19-output-3.png) ### Step 7: Long-Horizon Forecasting with TimeGPT Also, you can generate cross validation for different instances of `TimeGPT` using the `model` argument. Here we use the base model and the model for long-horizon forecasting. ```python theme={null} timegpt_cv_df_x_long_horizon = nixtla_client.cross_validation( df.groupby('unique_id').tail(100 * 48), h=48, n_windows=2, level=[80, 90], model='timegpt-1-long-horizon', ) timegpt_cv_df_x_long_horizon.columns = timegpt_cv_df_x_long_horizon.columns.str.replace('TimeGPT', 'TimeGPT-LongHorizon') timegpt_cv_df_x_models = timegpt_cv_df_x_long_horizon.merge(timegpt_cv_df_x) cutoffs = timegpt_cv_df_x_models.query('unique_id == "BE"')['cutoff'].unique() for cutoff in cutoffs: fig = nixtla_client.plot( df.query('unique_id == "BE"').tail(24 * 7), timegpt_cv_df_x_models.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']), models=['TimeGPT', 'TimeGPT-LongHorizon'], level=[80, 90], ) display(fig) ``` ![Cross-validation Example with Long Horizon Forecasting](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-20-output-2.png) ![Cross-validation Example with Long Horizon Forecasting](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/08_cross_validation_files/figure-markdown_strict/cell-20-output-3.png) ## Frequently Asked Questions **What is time series cross-validation?** Time series cross-validation is a model validation technique that uses rolling windows to evaluate forecasting accuracy while preserving temporal order, ensuring reliable predictions on unseen data. **How is time series cross-validation different from k-fold cross-validation?** Unlike k-fold cross-validation which randomly shuffles data, time series cross-validation maintains temporal order using techniques like walk-forward validation and expanding windows to prevent data leakage. **What are the key parameters for cross-validation in TimeGPT?** Key parameters include `h` (forecast horizon), `n_windows` (number of validation windows), `step_size` (window increment), and `level` (prediction interval confidence levels). **How do you evaluate cross-validation results?** Evaluate results by comparing forecasted values against actual values across multiple time windows, analyzing prediction intervals, and calculating metrics like MAE, RMSE, and MAPE. ## Conclusion You've mastered time series cross-validation with TimeGPT, including rolling-window validation, prediction intervals, exogenous variables, and long-horizon forecasting. These model validation techniques ensure your forecasts are accurate, reliable, and production-ready. ### Next Steps in Model Validation * Explore [evaluation metrics](/docs/forecasting/evaluation/evaluation_metrics) to quantify forecast accuracy * Learn about [fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) for domain-specific data * Apply cross-validation to [multiple time series](/docs/data_requirements/multiple_series) Ready to validate your forecasts at scale? [Start your TimeGPT trial](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/evaluation/cross_validation) and implement robust cross-validation today. # Evaluation Metrics Source: https://nixtla.io/docs/forecasting/evaluation/evaluation_metrics Learn to select the right evaluation metrics to measure the performance of TimeGPT. Selecting the right evaluation metric is crucial, as it guides the selection of the best settings for TimeGPT to ensure the model is making accurate forecasts. ## Overview of Common Evaluation Metrics The following table summarizes the common evaluation metrics used in forecasting depending on the type of forecasts. It also indicates when to use and when to avoid a particular metric. | Metric | Types of forecast | Properties | When to avoid | | ------ | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------- | | MAE | Point forecast | | When averaging over series of different scales | | MSE | Point forecast | | There are unrepresentative outliers in the data | | RMSE | Point forecast | | There are unrepresentative outliers in the data | | MAPE | Point forecast | | When data has zero values | | sMAPE | Point forecast | | When data has zero values | | MASE | Point forecast | | There is only one series to evaluate | | CRPS | Probabilistic forecast | | When evaluating point forecasts | In the following sections, we dive deeper into each metric. Note that all of these metrics can be used to evaluate the forecasts of TimeGPT using the *utilsforecast* library. For more information, read our tutorial on [evaluating TimeGPT with utilsforecast](/docs/forecasting/evaluation/evaluation_utilsforecast). ## Mean Absolute Error (MAE) The mean absolute error simply averages the absolute distance between the forecasts and the actual values. It is a good evaluation metric that works in the vast majority of forecasting tasks. It is robust to outliers, meaning that it will not magnifiy large errors, and it is expressed as the same units as the data, making it easy to interpret. Simply be careful when average the MAE over multiple series of different scales, since then a series with smaller values might bring down the MAE, while a series with larger values will bring it up. ## Mean Squared Error (MSE) The mean squared error squares the forecast errors before averaging them, which heavily penalizes large errors while giving less weight to small ones. As such, it is not robust to outliers since a single large error can dramatically inflate the MSE value. Additionally, the units are squared (e.g., dollars²), making it difficult to interpret in practical terms. Avoid MSE when your data contains outliers or when you need an easily interpretable metric. It's best used in optimization contexts where you specifically want to penalize large errors more severely. ## Root Mean Squared Error (RMSE) The root mean squared error is simply the square root of the MSE, bringing the metric back to the original units of the data while preserving MSE's property of penalizing large errors. RMSE is more interpretable than MSE since it's expressed in the same units as your data. You should avoid RMSE when outliers are present or when you want equal treatment of all errors. ## Mean Absolute Percentage Error (MAPE) The mean absolute percentage error expresses forecast errors as percentages of the actual values, making it scale-independent and easy to interpret. MAPE is excellent for comparing forecast accuracy across different time series with varying scales. It's intuitive and easily understood in business contexts. Avoid MAPE when your data contains zero or near-zero values (causes division by zero) or when you have intermittent demand patterns. Not that it's also asymmetric, penalizing positive errors (over-forecasts) more heavily than negative errors (under-forecasts). ## Symmetric Mean Absolute Percentage Error (sMAPE) The symmetric mean absolute percentage error attempts to address MAPE's asymmetry by using the average of actual and forecast values in the denominator, making it more balanced between over- and under-forecasts. sMAPE is more stable than MAPE and less prone to extreme values. It's still scale-independent and relatively easy to interpret, though not as intuitive as MAPE. Avoid sMAPE when dealing with zero values or when the sum of actual and forecast values approaches zero. While more symmetric than MAPE, it's still not perfectly symmetric and can behave unexpectedly in edge cases. ## Mean Absolute Scaled Error (MASE) The mean absolute scaled error scales forecast errors relative to the average error of a naive seasonal forecast, providing a scale-independent measure that's robust and interpretable. MASE is excellent for comparing forecasts across different time series and scales. A MASE value less than 1 indicates your forecast is better than the naive benchmark, while values greater than 1 indicate worse performance. It's robust to outliers and handles zero values well. While it is a good metric to compare across multiple series, it might not make sense for you to compare against naive forecasts, and it does require some technical knowledge to interpret correctly. ## Continuous Ranked Probability Score (CRPS) The continuous ranked probability score measures the distance between the entire forecast distribution and the observed value, making it ideal for evaluating probabilistic forecasts. CRPS is a proper scoring rule that reduces to MAE when dealing with deterministic forecasts, making it a natural extension for probabilistic forecasting. It's expressed in the same units as the original data and provides a comprehensive evaluation of forecast distributions, rewarding both accuracy and good uncertainty quantification. CRPS is specifically designed for probabilistic forecasts, so avoid it when you only have point forecasts. It's also more computationally intensive to calculate than simpler metrics and may be less intuitive for stakeholders unfamiliar with probabilistic forecasting concepts. ## Evaluating TimeGPT To learn how to use any of the metrics outlined above to evaluate the forecasts of TimeGPT, read our tutorial on [evaluating TimeGPT with utilsforecast](/docs/forecasting/evaluation/evaluation_utilsforecast). # Evaluation Pipeline Source: https://nixtla.io/docs/forecasting/evaluation/evaluation_utilsforecast Learn how to evaluate TimeGPT model performance using tools in utilforecast ## Overview After generating forecasts with TimeGPT, the next step is to evaluate how accurate those forecasts are. The evaluate function from the utilsforecast library provides a fast and flexible way to assess model performance using a wide range of metrics. This pipeline works seamlessly with TimeGPT and other forecasting models.\ With the evaluation pipeline, you can easily select models and define metrics like MAE, MSE, or MAPE to benchmark forecasting performance. ## Step-to-Step Guide ### Step 1. Import Required Packages Start by importing the necessary libraries and initializing the `NixtlaClient` with your API key. ```python theme={null} import pandas as pd from nixtla import NixtlaClient from functools import partial from utilsforecast.evaluation import evaluate from utilsforecast.losses import mae, mse, rmse, mape, smape, mase, scaled_crps nixtla_client = NixtlaClient(api_key='your_api_key_here') ``` ### Step 2. Load and Prepare the Dataset For this example, we use the Air Passenger dataset, which records monthly totals of international airline passengers. We'll load the dataset, format the timestamps, and split the data into a training set and a test set. The last 12 months are used for testing. ```python theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv') df['unique_id'] = 'passengers' df['timestamp'] = pd.to_datetime(df['timestamp']) ``` ```python theme={null} df_train = df.iloc[:-12] df_test = df.iloc[-12:] ``` ### Step 3. Generate Forecast with TimeGPT Next, we will: * Use the training set to generate a 12-step forecast with TimeGPT. * Merge the forecast with the test set for evaluation. ```python theme={null} fcst_timegpt = nixtla_client.forecast(df = df_train, h=12, time_col = 'timestamp', target_col = 'value', level=[80, 95]) fcst_timegpt = fcst_timegpt.merge(df_test, on = ['timestamp','unique_id']) ``` ### Step 4. Define Models and Metrics for Evaluation Next, we define the models to evaluate and the metrics to use. For more information about supported metrics, refer to the [evaluation metrics tutorial](forecasting/evaluation/evaluation_metrics) . ```python theme={null} models = ['TimeGPT'] metrics = [ mae, mse, rmse, mape, smape, partial(mase, seasonality=12), scaled_crps ] ``` ### Step 5. Run the Evaluation Finally, call the evaluate function with your merged forecast results. Include `train_df` for metrics that need the training data and `level` if using probabilistic metrics. ```python theme={null} evaluation = evaluate( fcst_timegpt, target_col = 'value', time_col = 'timestamp', metrics=metrics, models=model, train_df=df_train, level=[80, 95] ) ``` | unique\_id | metric | TimeGPT | | ---------- | ------------ | -------- | | passengers | mae | 12.67930 | | passengers | mse | 213.9358 | | passengers | rmse | 14.62654 | | passengers | mape | 0.026964 | | passengers | smape | 0.013527 | | passengers | mase | 0.416397 | | passengers | scaled\_crps | 0.008991 | # Categorical Variables Source: https://nixtla.io/docs/forecasting/exogenous-variables/categorical_features Learn how to incorporate external categorical variables in your TimeGPT forecasts to improve accuracy. ## What Are Categorical Variables? Categorical variables are external factors that take on a limited range of discrete values, grouping observations by categories. For example, "Sporting" or "Cultural" events in a dataset describing product demand. By capturing unique external conditions, categorical variables enhance the predictive power of your model and can reduce forecasting error. They are easy to incorporate by merging each time series data point with its corresponding categorical data. This tutorial demonstrates how to incorporate categorical (discrete) variables into TimeGPT forecasts. ## How to Use Categorical Variables in TimeGPT [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/03_categorical_variables.ipynb) ### Step 1: Import Packages and Initialize the Nixtla Client Make sure you have the necessary libraries installed: pandas, nixtla, and datasetsforecast. ```python theme={null} import pandas as pd import os from nixtla import NixtlaClient from datasetsforecast.m5 import M5 from utilsforecast.losses import smape # Initialize the Nixtla Client nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load M5 Data We use the **M5 dataset** — a collection of daily product sales demands across 10 US stores — to showcase how categorical variables can improve forecasts. Start by loading the M5 dataset and converting the date columns to datetime objects. ```python theme={null} Y_df, X_df, _ = M5.load(directory=os.getcwd()) Y_df['ds'] = pd.to_datetime(Y_df['ds']) X_df['ds'] = pd.to_datetime(X_df['ds']) Y_df.head(10) ``` | unique\_id | ds | y | | -------------------- | ---------- | --- | | FOODS\_1\_001\_CA\_1 | 2011-01-29 | 3.0 | | FOODS\_1\_001\_CA\_1 | 2011-01-30 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-01-31 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-01 | 1.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-02 | 4.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-03 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-04 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-05 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-06 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-07 | 0.0 | Extract the categorical columns from the X\_df dataframe. ```python theme={null} X_df = X_df[['unique_id', 'ds', 'event_type_1']] X_df.head(10) ``` | unique\_id | ds | event\_type\_1 | | -------------------- | ---------- | -------------- | | FOODS\_1\_001\_CA\_1 | 2011-01-29 | nan | | FOODS\_1\_001\_CA\_1 | 2011-01-30 | nan | | FOODS\_1\_001\_CA\_1 | 2011-01-31 | nan | | FOODS\_1\_001\_CA\_1 | 2011-02-01 | nan | | FOODS\_1\_001\_CA\_1 | 2011-02-02 | nan | | FOODS\_1\_001\_CA\_1 | 2011-02-03 | nan | | FOODS\_1\_001\_CA\_1 | 2011-02-04 | nan | | FOODS\_1\_001\_CA\_1 | 2011-02-05 | nan | | FOODS\_1\_001\_CA\_1 | 2011-02-06 | Sporting | | FOODS\_1\_001\_CA\_1 | 2011-02-07 | nan | Notice that there is a Sporting event on February 6, 2011, listed under `event_type_1`. ### Step 3: Prepare Data for Forecasting We'll select a specific product to demonstrate how to incorporate categorical features into TimeGPT forecasts. #### Select a High-Selling Product and Merge Data Start by selecting a high-selling product and merging the data. ```python theme={null} product = 'FOODS_3_090_CA_3' Y_df_product = Y_df.query('unique_id == @product') X_df_product = X_df.query('unique_id == @product') df = Y_df_product.merge(X_df_product) df.head(10) ``` | unique\_id | ds | y | event\_type\_1 | | -------------------- | ---------- | ----- | -------------- | | FOODS\_3\_090\_CA\_3 | 2011-01-29 | 108.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-01-30 | 132.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-01-31 | 102.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-02-01 | 120.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-02-02 | 106.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-02-03 | 123.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-02-04 | 279.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-02-05 | 175.0 | nan | | FOODS\_3\_090\_CA\_3 | 2011-02-06 | 186.0 | Sporting | | FOODS\_3\_090\_CA\_3 | 2011-02-07 | 120.0 | nan | #### Prepare Future External Variables Select future external variables for Feb 1-7, 2016. ```python theme={null} future_ex_vars_df = df.drop(columns=['y']).query("ds >= '2016-02-01' & ds <= '2016-02-07'") ``` Separate training data before Feb 1, 2016. ```python theme={null} df_train = df.query("ds < '2016-02-01'") df_train.tail(10) ``` | unique\_id | ds | y | event\_type\_1 | | -------------------- | ---------- | ----- | -------------- | | FOODS\_3\_090\_CA\_3 | 2016-01-22 | 94.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-23 | 144.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-24 | 146.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-25 | 87.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-26 | 73.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-27 | 62.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-28 | 64.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-29 | 102.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-30 | 113.0 | nan | | FOODS\_3\_090\_CA\_3 | 2016-01-31 | 98.0 | nan | ### Step 4: Forecast Product Demand To evaluate the impact of categorical variables, we'll forecast product demand with and without them. #### Forecast Without Categorical Variables ```python theme={null} timegpt_fcst_without_cat_vars_df = nixtla_client.forecast( df=df_train, h=7, level=[80, 90] ) timegpt_fcst_without_cat_vars_df.head() ``` | unique\_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 | | -------------------- | ---------- | --------- | ------------- | ------------- | ------------- | ------------- | | FOODS\_3\_090\_CA\_3 | 2016-02-01 | 73.304090 | 95.887380 | 98.250880 | 50.720802 | 48.357307 | | FOODS\_3\_090\_CA\_3 | 2016-02-02 | 66.335520 | 75.429660 | 76.663704 | 57.241375 | 56.007330 | | FOODS\_3\_090\_CA\_3 | 2016-02-03 | 65.881630 | 86.636480 | 87.502810 | 45.126778 | 44.260456 | | FOODS\_3\_090\_CA\_3 | 2016-02-04 | 72.371864 | 92.362690 | 96.378610 | 52.381035 | 48.365116 | | FOODS\_3\_090\_CA\_3 | 2016-02-05 | 95.141045 | 111.439224 | 114.115490 | 78.842865 | 76.166595 | Visualize the forecast without categorical variables. ```python theme={null} nixtla_client.plot( df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"), timegpt_fcst_without_cat_vars_df, max_insample_length=28, ) ``` Forecast with categorical variables TimeGPT already provides a reasonable forecast, but it seems to somewhat underforecast the peak on the 6th of February 2016 - the day before the Super Bowl. #### Forecast With Categorical Variables To forecast with categorical variables, simply provide the list of column names containing categorical features in the `categorical_exog_list` argument. ```python theme={null} timegpt_fcst_with_cat_vars_df = nixtla_client.forecast( df=df_train, X_df=future_ex_vars_df, h=7, level=[80, 90], categorical_exog_list=["event_type_1"] ) timegpt_fcst_with_cat_vars_df.head() ``` | unique\_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 | | -------------------- | ---------- | --------- | ------------- | ------------- | ------------- | ------------- | | FOODS\_3\_090\_CA\_3 | 2016-02-01 | 73.839455 | 100.905910 | 104.44151 | 46.773006 | 43.237396 | | FOODS\_3\_090\_CA\_3 | 2016-02-02 | 66.548750 | 75.294970 | 76.62822 | 57.802540 | 56.469284 | | FOODS\_3\_090\_CA\_3 | 2016-02-03 | 66.694435 | 87.777954 | 88.63922 | 45.610912 | 44.749650 | | FOODS\_3\_090\_CA\_3 | 2016-02-04 | 74.249530 | 94.813286 | 98.88473 | 53.685770 | 49.614326 | | FOODS\_3\_090\_CA\_3 | 2016-02-05 | 96.052414 | 112.402090 | 115.22341 | 79.702736 | 76.881420 | Visualize the forecast with categorical variables. ```python theme={null} # Visualize the forecast with categorical variables nixtla_client.plot( df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"), timegpt_fcst_with_cat_vars_df, max_insample_length=28, ) ``` Forecast with categorical variables ## 5. Evaluate Forecast Accuracy Finally, we calculate the **Symmetric Mean Absolute Percentage Error (sMAPE)** for the forecasts with and without categorical variables. ```python theme={null} # Create target dataframe df_target = df[['unique_id', 'ds', 'y']].query("ds >= '2016-02-01' & ds <= '2016-02-07'") # Rename forecast columns timegpt_fcst_without_cat_vars_df = timegpt_fcst_without_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-without-cat-vars'}) timegpt_fcst_with_cat_vars_df = timegpt_fcst_with_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-with-cat-vars'}) # Merge forecasts with target dataframe df_target = df_target.merge(timegpt_fcst_without_cat_vars_df[['unique_id', 'ds', 'TimeGPT-without-cat-vars']]) df_target = df_target.merge(timegpt_fcst_with_cat_vars_df[['unique_id', 'ds', 'TimeGPT-with-cat-vars']]) # Compute errors smape_errors = smape(df_target, ['TimeGPT-without-cat-vars', 'TimeGPT-with-cat-vars']) ``` | unique\_id | TimeGPT-without-cat-vars | TimeGPT-with-cat-vars | | -------------------- | ------------------------ | --------------------- | | FOODS\_3\_090\_CA\_3 | 0.109241 | 0.108666 | Including categorical variables improves forecast accuracy as it achieves a lower sMAPE. ## Conclusion Categorical variables are powerful additions to TimeGPT forecasts, helping capture valuable external factors. By simply passing them to the `categorical_exog_list` parameter, you can significantly enhance predictive performance. Continue exploring more advanced techniques or different datasets to further improve your TimeGPT forecasting models. # Date/Time Features Source: https://nixtla.io/docs/forecasting/exogenous-variables/date_features Learn how to incorporate date/time features into your forecasts to improve performance. ## Why incorporate Date/Time Features in your Forecasts Many time series display patterns that repeat based on the calendar like demand increasing on weekends, sales peaking at the end of the month, or traffic varying by hour of the day. Recognizing and capturing these time-based patterns can be a powerful way to improve forecasting accuracy. While you can forecast a time series based solely on its historical values, adding additional date/time related features, such as the day of the week, month, quarter, or hour, can often enhance the model's performance. These features can be especially useful when your dataset lacks exogenous variables, but they can also complement external regressors when available. In this tutorial, we'll walk through how to incorporate these date/time features into TimeGPT to boost the accuracy of your forecasts. ## How to incorporate Date/Time Features in your Forecasts [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/date_features.ipynb) ### Step 1: Import Packages Import the necessary libraries and initialize the Nixtla client. ```python theme={null} import numpy as np import pandas as pd from nixtla import NixtlaClient # For forecast evaluation from utilsforecast.evaluation import evaluate from utilsforecast.losses import mae, rmse ``` You can instantiate the `NixtlaClient` class providing your authentication API key. ```python theme={null} nixtla_client = NixtlaClient( # defaults to os.environ.get("NIXTLA_API_KEY") api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load Data In this notebook, we use hourly electricity prices as our example dataset, which consists of 5 time series, each with approximately 1700 data points. For demonstration purposes, we focus on the German electricity price series. The time series is split, with the last 240 steps (10 days) set aside as the test set. For simplicity, we will also demonstrate this tutorial without the use of any additional exogenous variables, but you could extend this same technique for datasets that have exogenous variables. ```python theme={null} df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv' ) df['ds'] = pd.to_datetime(df['ds']) df_sub = df.query('unique_id == "DE"')[['unique_id','ds','y']] ``` ```python theme={null} df_train = df_sub.query('ds < "2017-12-21"') df_test = df_sub.query('ds >= "2017-12-21"') df_train.shape, df_test.shape ``` ```bash theme={null} ((1440, 3), (240, 3)) ``` ```python theme={null} nixtla_client.plot(df_train, df_test.rename(columns={'y': 'test'})) ``` Train Test Split ### Step 3: Forecasting #### Without Datetime Features First, we forecast the univariate time series without the use of datetime features. ```python theme={null} fcst_timegpt_no_dt = nixtla_client.forecast( df = df_train, h=24*10, model="timegpt-1-long-horizon" ) ``` We will rename the forecast column for this approach, so that we can distinguish it from forecasts created using other methods later. ```python theme={null} fcst_timegpt_no_dt.rename(columns={"TimeGPT": "TimeGPT_no_dt"}, inplace=True) ``` #### With Inbuilt Datetime Features Next, let's forecast the same univariate time series with datetime features. This can be done by specifying the `date_features` argument. The data is hourly, so both the hour of the day (`hour`) and the day of the week (`dayofweek`) may impact the usage. For example, the usage may peak in the afternoon and drop off at night. It can also differ between the weekdays and weekends due to working and holiday patterns. Including these features can help the model make better forecasts. > NOTE: > > 1. In order to show how these features are created, we can add the > `feature_contribution` agrument. This is just for demonstration purposes in this > tutorial and not truly needed to forecast with datetime features. > 2. If you have a weekly frequency dataset, you can use > `date_features = ["week", "month", "year"]` or a subset of these features. > 3. If you have a monthly frequency dataset, you can use > `date_features = ["month", "year"]` or a subset of these features. ```python theme={null} fcst_timegpt_dt_no_ohe = nixtla_client.forecast( df = df_train, h=24*10, model="timegpt-1-long-horizon", date_features=['hour', 'dayofweek'], feature_contributions=True ) ``` ```python theme={null} shap_df = nixtla_client.feature_contributions shap_df.head() ``` | | unique\_id | ds | TimeGPT | hour | dayofweek | base\_value | | -: | ---------: | ------------------: | --------: | ---------: | --------: | ----------: | | 0 | DE | 2017-12-21 00:00:00 | 34.945976 | -12.797431 | 4.236599 | 43.506810 | | 1 | DE | 2017-12-21 01:00:00 | 33.700954 | -14.274811 | 4.168986 | 43.806778 | | 2 | DE | 2017-12-21 02:00:00 | 32.120293 | -15.785894 | 4.123096 | 43.783092 | | 3 | DE | 2017-12-21 03:00:00 | 32.544914 | -15.623017 | 4.542475 | 43.625454 | | 4 | DE | 2017-12-21 04:00:00 | 33.698105 | -14.559433 | 4.525819 | 43.731720 | As we can see, two new exogenous features (`hour` and `dayofweek`) got added to the dataset and the forecast utilized these features. However, we need to ensure that the model treats each hour (0, 1, 2, ..., 23) and each day (0, 1, 2, ..., 6) as a categorical variable and not as a numerical variable. If treated numerically, the model may exaggerate differences (e.g., hour 23 might appear 23 times more influential than hour 1), which doesn't reflect real patterns. Electricity usage at hour 23 is typically similar to hour 1, and day 6 usage often resembles day 0. To avoid this distortion, we one-hot encode these variables using the `date_features_to_one_hot` argument. This creates a separate exogenous feature for each hour and each day, allowing the model to capture their effects independently. ```python theme={null} fcst_timegpt_dt = nixtla_client.forecast( df = df_train, h=24*10, model="timegpt-1-long-horizon", date_features=['hour', 'dayofweek'], date_features_to_one_hot=['hour', 'dayofweek'], feature_contributions=True ) ``` ```python theme={null} shap_df = nixtla_client.feature_contributions shap_df.head() ``` | | unique\_id | ds | TimeGPT | hour\_0 | hour\_1 | hour\_2 | hour\_3 | hour\_4 | hour\_5 | hour\_6 | ... | hour\_22 | hour\_23 | dayofweek\_0 | dayofweek\_1 | dayofweek\_2 | dayofweek\_3 | dayofweek\_4 | dayofweek\_5 | dayofweek\_6 | base\_value | | | -: | ---------: | -: | ------------------: | --------: | ---------: | ---------: | ---------: | ---------: | ---------: | -------: | -------: | -------: | -------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | ----------: | --------- | | 0 | 0 | DE | 2017-12-21 00:00:00 | 35.248108 | -13.396377 | 0.387143 | 0.423001 | 0.392672 | 0.373034 | 0.333778 | 0.147671 | ... | 0.271507 | 0.393282 | 0.472389 | -0.377321 | -0.548429 | -0.101086 | -0.133001 | 1.455560 | 2.975230 | 44.333805 | | 1 | 1 | DE | 2017-12-21 01:00:00 | 34.400800 | 0.358443 | -14.488875 | 0.389985 | 0.359990 | 0.341219 | 0.320964 | 0.135058 | ... | 0.266497 | 0.391259 | 0.445456 | -0.306117 | -0.436959 | -0.172850 | -0.151865 | 1.533456 | 3.022358 | 44.539093 | | 2 | 2 | DE | 2017-12-21 02:00:00 | 33.175526 | 0.375983 | 0.372809 | -15.824338 | 0.348533 | 0.351379 | 0.317832 | 0.123833 | ... | 0.273698 | 0.410714 | 0.417348 | -0.279551 | -0.342991 | -0.171547 | -0.142890 | 1.532721 | 3.042772 | 44.515614 | | 3 | 3 | DE | 2017-12-21 03:00:00 | 33.205390 | 0.368333 | 0.366936 | 0.372584 | -15.880591 | 0.346306 | 0.319877 | 0.136488 | ... | 0.276705 | 0.416273 | 0.508190 | -0.274014 | -0.339005 | -0.176228 | -0.152890 | 1.588364 | 3.095226 | 44.391410 | | 4 | 4 | DE | 2017-12-21 04:00:00 | 34.689583 | 0.363581 | 0.363459 | 0.393807 | 0.362043 | -14.755774 | 0.314718 | 0.141911 | ... | 0.274819 | 0.402653 | 0.531417 | -0.277548 | -0.360688 | -0.159342 | -0.169762 | 1.692538 | 3.165733 | 44.505848 | As we can see above, this now creates a separate feature for each hour of the day and each day of the week. > NOTE: With one hot encoding, the number of features can increase by a lot. > This is especially true if you have weekly frequency data and you are using > `date_feature=["week"]` because this leads to 52 features being created after > one hot encoding. Please make sure that your dataset has enough datapoints or > else the model will overfit to the data. You can increase the number of > datapoints in the dataset by increasing the available history for your time > series, or increasing the number of unique time series that share a common > pattern in your dataset. ```python theme={null} fcst_timegpt_dt.rename(columns={"TimeGPT": "fcst_timegpt_dt"}, inplace=True) ``` #### With Custom Datetime Features In the example above, we saw how to incorporate the inbuilt datetime features into the forecast. However, as seen above, in some cases, it may not be feasible to one hot encode the datetime features since it may lead to a large number of features for the dataset size. In that case, we can create a custom datetime feature and use it in the forecast. In this example, we will create a sine/cosine encoder for the week which is a popular technique to encode datetime features due to their circular nature described above (e.g. hour 23 behavior is close to hour 0 behavior, week 52 behavior is very close to week 1 behavior, etc.). ```python theme={null} class SinCosWeekOfYear: """ Adds sine and cosine features for each week of the year. This is useful for models that can benefit from understanding the periodicity of weeks in a year. """ def __call__(self, dates: pd.DatetimeIndex): df = pd.DataFrame(index=dates) # Get week of year (1 to 53) weeks = np.array([date.isocalendar().week for date in dates]) # Calculate sine and cosine features df["week_sin"] = np.sin((2 * np.pi) * (weeks-1) / 53).round(4) df["week_cos"] = np.cos((2 * np.pi) * (weeks-1) / 53).round(4) return df def __name__(self): return "SinCosWeekOfYear" # Example usage dates = pd.date_range(start='2023-01-01', periods=55, freq='W-MON') sin_cos_week = SinCosWeekOfYear() features = sin_cos_week(dates) features.tail() ``` | | week\_sin | week\_cos | | ---------: | --------: | --------: | | 2023-12-18 | -0.3482 | 0.9374 | | 2023-12-25 | -0.2349 | 0.9720 | | 2024-01-01 | 0.0000 | 1.0000 | | 2024-01-08 | 0.1183 | 0.9930 | | 2024-01-15 | 0.2349 | 0.9720 | As we can see above, because of the cyclical encoding of the datetime feature, the encoded values (`week_sin` and `week_cos`) for week 2023-12-25 (week 52) is very close to 2024-01-01 (week 1). This will ensure that the learned features for week 52 will be close to those for week 1. This has also helped us get the feature cardinality down from 53 (in case of one hot encoding) to only 2 features. In our example, we have the hour feature wich has a relatively high cardinality after one hot encoding. Let's encode this with sine and cosine features and use this instead of the one hot encoding. ```python theme={null} class SinCosHourOfDay: """ Adds sine and cosine features for each hour of the day. This is useful for models that can benefit from understanding the periodicity of hours in a day. """ def __call__(self, dates: pd.DatetimeIndex): df = pd.DataFrame(index=dates) # Get hour of day (0 to 23) hours = np.array([date.hour for date in dates]) # Calculate sine and cosine features df["hour_sin"] = np.sin((2 * np.pi) * (hours) / 24).round(4) df["hour_cos"] = np.cos((2 * np.pi) * (hours) / 24).round(4) return df def __name__(self): return "SinCosHourOfDay" # Example usage dates = pd.date_range(start='2023-01-01 00:00', periods=26, freq='h') sin_cos_hour = SinCosHourOfDay() features = sin_cos_hour(dates) features.tail() ``` | | hour\_sin | hour\_cos | | ------------------: | --------: | --------: | | 2023-01-01 21:00:00 | -0.7071 | 0.7071 | | 2023-01-01 22:00:00 | -0.5000 | 0.8660 | | 2023-01-01 23:00:00 | -0.2588 | 0.9659 | | 2023-01-02 00:00:00 | 0.0000 | 1.0000 | | 2023-01-02 01:00:00 | 0.2588 | 0.9659 | In order to use this custom datetime feature, we can simply pass an instance of the class to the `date_features` argument. Since this is alreay encoded, we do not need to include it in the `date_features_to_one_hot` argument. ```python theme={null} fcst_timegpt_dt_custom = nixtla_client.forecast( df = df_train, h=24*10, model="timegpt-1-long-horizon", date_features=[SinCosHourOfDay(), 'dayofweek'], date_features_to_one_hot=['dayofweek'], feature_contributions=True ) ``` ```python theme={null} shap_df = nixtla_client.feature_contributions shap_df.head() ``` | | unique\_id | ds | TimeGPT | hour\_sin | hour\_cos | dayofweek\_0 | dayofweek\_1 | dayofweek\_2 | dayofweek\_3 | dayofweek\_4 | dayofweek\_5 | dayofweek\_6 | base\_value | | -: | ---------: | ------------------: | --------: | --------: | ---------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | ----------: | | 0 | DE | 2017-12-21 00:00:00 | 35.801600 | -3.609636 | -9.003666 | 0.805974 | -0.424078 | -0.343238 | -0.428668 | -0.055370 | 1.462214 | 3.295479 | 44.102590 | | 1 | DE | 2017-12-21 01:00:00 | 34.419390 | -3.824628 | -10.493365 | 0.714771 | -0.400898 | -0.282606 | -0.331269 | -0.115753 | 1.539153 | 3.245723 | 44.368263 | | 2 | DE | 2017-12-21 02:00:00 | 32.892105 | -4.959243 | -10.772224 | 0.712402 | -0.439891 | -0.261654 | -0.207954 | -0.191223 | 1.481960 | 3.206257 | 44.323673 | | 3 | DE | 2017-12-21 03:00:00 | 32.727295 | -5.161374 | -10.812295 | 0.771099 | -0.417504 | -0.262543 | -0.146066 | -0.258350 | 1.578070 | 3.268950 | 44.167310 | | 4 | DE | 2017-12-21 04:00:00 | 34.121994 | -3.687167 | -11.353230 | 0.846524 | -0.387008 | -0.278475 | -0.169525 | -0.255498 | 1.788180 | 3.362950 | 44.255240 | As we can see above, the hour has now gotten encoded using the sine and cosine features instead of the one hot encoding. ```python theme={null} fcst_timegpt_dt_custom.rename(columns={"TimeGPT": "fcst_timegpt_dt_custom"}, inplace=True) ``` ### Step 4: Compare Results #### Visual Comparison Let's compare the results visually first. For this, we will merge all the forecasts together. This is why we had renamed the forecast columns above so that we can distinguish the forecasts generated by the different methods. ```python theme={null} all_fcst = ( fcst_timegpt_no_dt .merge(fcst_timegpt_dt, on=['unique_id', 'ds']) .merge(fcst_timegpt_dt_custom, on=['unique_id', 'ds']) ) all_fcst.head() ``` | | unique\_id | ds | TimeGPT\_no\_dt | fcst\_timegpt\_dt | fcst\_timegpt\_dt\_custom | | -: | ---------: | ------------------: | --------------: | ----------------: | ------------------------: | | 0 | DE | 2017-12-21 00:00:00 | 34.340740 | 35.248108 | 35.801600 | | 1 | DE | 2017-12-21 01:00:00 | 34.376488 | 34.400800 | 34.419390 | | 2 | DE | 2017-12-21 02:00:00 | 32.215570 | 33.175526 | 32.892105 | | 3 | DE | 2017-12-21 03:00:00 | 34.485695 | 33.205390 | 32.727295 | | 4 | DE | 2017-12-21 04:00:00 | 34.359673 | 34.689583 | 34.121994 | ```python theme={null} nixtla_client.plot(df_sub, all_fcst) ``` Train Test Split Visually looking at the results shows that the forecast with the datetime features is closer to the actuals as compared to the forecast without the datetime features. #### Metric Comparison Next, let's compare the forecast with the actual data quantitatively. We will use two common metrics - `MAE` and `RMSE` for this purpose. ```python theme={null} all_fcst_with_actuals = ( df_test[["unique_id", "ds", "y"]] .merge(all_fcst, on=['unique_id', 'ds']) ) all_fcst_with_actuals.head() ``` | | unique\_id | ds | y | TimeGPT\_no\_dt | fcst\_timegpt\_dt | fcst\_timegpt\_dt\_custom | | -: | ---------: | ------------------: | ----: | --------------: | ----------------: | ------------------------: | | 0 | DE | 2017-12-21 00:00:00 | 33.09 | 34.340740 | 35.248108 | 35.801600 | | 1 | DE | 2017-12-21 01:00:00 | 35.26 | 34.376488 | 34.400800 | 34.419390 | | 2 | DE | 2017-12-21 02:00:00 | 31.88 | 32.215570 | 33.175526 | 32.892105 | | 3 | DE | 2017-12-21 03:00:00 | 33.04 | 34.485695 | 33.205390 | 32.727295 | | 4 | DE | 2017-12-21 04:00:00 | 33.60 | 34.359673 | 34.689583 | 34.121994 | ```python theme={null} metrics = [mae, rmse] evaluation = evaluate( all_fcst_with_actuals, metrics=metrics, ) evaluation ``` | | unique\_id | metric | TimeGPT\_no\_dt | fcst\_timegpt\_dt | fcst\_timegpt\_dt\_custom | | -: | ---------: | -----: | --------------: | ----------------: | ------------------------: | | 0 | DE | mae | 27.527012 | 21.644545 | 21.139603 | | 1 | DE | rmse | 33.478168 | 28.099654 | 27.616988 | As we can see, the addition of the datetime features improved the forecasting metrics compared to the baseline model created without these features. ## Conclusion As demonstrated in this tutorial 1. Providing datetime features to the model during forecasting can improve the metrics substantially. 2. However, users must be careful of the cardinality of the features after datetime features have been added. If the feature cardinality is too large for the dataset, it may lead to overfitting. 3. In case of high cardinality, users may consider a custom encoding approach as demonstrated. # Holidays & Special Dates Source: https://nixtla.io/docs/forecasting/exogenous-variables/holiday_and_special_dates Guide to using holiday calendar variables and special dates to improve forecast accuracy in time series. ## What Are Holiday Variables and Special Dates? Special dates, such as holidays, promotions, or significant events, often cause notable deviations from normal patterns in your time series. By incorporating these special dates into your forecasting model, you can better capture these expected variations and improve prediction accuracy. ## How to Add Holiday Variables and Special Dates [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/02_holidays.ipynb) ### Step 1: Import Packages Import the required libraries and initialize the Nixtla client. ```python theme={null} import pandas as pd from nixtla import NixtlaClient ``` ```python theme={null} nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load Data We use a Google Trends dataset on "chocolate" with monthly frequency: ```python theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/google_trend_chocolate.csv') df['month'] = pd.to_datetime(df['month']).dt.to_period('M').dt.to_timestamp('M') df.head() ``` | | month | chocolate | | - | ---------- | --------- | | 0 | 2004-01-31 | 35 | | 1 | 2004-02-29 | 45 | | 2 | 2004-03-31 | 28 | | 3 | 2004-04-30 | 30 | | 4 | 2004-05-31 | 29 | ### Step 3: Create a Future Dataframe When adding exogenous variables (like holidays) to time series forecasting, we need a future DataFrame because: * Historical data already exists: Our training data contains past values of both the target variable and exogenous features * Future exogenous features are known: Unlike the target variable, we can determine future values of exogenous features (like holidays) in advance For example, we know that Christmas will occur on December 25th next year, so we can include this information in our future DataFrame to help the model understand seasonal patterns during the forecast period. Start with creating a future DataFrame with 14 months of dates starting from May 2024. ```python theme={null} # Create future Dataframe for adding US holidays start_date = "2024-05" dates = pd.date_range(start=start_date, periods=14, freq="ME") dates = dates.to_period("M").to_timestamp("M") future_df = pd.DataFrame(dates, columns=["month"]) future_df.tail() ``` | | month | | -- | ------------------- | | 9 | 2025-02-28 00:00:00 | | 10 | 2025-03-31 00:00:00 | | 11 | 2025-04-30 00:00:00 | | 12 | 2025-05-31 00:00:00 | | 13 | 2025-06-30 00:00:00 | ### Step 4: Forecast with Holidays and Special Dates TimeGPT automatically generates standard date-based features (like month, day of week, etc.) during forecasting. For more specialized temporal patterns, you can manually add holiday indicators to both your historical and future datasets. #### Create a Function to Add Date Features To make it easier to add date features to a DataFrame, we'll create the `add_date_features_to_DataFrame` function that takes: * A pandas DataFrame * A date extractor function, which can be `CountryHolidays` or `SpecialDates` * A time column name ```python theme={null} def add_date_features_to_dataframe(df, date_extractor, time_col="month", freq="ME"): # Create a copy of the DataFrame df = df.copy() # Ensure time column is datetime datetime_types = ["datetime64[ns]", "datetime64[us]", "datetime64[ms]"] if df[time_col].dtype.name not in datetime_types: raise ValueError( f"Column '{time_col}' must be datetime type, got {df[time_col].dtype}" ) # Create date range dates_range = pd.date_range( start=df[time_col].min(), end=df[time_col].max(), freq="D" ) # Get date feature indicators and resample to specified frequency features_df = date_extractor(dates_range) features = features_df.resample(freq).max() features = features.reset_index(names=time_col) # Merge with input DataFrame result_df = df.merge(features) return result_df ``` #### Add Holiday Features To add holiday features, we'll use the `CountryHolidays` class to compute US holidays and merge them into the future DataFrame. ```python theme={null} from nixtla.date_features import CountryHolidays us_holidays = CountryHolidays(countries=["US"]) future_df_holidays = add_date_features_to_DataFrame(future_df, us_holidays) print(f"Future DataFrame shape: {future_df_holidays.shape}") future_df_holidays.head() ``` | | month | US\_New Year's Day | US\_Memorial Day | US\_Juneteenth National Independence Day | US\_Independence Day | US\_Labor Day | US\_Veterans Day | US\_Thanksgiving Day | US\_Christmas Day | US\_Martin Luther King Jr. Day | US\_Washington's Birthday | US\_Columbus Day | | -: | :------------------ | -----------------: | ---------------: | ---------------------------------------: | -------------------: | ------------: | ---------------: | -------------------: | ----------------: | -----------------------------: | ------------------------: | ---------------: | | 0 | 2024-05-31 00:00:00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 2024-06-30 00:00:00 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 2 | 2024-07-31 00:00:00 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 3 | 2024-08-31 00:00:00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 4 | 2024-09-30 00:00:00 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | This DataFrame now includes columns for each identified US holiday as binary indicators. Next, add holiday indicators to the historical DataFrame. ```python theme={null} df_with_holidays = add_date_features_to_DataFrame(df, us_holidays) df_with_holidays.tail() ``` | | month | chocolate | US\_New Year's Day | US\_New Year's Day (observed) | US\_Memorial Day | US\_Independence Day | US\_Independence Day (observed) | US\_Labor Day | US\_Veterans Day | US\_Thanksgiving Day | US\_Christmas Day | US\_Christmas Day (observed) | US\_Martin Luther King Jr. Day | US\_Washington's Birthday | US\_Columbus Day | US\_Veterans Day (observed) | US\_Juneteenth National Independence Day | US\_Juneteenth National Independence Day (observed) | | --: | :------------------ | --------: | -----------------: | ----------------------------: | ---------------: | -------------------: | ------------------------------: | ------------: | ---------------: | -------------------: | ----------------: | ---------------------------: | -----------------------------: | ------------------------: | ---------------: | --------------------------: | ---------------------------------------: | --------------------------------------------------: | | 239 | 2023-12-31 00:00:00 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 240 | 2024-01-31 00:00:00 | 64 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | | 241 | 2024-02-29 00:00:00 | 66 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | | 242 | 2024-03-31 00:00:00 | 59 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 243 | 2024-04-30 00:00:00 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Now, your historical DataFrame also contains holiday flags for each month. Finally, forecast with the holiday features. ```python theme={null} fcst_df_holidays = nixtla_client.forecast( df=df_with_holidays, h=14, freq="ME", time_col="month", target_col="chocolate", X_df=future_df_holidays, model="timegpt-1-long-horizon", hist_exog_list=[ "US_New Year's Day (observed)", "US_Independence Day (observed)", "US_Christmas Day (observed)", "US_Veterans Day (observed)", "US_Juneteenth National Independence Day (observed)", ], feature_contributions=True, # for shapley values ) fcst_df_holidays.head() ``` Plot the forecast with holiday effects. ```python theme={null} nixtla_client.plot( df_with_holidays, fcst_df_holidays, time_col='month', target_col='chocolate', ) ``` ![Forecast plot including holiday effects](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/02_holidays_files/figure-markdown_strict/cell-16-output-1.png) We can then plot the weights of each holiday to see which are more important in forecasting the interest in chocolate. We will use the [SHAP library](https://shap.readthedocs.io/en/latest/) to plot the weights. > For more details on how to use the shap library, see our [tutorial on model interpretability](/docs/forecasting/exogenous-variables/interpretability_with_shap). ```python theme={null} import shap import matplotlib.pyplot as plt def plot_shap_values(ds_column, title): shap_df = nixtla_client.feature_contributions shap_columns = shap_df.columns.difference( ["unique_id", ds_column, "TimeGPT", "base_value"] ) shap_obj = shap.Explanation( values=shap_df[shap_columns].values, base_values=shap_df["base_value"].values, feature_names=shap_columns, ) shap.plots.bar(shap_obj, max_display=len(shap_columns), show=False) plt.title(title) plt.show() plot_shap_values(ds_column="month", title="SHAP values for holidays") ``` Holiday-related feature weights The SHAP values reveal that Christmas, Independence Day, and Labor Day have the strongest influence on chocolate interest forecasting. These holidays show the highest feature importance weights, indicating they significantly impact consumer behavior patterns. This aligns with expectations since these are major US holidays associated with gift-giving, celebrations, and seasonal consumption patterns that drive chocolate sales. #### Add Special Dates Beyond country holidays, you can create custom special dates with `SpecialDates`. These can represent unique one-time events or recurring patterns on specific dates of your choice. Assume we already have a future DataFrame with monthly dates. We'll create Valentine's Day and Halloween as custom special dates and add them to the future DataFrame. ```python theme={null} from nixtla.date_features import SpecialDates # Generate special dates programmatically for the full data range (2004-2025) valentine_dates = [f"{year}-02-14" for year in range(2004, 2026)] halloween_dates = [f"{year}-10-31" for year in range(2004, 2026)] # Define custom special dates - chocolate-related seasonal events special_dates = SpecialDates( special_dates={ "Valentine_season": valentine_dates, "Halloween_season": halloween_dates, } ) # Apply special dates to future data future_df_special = add_date_features_to_DataFrame(future_df, special_dates) future_df_special.head() ``` | | month | Valentine\_season | Halloween\_season | | -: | :------------------ | ----------------: | ----------------: | | 0 | 2024-05-31 00:00:00 | 0 | 0 | | 1 | 2024-06-30 00:00:00 | 0 | 0 | | 2 | 2024-07-31 00:00:00 | 0 | 0 | | 3 | 2024-08-31 00:00:00 | 0 | 0 | | 4 | 2024-09-30 00:00:00 | 0 | 0 | We will also add custom special dates to the historical DataFrame. ```python theme={null} # Apply special dates to historical data as well df_special = add_date_features_to_DataFrame(df, special_dates) df_special.tail() ``` | | month | chocolate | Valentine\_season | Halloween\_season | | --: | :------------------ | --------: | ----------------: | ----------------: | | 239 | 2023-12-31 00:00:00 | 90 | 0 | 0 | | 240 | 2024-01-31 00:00:00 | 64 | 0 | 0 | | 241 | 2024-02-29 00:00:00 | 66 | 1 | 0 | | 242 | 2024-03-31 00:00:00 | 59 | 0 | 0 | | 243 | 2024-04-30 00:00:00 | 51 | 0 | 0 | Now, forecast with the special date features. ```python theme={null} fcst_df_special = nixtla_client.forecast( df=df_special, h=14, freq="M", time_col="month", target_col="chocolate", X_df=future_df_special, model="timegpt-1-long-horizon", feature_contributions=True, ) ``` Plot the forecast with special date effects. ```python theme={null} nixtla_client.plot( df_special, fcst_df_special, time_col='month', target_col='chocolate', ) ``` Forecast plot including special date effects Examine the feature importance of the special dates. ```python theme={null} plot_shap_values(ds_column="month", title="SHAP values for special dates") ``` Special date feature weights The SHAP values reveal that Valentine's Day has the strongest positive impact on chocolate sales forecasts. This aligns with consumer behavior patterns, as chocolate is a popular gift choice during Valentine's Day celebrations. Congratulations! You have successfully integrated holiday and special date features into your time series forecasts. Use these steps as a starting point for further experimentation with advanced date features. # Model Interpretability Source: https://nixtla.io/docs/forecasting/exogenous-variables/interpretability_with_shap Learn how to interpret model predictions using SHAP values to understand the impact of exogenous variables. ## What Are SHAP Values? SHAP (SHapley Additive exPlanation) values use game theory concepts to explain how each feature influences machine learning forecasts. They're particularly useful when working with exogenous (external) variables, letting you understand contributions both at individual prediction steps and across entire forecast horizons. SHAP values can be seamlessly combined with visualization methods from the [SHAP](https://shap.readthedocs.io/en/latest/) Python package for powerful plots and insights. Before proceeding, make sure you understand forecasting with exogenous features. For reference, see our [tutorial on exogenous variables](/docs/forecasting/exogenous-variables/numeric_features). ## How to Use SHAP Values for TimeGPT [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/21_shap_values.ipynb) ## Install SHAP Install the SHAP library. ```bash theme={null} pip install shap ``` For more details, visit the [official SHAP documentation](https://shap.readthedocs.io/en/latest/). ### Step 1: Import Packages Import the necessary libraries and initialize the Nixtla client. ```python theme={null} import pandas as pd from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # Or use os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Load Data We'll use exogenous variables (covariates) to enhance electricity market forecasting accuracy. The widely known EPF dataset is available at [this link](https://zenodo.org/records/4624805). It contains hourly prices and relevant exogenous factors for five different electricity markets. For this tutorial, we'll focus on the Belgian electricity market (BE). The data includes: * Hourly prices (y) * Forecasts for load (Exogenous1) and generation (Exogenous2) * Day-of-week indicators (one-hot encoded) If your data relies on factors such as weather, holiday calendars, marketing, or other elements, ensure they're similarly structured. ```python theme={null} market = "BE" df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv' ) df.head() ``` | unique\_id | ds | y | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 | | ---------- | ------------------- | ----- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | BE | 2016-10-22 00:00:00 | 70.00 | 57253.0 | 49593.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 01:00:00 | 37.10 | 51887.0 | 46073.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 02:00:00 | 37.10 | 51896.0 | 44927.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 03:00:00 | 44.75 | 48428.0 | 44483.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 04:00:00 | 37.10 | 46721.0 | 44338.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ### Step 3: Forecast with Exogenous Variables To make forecasts with exogenous variables, you must have future data for these variables available at the time of prediction. Before generating forecasts, ensure you have (or can generate) future exogenous values. Below, we load future exogenous features to obtain 24-step-ahead predictions: ```python theme={null} future_ex_vars_df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv' ) future_ex_vars_df.head() ``` | unique\_id | ds | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 | | ---------- | ------------------- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | BE | 2016-12-31 00:00:00 | 70318.0 | 64108.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 01:00:00 | 67898.0 | 62492.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 02:00:00 | 68379.0 | 61571.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 03:00:00 | 64972.0 | 60381.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 04:00:00 | 62900.0 | 60298.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | Next, create forecasts using the Nixtla API: ```python theme={null} timegpt_fcst_ex_vars_df = nixtla_client.forecast( df=df, X_df=future_ex_vars_df, h=24, level=[80, 90], feature_contributions=True ) timegpt_fcst_ex_vars_df.head() ``` | unique\_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 | | ---------- | ------------------- | --------- | ------------- | ------------- | ------------- | ------------- | | BE | 2016-12-31 00:00:00 | 51.632830 | 61.598820 | 66.088295 | 41.666843 | 37.177372 | | BE | 2016-12-31 01:00:00 | 45.750877 | 54.611988 | 60.176445 | 36.889767 | 31.325312 | | BE | 2016-12-31 02:00:00 | 39.650543 | 46.256210 | 52.842808 | 33.044876 | 26.458277 | | BE | 2016-12-31 03:00:00 | 34.000072 | 44.015310 | 47.429000 | 23.984835 | 20.571144 | | BE | 2016-12-31 04:00:00 | 33.785370 | 43.140503 | 48.581240 | 24.430239 | 18.989498 | ### Step 4: Extract SHAP Values After forecasting, you can retrieve SHAP values to see how each feature contributed to the model's predictions. ```python theme={null} shap_df = nixtla_client.feature_contributions shap_df = shap_df.query("unique_id == @market") shap_df.head() ``` ### Step 5: Visualization with SHAP Visualizing SHAP values helps interpret the impact of exogenous features in detail. Below, we demonstrate three common SHAP plots. #### Bar Plot Use a bar plot to see the average impact of each feature across predictions: ```python theme={null} import shap import matplotlib.pyplot as plt shap_columns = shap_df.columns.difference(['unique_id', 'ds', 'TimeGPT', 'base_value']) shap_obj = shap.Explanation( values=shap_df[shap_columns].values, base_values=shap_df['base_value'].values, feature_names=shap_columns ) shap.plots.bar( shap_obj, max_display=len(shap_columns), show=False ) plt.title(f'SHAP values for {market}') plt.show() ``` ![Bar Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-14-output-1.png) #### Waterfall Plot A waterfall plot shows how each feature contributes to a single prediction step. Here, we select the earliest date for illustration: ```python theme={null} selected_ds = shap_df['ds'].min() filtered_df = shap_df[shap_df['ds'] == selected_ds] shap_obj = shap.Explanation( values=filtered_df[shap_columns].values.flatten(), base_values=filtered_df['base_value'].values[0], feature_names=shap_columns ) shap.plots.waterfall(shap_obj, show=False) plt.title(f'Waterfall Plot: {market}, date: {selected_ds}') plt.show() ``` ![Waterfall Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-15-output-1.png) #### Heatmap Visualize how feature impacts vary across each forecast step. This often reveals time-dependent effects of certain variables. ```python theme={null} shap_obj = shap.Explanation( values=shap_df[shap_columns].values, feature_names=shap_columns ) shap.plots.heatmap(shap_obj, show=False) plt.title(f'SHAP Heatmap (Unique ID: {market})') plt.show() ``` ![SHAP Heatmap](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/21_shap_values_files/figure-markdown_strict/cell-16-output-1.png) # Numeric Variables Source: https://nixtla.io/docs/forecasting/exogenous-variables/numeric_features Learn how to incorporate external numeric variables to improve your forecasting accuracy. ## What Are Exogenous Variables? Exogenous variables or external factors are crucial in time series forecasting as they provide additional information that might influence the prediction. These variables could include holiday markers, marketing spending, weather data, or any other external data that correlate with the time series data you are forecasting. For example, if you're forecasting ice cream sales, temperature data could serve as a useful exogenous variable. On hotter days, ice cream sales may increase. ## How to Use Exogenous Variables [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/01_exogenous_variables_reworked.ipynb) To incorporate exogenous variables in TimeGPT, you'll need to pair each point in your time series data with the corresponding external data. ### Step 1: Import Packages Import the required libraries and initialize the Nixtla client. ```python theme={null} import pandas as pd from nixtla import NixtlaClient nixtla_client = NixtlaClient( # defaults to os.environ.get("NIXTLA_API_KEY") api_key="my_api_key_provided_by_nixtla" ) ``` ### Step 2: Load Dataset In this tutorial, we'll predict day-ahead electricity prices. The dataset contains: * Hourly electricity prices (`y`) from various markets (identified by `unique_id`) * Exogenous variables (`Exogenous1` to `day_6`) ```python theme={null} df = pd.read_csv("https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv") df.head() ``` | unique\_id | ds | y | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 | | ---------- | ------------------- | ----- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | BE | 2016-10-22 00:00:00 | 70.00 | 57253.0 | 49593.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 01:00:00 | 37.10 | 51887.0 | 46073.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 02:00:00 | 37.10 | 51896.0 | 44927.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 03:00:00 | 44.75 | 48428.0 | 44483.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-10-22 04:00:00 | 37.10 | 46721.0 | 44338.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | ### Step 3: Forecast without Exogenous Variables First, let's create a baseline forecast without using any exogenous variables. ```python theme={null} timegpt_fcst_no_ex_vars = nixtla_client.forecast( df=df[["unique_id", "ds", "y"]], h=24, level=[80, 90] ) ``` ### Step 4: Forecasting with Exogenous Variables Next, let's create a forecast using the exogenous variables. To make a forecast using exogenous variables, you need to provide historical and future exogenous values. Below is an example dataset containing future exogenous variables. Note that it only contains the future exogenous variable values not the target variable `y`. We need to forecast this target variable using the exogenous variables provided. ```python theme={null} future_ex_vars_df = pd.read_csv("https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv") future_ex_vars_df.head() ``` | unique\_id | ds | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 | | ---------- | ------------------- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ | | BE | 2016-12-31 00:00:00 | 70318.0 | 64108.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 01:00:00 | 67898.0 | 62492.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 02:00:00 | 68379.0 | 61571.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 03:00:00 | 64972.0 | 60381.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | | BE | 2016-12-31 04:00:00 | 62900.0 | 60298.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | Ensure you maintain consistent data formatting and columns in both historical and future exogenous datasets (e.g., dates, unique\_id, variable names). ```python theme={null} timegpt_fcst_ex_vars = nixtla_client.forecast( df=df, X_df=future_ex_vars_df, h=24, level=[80, 90] ) ``` ### Step 5: Forecast Visualization Once you have generated your forecasts, you can visualize the results to compare forecasts between the two methods above. ```python theme={null} timegpt_fcst_no_ex_vars.rename(columns={"TimeGPT": "TimeGPT_no_ex_vars"}, inplace=True) timegpt_fcst_ex_vars.rename(columns={"TimeGPT": "TimeGPT_ex_vars"}, inplace=True) all_forecasts = ( timegpt_fcst_no_ex_vars .merge( timegpt_fcst_ex_vars, how='outer', on=["unique_id", "ds"] ) ) ``` ```python theme={null} nixtla_client.plot( df[["unique_id", "ds", "y"]], all_forecasts, max_insample_length=1000, ) ``` Forecast chart ## Key Takeaways * Exogenous variables enrich time series forecasting. * Ensure proper alignment of historical and future exogenous data. ## Next Steps Congratulations! You have mastered the fundamentals of adding exogenous variables to your TimeGPT forecasts. Keep refining your approach by * Exploring feature engineering to create domain-specific exogenous data. * Experimenting with different modeling approaches for external variables. * Validating forecast accuracy by comparing with real future data. # Fine-tuning with a Specific Loss Function Source: https://nixtla.io/docs/forecasting/fine-tuning/custom_loss Learn how to fine-tune a model using specific loss functions, configure the Nixtla client, and evaluate performance improvements. ## Fine-tuning with a Specific Loss Function When you fine-tune, the model trains on your dataset to tailor predictions to your specific dataset. You can specify the loss function to be used during fine-tuning using the `finetune_loss` argument. Below are the available loss functions: * `"default"`: A proprietary function robust to outliers. * `"mae"`: Mean Absolute Error * `"mse"`: Mean Squared Error * `"rmse"`: Root Mean Squared Error * `"mape"`: Mean Absolute Percentage Error * `"smape"`: Symmetric Mean Absolute Percentage Error ## How to Fine-tune with a Specific Loss Function [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/07_loss_function_finetuning.ipynb) ### Step 1: Import Packages and Initialize Client First, we import the required packages and initialize the Nixtla client. ```python theme={null} import pandas as pd from nixtla import NixtlaClient from utilsforecast.losses import mae, mse, rmse, mape, smape ``` ```python theme={null} nixtla_client = NixtlaClient( # defaults to os.environ.get("NIXTLA_API_KEY") api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load Data Load your data and prepare it for fine-tuning. Here, we will demonstrate using an example dataset of air passenger counts. ```python theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv') df.insert(loc=0, column='unique_id', value=1) df.head() ``` | | unique\_id | timestamp | value | | - | ---------- | ---------- | ----- | | 0 | 1 | 1949-01-01 | 112 | | 1 | 1 | 1949-02-01 | 118 | | 2 | 1 | 1949-03-01 | 132 | | 3 | 1 | 1949-04-01 | 129 | | 4 | 1 | 1949-05-01 | 121 | ### Step 3: Fine-Tune the Model Let's fine-tune the model on a dataset using the mean absolute error (MAE). For that, we simply pass the appropriate string representing the loss function to the `finetune_loss` parameter of the `forecast` method. ```python theme={null} timegpt_fcst_finetune_mae_df = nixtla_client.forecast( df=df, h=12, finetune_steps=10, finetune_loss='mae', # Select desired loss function time_col='timestamp', target_col='value', ) ``` After training completes, you can visualize the forecast: ```python theme={null} nixtla_client.plot( df, timegpt_fcst_finetune_mae_df, time_col='timestamp', target_col='value', ) ``` ![Fine tuning with MAE](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/07_loss_function_finetuning_files/figure-markdown_strict/cell-12-output-1.png) ## Explanation of Loss Functions Now, depending on your data, you will use a specific error metric to accurately evaluate your forecasting model's performance. Below is a non-exhaustive guide on which metric to use depending on your use case. **Mean absolute error (MAE)** * Robust to outliers * Easy to understand * You care equally about all error sizes * Same units as your data **Mean squared error (MSE)** * You want to penalize large errors more than small ones * Sensitive to outliers * Used when large errors must be avoided * *Not* the same units as your data **Root mean squared error (RMSE)** * Brings the MSE back to original units of data * Penalizes large errors more than small ones **Mean absolute percentage error (MAPE)** * Easy to understand for non-technical stakeholders * Expressed as a percentage * Heavier penalty on positive errors over negative errors * To be avoided if your data has values close to 0 or equal to 0 **Symmetric mean absolute percentage error (sMAPE)** * Fixes bias of MAPE * Equally sensitive to over and under forecasting * To be avoided if your data has values close to 0 or equal to 0 With TimeGPT, you can choose your loss function during fine-tuning as to maximize the model's performance metric for your particular use case. ## Experimentation with Loss Function Let's run a small experiment to see how each loss function improves their associated metric when compared to the default setting. Let's split the dataset into training and testing sets. ```python theme={null} train = df[:-36] test = df[-36:] ``` Next, let's compute the forecasts with the various loss functions. ```python theme={null} losses = ['default', 'mae', 'mse', 'rmse', 'mape', 'smape'] test = test.copy() for loss in losses: preds_df = nixtla_client.forecast( df=train, h=36, finetune_steps=10, finetune_loss=loss, time_col='timestamp', target_col='value') preds = preds_df['TimeGPT'].values test.loc[:,f'TimeGPT_{loss}'] = preds ``` Great! We have predictions from TimeGPT using all the different loss functions. We can evaluate the performance using their associated metric and measure the improvement. ```python theme={null} loss_fct_dict = { "mae": mae, "mse": mse, "rmse": rmse, "mape": mape, "smape": smape } pct_improv = [] for loss in losses[1:]: evaluation = loss_fct_dict[f'{loss}'](test, models=['TimeGPT_default', f'TimeGPT_{loss}'], id_col='unique_id', target_col='value') pct_diff = (evaluation['TimeGPT_default'] - evaluation[f'TimeGPT_{loss}']) / evaluation['TimeGPT_default'] * 100 pct_improv.append(round(pct_diff, 2)) ``` ```python theme={null} data = { 'mae': pct_improv[0].values, 'mse': pct_improv[1].values, 'rmse': pct_improv[2].values, 'mape': pct_improv[3].values, 'smape': pct_improv[4].values } metrics_df = pd.DataFrame(data) metrics_df.index = ['Metric improvement (%)'] metrics_df ``` | | mae | mse | rmse | mape | smape | | ---------------------- | ---- | ---- | ---- | ----- | ----- | | Metric improvement (%) | 8.54 | 0.31 | 0.64 | 31.02 | 7.36 | From the table above, we can see that using a specific loss function during fine-tuning will improve its associated error metric when compared to the default loss function. In this example, using the MAE as the loss function improves the metric by 8.54% when compared to using the default loss function. That way, depending on your use case and performance metric, you can use the appropriate loss function to maximize the accuracy of the forecasts. # Controlling the Level of Fine-Tuning Source: https://nixtla.io/docs/forecasting/fine-tuning/depth Learn how to use the finetune_depth parameter to control the extent of fine-tuning in TimeGPT models. ## Controlling the Level of Fine-Tuning It is possible to control the depth of fine-tuning with the `finetune_depth` parameter. `finetune_depth` takes values among `[1, 2, 3, 4, 5]`. By default, it is set to 1, which means that a small set of the model's parameters are being adjusted, whereas a value of 5 fine-tunes the maximum amount of parameters. Increasing `finetune_depth` also increases the time to generate predictions. While it can generate better results, we must be careful to not overfit the model, in which case the predictions may not be as accurate. Let's run a small experiment to see how `finetune_depth` impacts the performance. ## How to Control the Level of Fine-Tuning [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/23_finetune_depth_finetuning.ipynb) ### Step 1: Import Packages First, we import the required packages and initialize the Nixtla client ```python theme={null} import pandas as pd from nixtla import NixtlaClient from utilsforecast.losses import mae, mse from utilsforecast.evaluation import evaluate ``` ```python theme={null} nixtla_client = NixtlaClient( # defaults to os.environ.get("NIXTLA_API_KEY") api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load Data Next, load the dataset ```python theme={null} df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv' ) df.head() ``` Now, we split the data into a training and test set so that we can measure the performance of the model as we vary `finetune_depth`. ```python theme={null} train = df[:-24] test = df[-24:] ``` ### Step 3: Fine-Tuning With finetune\_depth As mentioned above, `finetune_depth` controls how many parameters from TimeGPT are fine-tuned on your particular dataset. If the value is set to 1, only a few parameters are fine-tuned. Setting it to 5 means that all parameters of the model will be fine-tuned. Using a large value for `finetune_depth` can lead to better performances for large datasets with complex patterns. However, it can also lead to overfitting, in which case the accuracy of the forecasts may degrade, as we will see from the small experiment below. ```python theme={null} depths = [1, 2, 3, 4, 5] test = test.copy() for depth in depths: preds_df = nixtla_client.forecast( df=train, h=24, finetune_steps=5, finetune_depth=depth, time_col='timestamp', target_col='value' ) preds = preds_df['TimeGPT'].values test.loc[:, f'TimeGPT_depth{depth}'] = preds ``` Evaluate the forecasts using MAE and MSE metrics: ```python theme={null} test['unique_id'] = 0 evaluation = evaluate( test, metrics=[mae, mse], time_col="timestamp", target_col="value" ) evaluation ``` | unique\_id | metric | TimeGPT\_depth1 | TimeGPT\_depth2 | TimeGPT\_depth3 | TimeGPT\_depth4 | TimeGPT\_depth5 | | ---------- | ------ | --------------- | --------------- | --------------- | --------------- | --------------- | | 0 | mae | 22.675540 | 17.908963 | 21.318518 | 24.745096 | 28.734302 | | 0 | mse | 677.254283 | 461.320852 | 676.202126 | 991.835359 | 1119.722602 | From the result above, we can see that a `finetune_depth` of 2 achieves the best results since it has the lowest MAE and MSE. Also notice that with a `finetune_depth` of 4 and 5, the performance degrades, which is a clear sign of overfitting. Thus, keep in mind that fine-tuning can be a bit of trial and error. You might need to adjust the number of `finetune_steps` and the level of `finetune_depth` based on your specific needs and the complexity of your data. Usually, a higher `finetune_depth` works better for large datasets. In this specific tutorial, since we were forecasting a single series with a very short dataset, increasing the depth led to overfitting. It's recommended to monitor the model's performance during fine-tuning and adjust as needed. Be aware that more `finetune_steps` and a larger value of `finetune_depth` may lead to longer training times and could potentially lead to overfitting if not managed properly. # Re-using fine-tuned models Source: https://nixtla.io/docs/forecasting/fine-tuning/save_reuse_delete_finetuned_models Learn how to save, fine-tune, list, and delete TimeGPT models to optimize forecasting. ## Re-using Fine-tuned Models Reusing previously fine-tuned TimeGPT models can help reduce computation time and costs while maintaining or improving forecast accuracy. This guide walks you through the steps to save, fine-tune, list, and delete your TimeGPT models effectively. ## How to Re-use Fine-tuned Models [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/061_reusing_finetuned_models.ipynb) ### Step 1: Import Packages First, we import the required packages and initialize the Nixtla client ```python theme={null} import pandas as pd from nixtla import NixtlaClient from utilsforecast.losses import rmse from utilsforecast.evaluation import evaluate ``` ```python theme={null} nixtla_client = NixtlaClient( # defaults to os.environ["NIXTLA_API_KEY"] api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load Data Load the forecasting dataset and prepare the train/validation split. ```python theme={null} df = pd.read_parquet('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.parquet') h = 48 valid = df.groupby('unique_id', observed=True).tail(h) train = df.drop(valid.index) train.head() ``` | | unique\_id | ds | y | | - | ---------- | -- | ----- | | 0 | H1 | 1 | 605.0 | | 1 | H1 | 2 | 586.0 | | 2 | H1 | 3 | 586.0 | | 3 | H1 | 4 | 559.0 | | 4 | H1 | 5 | 511.0 | ### Step 3: Zero-shot forecast We can try forecasting without any finetuning to see how well TimeGPT does. ```python theme={null} fcst_kwargs = { 'df': train, 'freq': 1, 'model': 'timegpt-1-long-horizon' } fcst = nixtla_client.forecast(h=h, **fcst_kwargs) zero_shot_eval = evaluate(fcst.merge(valid), metrics=[rmse], agg_fn='mean') zero_shot_eval ``` | metric | TimeGPT | | ------ | ----------- | | rmse | 1504.474342 | ### Step 4: Fine-tune the model We can now fine-tune TimeGPT a little and save our model for later use. We can define the ID that we want that model to have by providing it through `output_model_id`. This ID is also returned as the output of the `finetune` method. ```python theme={null} first_model_id = 'my-first-finetuned-model' nixtla_client.finetune(output_model_id=first_model_id, **fcst_kwargs) ``` ```bash theme={null} 'my-first-finetuned-model' ``` We can now forecast using this fine-tuned model by providing its ID through the `finetuned_model_id` argument. ```python theme={null} first_finetune_fcst = nixtla_client.forecast( h=h, finetuned_model_id=first_model_id, **fcst_kwargs ) first_finetune_eval = evaluate( first_finetune_fcst.merge(valid), metrics=[rmse], agg_fn='mean' ) zero_shot_eval.merge( first_finetune_eval, on=['metric'], suffixes=('_zero_shot', '_first_finetune') ) ``` | metric | TimeGPT\_zero\_shot | TimeGPT\_first\_finetune | | ------ | ------------------- | ------------------------ | | rmse | 1504.474342 | 1472.024619 | We can see the error was reduced. ### Step 5: Further fine-tune the model We can now take this model and fine-tune it a bit further by using the `NixtlaClient.finetune` method but providing our already fine-tuned model as `finetuned_model_id`, which will take that model and fine-tune it a bit more. We can also change the fine-tuning settings, like using `finetune_depth=3`, for example. As before, the new finetuned model ID is returned by the `finetune` method. ```python theme={null} second_model_id = nixtla_client.finetune( finetuned_model_id=first_model_id, finetune_depth=3, **fcst_kwargs ) second_model_id ``` ```bash theme={null} '468b13fb-4b26-447a-bd87-87a64b50d913' ``` Since we didn't provide `output_model_id` this time, it got assigned an UUID. We can now use this model to forecast. ```python theme={null} second_finetune_fcst = nixtla_client.forecast( h=h, finetuned_model_id=second_model_id, **fcst_kwargs ) second_finetune_eval = evaluate( second_finetune_fcst.merge(valid), metrics=[rmse], agg_fn='mean' ) first_finetune_eval.merge( second_finetune_eval, on=['metric'], suffixes=('_first_finetune', '_second_finetune') ) ``` | metric | TimeGPT\_first\_finetune | TimeGPT\_second\_finetune | | ------ | ------------------------ | ------------------------- | | rmse | 1472.024619 | 1435.365211 | We can see the error was reduced a bit more. ### Step 6: List fine-tuned models We can list our fine-tuned models with the `NixtlaClient.finetuned_models` method. ```python theme={null} finetuned_models = nixtla_client.finetuned_models() finetuned_models ``` ```bash theme={null} [FinetunedModel(id='468b13fb-4b26-447a-bd87-87a64b50d913', created_at=datetime.datetime(2024, 12, 30, 17, 57, 31, 241455, tzinfo=TzInfo(UTC)), created_by='user', base_model_id='my-first-finetuned-model', steps=10, depth=3, loss='default', model='timegpt-1-long-horizon', freq='MS'), FinetunedModel(id='my-first-finetuned-model', created_at=datetime.datetime(2024, 12, 30, 17, 57, 16, 978907, tzinfo=TzInfo(UTC)), created_by='user', base_model_id='None', steps=10, depth=1, loss='default', model='timegpt-1-long-horizon', freq='MS')] ``` While that representation may be useful for programmatic use, in this exploratory setting it's nicer to see them as a dataframe, which we can get by providing `as_df=True`. ```python theme={null} nixtla_client.finetuned_models(as_df=True) ``` | id | created\_at | created\_by | base\_model\_id | steps | depth | loss | model | freq | | ------------------------------------ | -------------------------------- | ----------- | ------------------------ | ----- | ----- | ------- | ---------------------- | ---- | | 468b13fb-4b26-447a-bd87-87a64b50d913 | 2024-12-30 17:57:31.241455+00:00 | user | my-first-finetuned-model | 10 | 3 | default | timegpt-1-long-horizon | MS | | my-first-finetuned-model | 2024-12-30 17:57:16.978907+00:00 | user | None | 10 | 1 | default | timegpt-1-long-horizon | MS | We can see that the `base_model_id` of our second model is our first model, along with other metadata. ### Step 7: Delete fine-tuned models In order to keep things organized, and since there's a limit of 50 fine-tuned models, you can delete models that weren't so promising to make room for more experiments. For example, we can delete our first finetuned model. Note that even though it was used as the base for our second model, they're saved independently so removing it won't affect our second model, except for the dangling metadata. ```python theme={null} nixtla_client.delete_finetuned_model(first_model_id) nixtla_client.finetuned_models(as_df=True) ``` | id | created\_at | created\_by | base\_model\_id | steps | depth | loss | model | freq | | ------------------------------------ | -------------------------------- | ----------- | ------------------------ | ----- | ----- | ------- | ---------------------- | ---- | | 468b13fb-4b26-447a-bd87-87a64b50d913 | 2024-12-30 17:57:31.241455+00:00 | user | my-first-finetuned-model | 10 | 3 | default | timegpt-1-long-horizon | MS | > WARNING: Deleting a fine-tuned model is irreversible. Make sure to back up any > necessary information before removal. ## Conclusion Congratulations! You have successfully learned how to save, refine, and manage your fine-tuned TimeGPT models. This workflow helps optimize your forecasting pipelines by leveraging previously generated insights. # Fine-tuning Tutorial TimeGPT Source: https://nixtla.io/docs/forecasting/fine-tuning/steps Adapt TimeGPT to your specific datasets for more accurate forecasts Fine-tuning is a powerful process for utilizing TimeGPT more effectively. Foundation models such as TimeGPT are pre-trained on vast amounts of data, capturing wide-ranging features and patterns. These models can then be specialized for specific contexts or domains. With fine-tuning, the model's parameters are refined to forecast a new task, allowing it to tailor its vast pre-existing knowledge towards the requirements of the new data. Fine-tuning thus serves as a crucial bridge, linking TimeGPT's broad capabilities to your tasks specificities. Concretely, the process of fine-tuning consists of performing a certain number of training iterations on your input data minimizing the forecasting error. The forecasts will then be produced with the updated model. To control the number of iterations, use the `finetune_steps` argument of the `forecast` method. ## Tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/06_finetuning.ipynb) ### Step 1: Import Packages and Initialize Client First, we import the required packages and initialize the Nixtla client. ```python theme={null} import pandas as pd from nixtla import NixtlaClient from utilsforecast.losses import mae, mse from utilsforecast.evaluation import evaluate ``` Next, initialize the NixtlaClient instance, providing your API key (or rely on environment variables): ```python initialize-client theme={null} nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Load Data Load the dataset from the provided CSV URL: ```python load-data theme={null} df = pd.read_csv( "https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv" ) df.head() ``` | | timestamp | value | | - | ---------- | ----- | | 0 | 1949-01-01 | 112 | | 1 | 1949-02-01 | 118 | | 2 | 1949-03-01 | 132 | | 3 | 1949-04-01 | 129 | | 4 | 1949-05-01 | 121 | ### Step 3: Fine-tune the Model Set the number of fine-tuning iterations with the **finetune\_steps** parameter. Here, `finetune_steps=10` means the model will go through 10 iterations of training on your time series data. ```python theme={null} timegpt_fcst_finetune_df = nixtla_client.forecast( df=df, h=12, finetune_steps=10, time_col='timestamp', target_col='value', ) ``` Visualize forecasts to confirm performance: ```python theme={null} nixtla_client.plot( df, timegpt_fcst_finetune_df, time_col='timestamp', target_col='value', ) ``` ![Forecast Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/06_finetuning_files/figure-markdown_strict/cell-12-output-1.png) ## Conclusion Keep in mind that fine-tuning can be a bit of trial and error. You might need to adjust the number of `finetune_steps` based on your specific needs and the complexity of your data. Usually, a larger value of `finetune_steps` works better for large datasets. It's recommended to monitor the model's performance during fine-tuning and adjust as needed. Be aware that more `finetune_steps` may lead to longer training times and could potentially lead to overfitting if not managed properly. Remember, fine-tuning is a powerful feature, but it should be used thoughtfully and carefully. ## Additional Resources * For a detailed guide on using a specific loss function for fine-tuning, check out the [Fine-tuning with a specific loss function](/docs/forecasting/fine-tuning/custom_loss) tutorial. * Also, read our detailed tutorial on [controlling the level of fine-tuning](/docs/forecasting/fine-tuning/depth) using `finetune_depth`. # Distributed Forecasting with Spark, Dask & Ray Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/computing_at_scale Scale your time series forecasting with TimeGPT using Spark, Dask, or Ray. Learn distributed computing for millions of time series with Python code examples and best practices. ## Distributed Computing for Large-Scale Forecasting Handling large datasets is a common challenge in time series forecasting. For example, when working with retail data, you may need to forecast sales for 100,000+ products across hundreds of stores—generating millions of forecasts daily. Similarly, when dealing with electricity consumption data, you may need to predict consumption for millions of smart meters across multiple regions in real-time. ### Why Distributed Computing for Forecasting? Distributed computing offers significant advantages for time series forecasting: * **Speed**: Reduce computation time by 10-100x compared to single-machine processing * **Scalability**: Handle datasets that don't fit in memory on a single machine * **Cost-efficiency**: Process more forecasts in less time, optimizing resource utilization * **Reliability**: Fault-tolerant processing ensures forecasts complete even if individual nodes fail Nixtla's **TimeGPT** enables you to efficiently handle expansive datasets by integrating distributed computing frameworks (**[Spark](https://spark.apache.org/)**, **[Dask](https://www.dask.org/)**, and **[Ray](https://www.ray.io/)** through **Fugue**) that parallelize forecasts across multiple time series and drastically reduce computation times. ## Getting Started Before getting started, ensure you have your TimeGPT API key. Upon [registration](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/forecasting-at-scale/computing_at_scale), you'll receive an email prompting you to confirm your signup. Once confirmed, access your dashboard and navigate to the **API Keys** section to retrieve your key. For detailed setup instructions, see the [Setting Up Your Authentication Key tutorial](/docs/setup/setting_up_your_api_key). ## How to Use TimeGPT with Distributed Computing Frameworks Using TimeGPT with distributed computing frameworks is straightforward. The process only slightly differs from non-distributed usage. ### Step 1: Instantiate a NixtlaClient class ```python theme={null} from nixtla import NixtlaClient # Replace 'YOUR_API_KEY' with the key obtained from your Nixtla dashboard client = NixtlaClient(api_key="YOUR_API_KEY") ``` ### Step 2: Load your data into a pandas DataFrame Make sure your data is properly formatted, with each time series uniquely identified (e.g., by store or product). ```python theme={null} import pandas as pd data = pd.read_csv("your_time_series_data.csv") ``` ### Step 3: Initialize a distributed computing framework Currently, TimeGPT supports: * [Spark](/docs/forecasting/forecasting-at-scale/spark) * [Dask](/docs/forecasting/forecasting-at-scale/dask) * [Ray](/docs/forecasting/forecasting-at-scale/ray) Follow the links above for examples on setting up each framework. ### Step 4: Use NixtlaClient methods to forecast at scale Once your framework is initialized and your data is loaded, you can apply the forecasting methods: ```python theme={null} # Example function call within the distributed environment forecast_results = client.forecast( data=data, h=14 # horizon (e.g., 14 days) ) ``` ### Step 5: Stop the distributed computing framework When you're finished, you may need to terminate your Spark, Dask, or Ray session. This depends on your environment and setup. Parallelization in these frameworks operates across multiple time series within your dataset. Ensure each series is uniquely identified so the parallelization can be fully leveraged. ## Real-World Use Cases Distributed forecasting with TimeGPT is essential for: * **Retail & E-commerce**: Forecast demand for 100,000+ SKUs across multiple locations simultaneously * **Energy & Utilities**: Predict consumption patterns for millions of smart meters in real-time * **Finance**: Generate forecasts for thousands of stocks, currencies, or commodities * **IoT & Manufacturing**: Process sensor data from thousands of devices for predictive maintenance * **Telecommunications**: Forecast network traffic across thousands of cell towers The distributed approach reduces forecast generation time from hours to minutes, enabling faster decision-making at scale. ## Important Considerations ### When to Use a Distributed Computing Framework Consider a distributed framework if your dataset: * Contains millions of observations across multiple time series * Cannot fit into memory on a single machine * Requires extensive processing time that is impractical on a single machine ### Choosing the Right Framework When selecting Spark, Dask, or Ray, weigh your existing infrastructure and your team's expertise. Minimal code changes allow TimeGPT to work with each of these frameworks seamlessly. Pick the framework that aligns with your organization's tools and resources for the most efficient large-scale forecasting efforts. ### Framework Comparison | Framework | Best For | Ideal Dataset Size | Learning Curve | | --------- | ----------------------------------------------------------- | --------------------- | -------------- | | **Spark** | Enterprise environments with existing Hadoop infrastructure | 100M+ observations | Medium | | **Dask** | Python-native workflows, easy scaling from pandas | 10M-100M observations | Low | | **Ray** | Machine learning pipelines, complex task dependencies | 10M+ observations | Medium | Each framework integrates seamlessly with TimeGPT through Fugue, requiring minimal code changes to scale from single-machine to distributed forecasting. ### Best Practices To maximize the benefits of distributed forecasting: * **Distribute workloads efficiently**: Spread your forecasts across multiple compute nodes to handle huge datasets without exhausting memory or overwhelming single-machine resources. * **Use proper identifiers**: Ensure your data has distinct identifiers for each series. Correct labeling is crucial for successful multi-series parallel forecasts. ## Frequently Asked Questions **Q: Which distributed framework should I choose for TimeGPT?** Choose **Spark** if you have existing Hadoop infrastructure, **Dask** if you're already using Python/pandas and want the easiest transition, or **Ray** if you're building complex ML pipelines. **Q: How much faster is distributed forecasting compared to single-machine?** Speed improvements typically range from 10-100x depending on your dataset size, number of time series, and cluster configuration. Datasets with more independent time series see greater parallelization benefits. **Q: Do I need to change my TimeGPT code to use distributed computing?** Minimal changes are required. After initializing your chosen framework (Spark/Dask/Ray), TimeGPT automatically detects and uses distributed processing. The API calls remain the same. **Q: Can I use distributed computing with fine-tuning and cross-validation?** Yes, TimeGPT supports distributed [fine-tuning](/docs/forecasting/fine-tuning/steps) and [cross-validation](/docs/forecasting/evaluation/cross_validation) across all supported frameworks. ## Related Resources Explore more TimeGPT capabilities: * [Spark Integration Guide](/docs/forecasting/forecasting-at-scale/spark) - Detailed Spark setup and examples * [Dask Integration Guide](/docs/forecasting/forecasting-at-scale/dask) - Dask configuration for TimeGPT * [Ray Integration Guide](/docs/forecasting/forecasting-at-scale/ray) - Ray distributed forecasting tutorial * [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale * [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts # Time Series Forecasting with Dask Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/dask Scale pandas workflows with Dask and TimeGPT for distributed time series forecasting. Learn to process 10M+ time series in Python with minimal code changes. ## Overview [Dask](https://www.dask.org/) is an open-source parallel computing library for Python that scales pandas workflows seamlessly. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks. Dask is ideal when you're already using pandas and need to scale beyond single-machine memory limits—typically for datasets with 10-100 million observations across multiple time series. Unlike Spark, Dask requires minimal code changes from your existing pandas workflow. ## Why Use Dask for Time Series Forecasting? Dask offers unique advantages for scaling time series forecasting: * **Pandas-like API**: Minimal code changes from your existing pandas workflows * **Easy scaling**: Convert pandas DataFrames to Dask with a single line of code * **Python-native**: Pure Python implementation, no JVM required (unlike Spark) * **Flexible deployment**: Run on your laptop or scale to a cluster * **Memory efficiency**: Process datasets larger than RAM through intelligent chunking Choose Dask when you need to scale from 10 million to 100 million observations and want the smoothest transition from pandas. **What you'll learn:** * Simplify distributed computing with Fugue * Run TimeGPT at scale on a Dask cluster * Seamlessly convert pandas DataFrames to Dask ## Prerequisites Before proceeding, make sure you have an [API key from Nixtla](/docs/setup/setting_up_your_api_key). ## How to Use TimeGPT with Dask [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/17_computing_at_scale_dask_distributed.ipynb) ### Step 1: Install Fugue and Dask Fugue provides an easy-to-use interface for distributed computing over frameworks like Dask. You can install Fugue with: ```bash theme={null} pip install fugue[dask] ``` If running on a distributed Dask cluster, ensure the `nixtla` library is installed on all worker nodes. ### Step 2: Load Your Data You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets: ```python theme={null} import pandas as pd df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv', parse_dates=['ds'], ) df.head() ``` Example pandas DataFrame: | | unique\_id | ds | y | | - | ---------- | ------------------- | ----- | | 0 | BE | 2016-10-22 00:00:00 | 70.00 | | 1 | BE | 2016-10-22 01:00:00 | 37.10 | | 2 | BE | 2016-10-22 02:00:00 | 37.10 | | 3 | BE | 2016-10-22 03:00:00 | 44.75 | | 4 | BE | 2016-10-22 04:00:00 | 37.10 | ### Step 3: Import Dask Convert the pandas DataFrame into a Dask DataFrame for parallel processing. ```python theme={null} import dask.dataframe as dd dask_df = dd.from_pandas(df, npartitions=2) dask_df ``` When converting to a Dask DataFrame, you can specify the number of partitions based on your data size or system resources. ### Step 4: Use TimeGPT on Dask To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla's client methods instead of a pandas DataFrame. Instantiate the `NixtlaClient` class to interact with Nixtla's API: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`. ```python Forecast with TimeGPT and Dask theme={null} fcst_df = nixtla_client.forecast(dask_df, h=12) fcst_df.compute().head() ``` | | unique\_id | ds | TimeGPT | | - | ---------- | ------------------- | --------- | | 0 | BE | 2016-12-31 00:00:00 | 45.190453 | | 1 | BE | 2016-12-31 01:00:00 | 43.244446 | | 2 | BE | 2016-12-31 02:00:00 | 41.958389 | | 3 | BE | 2016-12-31 03:00:00 | 39.796486 | | 4 | BE | 2016-12-31 04:00:00 | 39.204533 | ```python Cross-validation with TimeGPT and Dask theme={null} cv_df = nixtla_client.cross_validation( dask_df, h=12, n_windows=5, step_size=2 ) cv_df.compute().head() ``` | | unique\_id | ds | cutoff | TimeGPT | | - | ---------- | ------------------- | ------------------- | --------- | | 0 | BE | 2016-12-30 04:00:00 | 2016-12-30 03:00:00 | 39.375439 | | 1 | BE | 2016-12-30 05:00:00 | 2016-12-30 03:00:00 | 40.039215 | | 2 | BE | 2016-12-30 06:00:00 | 2016-12-30 03:00:00 | 43.455849 | | 3 | BE | 2016-12-30 07:00:00 | 2016-12-30 03:00:00 | 47.716408 | | 4 | BE | 2016-12-30 08:00:00 | 2016-12-30 03:00:00 | 50.316650 | ## Working with Exogenous Variables TimeGPT with Dask also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Dask DataFrames—the API remains identical. ## Related Resources Explore more distributed forecasting options: * [Distributed Computing Overview](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray * [Spark Integration](/docs/forecasting/forecasting-at-scale/spark) - For datasets with 100M+ observations * [Ray Integration](/docs/forecasting/forecasting-at-scale/ray) - For ML pipeline integration * [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale * [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts # Time Series Forecasting with Ray Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/ray Scale machine learning pipelines with Ray and TimeGPT for distributed time series forecasting. Learn to integrate TimeGPT with Ray for complex ML workflows in Python. ## Overview [Ray](https://www.ray.io/) is an open-source unified compute framework that helps scale Python workloads for distributed computing. This guide demonstrates how to distribute TimeGPT forecasting jobs on top of Ray. Ray is ideal for machine learning pipelines with complex task dependencies and datasets with 10+ million observations. Its unified framework excels at orchestrating distributed ML workflows, making it perfect for integrating TimeGPT into broader AI applications. ## Why Use Ray for Time Series Forecasting? Ray offers unique advantages for ML-focused time series forecasting: * **ML pipeline integration**: Seamlessly integrate TimeGPT into complex ML workflows with Ray Tune and Ray Serve * **Task parallelism**: Handle complex task dependencies beyond data parallelism * **Python-native**: Pure Python with minimal boilerplate code * **Flexible architecture**: Scale from laptop to cluster with the same code * **Actor model**: Stateful computations for advanced forecasting scenarios Choose Ray when you're building ML pipelines, need complex task orchestration, or want to integrate TimeGPT with other ML frameworks like PyTorch or TensorFlow. **What you'll learn:** * Install Fugue with Ray support for distributed computing * Initialize Ray clusters for distributed forecasting * Run TimeGPT forecasting and cross-validation on Ray ## Prerequisites Before proceeding, make sure you have an [API key from Nixtla](/docs/setup/setting_up_your_api_key). When executing on a distributed Ray cluster, ensure the `nixtla` library is installed on all workers. ## How to Use TimeGPT with Ray [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/19_computing_at_scale_ray_distributed.ipynb) ### Step 1: Install Fugue and Ray Fugue provides an easy-to-use interface for distributed computation across frameworks like Ray. Install Fugue with Ray support: ```bash theme={null} pip install fugue[ray] ``` ### Step 2: Load Your Data Load your dataset into a pandas DataFrame. This tutorial uses hourly electricity prices from various markets: ```python theme={null} import pandas as pd df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv', parse_dates=['ds'], ) df.head() ``` Example pandas DataFrame: | | unique\_id | ds | y | | - | ---------- | ------------------- | ----- | | 0 | BE | 2016-10-22 00:00:00 | 70.00 | | 1 | BE | 2016-10-22 01:00:00 | 37.10 | | 2 | BE | 2016-10-22 02:00:00 | 37.10 | | 3 | BE | 2016-10-22 03:00:00 | 44.75 | | 4 | BE | 2016-10-22 04:00:00 | 37.10 | ### Step 3: Initialize Ray Create a Ray cluster locally by initializing a head node. You can scale this to multiple machines in a real cluster environment. ```python theme={null} import ray from ray.cluster_utils import Cluster ray_cluster = Cluster( initialize_head=True, head_node_args={"num_cpus": 2} ) ray.init(address=ray_cluster.address, ignore_reinit_error=True) # Convert your DataFrame to Ray format: ray_df = ray.data.from_pandas(df) ray_df ``` ### Step 4: Use TimeGPT on Ray To use TimeGPT with Ray, provide a Ray Dataset to Nixtla's client methods instead of a pandas DataFrame. The API remains the same as local usage. Instantiate the `NixtlaClient` class to interact with Nixtla's API: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`. ```python theme={null} fcst_df = nixtla_client.forecast(ray_df, h=12) fcst_df.to_pandas().tail() ``` Public API models supported include `timegpt-1` (default) and `timegpt-1-long-horizon`. For long horizon forecasting, see the [long-horizon model tutorial](/docs/forecasting/model-version/longhorizon_model). ```python theme={null} cv_df = nixtla_client.cross_validation( ray_df, h=12, freq='H', n_windows=5, step_size=2 ) cv_df.to_pandas().tail() ``` ### Step 5: Shutdown Ray Always shut down Ray after you finish your tasks to free up resources: ```python theme={null} ray.shutdown() ``` ## Working with Exogenous Variables TimeGPT with Ray also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Ray Datasets—the API remains identical. ## Related Resources Explore more distributed forecasting options: * [Distributed Computing Overview](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray * [Spark Integration](/docs/forecasting/forecasting-at-scale/spark) - For datasets with 100M+ observations * [Dask Integration](/docs/forecasting/forecasting-at-scale/dask) - For datasets with 10M-100M observations * [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale * [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts # Time Series Forecasting with Spark Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/spark Scale enterprise time series forecasting with Spark and TimeGPT. Learn to process 100M+ observations across distributed clusters with Python and PySpark. ## Overview [Spark](https://spark.apache.org/) is an open-source distributed compute framework designed for large-scale data processing. This guide demonstrates how to use TimeGPT with Spark to perform forecasting and cross-validation across distributed clusters. Spark is ideal for enterprise environments with existing Hadoop infrastructure and datasets exceeding 100 million observations. Its robust distributed architecture handles massive-scale time series forecasting with fault tolerance and high performance. ## Why Use Spark for Time Series Forecasting? Spark offers unique advantages for enterprise-scale time series forecasting: * **Enterprise-grade scalability**: Handle datasets with 100M+ observations across distributed clusters * **Hadoop integration**: Seamlessly integrate with existing HDFS and Hadoop ecosystems * **Fault tolerance**: Automatic recovery from node failures ensures reliable computation * **Mature ecosystem**: Leverage Spark's rich ecosystem of tools and libraries * **Multi-language support**: Work with Python (PySpark), Scala, or Java Choose Spark when you have enterprise infrastructure, datasets exceeding 100 million observations, or need robust fault tolerance for mission-critical forecasting. **What you'll learn:** * Install Fugue with Spark support for distributed computing * Convert pandas DataFrames to Spark DataFrames * Run TimeGPT forecasting and cross-validation on Spark clusters ## Prerequisites Before proceeding, make sure you have an [API key from Nixtla](/docs/setup/setting_up_your_api_key). If executing on a distributed Spark cluster, ensure the `nixtla` library is installed on all worker nodes for consistent execution. ## How to Use TimeGPT with Spark [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/16_computing_at_scale_spark_distributed.ipynb) ### Step 1: Install Fugue and Spark Fugue provides a convenient interface to distribute Python code across frameworks like Spark. Install Fugue with Spark support: ```bash theme={null} pip install fugue[spark] ``` To work with TimeGPT, make sure you have the `nixtla` library installed as well. ### Step 2: Load Your Data Load the dataset into a pandas DataFrame. In this example, we use hourly electricity price data from different markets: ```python theme={null} import pandas as pd df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv', parse_dates=['ds'], ) df.head() ``` ### Step 3: Initialize Spark Create a Spark session and convert your pandas DataFrame to a Spark DataFrame: ```python theme={null} from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() spark_df = spark.createDataFrame(df) spark_df.show(5) ``` ### Step 4: Use TimeGPT on Spark To use TimeGPT with Spark, provide a Spark DataFrame to Nixtla's client methods instead of a pandas DataFrame. The main difference from local usage is working with Spark DataFrames instead of pandas DataFrames. Instantiate the `NixtlaClient` class to interact with Nixtla's API: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`. ```python theme={null} fcst_df = nixtla_client.forecast(spark_df, h=12) fcst_df.show(5) ``` When using Azure AI endpoints, specify `model="azureai"`: ```python theme={null} nixtla_client.forecast( spark_df, h=12, model="azureai" ) ``` The public API supports two models: `timegpt-1` (default) and `timegpt-1-long-horizon`. For long horizon forecasting, see the [long-horizon model tutorial](/docs/forecasting/model-version/longhorizon_model). ```python theme={null} cv_df = nixtla_client.cross_validation( spark_df, h=12, n_windows=5, step_size=2 ) cv_df.show(5) ``` ### Step 5: Stop Spark After completing your tasks, stop the Spark session to free resources: ```python theme={null} spark.stop() ``` ## Working with Exogenous Variables TimeGPT with Spark also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Spark DataFrames—the API remains identical. ## Related Resources Explore more distributed forecasting options: * [Distributed Computing Overview](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray * [Dask Integration](/docs/forecasting/forecasting-at-scale/dask) - For datasets with 10M-100M observations * [Ray Integration](/docs/forecasting/forecasting-at-scale/ray) - For ML pipeline integration * [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale * [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts # Improve Forecast Accuracy with TimeGPT Source: https://nixtla.io/docs/forecasting/improve_accuracy Advanced techniques to enhance TimeGPT forecast accuracy for energy and electricity. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/22_how_to_improve_forecast_accuracy.ipynb) # Improve Forecast Accuracy with TimeGPT This guide demonstrates how to improve forecast accuracy using TimeGPT. We use hourly electricity price data from Germany as an illustrative example. Before you begin, make sure you have initialized the `NixtlaClient` object with your API key. ## Forecasting Results Overview Below is a summary of our experiments and the corresponding accuracy improvements. We progressively refine forecasts by adding fine-tuning steps, adjusting loss functions, increasing the number of fine-tuned parameters, incorporating exogenous variables, and switching to a long-horizon model. | Steps | Description | MAE | MAE Improvement (%) | RMSE | RMSE Improvement (%) | | ----- | ---------------------------- | ---- | ------------------- | ---- | -------------------- | | 0 | Zero-Shot TimeGPT | 18.5 | N/A | 20.0 | N/A | | 1 | Add Fine-Tuning Steps | 11.5 | 38% | 12.6 | 37% | | 2 | Adjust Fine-Tuning Loss | 9.6 | 48% | 11.0 | 45% | | 3 | Fine-tune More Parameters | 9.0 | 51% | 11.3 | 44% | | 4 | Add Exogenous Variables | 4.6 | 75% | 6.4 | 68% | | 5 | Switch to Long-Horizon Model | 6.4 | 65% | 7.7 | 62% | *** ## Step-by-Step Guide ### Step 1: Install and Import Packages Make sure all necessary libraries are installed and imported. Then set up the Nixtla client (replace with your actual API key). ```python theme={null} import numpy as np import pandas as pd from utilsforecast.evaluation import evaluate from utilsforecast.plotting import plot_series from utilsforecast.losses import mae, rmse from nixtla import NixtlaClient nixtla_client = NixtlaClient( # api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load the Dataset We use hourly electricity price data from Germany (`unique_id == "DE"`). The final two days (`48` data points) form the test set. ```python theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv') df['ds'] = pd.to_datetime(df['ds']) df_sub = df.query('unique_id == "DE"') df_train = df_sub.query('ds < "2017-12-29"') df_test = df_sub.query('ds >= "2017-12-29"') df_train.shape, df_test.shape ``` ```bash Dataset Shape Output theme={null} ((1632, 12), (48, 12)) ``` ![Electricity Price Over Time](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-11-output-1.png) ### Step 3: Benchmark Forecast with TimeGPT **Info:** We first generate a zero-shot forecast using TimeGPT, which captures overall trends but may struggle with short-term fluctuations. ```python theme={null} fcst_timegpt = nixtla_client.forecast( df=df_train[['unique_id', 'ds', 'y']], h=2*24, target_col='y', level=[90, 95] ) ``` ```bash Forecast Logs theme={null} [INFO logs here...] ``` #### Evaluation Metrics | unique\_id | metric | TimeGPT | | ---------- | ------ | ------- | | DE | mae | 18.519 | | DE | rmse | 20.038 | ![TimeGPT Forecast for Germany](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-15-output-1.png) ### Step 4: Methods to Enhance Forecasting Accuracy Use these following strategies to refine and improve your forecast: #### 4.1 Add Fine-tuning Steps Further fine-tuning typically reduces forecasting errors by adjusting the internal weights of the TimeGPT model, allowing it to better adapt to your specific data. ```python theme={null} fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']], h=24*2, finetune_steps = 30, level=[90, 95]) ``` ![TimeGPT Forecast for Germany](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-18-output-1.png) Evaluation result: | unique\_id | metric | TimeGPT | | ---------- | ------ | ------- | | DE | mae | 11.458 | | DE | rmse | 12.643 | #### 4.2 Fine-tune Using Different Loss Functions Trying different loss functions (e.g., `MAE`, `MSE`) can yield better results for specific use cases. ```python theme={null} fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']], h=24*2, finetune_steps = 30, finetune_loss = 'mae', level=[90, 95]) ``` ![TimeGPT Forecast for Germany](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-21-output-1.png) Evaluation result: | unique\_id | metric | TimeGPT | | ---------- | ------ | ------- | | DE | mae | 9.641 | | DE | rmse | 10.956 | #### 4.3 Adjust Number of Fine-tuned Parameters The finetune\_depth parameter controls how many model layers are fine-tuned. It ranges from 1 (few parameters) to 5 (more parameters). ```python theme={null} fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']], h=24*2, finetune_steps = 30, finetune_depth=2, finetune_loss = 'mae', level=[90, 95]) ``` ![TimeGPT Forecast for Germany](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-24-output-1.png) Evaluation result: | unique\_id | metric | TimeGPT | | ---------- | ------ | ------- | | DE | mae | 9.002 | | DE | rmse | 11.348 | #### 4.4 Forecast with Exogenous Variables Incorporate external data (e.g., weather conditions) to boost predictive performance. ```python theme={null} #import exogenous variables future_ex_vars_df = df_test.drop(columns = ['y']) future_ex_vars_df.head() #make forecast with historical and future exogenous variables fcst_df = nixtla_client.forecast(df=df_train, X_df=future_ex_vars_df, h=24*2, level=[90, 95]) ``` ![TimeGPT Forecast for Germany](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-29-output-1.png) Evaluation result: | unique\_id | metric | TimeGPT | | ---------- | ------ | ------- | | DE | mae | 4.603 | | DE | rmse | 6.359 | #### 4.5 Use a Long-Horizon Model For longer forecasting periods, models optimized for multi-step predictions tend to perform better. You can enable this by setting the model parameter to `timegpt-1-long-horizon`. ```python theme={null} fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']], h=24*2, model = 'timegpt-1-long-horizon', level=[90, 95]) ``` ![TimeGPT Forecast for Germany](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/22_how_to_improve_forecast_accuracy_files/figure-markdown_strict/cell-32-output-1.png) Evaluation result: | unique\_id | metric | TimeGPT | | ---------- | ------ | ------- | | DE | mae | 6.366 | | DE | rmse | 7.738 | ### Step 5: Conclusion and Next Steps Key takeaways: The following strategies offer consistent improvements in forecast accuracy. We recommend systematically experimenting with each approach to find the best combination for your data. * Increase the number of fine-tuning steps. * Experiment with different loss functions. * Incorporate exogenous data. * Switching to the long-horizon model for extended forecasting periods. **Success:** Small refinements—like adding exogenous data or adjusting fine-tuning parameters—can significantly improve your forecasting results. *** ## Result Summary | Steps | Description | MAE | MAE Improvement (%) | RMSE | RMSE Improvement (%) | | ----- | ---------------------------- | ---- | ------------------- | ---- | -------------------- | | 0 | Zero-Shot TimeGPT | 18.5 | N/A | 20.0 | N/A | | 1 | Add Fine-Tuning Steps | 11.5 | 38% | 12.6 | 37% | | 2 | Adjust Fine-Tuning Loss | 9.6 | 48% | 11.0 | 45% | | 3 | Fine-tune More Parameters | 9.0 | 51% | 11.3 | 44% | | 4 | Add Exogenous Variables | 4.6 | 75% | 6.4 | 68% | | 5 | Switch to Long-Horizon Model | 6.4 | 65% | 7.7 | 62% | # Long-Horizon Forecasting with TimeGPT Source: https://nixtla.io/docs/forecasting/model-version/longhorizon_model Master long-horizon time series forecasting in Python using TimeGPT. Learn to predict 2+ seasonal periods ahead with confidence intervals and uncertainty quantification. ## What is Long-Horizon Forecasting? Long-horizon forecasting refers to predictions far into the future, typically exceeding two seasonal periods. For example, forecasting electricity demand 3 months ahead for hourly data, or predicting sales 2 years ahead for monthly data. The exact threshold depends on data frequency. The further you forecast, the more uncertainty you face. The key challenge with long-horizon forecasting is that these predictions extend so far into the future that they may be influenced by unforeseen factors not present in the initial dataset. This means long-horizon forecasts generally involve greater risk and uncertainty compared to short-term predictions. To address these unique challenges, Nixtla provides the specialized `timegpt-1-long-horizon` model in TimeGPT. You can access this model by simply specifying `model="timegpt-1-long-horizon"` when calling `nixtla_client.forecast`. ## When to Use Long-Horizon Forecasting Long-horizon forecasting is ideal for: * **Supply chain planning**: Predict inventory needs 3-6 months ahead * **Financial forecasting**: Model quarterly or annual revenue projections * **Energy demand**: Forecast power consumption weeks or months in advance * **Climate modeling**: Predict seasonal weather patterns Use the `timegpt-1-long-horizon` model when your forecast horizon exceeds two complete seasonal cycles in your data. ## How to Use the Long-Horizon Model [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/04_longhorizon.ipynb) ### Step 1: Import Packages Start by installing and importing the required packages, then initialize the Nixtla client: ```python theme={null} from nixtla import NixtlaClient from datasetsforecast.long_horizon import LongHorizon from utilsforecast.losses import mae nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # defaults to os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Load the Data We'll demonstrate long-horizon forecasting using the ETTh1 dataset, which measures oil temperatures and load variations on an electricity transformer in China. Here, we only forecast oil temperatures (`y`): ```python theme={null} Y_df, *_ = LongHorizon.load(directory='./', group='ETTh1') Y_df.head() ``` | | unique\_id | ds | y | | - | ---------- | ------------------- | -------- | | 0 | OT | 2016-07-01 00:00:00 | 1.460552 | | 1 | OT | 2016-07-01 01:00:00 | 1.161527 | | 2 | OT | 2016-07-01 02:00:00 | 1.161527 | | 3 | OT | 2016-07-01 03:00:00 | 0.862611 | | 4 | OT | 2016-07-01 04:00:00 | 0.525227 | We'll set our horizon to 96 timestamps (4 days) for testing and use the previous 42 days as input to the model: ```python theme={null} test = Y_df[-96:] # 96 timestamps (4 days × 24 hours/day) input_seq = Y_df[-1104:-96] # 1008 timestamps (42 days × 24 hours/day) ``` ### Step 3: Forecasting with the Long-Horizon Model TimeGPT's `timegpt-1-long-horizon` model is optimized for predictions far into the future. Specify it like so: ```python theme={null} fcst_df = nixtla_client.forecast( df=input_seq, h=96, level=[90], finetune_steps=10, finetune_loss='mae', model='timegpt-1-long-horizon', time_col='ds', target_col='y' ) ``` Next, plot the forecast along with 90% confidence intervals: ```python theme={null} nixtla_client.plot( Y_df[-168:], fcst_df, models=['TimeGPT'], level=[90], time_col='ds', target_col='y' ) ``` ![Long-horizon time series forecast showing predicted oil temperature with 90% confidence intervals over 96 hours](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/04_longhorizon_files/figure-markdown_strict/cell-14-output-1.png) ### Step 4: Evaluation Finally, assess forecast performance using Mean Absolute Error (MAE): ```python theme={null} test = test.copy() test.loc[:, 'TimeGPT'] = fcst_df['TimeGPT'].values evaluation = mae( test, models=['TimeGPT'], id_col='unique_id', target_col='y' ) ``` Evaluation result: | unique\_id | TimeGPT | | ---------- | -------- | | OT | 0.145393 | The model achieves a MAE of approximately 0.146, indicating strong performance for these longer-range forecasts. ## Frequently Asked Questions **Q: What's the difference between timegpt-1 and timegpt-1-long-horizon?** The `timegpt-1-long-horizon` model is specifically trained for extended forecast horizons (2+ seasonal periods), providing better accuracy for long-range predictions. **Q: How far ahead can I forecast with the long-horizon model?** The optimal horizon depends on your data frequency and patterns. Generally, the model performs well up to 4-6 seasonal cycles ahead. **Q: Can I use exogenous variables with long-horizon forecasting?** Yes, TimeGPT supports exogenous variables for improved long-horizon accuracy. See our [exogenous variables guide](/docs/forecasting/exogenous-variables/numeric_features) for details. ## Related Resources Learn more about TimeGPT capabilities: * [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy for your specific dataset * [Prediction Intervals](/docs/forecasting/probabilistic/prediction_intervals) - Quantify forecast uncertainty * [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate model performance * [Anomaly Detection](/docs/anomaly_detection/historical_anomaly_detection) - Identify unusual patterns in time series # Uncertainty Quantification with TimeGPT Source: https://nixtla.io/docs/forecasting/probabilistic/introduction Learn how to generate quantile forecasts and prediction intervals to capture uncertainty in your forecasts. In time series forecasting, it is important to consider the full probability distribution of the predictions rather than a single point estimate. This provides a more accurate representation of the uncertainty around the forecasts and allows better decision-making. **TimeGPT** supports uncertainty quantification through quantile forecasts and prediction intervals. ## Why Consider the Full Probability Distribution? When you focus on a single point prediction, you lose valuable information about the range of possible outcomes. By quantifying uncertainty, you can: * Identify best-case and worst-case scenarios * Improve risk management and contingency planning * Gain confidence in decisions that rely on forecast accuracy ## What You Will Learn Learn how to compute quantile forecasts using **TimeGPT**. Discover how to create prediction intervals with **TimeGPT**. # Prediction Intervals Source: https://nixtla.io/docs/forecasting/probabilistic/prediction_intervals Learn how to create prediction intervals with TimeGPT ## What Are Prediction Intervals? A prediction interval provides a range where a future observation of a time series is expected to fall, with a specific level of probability. For example, a 95% prediction interval means that the true future value is expected to lie within this range 95 times out of 100. Wider intervals reflect greater uncertainty, while narrower intervals indicate higher confidence in the forecast. With TimeGPT, you can easily generate prediction intervals for any confidence level between 0% and 100%. These intervals are constructed using **[conformal prediction](https://en.wikipedia.org/wiki/Conformal_prediction)**, a distribution-free framework for uncertainty quantification. Prediction intervals differ from confidence intervals: * **Prediction Intervals**: Capture the uncertainty in future observations. * **Confidence Intervals**: Quantify the uncertainty in the estimated model parameters (e.g., the mean). As a result, prediction intervals are typically wider, as they account for both model and data variability. ## How to Generate Prediction Intervals [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/forecast/10_prediction_intervals.ipynb) ### Step 1: Import Packages Import the required packages and initialize the Nixtla client. ```python theme={null} import pandas as pd from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # defaults to os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Load Data In this tutorial, we will use the Air Passengers dataset. ```python theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv') df.head() ``` | | timestamp | value | | - | ---------- | ----- | | 0 | 1949-01-01 | 112 | | 1 | 1949-02-01 | 118 | | 2 | 1949-03-01 | 132 | | 3 | 1949-04-01 | 129 | | 4 | 1949-05-01 | 121 | ### Step 3: Forecast with Prediction Intervals To generate prediction intervals with TimeGPT, provide a list of desired confidence levels using the `level` argument. Note that accepted values are between 0 and 100. * Higher confidence levels provide more certainty that the true value will be captured, but result in wider, less precise intervals. * Lower confidence levels provide less certainty that the true value will be captured, but result in narrower, more precise intervals. ```python theme={null} timegpt_fcst_pred_int_df = nixtla_client.forecast( df=df, h=12, level=[80, 90, 99], time_col='timestamp', target_col='value', ) timegpt_fcst_pred_int_df.head() ``` | timestamp | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-hi-99 | TimeGPT-lo-80 | TimeGPT-lo-90 | TimeGPT-lo-99 | | ---------- | ------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | | 1961-01-01 | 437.84 | 443.69 | 451.89 | 459.28 | 431.99 | 423.78 | 416.40 | | 1961-02-01 | 426.06 | 439.42 | 444.43 | 448.94 | 412.70 | 407.70 | 403.19 | | 1961-03-01 | 463.12 | 488.83 | 495.92 | 502.31 | 437.41 | 430.31 | 423.93 | | 1961-04-01 | 478.24 | 507.77 | 509.72 | 511.47 | 448.72 | 446.77 | 445.02 | | 1961-05-01 | 505.65 | 532.89 | 539.32 | 545.12 | 478.41 | 471.97 | 466.18 | You can visualize the prediction intervals using the `plot` method. To do so, specify the confidence levels to display using the `level` argument. ```python theme={null} nixtla_client.plot( df, timegpt_fcst_pred_int_df, time_col='timestamp', target_col='value', level=[80, 90, 99] ) ``` ### Step 4: Historical Forecast You can also generate prediction intervals for historical forecasts by setting `add_history=True`. ```python theme={null} timegpt_fcst_pred_int_historical_df = nixtla_client.forecast( df=df, h=12, level=[80, 90], time_col='timestamp', target_col='value', add_history=True, ) timegpt_fcst_pred_int_historical_df.head() ``` Plot the prediction intervals for the historical forecasts. ```python theme={null} nixtla_client.plot( df, timegpt_fcst_pred_int_historical_df, time_col='timestamp', target_col='value', level=[80,90,99] ) ``` ### Step 5. Cross-Validation You can use the `cross_validation` method to generate prediction intervals for each time window. ```python theme={null} cv_df = nixtla_client.cross_validation( df=df, h=12, n_windows=4, level=[80, 90, 99], time_col='timestamp', target_col='value' ) cv_df.head() ``` After computing the forecasts, you can visualize the results for each cross-validation cutoff to better understand model performance over time. ```python theme={null} cutoffs = cv_df['cutoff'].unique() for cutoff in cutoffs: fig = nixtla_client.plot( df.tail(100), cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'value']), level=[80,90,99], time_col='timestamp', target_col='value', ) display(fig) ``` Congratulations! You have successfully generated prediction intervals using TimeGPT. You also visualized historical forecasts with intervals and evaluated their coverage across multiple time windows using cross-validation. # Quantile Forecasts Source: https://nixtla.io/docs/forecasting/probabilistic/quantiles Learn how to generate quantile forecasts with TimeGPT ## What Are Quantile Forecasts? Quantile forecasts correspond to specific percentiles of the forecast distribution and provide a more complete representation of the range of possible outcomes. * The 0.5 quantile (or 50th percentile) is the median forecast, meaning there is a 50% chance that the actual value will fall below or above this point. * The 0.1 quantile (or 10th percentile) forecast represents a value that the actual observation is expected to fall below 10% of the time. * The 0.9 quantile (or 90th percentile) forecast represents a value that the actual observation is expected to fall below 90% of the time. TimeGPT supports quantile forecasts. In this tutorial, we will show you how to generate them. ## Why Use Quantile Forecasts * Quantile forecasts can provide information about best and worst-case scenarios, allowing you to make better decisions under uncertainty. * In many real-world scenarios, being wrong in one direction is more costly than being wrong in the other. Quantile forecasts allow you to focus on the specific percentiles that matter most for your particular use case. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts.ipynb) ## How to Generate Quantile Forecasts ### Step 1: Import Packages Import the required packages and initialize a Nixtla client to connect with TimeGPT. ```python theme={null} import pandas as pd from nixtla import NixtlaClient from IPython.display import display nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Load Data In this tutorial, we will use the Air Passengers dataset. ```python theme={null} df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv' ) df.head() ``` | | timestamp | value | | - | ---------- | ----- | | 0 | 1949-01-01 | 112 | | 1 | 1949-02-01 | 118 | | 2 | 1949-03-01 | 132 | | 3 | 1949-04-01 | 129 | | 4 | 1949-05-01 | 121 | ### Step 3: Forecast with Quantiles To specify the desired quantiles, you need to pass a list of quantiles to the `quantiles` parameter. Choose quantiles between 0 and 1 based on your uncertainty analysis needs. ```python theme={null} quantiles = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9] timegpt_quantile_fcst_df = nixtla_client.forecast( df=df, h=12, quantiles=quantiles, time_col='timestamp', target_col='value' ) timegpt_quantile_fcst_df.head() ``` | timestamp | TimeGPT | TimeGPT-q-10 | TimeGPT-q-20 | TimeGPT-q-30 | TimeGPT-q-40 | TimeGPT-q-50 | TimeGPT-q-60 | TimeGPT-q-70 | TimeGPT-q-80 | TimeGPT-q-90 | | ---------- | ------- | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | | 1961-01-01 | 437.84 | 431.99 | 435.04 | 435.38 | 436.40 | 437.84 | 439.27 | 440.29 | 440.63 | 443.69 | | 1961-02-01 | 426.06 | 412.70 | 414.83 | 416.04 | 421.72 | 426.06 | 430.41 | 436.08 | 437.29 | 439.42 | | 1961-03-01 | 463.12 | 437.41 | 444.23 | 446.42 | 450.71 | 463.12 | 475.53 | 479.81 | 482.00 | 488.82 | | 1961-04-01 | 478.24 | 448.72 | 455.43 | 465.57 | 469.88 | 478.24 | 486.61 | 490.92 | 501.06 | 507.76 | | 1961-05-01 | 505.65 | 478.41 | 493.16 | 497.99 | 499.14 | 505.65 | 512.15 | 513.30 | 518.14 | 532.89 | TimeGPT returns multiple columns in the forecast output: * Each requested quantile gets its own column named in the format `TimeGPT-q-...` * The `TimeGPT` column shows the mean forecast * The mean forecast (`TimeGPT`) is identical to the 0.5 quantile (`TimeGPT-q-50`) ### Step 4: Plot the Quantile Forecasts To plot the quantile forecasts, you can use the `plot` method. ```python theme={null} nixtla_client.plot( df, timegpt_quantile_fcst_df, time_col='timestamp', target_col='value' ) ``` The plot displays: * The actual time series data in blue. * Multiple forecast intervals represented by different quantiles: * The 0.5 quantile (50th percentile) represents the median forecast. * The 0.1 and 0.9 quantiles (10th and 90th percentiles) show the outer bounds of the forecast. * Additional quantiles (0.2, 0.3, 0.4, 0.6, 0.7, 0.8) are shown in between, creating a gradient of uncertainty. This type of visualization is particularly useful because it: * Shows the full distribution of possible outcomes rather than just a single point forecast. * Helps identify best and worst-case scenarios. * Allows decision-makers to understand the range of uncertainty in the predictions. ### Step 5: Historical Forecast You can also use quantile forecasts to forecast historical data by setting the `add_history` parameter to `True`. ```python theme={null} timegpt_quantile_fcst_df = nixtla_client.forecast( df=df, h=12, quantiles=quantiles, time_col='timestamp', target_col='value', add_history=True, # Add historical data to the forecast ) nixtla_client.plot( df, timegpt_quantile_fcst_df, time_col='timestamp', target_col='value' ) ``` The plot now includes quantile forecasts for the historical data. This allows you to evaluate how well the quantile forecasts capture the true variability and identify any systematic bias. ### Step 6: Cross-Validation To evaluate the performance of the quantile forecasts across multiple time windows, you can use the `cross_validation` method. ```python theme={null} cv_df = nixtla_client.cross_validation( df=df, h=12, n_windows=4, quantiles=quantiles, time_col='timestamp', target_col='value' ) ``` After computing the forecasts, you can visualize the results for each cross-validation cutoff to better understand model performance over time. ```python theme={null} cutoffs = cv_df['cutoff'].unique() for cutoff in cutoffs: fig = nixtla_client.plot( df.tail(100), cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'value']), time_col='timestamp', target_col='value' ) display(fig) ``` Each plot shows a different cross-validation window (or cutoff) for the time series. This allows you to evaluate how well the predicted intervals capture the true values across multiple, independent forecast windows. Congratulations! You have successfully generated quantile forecasts using TimeGPT. You also visualized historical quantile predictions and evaluated their performance through cross-validation. # Bounded Forecasts Source: https://nixtla.io/docs/forecasting/special-topics/bounded_forecasts Learn how to generate forecasts with upper and lower bounds to match your business constraints. ## Why Generate Bounded Forecasts? In forecasting, we often want to make sure the predictions stay within a certain range. For example, for predicting the sales of a product, we may require all forecasts to be positive. Thus, the forecasts may need to be bounded. This tutorial shows how to generate bounded forecasts with TimeGPT by transforming data prior to forecasting. ## Tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/13_bounded_forecasts.ipynb) ### Step 1: Import Packages First, we install and import the required packages. ```python theme={null} import pandas as pd import numpy as np from nixtla import NixtlaClient ``` Next, initialize your Nixtla client with the API key: ```python theme={null} nixtla_client = NixtlaClient( # defaults to os.environ.get("NIXTLA_API_KEY") api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load Data We use the [annual egg prices](https://github.com/robjhyndman/fpp3package/tree/master/data) dataset from [Forecasting, Principles and Practices](https://otexts.com/fpp3/). We expect egg prices to be strictly positive, so we want to bound our forecasts to be positive. > NOTE: If you do not have `pyreadr`, you can install it with `pip`: ```shell theme={null} pip install pyreadr ``` ```python theme={null} import pyreadr from pathlib import Path url = 'https://github.com/robjhyndman/fpp3package/raw/master/data/prices.rda' dst_path = str(Path.cwd().joinpath('prices.rda')) result = pyreadr.read_r(pyreadr.download_file(url, dst_path), dst_path) df = result['prices'][['year', 'eggs']] df = df.dropna().reset_index(drop=True) df = df.rename(columns={'year': 'ds', 'eggs': 'y'}) df['ds'] = pd.to_datetime(df['ds'], format='%Y') df['unique_id'] = 'eggs' df.tail(10) ``` | | **ds** | **y** | **unique\_id** | | -- | ---------- | ------ | -------------- | | 84 | 1984-01-01 | 100.58 | eggs | | 85 | 1985-01-01 | 76.84 | eggs | | 86 | 1986-01-01 | 81.10 | eggs | | 87 | 1987-01-01 | 69.60 | eggs | | 88 | 1988-01-01 | 64.55 | eggs | | 89 | 1989-01-01 | 80.36 | eggs | | 90 | 1990-01-01 | 79.79 | eggs | | 91 | 1991-01-01 | 74.79 | eggs | | 92 | 1992-01-01 | 64.86 | eggs | | 93 | 1993-01-01 | 62.27 | eggs | We can have a look at how the prices have evolved in the 20th century, demonstrating that the price is trending down. ```python theme={null} nixtla_client.plot(df) ``` ![Annual Egg Prices Trend](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-12-output-1.png) ### Step 3: Generate Bounded Forecasts with TimeGPT First, we transform the target data. In this case, we will log-transform the data prior to forecasting, such that we can only forecast positive prices. ```python theme={null} df_transformed = df.copy() df_transformed['y'] = np.log(df_transformed['y']) ``` We will create forecasts for the next 10 years, and we include an 80, 90 and 99.5 percentile of our forecast distribution. ```python theme={null} timegpt_fcst_with_transform = nixtla_client.forecast( df=df_transformed, h=10, freq='Y', level=[80, 90, 99.5] ) ``` After having created the forecasts, we need to inverse the transformation that we applied earlier. With a log-transformation, this simply means we need to exponentiate the forecasts: ```python theme={null} cols_to_transform = [ col for col in timegpt_fcst_with_transform if col not in ['unique_id', 'ds'] ] for col in cols_to_transform: timegpt_fcst_with_transform[col] = np.exp(timegpt_fcst_with_transform[col]) ``` Now, we can plot the forecasts. We include a number of prediction intervals, indicating the 80, 90 and 99.5 percentile of our forecast distribution. ```python theme={null} nixtla_client.plot( df, timegpt_fcst_with_transform, level=[80, 90, 99.5], max_insample_length=20 ) ``` ![Bounded Forecasts with Log Transformation](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-16-output-1.png) The forecast and the prediction intervals look reasonable. ### Step 4: Compare with Unbounded Forecast Let's compare these forecasts to the situation where we don't apply a transformation. In this case, it may be possible to forecast a negative price. ```python theme={null} timegpt_fcst_without_transform = nixtla_client.forecast( df=df, h=10, freq='Y', level=[80, 90, 99.5] ) ``` Indeed, we now observe prediction intervals that become negative: ```python theme={null} nixtla_client.plot( df, timegpt_fcst_without_transform, level=[80, 90, 99.5], max_insample_length=20 ) ``` ![Unbounded Forecast with Possible Negative Intervals](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/13_bounded_forecasts_files/figure-markdown_strict/cell-18-output-1.png) For example, in 1995: ```python theme={null} timegpt_fcst_without_transform ``` | | unique\_id | ds | TimeGPT | TimeGPT-lo-99.5 | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-hi-99.5 | | -: | ---------: | ---------: | --------: | --------------: | ------------: | ------------: | ------------: | ------------: | --------------: | | 0 | eggs | 1994-01-01 | 66.859756 | 43.103240 | 46.131448 | 49.319034 | 84.400479 | 87.588065 | 90.616273 | | 1 | eggs | 1995-01-01 | 64.993477 | -20.924112 | -4.750041 | 12.275298 | 117.711656 | 134.736995 | 150.911066 | | 2 | eggs | 1996-01-01 | 66.695808 | 6.499170 | 8.291150 | 10.177444 | 123.214173 | 125.100467 | 126.892446 | | 3 | eggs | 1997-01-01 | 66.103325 | 17.304282 | 24.966939 | 33.032894 | 99.173756 | 107.239711 | 114.902368 | | 4 | eggs | 1998-01-01 | 67.906517 | 4.995371 | 12.349648 | 20.090992 | 115.722042 | 123.463386 | 130.817663 | | 5 | eggs | 1999-01-01 | 66.147575 | 29.162207 | 31.804460 | 34.585779 | 97.709372 | 100.490691 | 103.132943 | | 6 | eggs | 2000-01-01 | 66.062637 | 14.671932 | 19.305822 | 24.183601 | 107.941673 | 112.819453 | 117.453343 | | 7 | eggs | 2001-01-01 | 68.045769 | 3.915282 | 13.188964 | 22.950736 | 113.140802 | 122.902573 | 132.176256 | | 8 | eggs | 2002-01-01 | 66.718903 | -42.212631 | -30.583703 | -18.342726 | 151.780531 | 164.021508 | 175.650436 | | 9 | eggs | 2003-01-01 | 67.344078 | -86.239911 | -44.959745 | -1.506939 | 136.195095 | 179.647901 | 220.928067 | ## Conclusion Log-transformations are a simple and effective way to enforce non-negative predictions. This tutorial demonstrated how TimeGPT accommodates bounded forecasts to enhance forecast realism and reliability. ## References * [**Hyndman, Rob J., and George Athanasopoulos (2021). Forecasting: Principles and Practice (3rd Ed)**](https://otexts.com/fpp3/) # Hierarchical Forecasting Source: https://nixtla.io/docs/forecasting/special-topics/hierarchical_forecasting Learn how to use TimeGPT for hierarchical forecasting across multiple levels. ## What is Hierarchical Forecasting? Hierarchical forecasting involves generating forecasts for multiple time series that share a hierarchical structure (e.g., product demand by category, department, or region). The goal is to ensure that forecasts are coherent across each level of the hierarchy. Hierarchical forecasting can be particularly important when you need to generate forecasts at different granularities (e.g., country, state, region) and ensure they align with each other and aggregate correctly at higher levels. Using TimeGPT, you can create forecasts for multiple related time series and then apply hierarchical forecasting methods from [HierarchicalForecast](https://nixtlaverse.nixtla.io/hierarchicalforecast/index.html) to reconcile those forecasts across your specified hierarchy. ## Why use Hierarchical Forecasting? * Ensures consistency: Forecasts at lower levels add up to higher-level forecasts. * Improves accuracy: Reconciliation methods often yield more robust predictions. * Facilitates deeper insights: Understand how smaller segments contribute to overall trends. ## Tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/14_hierarchical_forecasting.ipynb) ### Step 1: Install, Import and Initialize Start by installing the required packages. ```shell theme={null} pip install nixtla pip install hierarchicalforecast ``` Next, initialize the TimeGPT NixtlaClient. ```python theme={null} import pandas as pd import numpy as np from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load and Prepare Data This tutorial uses the Australian Tourism dataset from [Forecasting: Principles and Practices](https://otexts.com/fpp3/). The dataset contains different levels of hierarchical data, from the entire country of Australia down to individual regions. Map of Australia color coded by state. Australia hierarchical structure. The dataset provides only the lowest-level series, so higher-level series need to be aggregated explicitly. Let's load and preprocess the dataset. ```python theme={null} Y_df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv' ) Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1) Y_df.insert(0, 'Country', 'Australia') Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']] Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True) Y_df['ds'] = pd.to_datetime(Y_df['ds']) Y_df.head(10) ``` | Country | Region | State | Purpose | ds | y | | --------- | -------- | --------------- | -------- | ---------- | ---------- | | Australia | Adelaide | South Australia | Business | 1998-01-01 | 135.077690 | | Australia | Adelaide | South Australia | Business | 1998-04-01 | 109.987316 | | Australia | Adelaide | South Australia | Business | 1998-07-01 | 166.034687 | | Australia | Adelaide | South Australia | Business | 1998-10-01 | 127.160464 | | Australia | Adelaide | South Australia | Business | 1999-01-01 | 137.448533 | | Australia | Adelaide | South Australia | Business | 1999-04-01 | 199.912586 | | Australia | Adelaide | South Australia | Business | 1999-07-01 | 169.355090 | | Australia | Adelaide | South Australia | Business | 1999-10-01 | 134.357937 | | Australia | Adelaide | South Australia | Business | 2000-01-01 | 154.034398 | | Australia | Adelaide | South Australia | Business | 2000-04-01 | 168.776364 | We define the dataset hierarchies explicitly. Each level in the list describes one view of the hierarchy: ```python theme={null} spec = [ ['Country'], ['Country', 'State'], ['Country', 'Purpose'], ['Country', 'State', 'Region'], ['Country', 'State', 'Purpose'], ['Country', 'State', 'Region', 'Purpose'] ] ``` Then, use `aggregate` from `HierarchicalForecast` to generate the aggregated series: ```python theme={null} from hierarchicalforecast.utils import aggregate Y_df, S_df, tags = aggregate(Y_df, spec) Y_df.head(10) ``` | unique\_id | ds | y | | ---------- | ---------- | ------------ | | Australia | 1998-01-01 | 23182.197269 | | Australia | 1998-04-01 | 20323.380067 | | Australia | 1998-07-01 | 19826.640511 | | Australia | 1998-10-01 | 20830.129891 | | Australia | 1999-01-01 | 22087.353380 | | Australia | 1999-04-01 | 21458.373285 | | Australia | 1999-07-01 | 19914.192508 | | Australia | 1999-10-01 | 20027.925640 | | Australia | 2000-01-01 | 22339.294779 | | Australia | 2000-04-01 | 19941.063482 | Next, create the train/test splits. Here, we use the last two years (eight quarters) of data for testing: ```python theme={null} Y_test_df = Y_df.groupby('unique_id').tail(8) Y_train_df = Y_df.drop(Y_test_df.index) ``` ### Step 3: Hierarchical Forecasting Using TimeGPT Now we'll generate base forecasts across all series using TimeGPT and then apply hierarchical reconciliation to ensure the forecasts align across each level. #### Generate Base Forecasts with TimeGPT Obtain forecasts with TimeGPT for all series in your training data. ```python theme={null} timegpt_fcst = nixtla_client.forecast( df=Y_train_df, h=8, freq='QS', add_history=True ) ``` Next, separate the generated forecasts into in-sample (historical) and out-of-sample (forecasted) periods: ```python theme={null} timegpt_fcst_insample = timegpt_fcst.query("ds < '2016-01-01'") timegpt_fcst_outsample = timegpt_fcst.query("ds >= '2016-01-01'") ``` #### Visualize TimeGPT Forecasts Quickly visualize the forecasts for different hierarchy levels. Here, we look at the entire country, the state of Queensland, the Brisbane region, and holidays in Brisbane: ```python theme={null} nixtla_client.plot( Y_df, timegpt_fcst_outsample, max_insample_length=4 * 12, unique_ids=[ 'Australia', 'Australia/Queensland', 'Australia/Queensland/Brisbane', 'Australia/Queensland/Brisbane/Holiday' ] ) ``` hier_plot1 #### Apply Hierarchical Reconciliation We use `MinTrace` methods to reconcile forecasts across all levels of the hierarchy. The `S` parameter was renamed to `S_df` in `hierarchicalforecast`. Make sure you are using `S_df` when calling `reconcile`. ```python theme={null} from hierarchicalforecast.methods import MinTrace from hierarchicalforecast.core import HierarchicalReconciliation reconcilers = [ MinTrace(method='ols'), MinTrace(method='mint_shrink') ] hrec = HierarchicalReconciliation(reconcilers=reconcilers) Y_df_with_insample_fcsts = timegpt_fcst_insample.merge(Y_df.copy()) Y_rec_df = hrec.reconcile( Y_hat_df=timegpt_fcst_outsample, Y_df=Y_df_with_insample_fcsts, S_df=S_df, tags=tags ) ``` Now, let's plot the reconciled forecasts to ensure they make sense across the full country → state → region → purpose hierarchy: ```python theme={null} nixtla_client.plot( Y_df, Y_rec_df, max_insample_length=4 * 12, unique_ids=[ 'Australia', 'Australia/Queensland', 'Australia/Queensland/Brisbane', 'Australia/Queensland/Brisbane/Holiday' ] ) ``` hier_plot1 ### Step 4: Evaluate Forecast Accuracy Finally, evaluate your forecast performance using RMSE for different levels of the hierarchy, from total (country) to bottom-level (region/purpose). ```python theme={null} from hierarchicalforecast.evaluation import evaluate from utilsforecast.losses import rmse eval_tags = { 'Total': tags['Country'], 'Purpose': tags['Country/Purpose'], 'State': tags['Country/State'], 'Regions': tags['Country/State/Region'], 'Bottom': tags['Country/State/Region/Purpose'] } evaluation = evaluate( df=Y_rec_df.merge(Y_test_df, on=['unique_id', 'ds']), tags=eval_tags, train_df=Y_train_df, metrics=[rmse] ) evaluation[evaluation.select_dtypes(np.number).columns] = evaluation.select_dtypes(np.number).map('{:.2f}'.format) evaluation ``` | | level | metric | TimeGPT | TimeGPT/MinTrace\_method-ols | TimeGPT/MinTrace\_method-mint\_shrink | | - | ------- | ------ | ------- | ---------------------------- | ------------------------------------- | | 0 | Total | rmse | 1433.07 | 1436.07 | 1627.43 | | 1 | Purpose | rmse | 482.09 | 475.64 | 507.50 | | 2 | State | rmse | 275.85 | 278.39 | 294.28 | | 3 | Regions | rmse | 49.40 | 47.91 | 47.99 | | 4 | Bottom | rmse | 19.32 | 19.11 | 18.86 | | 5 | Overall | rmse | 38.66 | 38.21 | 39.16 | ## Conclusion We made a small improvement in overall RMSE by reconciling the forecasts with `MinTrace(ols)`, and made them slightly worse using `MinTrace(mint_shrink)`, indicating that the base forecasts were relatively strong already. However, we now have coherent forecasts too - so not only did we make a (small) accuracy improvement, we also got coherency to the hierarchy as a result of our reconciliation step. ## References * [Hyndman, Rob J., and George Athanasopoulos (2021). Forecasting: Principles and Practice](https://otexts.com/fpp3/). # Irregular Timestamps Source: https://nixtla.io/docs/forecasting/special-topics/irregular_timestamps Learn how to work with both regular and irregular timestamps in TimeGPT for accurate forecasting. ## Why Handle Irregular Timestamps? When working with time series data, it is important to specify its frequency correctly, as this can significantly impact forecasting results. TimeGPT is designed to automatically infer the frequency of your timestamps. For commonly used frequencies, such as hourly, daily, or monthly, TimeGPT reliably infers the frequency automatically, so no additional input is required. However, for irregular frequencies, where observations are not recorded at consistent or regular intervals, such as the days the U.S. stock market is open, it is necessary to specify the frequency directly. In this tutorial, we will show you how to handle irregular and custom frequencies in TimeGPT. > NOTE: TimeGPT requires that your data does not contain missing values, as this is not > currently supported. In other words, the irregularity of the data should stem > from the nature of the recorded phenomenon, not from missing observations. > If your data contains missing values, please refer to our > [tutorial on missing dates](/docs/data_requirements/missing_values). ## Tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/forecast/11_irregular_timestamps.ipynb) ### Step 1: Import Packages First, we import the required packages and initialize the Nixtla client. ```python theme={null} import pandas as pd import pandas_market_calendars as mcal from nixtla import NixtlaClient ``` Initialize NixtlaClient with your API key: ```python theme={null} nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Handling Regular Frequencies As discussed in the introduction, for time series data with regular frequencies, where observations are recorded at consistent intervals, TimeGPT can automatically infer the frequency of your timestamps if the input data is a **pandas DataFrame**. If you prefer not to rely on TimeGPT's automatic inference, you can set the `freq` parameter to a valid [pandas frequency string](https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases), such as `MS` for month-start frequency or `min` for minutely frequency. When working with **Polars DataFrames**, you must specify the frequency explicitly by using a valid [polars offset](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.offset_by.html), such as `1d` for daily frequency or `1h` for hourly frequency. Below is an example of how to specify the frequency for a Polars DataFrame. ```python theme={null} import polars as pl url = 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv' polars_df = pl.read_csv(url, try_parse_dates=True) fcst_df = nixtla_client.forecast( df=polars_df, h=12, freq='1mo', time_col='timestamp', target_col='value', level=[80, 95] ) ``` Plot the forecast DataFrame: ```python theme={null} nixtla_client.plot( polars_df, fcst_df, time_col='timestamp', target_col='value', level=[80, 95] ) ``` ![Air Passengers Forecast](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/forecast/11_irregular_timestamps_files/figure-markdown_strict/cell-11-output-1.png) ### Step 3: Handling Irregular Frequencies In this section, we will discuss cases where observations are not recorded at consistent intervals. #### Load data We will use the daily stock prices of Palantir Technologies (PLTR) from 2020 to 2023. The dataset includes data up to 2023-09-22, but for this tutorial, we will exclude any data before 2023-08-28. This allows us to show how a custom frequency can handle days when the stock market is closed, such as Labor Day in the U.S. > IMPORTANT NOTE: While we are using TimeGPT to predict stock price in this > tutorial, please note that this is being done only with the intention of showing > the capability of forecasting with irregular timestamps. **Stock prices are [`random > walks`](https://otexts.com/fpppy/nbs/09-arima.html#random-walk-model) and as > such can not be predicted using traditional time series forecasting methods > (including TimeGPT)**. Predictions for random walk will be a straight line type > of forecast where tomorrow's price is predicted to be equal to today's price, > which is not a useful model. ```python theme={null} url = 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/pltr.csv' pltr_df = pd.read_csv(url, parse_dates=['date']) pltr_df = pltr_df.query('date < "2023-08-28"') pltr_df.head() ``` | | date | Open | High | Low | Close | Adj Close | Volume | Dividends | Stock Splits | | -: | ---------: | ----: | ----: | ---: | ----: | --------: | --------: | --------: | -----------: | | 0 | 2020-09-30 | 10.00 | 11.41 | 9.11 | 9.50 | 9.50 | 338584400 | 0.0 | 0.0 | | 1 | 2020-10-01 | 9.69 | 10.10 | 9.23 | 9.46 | 9.46 | 124297600 | 0.0 | 0.0 | | 2 | 2020-10-02 | 9.06 | 9.28 | 8.94 | 9.20 | 9.20 | 55018300 | 0.0 | 0.0 | | 3 | 2020-10-05 | 9.43 | 9.49 | 8.92 | 9.03 | 9.03 | 36316900 | 0.0 | 0.0 | | 4 | 2020-10-06 | 9.04 | 10.18 | 8.90 | 9.90 | 9.90 | 90864000 | 0.0 | 0.0 | We will forecast the **adjusted closing price**, which represents the stock's closing price adjusted for corporate actions such as stock splits, dividends, and rights offerings. Hence, we will exclude the other columns from the dataset. ```python theme={null} pltr_df = pltr_df[['date', 'Adj Close']] nixtla_client.plot( pltr_df, time_col="date", target_col="Adj Close" ) ``` ![PLTR Adjusted Close Prices](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/forecast/11_irregular_timestamps_files/figure-markdown_strict/cell-13-output-1.png) #### Define the Frequency To define a custom frequency, we will first extract and sort the dates from the input data, ensuring they are in the correct datetime format. Next, we will use the [`pandas_market_calendars package`](https://pypi.org/project/pandas-market-calendars/), specifically the `get_calendar` method, to obtain the New York Stock Exchange (NYSE) calendar. Using this calendar, we can create a custom frequency that includes only the days the stock market is open. ```python theme={null} dates = pd.DatetimeIndex(sorted(pltr_df['date'].unique())) nyse = mcal.get_calendar('NYSE') ``` Note that the days the stock market is open need to include all the dates in the input data plus the forecast horizon. In this example, we will forecast 7 days ahead, so we need to make sure our trading days include the last date in the input data as well as the next 7 valid trading days. To avoid dealing with holidays or weekends during the forecast horizon, we will specify an end date well beyond the forecast horizon. For this example, we will use January 1, 2024, as a safe cutoff. ```python theme={null} trading_days = nyse.valid_days( start_date=dates.min(), end_date="2024-01-01" ).tz_localize(None) ``` Now, with the list of trading days, we can identify the days the stock market is closed. These are all weekdays (Monday to Friday) within the range that are not trading days. Using this information, we can define a custom frequency that skips the stock market's closed days. ```python theme={null} all_weekdays = pd.date_range( start=dates.min(), end="2024-01-01", freq='B' ) closed_days = all_weekdays.difference(trading_days) custom_bday = pd.offsets.CustomBusinessDay( holidays=closed_days ) ``` #### Forecast with TimeGPT With the custom frequency defined, we can now use the forecast method, specifying the custom\_bday frequency in the freq argument. This will make the forecast respect the trading schedule of the stock market. ```python theme={null} fcst_pltr_df = nixtla_client.forecast( df=pltr_df, h=7, freq=custom_bday, time_col='date', target_col='Adj Close', level=[80, 95] ) ``` Finally, plot the forecast results: ```python theme={null} nixtla_client.plot( pltr_df, fcst_pltr_df, time_col="date", target_col="Adj Close", level=[80, 95], max_insample_length=180 ) ``` ![PLTR Forecast (Custom Frequency)](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/capabilities/forecast/11_irregular_timestamps_files/figure-markdown_strict/cell-18-output-1.png) ```python theme={null} fcst_pltr_df[['date']].head(7) ``` | | date | | -: | ---------- | | 0 | 2023-08-28 | | 1 | 2023-08-29 | | 2 | 2023-08-30 | | 3 | 2023-08-31 | | 4 | 2023-09-01 | | 5 | 2023-09-05 | | 6 | 2023-09-06 | Note that the forecast excludes 2023-09-04, which was a Monday when the stock market was closed for Labor Day in the United States. ## Conclusion Below are the key takeaways of this tutorial: * TimeGPT can reliably infer regular frequencies, but you can override this by setting the `freq` parameter to the corresponding pandas alias. * When working with polars data frames, you must always specify the frequency using the correct polars offset. * TimeGPT supports irregular frequencies and allows you to define a custom frequency, generating forecasts exclusively for the specified dates. # Temporal Hierarchical Forecasting with TimeGPT Source: https://nixtla.io/docs/forecasting/special-topics/temporal_hierarchical Learn how to combine forecasts at different time frequencies to improve accuracy. ## What is Temporal Hierarchical Forecasting? Temporal hierarchical forecasting is a technique that improves prediction accuracy by leveraging the structure of time series data across multiple temporal resolutions such as hourly, daily, weekly, and monthly. Rather than modeling just one time scale, it generates forecasts at each level of the temporal hierarchy and then reconciles them to ensure consistency (e.g., the sum of hourly forecasts aligns with the daily total). This approach captures both high-frequency variations and long-term trends, allowing for coherent forecasts across time scales. It is particularly effective in domains like energy demand, retail sales, and transportation planning, where decisions depend on both granular and aggregated time-based insights. ## Tutorial [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/23_temporalhierarchical.ipynb) In this tutorial, we demonstrate how to use TimeGPT for temporal hierarchical forecasting. We will use a dataset that has an hourly frequency, and we create forecasts with TimeGPT for both the hourly and the 2-hourly frequency level. The latter constitutes the timeseries when it is aggregated across 2-hour windows. Subsequently, we can use temporal reconciliation techniques to improve the forecasting performance of TimeGPT. ### Step 1: Import and Initialize Let's import the NixtlaClient and Initialize it with an API key. ```python theme={null} import numpy as np import pandas as pd from utilsforecast.evaluation import evaluate from utilsforecast.plotting import plot_series from utilsforecast.losses import mae, rmse from nixtla import NixtlaClient # Initialize NixtlaClient nixtla_client = NixtlaClient( # api_key = 'my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load and Prepare Data First, let's read and process the dataset. ```python theme={null} df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv' ) df['ds'] = pd.to_datetime(df['ds']) df_sub = df.query('unique_id == "DE"') ``` Next, let's create the train-test splits ```python theme={null} df_train = df_sub.query('ds < "2017-12-29"') df_test = df_sub.query('ds >= "2017-12-29"') df_train.shape, df_test.shape ``` ```bash theme={null} ((1632, 12), (48, 12)) ``` Let's visualize the train and test splits to make sure that they are as expected. ```python theme={null} plot_series( df_train[['unique_id', 'ds', 'y']][-200:], forecasts_df=df_test[['unique_id', 'ds', 'y']].rename(columns={'y': 'test'}) ) ``` ![Training and Testing Data](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/tutorials/23_temporalhierarchical_files/figure-markdown_strict/cell-11-output-1.png) ### Step 3: Temporal Hierarchical Forecasting #### Temporal Aggregation We are interested in generating forecasts for the hourly and 2-hourly windows. We can generate these forecasts using TimeGPT. After generating these forecasts, we make use of hierarchical forecasting techniques to improve the accuracy of each forecast. We first define the temporal aggregation spec. The spec is a dictionary in which the keys are the name of the aggregation and the value is the amount of bottom-level timesteps that should be aggregated in that aggregation. In this example, we choose a temporal aggregation of a 2-hour period and a 1-hour period (the bottom level). ```python theme={null} spec_temporal = { "2-hour-period": 2, "1-hour-period": 1 } ``` We next compute the temporally aggregated train- and test sets using the aggregate\_temporal function from hierarchicalforecast. Note that we have different aggregation matrices S for the train- and test set, as the test set contains temporal hierarchies that are not included in the train set. ```python theme={null} from hierarchicalforecast.utils import aggregate_temporal Y_train, S_train, tags_train = aggregate_temporal( df=df_train[['unique_id', 'ds', 'y']], spec=spec_temporal ) Y_test, S_test, tags_test = aggregate_temporal( df=df_test[['unique_id', 'ds', 'y']], spec=spec_temporal ) ``` `Y_train` contains our training data, for both 1-hour and 2-hour periods. For example, if we look at the first two timestamps of the training data, we have a 2-hour period ending at 2017-10-22 01:00, and two 1-hour periods, the first ending at 2017-10-22 00:00, and the second at 2017-10-22 01:00, the latter corresponding to when the first 2-hour period ends. Also, the ground truth value `y` of the first 2-hour period is 38.13, which is equal to the sum of the first two 1-hour periods (19.10 + 19.03). This showcases how the higher frequency `1-hour-period` has been aggregated into the `2-hour-period` frequency. ```python theme={null} Y_train.query("ds <= '2017-10-22 01:00:00'") ``` | | temporal\_id | unique\_id | ds | y | | --- | --------------- | ---------- | ------------------- | ----- | | 0 | 2-hour-period-1 | DE | 2017-10-22 01:00:00 | 38.13 | | 816 | 1-hour-period-1 | DE | 2017-10-22 00:00:00 | 19.10 | | 817 | 1-hour-period-2 | DE | 2017-10-22 01:00:00 | 19.03 | The aggregation matrices `S_train` and `S_test` detail how the lowest temporal granularity (hour) can be aggregated into the 2-hour periods. For example, the first 2-hour period, named `2-hour-period-1`, can be constructed by summing the first two hour-periods, `1-hour-period-1` and `1-hour-period-2`, which we also verified above in our inspection of Y\_train. ```python theme={null} S_train.iloc[:5, :5] ``` | | temporal\_id | 1-hour-period-1 | 1-hour-period-2 | 1-hour-period-3 | 1-hour-period-4 | | - | --------------- | --------------- | --------------- | --------------- | --------------- | | 0 | 2-hour-period-1 | 1.0 | 1.0 | 0.0 | 0.0 | | 1 | 2-hour-period-2 | 0.0 | 0.0 | 1.0 | 1.0 | | 2 | 2-hour-period-3 | 0.0 | 0.0 | 0.0 | 0.0 | | 3 | 2-hour-period-4 | 0.0 | 0.0 | 0.0 | 0.0 | | 4 | 2-hour-period-5 | 0.0 | 0.0 | 0.0 | 0.0 | #### Computing Base Forecasts with TimeGPT Now, we need to compute base forecasts for each temporal aggregation. The following cell computes the **base forecasts** for each temporal aggregation in `Y_train` using TimeGPT. Note that both frequency and horizon are different for each temporal aggregation. In this example, the lowest level has a hourly frequency, and a horizon of `48`. The `2-hourly-period` aggregation thus has a 2-hourly frequency with a horizon of `24`. ```python theme={null} Y_hats = [] id_cols = ["unique_id", "temporal_id", "ds", "y"] for level, temporal_ids_train in tags_train.items(): Y_level_train = Y_train.query("temporal_id in @temporal_ids_train") temporal_ids_test = tags_test[level] Y_level_test = Y_test.query("temporal_id in @temporal_ids_test") freq_level = pd.infer_freq(Y_level_train["ds"].unique()) horizon_level = Y_level_test["ds"].nunique() Y_hat_level = nixtla_client.forecast( df=Y_level_train[["ds", "unique_id", "y"]], h=horizon_level ) Y_hat_level = Y_hat_level.merge(Y_level_test, on=["ds", "unique_id"], how="left") Y_hat_cols = id_cols + [col for col in Y_hat_level.columns if col not in id_cols] Y_hat_level = Y_hat_level[Y_hat_cols] Y_hats.append(Y_hat_level) Y_hat = pd.concat(Y_hats, ignore_index=True) ``` Observe that `Y_hat` contains all the forecasts but they are not coherent with each other. For example, consider the forecasts for the first time period of both frequencies. | | unique\_id | temporal\_id | ds | y | TimeGPT | | -: | ---------: | --------------: | ------------------: | ----: | --------- | | 0 | DE | 2-hour-period-1 | 2017-12-29 01:00:00 | 10.45 | 16.949455 | | 24 | DE | 1-hour-period-1 | 2017-12-29 00:00:00 | 9.73 | -0.241482 | | 25 | DE | 1-hour-period-2 | 2017-12-29 01:00:00 | 0.72 | -3.456478 | The ground truth value `y` for the first 2-hour period is 10.45, and the sum of the ground truth values for the first two 1-hour periods is (9.73 + 0.72) \= 10.45. Hence, these values are coherent with each other. However, the forecast for the first 2-hour period is 16.95, but the sum of the forecasts for the first two 1-hour periods is -3.69. Hence, these forecasts are clearly not coherent with each other. We will use reconciliation techniques to make these forecasts better coherent with each other and improve their accuracy. #### Forecast Reconciliation We can use the `HierarchicalReconciliation` class to reconcile the forecasts. In this example we use `MinTrace`. Note that we have to set `temporal=True` in the `reconcile` function. The `S` parameter was renamed to `S_df` in `hierarchicalforecast`. Make sure you are using `S_df` when calling `reconcile`. ```python theme={null} from hierarchicalforecast.methods import MinTrace from hierarchicalforecast.core import HierarchicalReconciliation reconcilers = [MinTrace(method="wls_struct")] hrec = HierarchicalReconciliation(reconcilers=reconcilers) Y_rec = hrec.reconcile(Y_hat_df=Y_hat, S_df=S_test, tags=tags_test, temporal=True) ``` ### Step 4. Evaluation The `HierarchicalForecast` package includes the `evaluate` function to evaluate the different hierarchies. We evaluate the temporally aggregated forecasts across **all temporal aggregations**. ```python theme={null} import hierarchicalforecast.evaluation as hfe evaluation = hfe.evaluate( df=Y_rec.drop(columns='unique_id'), tags=tags_test, metrics=[mae], id_col='temporal_id' ) numeric_cols = evaluation.select_dtypes('number').columns evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.3}'.format).astype(float) evaluation ``` | | level | metric | TimeGPT | TimeGPT/MinTrace\_method-wls\_struct | | -: | ------------: | -----: | ------: | -----------------------------------: | | 0 | 2-hour-period | mae | 25.2 | 12.00 | | 1 | 1-hour-period | mae | 18.5 | 6.16 | | 2 | Overall | mae | 20.8 | 8.12 | As we can see, we improved performance of TimeGPT's predictions both for the 2-hour period and for the 1-hour period, as both levels see a significant reduction in MAE. Visually, we can also verify the forecast is better after using reconciliation techniques. For the 1-hour-period forecasts: ```python theme={null} plot_series( Y_train.query( "temporal_id in @tags_train['1-hour-period']" )[["y", "ds", "unique_id"]].iloc[-100:], forecasts_df=Y_rec.query("temporal_id in @tags_test['1-hour-period']").drop(columns=["temporal_id"]) ) ``` hier_plot-1hour and for the 2-hour period forecasts: ```python theme={null} plot_series( Y_train.query( "temporal_id in @tags_train['2-hour-period']" )[["y", "ds", "unique_id"]].iloc[-50:], forecasts_df=Y_rec.query("temporal_id in @tags_test['2-hour-period']").drop(columns=["temporal_id"]) ) ``` hier_plot-2hour Also, we can now verify that the forecasts are better coherent with each other. For the first 2-hour period, our forecast after reconciliation is 6.63, and the sum of the forecasts for the first two 1-hour periods is 1.7 + 4.92 = 6.63. Hence, we now have more accurate and coherent forecasts across frequencies. ```python theme={null} Y_rec.query( "temporal_id in ['2-hour-period-1', '1-hour-period-1', '1-hour-period-2']" ) ``` | | unique\_id | temporal\_id | ds | y | TimeGPT | TimeGPT/MinTrace\_method-wls\_struct | | -: | ---------: | --------------: | ------------------: | ----: | --------: | -----------------------------------: | | 0 | DE | 2-hour-period-1 | 2017-12-29 01:00:00 | 10.45 | 16.949455 | 6.625748 | | 24 | DE | 1-hour-period-1 | 2017-12-29 00:00:00 | 9.73 | -0.241482 | 4.920372 | | 25 | DE | 1-hour-period-2 | 2017-12-29 01:00:00 | 0.72 | -3.456478 | 1.705376 | ## Conclusion In this tutorial we have shown: * How to create forecasts for multiple frequencies for the same dataset with TimeGPT * How to improve the accuracy of these forecasts using temporal reconciliation techniques Note that even though we created forecasts for two different frequencies, there is no 'need' to use the forecast of the 2-hour-period. One can use this technique also simply to improve the forecast of the 1-hour-period. # Quickstart (TimeGPT-2) Source: https://nixtla.io/docs/forecasting/timegpt_2_family Learn how to use TimeGPT-2 family of time series forecasting models ## TimeGPT-2 Family of Foundation Models [TimeGPT-2](https://www.nixtla.io/blog/timegpt-2-announcement) and [TimeGPT-2.1](https://www.nixtla.io/blog/timegpt-2-1-announcement) are the latest versions of our enterprise-grade models, built to reliably solve mission-critical time-series problems. The TimeGPT-2 family of models is optimized for enterprise needs, prioritizing accuracy and stability with a privacy-first approach and full support for self-hosted and on-premises deployments. ## Set Up TimeGPT-2 family of models for Python Time Series Forecasting ### Step 1: Confirm Access and get an API Key * Confirm with [support@nixtla.io](mailto:support@nixtla.io) that your account has access to these latest models. * Get your API key from [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/timegpt_2_family). Note that you can also use your existing keys as long as your account has access to these latest models (Step 1 above). * Sign in using Google, GitHub, or your email. * Navigate to **API Keys** in the menu and select **Create New API Key**. * Your new API key will appear on the screen. Copy this key and save it in a safe place for use later. ![Dashboard for TimeGPT API keys](https://github.com/Nixtla/nixtla/blob/main/nbs/img/dashboard.png?raw=true) ### Step 2: Install Nixtla Install the Nixtla library in your preferred Python environment. In order to use the TimeGPT-2 family of models, the client version must be >= 0.7.0. ```bash theme={null} pip install nixtla>=0.7.0 ``` You can verify the client version installed using the following code. It should return a version >= 0.7.0 ```python theme={null} from nixtla import __version__ print(__version__) ``` ```bash theme={null} 0.7.2 ``` ### Step 3: Import the Nixtla TimeGPT client Import the Nixtla client and instantiate it with your API key and base URL for TimeGPT-2 family: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient( base_url = 'https://api-preview.nixtla.io', # Needed for TimeGPT-2 family api_key='my_api_key_provided_by_nixtla' ) ``` Verify the status and validity of your API key: ```python theme={null} nixtla_client.validate_api_key() ``` ```bash theme={null} True ``` ## Forecasting with TimeGPT-2 family ### Load your time series data ```python theme={null} import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv') df.head() ``` | | timestamp | value | | - | ---------- | ----- | | 0 | 1949-01-01 | 112 | | 1 | 1949-02-01 | 118 | | 2 | 1949-03-01 | 132 | | 3 | 1949-04-01 | 129 | | 4 | 1949-05-01 | 121 | ### Generate the forecast Forecast the next 12 months using the SDK's `forecast` method. You can switch the model to any of the TimeGPT-2 family of models - `timegpt-2-pro`, `timegpt-2-lab`, `timegpt-2-mini`, `timegpt-2.1` ```python theme={null} timegpt_fcst_df = nixtla_client.forecast( df, h=12, time_col="timestamp", target_col="value", model="timegpt-2.1", ) ``` ### Plot the forecast ```python theme={null} nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value') ``` ![Forecasted Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/2_quickstart_files/figure-markdown_strict/cell-15-output-1.png) ## Summary Using the TimeGPT-2 family of models is similar to using the TimeGPT-1 family with the following changes. In order to use TimeGPT-2 family of models * Make sure that your account has access to these models * Install the latest Nixtla client (>= 0.7.0) * Make sure you use the right `base_url` while instantiating the client along with your API key. Happy forecasting! # Quickstart (TimeGPT-1) Source: https://nixtla.io/docs/forecasting/timegpt_quickstart Learn how to use TimeGPT for accurate time series forecasting ## TimeGPT-1 Family - Foundation Models for Time Series Forecasting TimeGPT is a production-ready generative pretrained transformer for time series forecasting and predictions. It delivers accurate forecasts for retail sales, electricity demand, financial markets, and IoT sensor data with just a few lines of Python code. This quickstart guide will have you making your first forecast in under 5 minutes! ## Set Up TimeGPT for Python Time Series Forecasting ### Step 1: Get an API Key * Visit [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/timegpt_quickstart) to activate your free trial and create an account. * Sign in using Google, GitHub, or your email. * Navigate to **API Keys** in the menu and select **Create New API Key**. * Your new API key will appear on the screen. Copy this key and save it in a safe place for use later. ![Dashboard for TimeGPT API keys](https://github.com/Nixtla/nixtla/blob/main/nbs/img/dashboard.png?raw=true) ### Step 2: Install Nixtla Install the Nixtla library in your preferred Python environment: ```bash theme={null} pip install nixtla ``` ### Step 3: Import the Nixtla TimeGPT client Import the Nixtla client and instantiate it with your API key: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 4: Verify your API key Verify the status and validity of your API key: ```python theme={null} nixtla_client.validate_api_key() ``` ```bash theme={null} True ``` For enhanced security practices, see our guide on [Setting Up your API Key](/docs/setup/setting_up_your_api_key). ## Make Your First Time Series Forecast We'll demonstrate TimeGPT's forecasting capabilities using the classic `AirPassengers` dataset, a monthly time series showing international airline passengers from 1949 to 1960. ```python theme={null} import pandas as pd df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv') df.head() ``` | | timestamp | value | | - | ---------- | ----- | | 0 | 1949-01-01 | 112 | | 1 | 1949-02-01 | 118 | | 2 | 1949-03-01 | 132 | | 3 | 1949-04-01 | 129 | | 4 | 1949-05-01 | 121 | If you are using your own data, here are the data requirements: * The target variable must not contain missing or non-numeric values. * The timestamp column must not contain missing values. * Date stamps must form a continuous sequence without gaps for the selected frequency. * pandas must be able to parse the timestamp column as datetime objects. ([see Pandas documentation](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)). For more details, visit [Data Requirements](/docs/data_requirements/data_requirements). Plot the dataset: ```python theme={null} nixtla_client.plot(df, time_col='timestamp', target_col='value') ``` ![Time Series Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/2_quickstart_files/figure-markdown_strict/cell-13-output-1.png) The `plot` method automatically displays figures in notebook environments. To save a plot locally: ```python theme={null} fig = nixtla_client.plot(df, time_col='timestamp', target_col='value') fig.savefig('plot.png', bbox_inches='tight') ``` ## Real-World Forecasting Applications TimeGPT excels at: * **Retail forecasting**: Predict product demand and inventory needs * **Energy forecasting**: Forecast electricity consumption and renewable energy production * **Financial forecasting**: Project revenue, sales, and market trends * **IoT predictions**: Anticipate sensor readings and equipment metrics ## Short and Long-Term Forecasting Examples ### Generate a longer-term forecast Forecast the next 12 months using the SDK's `forecast` method: ```python theme={null} timegpt_fcst_df = nixtla_client.forecast( df=df, h=12, freq='MS', time_col='timestamp', target_col='value' ) timegpt_fcst_df.head() ``` Plot the forecast: ```python theme={null} nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value') ``` ![Forecasted Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/2_quickstart_files/figure-markdown_strict/cell-15-output-1.png) You may also generate forecasts for longer horizons with the `timegpt-1-long-horizon` model. For example, 36 months ahead: ```python theme={null} timegpt_fcst_df = nixtla_client.forecast( df=df, h=36, freq='MS', time_col='timestamp', target_col='value', model='timegpt-1-long-horizon' ) timegpt_fcst_df.head() ``` Plot the forecast: ```python theme={null} nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value') ``` ![Longer Forecast Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/2_quickstart_files/figure-markdown_strict/cell-17-output-1.png) ### Generate a shorter-term forecast Forecast the next 6 months with a single command: ```python theme={null} timegpt_fcst_df = nixtla_client.forecast( df=df, h=6, freq='MS', time_col='timestamp', target_col='value' ) ``` Plot the forecast: ```python theme={null} nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value') ``` ![Shorter Forecast Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/2_quickstart_files/figure-markdown_strict/cell-18-output-2.png) ## Frequently Asked Questions ### How accurate is TimeGPT for forecasting? TimeGPT achieves state-of-the-art accuracy across multiple domains including retail, finance, and electricity forecasting with zero-shot learning. ### Can I use TimeGPT with my own time series data? Yes, TimeGPT works with any time series data in pandas DataFrame format with a timestamp and target value column. ### How long does it take to generate forecasts? TimeGPT typically generates forecasts in seconds, making it suitable for production environments. ## Next Steps Now that you've made your first forecast, explore these tutorials to unlock TimeGPT's full capabilities: * [Improve Accuracy](/docs/forecasting/improve_accuracy) - Advanced techniques to enhance forecast accuracy * [Fine-Tuning](/docs/forecasting/fine-tuning/steps) - Customize TimeGPT for your specific data * [Exogenous Variables](/docs/forecasting/exogenous-variables/numeric_features) - Include external variables in forecasts * [Uncertainty Quantification](/docs/forecasting/probabilistic/introduction) - Generate prediction intervals and quantile forecasts * [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Assess forecast performance * [Forecasting at Scale](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Process thousands of time series * [Anomaly Detection](/docs/anomaly_detection/historical_anomaly_detection) - Identify outliers in your data # About TimeGPT Source: https://nixtla.io/docs/introduction/about_timegpt Learn about TimeGPT - the foundation model for time series.
TimeGPT is a production-ready generative pretrained transformer model specifically designed for time series forecasting. It accurately forecasts domains such as retail, electricity, finance, and IoT with minimal code. Below you'll find a high-level overview of its features, architecture, and practical examples.
You can access TimeGPT through: * Self-hosted deployment on your infrastructure (recommended): [book a call](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=dc037f5a-d93b-4%5B…%5D90b-a611dd9460af\&utm_source=github\&utm_medium=pricing_page) for more information * Hosted APIs: start your [free trial](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/about_timegpt) * Azure Studio (TimeGEN-1) Perform zero-shot inference out-of-the-box to forecast future values or detect anomalies. Fine-tune the model if you need more targeted performance. For detailed instructions and advanced configurations, visit our [Quickstart Guide](/docs/forecasting/timegpt_quickstart) and additional tutorials. ## Features and Capabilities **[Zero-shot Inference](/docs/forecasting/timegpt_quickstart)**: Generate forecasts and detect anomalies immediately without prior training. Quickly gain insights from your data. **[Fine-tuning](/docs/forecasting/fine-tuning/steps)**: Enhance prediction accuracy by training TimeGPT on your own datasets, tailoring it to your unique scenario. **[API Access](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/about_timegpt)**: Integrate forecasts into applications via a robust API. Easily obtain keys at the [Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/about_timegpt). Easily deploy TimeGPT in your own infrastructure or with any cloud provider using [Docker](/docs/setup/docker) or our Python [wheel file](/docs/setup/python_wheel). Also accessible in [Azure Studio](/docs/setup/azureai) or through private deployment. **[Add Exogenous Variables](/docs/forecasting/exogenous-variables/numeric_features)**: Incorporate external variables (e.g., events, prices) to improve forecast accuracy. **[Multiple Series Forecasting](/docs/forecasting/timegpt_quickstart)**: Predict multiple time series at once, improving workflow efficiency. **[Specific Loss Function](/docs/forecasting/fine-tuning/custom_loss)**: Customize training with loss functions that match your performance objectives. **[Cross-validation](/docs/forecasting/evaluation/cross_validation)**: Evaluate model reliability and generalization with built-in cross-validation. **[Prediction Intervals](/docs/forecasting/probabilistic/prediction_intervals)**: Generate intervals to capture forecast uncertainty. **[Irregular Timestamps](/docs/forecasting/special-topics/irregular_timestamps)**: Process data with non-uniform timestamps directly, with no extra preprocessing. **[Anomaly Detection](/docs/anomaly_detection/real-time/introduction)**: Identify anomalies automatically, integrating external features for improved precision. Get started quickly with the [Quickstart guide](/docs/forecasting/timegpt_quickstart). Explore in-depth tutorials on TimeGPT capabilities and real-world applications. ## Architecture ![TimeGPT Architecture Overview](https://github.com/Nixtla/nixtla/blob/main/nbs/img/timegpt_archi.png?raw=true) TimeGPT's architecture builds on the self-attention mechanism introduced in the original ["Attention is All You Need"](https://arxiv.org/abs/1706.03762) paper. Unlike typical large language models (LLMs), TimeGPT is independently trained on extensive time series datasets to minimize forecasting errors. TimeGPT employs an encoder-decoder structure with residual connections, layer normalization, and a linear output layer to match the decoder outputs to forecast dimensions. The attention-based mechanisms help the model capture diverse historical patterns to create accurate future predictions. The model processes input sequences from left to right, similar to how humans read sentences, and predicts future values (*"tokens"*) based on historical windows of time series data. ## Explore Examples and Use Cases Quickly set up your workflow using our [Quickstart Guide](/docs/forecasting/timegpt_quickstart) or learn to use the API by [setting up your API key](/docs/setup/setting_up_your_api_key). * [Anomaly Detection](/docs/anomaly_detection/real-time/introduction) * Fine-tuning with [custom loss functions](/docs/forecasting/fine-tuning/custom_loss) * Scaling workflows using [Spark](/docs/forecasting/forecasting-at-scale/spark), [Dask](/docs/forecasting/forecasting-at-scale/dask), or [Ray](/docs/forecasting/forecasting-at-scale/ray) * Integrating [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features), validation with [cross-validation](/docs/forecasting/evaluation/cross_validation), and estimating uncertainty via [quantile forecasts](/docs/forecasting/probabilistic/quantiles) or [prediction intervals](/docs/forecasting/probabilistic/prediction_intervals). * [Web Traffic Forecasting](/docs/use_cases/forecasting_web_traffic) * [Bitcoin Price Prediction](/docs/use_cases/bitcoin_price_prediction) With TimeGPT, you can rapidly iterate from initial exploration to high-accuracy forecasting. Dive deeper into the comprehensive tutorials for more sophisticated workflows. # TimeGPT FAQ Source: https://nixtla.io/docs/introduction/faq Frequently asked questions about TimeGPT Get started with TimeGPT in minutes Set up the Python SDK for TimeGPT Review subscription plans and pricing ## Commonly asked questions TimeGPT is the first foundation model for time series forecasting. It produces accurate forecasts for new time series across diverse domains using only historical values as inputs. The model reads time series data sequentially from left to right, similar to how humans read a sentence. It examines windows of past data as "tokens" and predicts what comes next based on identified patterns that extrapolate into the future. Beyond forecasting, TimeGPT supports other time series tasks, including what-if scenarios and anomaly detection. TimeGPT is specifically designed for time series data, not text. No, TimeGPT is not based on any large language model. While it follows the principle of training a large transformer model on a vast dataset, its architecture specifically handles time series data and minimizes forecasting errors. To get started with TimeGPT, register for an account at [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq). After confirming your signup via email, you can access your dashboard with account details. Create an account at [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq) Click the confirmation link in your email Find your API key in the dashboard under "API Keys" Run `pip install nixtla` to install the Python SDK For a deeper understanding of TimeGPT, refer to the [research paper](https://arxiv.org/pdf/2310.03589.pdf). While some aspects of the model architecture remain confidential, registration for TimeGPT is open to everyone. You can use TimeGPT through the Python SDK or the REST API. ```python Python SDK Forecast Example theme={null} from nixtla import NixtlaClient # Initialize client with your API key client = NixtlaClient(api_key="your_api_key") # Make a forecast forecast = client.forecast(df, h=7) ``` ```bash REST API Forecast Example theme={null} curl -X POST "https://api.nixtla.io/timegpt" \ -H "accept: application/json" \ -H "x-api-key: your_api_key" \ -H "Content-Type: application/json" \ -d '{"df": [{"ds": "2023-01-01", "y": 100}, ...], "h": 7}' ``` Both methods require an API key, obtained upon registration and available in your dashboard under "API Keys". An API key is a unique string of characters that authenticates your requests when using the Nixtla SDK, ensuring only authorized users can make requests. Your API key is personal and should not be shared with anyone or exposed in client-side code. Upon registration, you receive an API key available in your [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq) under "API Keys". Keep your API key confidential. To integrate your API key into your development workflow, refer to the [Setting Up Your API Key](/docs/setup/setting_up_your_api_key) tutorial. ```python Python API Key Example theme={null} from nixtla import NixtlaClient client = NixtlaClient(api_key="your_api_key") ``` ```bash REST API Key Example theme={null} curl -X POST "https://api.nixtla.io/timegpt" \ -H "accept: application/json" \ -H "x-api-key: your_api_key" \ -H "Content-Type: application/json" \ -d '{"df": [{"ds": "2023-01-01", "y": 100}, ...], "h": 7}' ``` Check your API key status with the [`validate_api_key` method](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-validate-api-key) of the `NixtlaClient` class. ```python Validate API Key Example theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla') nixtla_client.validate_api_key() ``` ```bash Log Output theme={null} INFO:nixtla.nixtla_client:Happy Forecasting! :), If you have questions or need support, please email support@nixtla.io True ``` When you validate your API key and it returns `False`: * If you are targeting an Azure endpoint, getting `False` from the `NixtlaClient.validate_api_key` method is expected. You can skip this step when targeting an Azure endpoint and proceed diretly to forecasting instead. * If you are not taregting an Azure endpoint, then you should check the following: * Make sure you are using the latest version of the SDK (Python or R). * Check that your API key is active in your dashboard by visiting [https://nixtla.io/free-trial?utm\_source=nixtla.io\&utm\_campaign=/docs/introduction/faq](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq). * Consider any firewalls your organization might have. There may be restricted access. If so, you can whitelist our endpoint [https://api.nixtla.io/](https://api.nixtla.io/). * To use Nixtla's API, you need to let your system know that our endpoint is ok, so it will let you access it. Whitelisting the endpoint isn't something that Nixtla can do on our side. It's something that needs to be done on the user's system. This is a bit of an [overview on whitelisting](https://www.csoonline.com/article/569493/whitelisting-explained-how-it-works-and-where-it-fits-in-a-security-program.html). * If you work in an organization, please work with an IT team. They're likely the ones setting the security and you can talk with them to get it addressed. If you run your own systems, then it's something you should be able to update, depending on the system you're using. At Nixtla, we take privacy and security very seriously. To ensure you understand our data policies, refer to these documents: Our data privacy policies Python SDK license TimeGPT service terms We offer a self-hosted version of TimeGPT, allowing you complete control over your data - your data never leaves your premises. You can either use [Docker](/docs/setup/docker) or a [Python wheel file](/docs/setup/python_wheel). If interested in these option, contact us at `support@nixtla.io`.

Common errors and warnings

```python Invalid API Key Error theme={null} ApiError: status_code: 401, body: {'data': None, 'message': 'Invalid API key', 'details': 'Key not found', 'code': 'A12', 'requestID': 'E7F2BBTB2P', 'support': 'If you have questions or need support, please email support@nixtla.io'} ``` This error occurs when your TimeGPT API key is invalid or not set up correctly. Use the `validate_api_key` method to verify it or check that you copied it correctly from the "API Keys" section of your [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq). ```python Too Many Requests Error theme={null} ApiError: status_code: 429, body: {'data': None, 'message': 'Too many requests', 'details': 'You need to add a payment method to continue using the API, do so from https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/introduction/faq', 'code': 'A21', 'requestID': 'NCJDK7KSJ6', 'support': 'If you have questions or need support, please email support@nixtla.io'} ``` This error occurs when you have exhausted your free credits and need to add a payment method to continue using TimeGPT. Add a payment method in the "Billing" section of your [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq). A `WriteTimeout` error indicates the request exceeded allowable processing time. This commonly happens with large datasets. To fix this, increase the `num_partitions` parameter in the [`forecast` method](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) of the `NixtlaClient` class, or use a distributed backend.
Get Help with TimeGPT For more questions or support, reach out through one of our channels: For technical questions or bugs For general inquiries or support Connect with our team and community When reporting issues, include your API key status, SDK version, and sample code to help us assist you more quickly.
## Features & Capabilities TimeGPT accepts pandas dataframes in [long format](https://www.theanalysisfactor.com/wide-and-long-data/#comments) with these necessary columns: Timestamp in format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS` The target variable to forecast You can also pass a DataFrame with a DatetimeIndex without the `ds` column. ```python Example Input DataFrame theme={null} import pandas as pd # Create sample data data = { 'ds': ['2023-01-01', '2023-01-02', '2023-01-03'], 'y': [10, 12, 15] } df = pd.DataFrame(data) df['ds'] = pd.to_datetime(df['ds']) print(df) ``` ``` ds y 0 2023-01-01 10 1 2023-01-02 12 2 2023-01-03 15 ``` TimeGPT also works with [distributed dataframes](/docs/forecasting/forecasting-at-scale/computing_at_scale) like `dask`, `spark`, and `ray`. Yes, TimeGPT can forecast multiple time series simultaneously. For guidance on forecasting multiple time series at once, consult the [Multiple Series](/docs/forecasting/timegpt_quickstart) tutorial. ```python Multiple Series Forecasting theme={null} # Example of forecasting multiple series from nixtla import NixtlaClient # Initialize client client = NixtlaClient(api_key="your_api_key") # Group identifier for multiple series df['unique_id'] = df['store_id'] + '_' + df['item_id'] # Forecast multiple series at once forecast = client.forecast(df, h=7, level=[80, 90]) ``` Yes, TimeGPT can incorporate external variables into forecasts. For instructions on incorporating exogenous variables to TimeGPT, see the [Exogenous Variables](/docs/forecasting/exogenous-variables/numeric_features) tutorial. For incorporating calendar dates, the [Holidays and Special Dates](https://docs.nixtla.io/docs/tutorials-holidays_and_special_dates) tutorial might help. For categorical variables, refer to the [Categorical Variables](https://docs.nixtla.io/docs/tutorials-categorical_variables) tutorial. ```python Exogenous Variables Forecast theme={null} # Forecasting with exogenous variables forecast = client.forecast( df, h=7, X_df=exog_df # DataFrame with exogenous variables ) ``` Yes. To forecast historical data using TimeGPT, use cross-validation. See the full tutorial on [cross-validation](/docs/forecasting/evaluation/cross_validation). ```python Historical Forecast theme={null} # Get in-sample predictions historical_forecast = client.cross_validation( df, h=12, n_windows=11 # Set as many windows as you want ) ``` TimeGPT has no maximum forecast horizon, but performance decreases as the horizon increases. When the forecast horizon exceeds the data's seasonal length (for example, more than 12 months for monthly data), you will receive this message: `WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon` For details, refer to the [Long Horizon in Time Series](/docs/forecasting/model-version/longhorizon_model) tutorial. For best results, keep your forecast horizon within the seasonal pattern of your data. Yes, TimeGPT includes anomaly detection capabilities. To learn how to use TimeGPT for anomaly detection, refer to the [Anomaly Detection](/docs/anomaly_detection/real-time/introduction) tutorial. ```python Anomaly Detection Example theme={null} # Detect anomalies in time series anomalies = client.detect_anomalies(df) ``` Yes. To learn how to use TimeGPT for cross-validation, refer to the [Cross-Validation](/docs/forecasting/evaluation/cross_validation) tutorial. ```python Cross-Validation Example theme={null} # Perform cross-validation cv_results = client.cross_validation( df, h=7, k=3, # Number of folds test_size=7 # Size of each test fold ) ``` Yes. For more information, explore the [Prediction Intervals](/docs/forecasting/probabilistic/prediction_intervals) and [Quantile Forecasts](/docs/forecasting/probabilistic/quantiles) tutorials. ```python Prediction Intervals Example theme={null} # Generate prediction intervals forecast_with_intervals = client.forecast( df, h=7, level=[80, 90, 95] # Confidence levels ) ``` Yes, TimeGPT works with distributed computing frameworks for large datasets. For large datasets with hundreds of thousands or millions of time series, we recommend using a distributed backend. TimeGPT works with several [distributed computing frameworks](/docs/forecasting/forecasting-at-scale/computing_at_scale), including [Spark](/docs/forecasting/forecasting-at-scale/spark), [Ray](/docs/forecasting/forecasting-at-scale/ray), and [Dask](/docs/forecasting/forecasting-at-scale/dask). ```python Using Dask Example theme={null} import dask.dataframe as dd # Convert to Dask DataFrame dask_df = dd.from_pandas(df, npartitions=4) # Forecast using Dask backend forecast = client.forecast(dask_df, h=7) ``` TimeGPT supports any amount of data for generating point forecasts and can produce results with just one observation per series. When using arguments such as `level`, `finetune_steps`, `X_df` (exogenous variables), or `add_history`, additional data points are necessary depending on data frequency. For more details, refer to the [Data Requirements](/docs/data_requirements/data_requirements) tutorial. While TimeGPT can work with minimal data, more historical data typically produces better forecasts. TimeGPT cannot handle missing values or series with irregular timestamps. For more information, see the [Forecasting Time Series with Irregular Timestamps](/docs/forecasting/special-topics/irregular_timestamps) and [Dealing with Missing Values](/docs/data_requirements/missing_values) tutorials. The `NixtlaClient` class has a [`plot` method](/docs/reference/sdk_reference#nixtlaclient-plot) for visualizing forecasts. This method works only in interactive environments such as Jupyter notebooks, not in Python scripts. ```python Plotting Forecast Example theme={null} # Plot forecast client.plot( historical_data=df, forecast_data=forecast, level=[80, 95] # Optional: show prediction intervals ) ``` Currently, TimeGPT does not support polars. Yes, TimeGPT produces consistent results for identical inputs. TimeGPT is engineered for stability, ensuring consistent results for identical input data. Given the same dataset, the model will produce the same forecasts. While not the primary use case for TimeGPT, it can generate solid results on simple data patterns like straight lines. Zero-shot predictions might not always meet expectations, but fine-tuning allows TimeGPT to quickly grasp trends and produce accurate forecasts. For more details, refer to the [Improve Forecast Accuracy with TimeGPT](/docs/forecasting/improve_accuracy) tutorial. Fine-tuning improves TimeGPT's performance for your specific data patterns. TimeGPT was trained on the largest publicly available time series dataset, covering domains including finance, retail, healthcare, and more. This comprehensive training enables TimeGPT to produce accurate forecasts for new time series without additional training (zero-shot learning). While the zero-shot model provides a solid baseline, TimeGPT performance often improves through fine-tuning. During this process, the TimeGPT model undergoes additional training using your specific dataset, starting from the pre-trained parameters. ```python Fine-tuning Example theme={null} # Fine-tune with 100 steps forecast = client.forecast( df, h=7, finetune_steps=100, finetune_loss="mse" # Mean Squared Error ) ``` For a comprehensive guide on fine-tuning, refer to the [fine-tuning](/docs/forecasting/fine-tuning/steps) and [fine-tuning with a specific loss function](/docs/forecasting/fine-tuning/custom_loss) tutorials. No, you do not need to fine-tune every series individually. When using the `finetune_steps` parameter, the model fine-tunes across all series in your dataset simultaneously. This cross-learning approach allows the model to learn from multiple series at once, which can improve individual forecasts. Selecting the right number of fine-tuning steps may require experimentation. As fine-tuning steps increase, the model becomes more specialized to your dataset but takes longer to train and may become more prone to overfitting. Yes, you can save and reuse fine-tuned models. You can fine-tune the TimeGPT model, save it, and reuse it later. For detailed instructions, see our guide on [Re-using Fine-tuned Models](/docs/forecasting/fine-tuning/save_reuse_delete_finetuned_models). ```python Save Fine-tuned Model theme={null} # Fine-tune and save the model fine_tuned_parameters = client.forecast( df, h=7, finetune_steps=100, return_model=True # Return the fine-tuned parameters ) # Save to file import pickle with open("fine_tuned_model.pkl", "wb") as f: pickle.dump(fine_tuned_parameters, f) ``` ```python Load Fine-tuned Model theme={null} # Load the fine-tuned parameters import pickle with open("fine_tuned_model.pkl", "rb") as f: fine_tuned_parameters = pickle.load(f) # Use the fine-tuned model forecast = client.forecast( new_df, h=7, model=fine_tuned_parameters ) ``` Need more help? Contact our [support team](mailto:support@nixtla.io). # Introduction Source: https://nixtla.io/docs/introduction/introduction Welcome to TimeGPT - The foundational model for time series forecasting and anomaly detection ## Power your time series analysis with TimeGPT TimeGPT is the first foundation model for time series, providing state-of-the-art forecasting and anomaly detection capabilities to help you make better decisions with your time series data. Get started with TimeGPT in minutes with our simple Python interface Set up your environment to start using TimeGPT right away ## Core Capabilities Explore the powerful features that TimeGPT offers for your time series needs. Generate accurate predictions for your time series data Identify unusual patterns in historical data Detect anomalies as they happen with online detection ## Learn & Explore Enhance your skills with our comprehensive tutorials and use cases. Practical guides to get the most out of TimeGPT See how TimeGPT solves real business problems Learn how to use TimeGPT with big data frameworks Take your models further with fine-tuning and specialized techniques ## Resources Find additional resources to help you succeed with TimeGPT. Detailed SDK documentation for developers Get answers to commonly asked questions # TimeGPT Subscription Plans Source: https://nixtla.io/docs/introduction/timegpt_subscription_plans Overview of TimeGPT's Enterprise subscription plans with deployment options, support, and trial details. ## Overview TimeGPT provides multiple Enterprise subscription plans that can be tailored to meet your specific forecasting requirements. This includes customization of API call limits, user seats, and varying levels of support. * Scalable API calls to match your organization’s growth * Flexible user access management * High-level support options (email, chat, phone, or dedicated support) We offer three main options to use TimeGPT: The easiest option, you don't need to worry about any infrastructure just make calls directly to TimeGPT using any of our SDKs or plugins. Host TimeGPT on Azure, managed by Nixtla.\ • Quick setup with minimal maintenance requirements.\ • Automatic updates and patches.\ • Ideal for teams wanting a fully managed solution. Host TimeGPT in your own infrastructure.\ • Greater control over data and security.\ • Customizable configurations for specific compliance needs.\ • Ideal for organizations requiring on-premise or private cloud solutions. ## Get in Touch If you'd like to explore custom plan options—for instance, adjusting the number of API calls, user limits, or support level—reach out to us at [support@nixtla.io](mailto:support@nixtla.io).\ You can also schedule a demo through this [link](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=dc037f5a-d93b-4%5B…%5D90b-a611dd9460af\&utm_source=github\&utm_medium=pricing_page) to see TimeGPT in action and discuss your needs in more detail. **Free Trial Available!**\\ When you [**create your account**](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/timegpt_subscription_plans), you receive a 30-day free trial with no credit card required. Your access expires after 30 days unless you upgrade to a paid plan. If you need more time to evaluate or want to continue using TimeGPT, please **contact us** for flexible plan options. Visit [our dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/timegpt_subscription_plans) and sign up for a new account. Log in and explore the TimeGPT interface to set up forecasting tasks, configure APIs, and invite team members. Before the trial ends, decide if you’d like to upgrade to an Enterprise plan or contact us for a custom plan. **Pricing and Billing Information**\\ Additional pricing details and frequently asked questions can be found on our [FAQ page](introduction/faq). Ready to see TimeGPT in action? Schedule a personalized demo to learn how TimeGPT can enhance your forecasting capabilities. Have questions, need custom requirements, or want more info on our plans and deployments? Contact us any time. # Why TimeGPT? Source: https://nixtla.io/docs/introduction/why_timegpt Understand the benefits of using TimeGPT for time series analysis. ## Why TimeGPT? TimeGPT is a powerful, general-purpose time series forecasting solution. Throughout this notebook, we compare TimeGPT's performance against three popular forecasting approaches: * Classical model (ARIMA) * Machine learning model (LightGBM) * Deep learning model (N-HiTS) Below are three core benefits that our users value the most: TimeGPT consistently outperforms traditional models by accurately capturing complex patterns. Quickly generates forecasts with minimal training and tuning requirements per series. Minimal setup and no complex preprocessing make TimeGPT immediately accessible for use. ## TimeGPT Advantage TimeGPT delivers **superior results with minimal effort** compared to traditional approaches. In head-to-head testing against ARIMA, LightGBM, and N-HiTS models on M5 competition data, TimeGPT consistently achieves better accuracy metrics (**lowest RMSE at 592.6** and **SMAPE at 4.94%**). Unlike other models which require: * *Extensive preprocessing* * *Parameter tuning* * *Significant computational resources* TimeGPT provides **powerful forecasting capabilities** with a simple API interface, making advanced time series analysis **accessible to users of all technical backgrounds**. This notebook uses an aggregated subset from the M5 Forecasting Accuracy competition. The dataset: * Consists of **7 daily time series** * Has **1,941 observations** per series * Reserves the last **28 observations** for evaluation on unseen data ```python Data Loading and Stats Preview theme={null} import os import numpy as np import pandas as pd import matplotlib.pyplot as plt from nixtla import NixtlaClient from utilsforecast.plotting import plot_series from utilsforecast.losses import mae, rmse, smape from utilsforecast.evaluation import evaluate nixtla_client = NixtlaClient( # api_key='my_api_key_provided_by_nixtla' ) df = pd.read_csv( 'https://datasets-nixtla.s3.amazonaws.com/demand_example.csv', parse_dates=['ds'] ) # Display aggregated statistics per time series df.groupby('unique_id').agg({ "ds": ["min", "max", "count"], "y": ["min", "mean", "median", "max"] }) ``` Below is a preview of the aggregated statistics for each of the 7 time series. | unique\_id | min date | max date | count | min y | mean y | median y | max y | | ---------- | ---------- | ---------- | ----- | ----- | -------- | -------- | ------ | | FOODS\_1 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 2674.086 | 2665.0 | 5493.0 | | FOODS\_2 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 4015.984 | 3894.0 | 9069.0 | | ... | ... | ... | ... | ... | ... | ... | ... | Next, we split our dataset into training and test sets. Here, we use data up to "2016-04-24" for training and the remaining data for testing. ```python Train-Test Split Example theme={null} df_train = df.query('ds <= "2016-04-24"') df_test = df.query('ds > "2016-04-24"') print(df_train.shape, df_test.shape) # (13391, 3) (196, 3) ``` TimeGPT is compared against four different modeling approaches. Each approach forecasts the final 28 days of our dataset and we compare results across Root Mean Squared Error (RMSE) and Symmetric Mean Absolute Percentage Error (SMAPE). TimeGPT offers a streamlined solution for time series forecasting with minimal setup. ```python TimeGPT Forecasting with NixtlaClient theme={null} fcst_timegpt = nixtla_client.forecast( df=df_train, target_col='y', h=28, model='timegpt-1-long-horizon', finetune_steps=10, level=[90] ) evaluation_timegpt.groupby(['metric'])['TimeGPT'].mean() # metric # rmse 592.607378 # smape 0.049403 # Name: TimeGPT, dtype: float64 ``` ARIMA is a common baseline for time series, though it often requires more data preprocessing and does not handle multiple series as efficiently. ```python ARIMA Forecasting Using StatsForecast theme={null} from statsforecast import StatsForecast from statsforecast.models import AutoARIMA sf = StatsForecast(models=[AutoARIMA()], freq='D') fcst_arima = sf.forecast(h=28, df=df_train) # Evaluation methods omitted here for brevity ``` LightGBM is a popular gradient-boosted tree approach. However, careful feature engineering is typically required for optimal results. ```python LightGBM Modeling with AutoMLForecast theme={null} import optuna from mlforecast.auto import AutoMLForecast, AutoLightGBM mlf = AutoMLForecast(models=[AutoLightGBM()], freq='D') mlf.fit(df_train) fcst_lgbm = mlf.predict(28) # Evaluation methods omitted here for brevity ``` N-HiTS is a deep learning architecture for time series. While powerful, it often requires GPU resources and more hyperparameter tuning. ```python N-HiTS Deep Learning Forecast theme={null} from neuralforecast.core import NeuralForecast from neuralforecast.models import NHITS nf = NeuralForecast(models=[NHITS()], freq='D') nf.fit(df=df_train) fcst_nhits = nf.predict() # Evaluation methods omitted here for brevity ``` Below is a summary of the performance metrics (RMSE and SMAPE) on the test dataset. TimeGPT consistently delivers superior forecasting accuracy: | Model | RMSE | SMAPE | | -------- | ----- | ----- | | ARIMA | 724.9 | 5.50% | | LightGBM | 687.8 | 5.14% | | N-HiTS | 605.0 | 5.34% | | TimeGPT | 592.6 | 4.94% | ![Performance Chart](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/7_why_timegpt_files/figure-markdown_strict/cell-27-output-1.png) ![Benchmarking Results](https://github.com/Nixtla/nixtla/blob/main/nbs/img/timeseries_model_arena.png?raw=true) TimeGPT stands out with its accuracy, speed, and ease of use. Get started today by visiting the [Nixtla dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/why_timegpt) to generate your `api_key` and access advanced forecasting with minimal overhead. # Date Features Source: https://nixtla.io/docs/reference/date_features Use holidays flags and special dates to improve your accuracy Date features are an essential part of time series analysis. This document introduces helpful classes (CountryHolidays and SpecialDates) for generating holiday flags, custom date markers, and adding them to TimeGPT. ## Overview Easily attach holiday flags for multiple countries based on a list of countries. Add flags for custom events or significant dates you define. These classes help you enrich your time series datasets with relevant date-based signals. Use them alongside standard data preprocessing techniques to enhance your model's understanding of seasonality and special events. source #### CountryHolidays > ```text theme={null} > CountryHolidays (countries:list[str]) > ``` *Given a list of countries, returns a dataframe with holidays for each country.* ```python theme={null} import pandas as pd ``` | | US\_New Year's Day | US\_Memorial Day | US\_Independence Day | US\_Labor Day | US\_Veterans Day | US\_Veterans Day (observed) | US\_Thanksgiving | US\_Christmas Day | US\_Martin Luther King Jr. Day | US\_Washington's Birthday | ... | US\_Juneteenth National Independence Day (observed) | US\_Christmas Day (observed) | MX\_Año Nuevo | MX\_Día de la Constitución | MX\_Natalicio de Benito Juárez | MX\_Día del Trabajo | MX\_Día de la Independencia | MX\_Día de la Revolución | MX\_Transmisión del Poder Ejecutivo Federal | MX\_Navidad | | ---------- | ------------------ | ---------------- | -------------------- | ------------- | ---------------- | --------------------------- | ---------------- | ----------------- | ------------------------------ | ------------------------- | --- | --------------------------------------------------- | ---------------------------- | ------------- | -------------------------- | ------------------------------ | ------------------- | --------------------------- | ------------------------ | ------------------------------------------- | ----------- | | 2018-09-03 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 2018-09-04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 2018-09-05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 2018-09-06 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 2018-09-07 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ```python theme={null} c_holidays = CountryHolidays(countries=['US', 'MX']) periods = 365 * 5 dates = pd.date_range(end='2023-09-01', periods=periods) holidays_df = c_holidays(dates) holidays_df.head() ``` *** source #### SpecialDates > ```text theme={null} > SpecialDates (special_dates:dict[str,list[str]]) > ``` *Given a dictionary of categories and dates, returns a dataframe with the special dates.* ```python theme={null} special_dates = SpecialDates( special_dates={ 'Important Dates': ['2021-02-26', '2020-02-26'], 'Very Important Dates': ['2021-01-26', '2020-01-26', '2019-01-26'] } ) periods = 365 * 5 dates = pd.date_range(end='2023-09-01', periods=periods) holidays_df = special_dates(dates) holidays_df.head() ``` | | Important Dates | Very Important Dates | | ---------- | --------------- | -------------------- | | 2018-09-03 | 0 | 0 | | 2018-09-04 | 0 | 0 | | 2018-09-05 | 0 | 0 | | 2018-09-06 | 0 | 0 | | 2018-09-07 | 0 | 0 | # SDK Reference Source: https://nixtla.io/docs/reference/sdk_reference *** source ## NixtlaClient > ```text theme={null} > NixtlaClient (api_key:Optional[str]=None, base_url:Optional[str]=None, > timeout:Optional[int]=60, max_retries:int=6, > retry_interval:int=10, max_wait_time:int=360) > ``` *Client to interact with the Nixtla API.* | | **Type** | **Default** | **Details** | | --------------- | -------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | api\_key | Optional | None | The authorization api\_key interacts with the Nixtla API.
If not provided, will use the NIXTLA\_API\_KEY environment variable. | | base\_url | Optional | None | Custom base\_url.
If not provided, will use the NIXTLA\_BASE\_URL environment variable. | | timeout | Optional | 60 | Request timeout in seconds. Set this to `None` to disable it. | | max\_retries | int | 6 | The maximum number of attempts to make when calling the API before giving up.
It defines how many times the client will retry the API call if it fails.
Default value is 6, indicating the client will attempt the API call up to 6 times in total | | retry\_interval | int | 10 | The interval in seconds between consecutive retry attempts.
This is the waiting period before the client tries to call the API again after a failed attempt.
Default value is 10 seconds, meaning the client waits for 10 seconds between retries. | | max\_wait\_time | int | 360 | The maximum total time in seconds that the client will spend on all retry attempts before giving up.
This sets an upper limit on the cumulative waiting time for all retry attempts.
If this time is exceeded, the client will stop retrying and raise an exception.
Default value is 360 seconds, meaning the client will cease retrying if the total time
spent on retries exceeds 360 seconds.
The client throws a ReadTimeout error after 60 seconds of inactivity. If you want to
catch these errors, use max\_wait\_time >> 60. | *** source ## NixtlaClient.validate\_api\_key > ```text theme={null} > NixtlaClient.validate_api_key (log:bool=True) > ``` *Check API key status.* | | **Type** | **Default** | **Details** | | ----------- | -------- | ----------- | ----------------------------- | | log | bool | True | Show the endpoint’s response. | | **Returns** | **bool** | | **Whether API key is valid.** | *** source ## NixtlaClient.forecast > ```text theme={null} > NixtlaClient.forecast (df:~AnyDFType, h:typing.Annotated[int,Gt(gt=0)], > freq:Union[str,int,pandas._libs.tslibs.offsets.Bas > eOffset,NoneType]=None, id_col:str='unique_id', > time_col:str='ds', target_col:str='y', > X_df:Optional[~AnyDFType]=None, > level:Optional[list[Union[int,float]]]=None, > quantiles:Optional[list[float]]=None, > finetune_steps:typing.Annotated[int,Ge(ge=0)]=0, > finetune_depth:Literal[1,2,3,4,5]=1, finetune_loss > :Literal['default','mae','mse','rmse','mape','smap > e']='default', > finetuned_model_id:Optional[str]=None, > clean_ex_first:bool=True, > hist_exog_list:Optional[list[str]]=None, > validate_api_key:bool=False, > add_history:bool=False, date_features:Union[bool,l > ist[Union[str,Callable]]]=False, date_features_to_ > one_hot:Union[bool,list[str]]=False, model:Literal > ['azureai','timegpt-1','timegpt-1-long- > horizon']='timegpt-1', num_partitions:Optional[Ann > otated[int,Gt(gt=0)]]=None, > feature_contributions:bool=False) > ``` *Forecast your time series using TimeGPT.* | | **Type** | **Default** | **Details** | | ---------------------------- | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | df | AnyDFType | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. | | h | Annotated | | Forecast horizon. | | freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). | | id\_col | str | unique\_id | Column that identifies each series. | | time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | | target\_col | str | y | Column that contains the target. | | X\_df | Optional | None | DataFrame with \[`unique_id`, `ds`] columns and `df`’s future exogenous. | | level | Optional | None | Confidence levels between 0 and 100 for prediction intervals. | | quantiles | Optional | None | Quantiles to forecast, list between (0, 1).
`level` and `quantiles` should not be used simultaneously.
The output dataframe will have the quantile columns
formatted as TimeGPT-q-(100 \* q) for each q.
100 \* q represents percentiles but we choose this notation
to avoid having dots in column names. | | finetune\_steps | Annotated | 0 | Number of steps used to finetune learning TimeGPT in the
new data. | | finetune\_depth | Literal | 1 | The depth of the finetuning. Uses a scale from 1 to 5, where 1 means little finetuning,
and 5 means that the entire model is finetuned. | | finetune\_loss | Literal | default | Loss function to use for finetuning. Options are: `default`, `mae`, `mse`, `rmse`, `mape`, and `smape`. | | finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use. | | clean\_ex\_first | bool | True | Clean exogenous signal before making forecasts using TimeGPT. | | hist\_exog\_list | Optional | None | Column names of the historical exogenous features. | | validate\_api\_key | bool | False | If True, validates api\_key before sending requests. | | add\_history | bool | False | Return fitted values of the model. | | date\_features | Union | False | Features computed from the dates.
Can be pandas date attributes or functions that will take the dates as input.
If True automatically adds most used date features for the
frequency of `df`. | | date\_features\_to\_one\_hot | Union | False | Apply one-hot encoding to these date features.
If `date_features=True`, then all date features are
one-hot encoded by default. | | model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. | | num\_partitions | Optional | None | Number of partitions to use.
If None, the number of partitions will be equal
to the available parallel resources in distributed environments. | | feature\_contributions | bool | False | | | **Returns** | **AnyDFType** | | **DataFrame with TimeGPT forecasts for point predictions and probabilistic
predictions (if level is not None).** | *** source ## NixtlaClient.cross\_validation > ```text theme={null} > NixtlaClient.cross_validation (df:~AnyDFType, > h:typing.Annotated[int,Gt(gt=0)], freq:Uni > on[str,int,pandas._libs.tslibs.offsets.Bas > eOffset,NoneType]=None, > id_col:str='unique_id', time_col:str='ds', > target_col:str='y', level:Optional[list[Un > ion[int,float]]]=None, > quantiles:Optional[list[float]]=None, > validate_api_key:bool=False, n_windows:typ > ing.Annotated[int,Gt(gt=0)]=1, step_size:O > ptional[Annotated[int,Gt(gt=0)]]=None, fin > etune_steps:typing.Annotated[int,Ge(ge=0)] > =0, finetune_depth:Literal[1,2,3,4,5]=1, f > inetune_loss:Literal['default','mae','mse' > ,'rmse','mape','smape']='default', > finetuned_model_id:Optional[str]=None, > refit:bool=True, clean_ex_first:bool=True, > hist_exog_list:Optional[list[str]]=None, > date_features:Union[bool,list[str]]=False, > date_features_to_one_hot:Union[bool,list[s > tr]]=False, model:Literal['azureai','timeg > pt-1','timegpt-1-long- > horizon']='timegpt-1', num_partitions:Opti > onal[Annotated[int,Gt(gt=0)]]=None) > ``` *Perform cross validation in your time series using TimeGPT.* | | **Type** | **Default** | **Details** | | ---------------------------- | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | df | AnyDFType | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. | | h | Annotated | | Forecast horizon. | | freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). | | id\_col | str | unique\_id | Column that identifies each series. | | time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | | target\_col | str | y | Column that contains the target. | | level | Optional | None | Confidence level between 0 and 100 for prediction intervals. | | quantiles | Optional | None | Quantiles to forecast, list between (0, 1).
`level` and `quantiles` should not be used simultaneously.
The output dataframe will have the quantile columns
formatted as TimeGPT-q-(100 \* q) for each q.
100 \* q represents percentiles but we choose this notation
to avoid having dots in column names. | | validate\_api\_key | bool | False | If True, validates api\_key before sending requests. | | n\_windows | Annotated | 1 | Number of windows to evaluate. | | step\_size | Optional | None | Step size between each cross validation window. If None it will be equal to `h`. | | finetune\_steps | Annotated | 0 | Number of steps used to finetune TimeGPT in the
new data. | | finetune\_depth | Literal | 1 | The depth of the finetuning. Uses a scale from 1 to 5, where 1 means little finetuning,
and 5 means that the entire model is finetuned. | | finetune\_loss | Literal | default | Loss function to use for finetuning. Options are: `default`, `mae`, `mse`, `rmse`, `mape`, and `smape`. | | finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use. | | refit | bool | True | Fine-tune the model in each window. If `False`, only fine-tunes on the first window.
Only used if `finetune_steps` > 0. | | clean\_ex\_first | bool | True | Clean exogenous signal before making forecasts using TimeGPT. | | hist\_exog\_list | Optional | None | Column names of the historical exogenous features. | | date\_features | Union | False | Features computed from the dates.
Can be pandas date attributes or functions that will take the dates as input.
If True automatically adds most used date features for the
frequency of `df`. | | date\_features\_to\_one\_hot | Union | False | Apply one-hot encoding to these date features.
If `date_features=True`, then all date features are
one-hot encoded by default. | | model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. | | num\_partitions | Optional | None | Number of partitions to use.
If None, the number of partitions will be equal
to the available parallel resources in distributed environments. | | **Returns** | **AnyDFType** | | **DataFrame with cross validation forecasts.** | *** source ## NixtlaClient.detect\_anomalies > ```text theme={null} > NixtlaClient.detect_anomalies (df:~AnyDFType, > freq:Union[str,int,pandas._libs.tslibs.off > sets.BaseOffset,NoneType]=None, > id_col:str='unique_id', time_col:str='ds', > target_col:str='y', > level:Union[int,float]=99, > finetuned_model_id:Optional[str]=None, > clean_ex_first:bool=True, > validate_api_key:bool=False, > date_features:Union[bool,list[str]]=False, > date_features_to_one_hot:Union[bool,list[s > tr]]=False, model:Literal['azureai','timeg > pt-1','timegpt-1-long- > horizon']='timegpt-1', num_partitions:Opti > onal[Annotated[int,Gt(gt=0)]]=None) > ``` *Detect anomalies in your time series using TimeGPT.* | | **Type** | **Default** | **Details** | | ---------------------------- | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | df | AnyDFType | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. | | freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). | | id\_col | str | unique\_id | Column that identifies each series. | | time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | | target\_col | str | y | Column that contains the target. | | level | Union | 99 | Confidence level between 0 and 100 for detecting the anomalies. | | finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use. | | clean\_ex\_first | bool | True | Clean exogenous signal before making forecasts
using TimeGPT. | | validate\_api\_key | bool | False | If True, validates api\_key before sending requests. | | date\_features | Union | False | Features computed from the dates.
Can be pandas date attributes or functions that will take the dates as input.
If True automatically adds most used date features for the
frequency of `df`. | | date\_features\_to\_one\_hot | Union | False | Apply one-hot encoding to these date features.
If `date_features=True`, then all date features are
one-hot encoded by default. | | model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. | | num\_partitions | Optional | None | Number of partitions to use.
If None, the number of partitions will be equal
to the available parallel resources in distributed environments. | | **Returns** | **AnyDFType** | | **DataFrame with anomalies flagged by TimeGPT.** | *** source ## NixtlaClient.usage > ```text theme={null} > NixtlaClient.usage () > ``` *Query consumed requests and limits* *** source ## NixtlaClient.finetune > ```text theme={null} > NixtlaClient.finetune > (df:Union[pandas.core.frame.DataFrame,polars.dataf > rame.frame.DataFrame], freq:Union[str,int,pandas._ > libs.tslibs.offsets.BaseOffset,NoneType]=None, > id_col:str='unique_id', time_col:str='ds', > target_col:str='y', > finetune_steps:typing.Annotated[int,Ge(ge=0)]=10, > finetune_depth:Literal[1,2,3,4,5]=1, finetune_loss > :Literal['default','mae','mse','rmse','mape','smap > e']='default', output_model_id:Optional[str]=None, > finetuned_model_id:Optional[str]=None, model:Liter > al['azureai','timegpt-1','timegpt-1-long- > horizon']='timegpt-1') > ``` *Fine-tune TimeGPT to your series.* | | **Type** | **Default** | **Details** | | -------------------- | --------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | df | Union | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. | | freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). | | id\_col | str | unique\_id | Column that identifies each series. | | time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | | target\_col | str | y | Column that contains the target. | | finetune\_steps | Annotated | 10 | Number of steps used to finetune learning TimeGPT in the new data. | | finetune\_depth | Literal | 1 | The depth of the finetuning. Uses a scale from 1 to 5, where 1 means little finetuning,
and 5 means that the entire model is finetuned. | | finetune\_loss | Literal | default | Loss function to use for finetuning. Options are: `default`, `mae`, `mse`, `rmse`, `mape`, and `smape`. | | output\_model\_id | Optional | None | ID to assign to the fine-tuned model. If `None`, an UUID is used. | | finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use as base. | | model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. | | **Returns** | **str** | | **ID of the fine-tuned model** | *** source ## NixtlaClient.finetuned\_models > ```text theme={null} > NixtlaClient.finetuned_models (as_df:bool=False) > ``` *List fine-tuned models* | | **Type** | **Default** | **Details** | | ----------- | --------- | ----------- | -------------------------------------------------- | | as\_df | bool | False | Return the fine-tuned models as a pandas dataframe | | **Returns** | **Union** | | **List of available fine-tuned models.** | *** source ## NixtlaClient.finetuned\_model > ```text theme={null} > NixtlaClient.finetuned_model (finetuned_model_id:str) > ``` *Get fine-tuned model metadata* | | **Type** | **Details** | | -------------------- | ------------------ | ------------------------------------------------ | | finetuned\_model\_id | str | ID of the fine-tuned model to get metadata from. | | **Returns** | **FinetunedModel** | **Fine-tuned model metadata.** | *** source ## NixtlaClient.delete\_finetuned\_model > ```text theme={null} > NixtlaClient.delete_finetuned_model (finetuned_model_id:str) > ``` *Delete a previously fine-tuned model* | | **Type** | **Details** | | -------------------- | -------- | ----------------------------------------- | | finetuned\_model\_id | str | ID of the fine-tuned model to be deleted. | | **Returns** | **bool** | **Whether delete was successful.** | *** source ## NixtlaClient.plot > ```text theme={null} > NixtlaClient.plot (df:Union[pandas.core.frame.DataFrame,polars.dataframe. > frame.DataFrame,NoneType]=None, forecasts_df:Union[pan > das.core.frame.DataFrame,polars.dataframe.frame.DataFr > ame,NoneType]=None, id_col:str='unique_id', > time_col:str='ds', target_col:str='y', unique_ids:Unio > n[list[str],NoneType,numpy.ndarray]=None, > plot_random:bool=True, max_ids:int=8, > models:Optional[list[str]]=None, > level:Optional[list[Union[int,float]]]=None, > max_insample_length:Optional[int]=None, > plot_anomalies:bool=False, > engine:Literal['matplotlib','plotly','plotly- > resampler']='matplotlib', > resampler_kwargs:Optional[dict]=None, ax:Union[Forward > Ref('plt.Axes'),numpy.ndarray,ForwardRef('plotly.graph > _objects.Figure'),NoneType]=None) > ``` *Plot forecasts and insample values.* | | **Type** | **Default** | **Details** | | --------------------- | -------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | df | Union | None | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. | | forecasts\_df | Union | None | DataFrame with columns \[`unique_id`, `ds`] and models. | | id\_col | str | unique\_id | Column that identifies each series. | | time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. | | target\_col | str | y | Column that contains the target. | | unique\_ids | Union | None | Time Series to plot.
If None, time series are selected randomly. | | plot\_random | bool | True | Select time series to plot randomly. | | max\_ids | int | 8 | Maximum number of ids to plot. | | models | Optional | None | list of models to plot. | | level | Optional | None | list of prediction intervals to plot if paseed. | | max\_insample\_length | Optional | None | Max number of train/insample observations to be plotted. | | plot\_anomalies | bool | False | Plot anomalies for each prediction interval. | | engine | Literal | matplotlib | Library used to plot. ‘matplotlib’, ‘plotly’ or ‘plotly-resampler’. | | resampler\_kwargs | Optional | None | Kwargs to be passed to plotly-resampler constructor.
For further custumization (“show\_dash”) call the method,
store the plotting object and add the extra arguments to
its `show_dash` method. | | ax | Union | None | Object where plots will be added. | # TimeGPT Excel Add-in (Beta) Source: https://nixtla.io/docs/reference/timegpt_excel_add_in_beta_ Use TimeGPT from Microsoft Excel ## Installation Head to the [TimeGTP excel add-in page in Microsoft Appsource](https://appsource.microsoft.com/en-us/product/office/WA200006429?tab=Overview) and click on “Get it now” ## Usage The TimeGPT Excel Add-in requires an access token. Get your API Key on the [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/reference/timegpt_excel_add_in_beta_). ## Support If you have questions or need support, please email `support@nixtla.io`. ## How-to ### Settings If this is your first time using Excel add-ins, find information on how to add Excel add-ins with your version of Excel. In the Office Add-ins Store, you’ll search for “TimeGPT”. Once you have installed the TimeGPT add-in, the add-in comes up in a sidebar task pane. \* Read through the Welcome screen. \* Click on the **‘Get Started’** button. \* The API URL is already set to: [https://api.nixtla.io](https://api.nixtla.io). \* Copy your API key from [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/reference/timegpt_excel_add_in_beta_). Paste it into the box that say **API Key, Bearer**. \* Click the gray arrow next to that box on the right. \* You’ll get to a screen with options for ‘Forecast’ and ‘Anomaly Detection’. To access the settings later, click the gear icon in the top left. ### Data Requirements * Put your dates in one column and your values in another. * Ensure your date format is recognized as a valid date by excel. * Ensure your values are recognized as valid number by excel. * All data inputs must exist in the same worksheet. The add-in does not support forecasting using multiple worksheets. * Do not include headers Example: | dates | values | | :------------ | :----- | | 12/1/16 0:00 | 72 | | 12/1/16 1:00 | 65.8 | | 12/1/16 2:00 | 59.99 | | 12/1/16 3:00 | 50.69 | | 12/1/16 4:00 | 52.58 | | 12/1/16 5:00 | 65.05 | | 12/1/16 6:00 | 80.4 | | 12/1/16 7:00 | 200 | | 12/1/16 8:00 | 200.63 | | 12/1/16 9:00 | 155.47 | | 12/1/16 10:00 | 150.91 | #### Forecasting Once you’ve configured your token and formatted your input data then you’re all ready to forecast! With the add-in open, configure the forecasting settings by selecting the column for each input. * **Frequency** - The frequency of the data (hourly / daily / weekly / monthly) * **Horizon** - The forecasting horizon. This represents the number of time steps into the future that the forecast should predict. * **Dates Range** - The column and range of the timeseries timestamps. Must not include header data, and should be formatted as a range, e.g. A2:A145. * **Values Range** - The column and range of the timeseries values for each point in time. Must not include header data, and should be formatted as a range, e.g. B2:B145. When you’re ready, click **Make Prediction** to generate the predicted values. The add-in will generate a plot and append the forecasted data to the end of the column of your existing data and highlight them in green. So, scroll to the end of your data to see the predicted values. #### Anomaly Detection The requirements are the same as for the forecasting functionality, so if you already tried it you are ready to run the anomaly detection one. Go to the main page in the add-in and select “Anomaly Detection”, then choose your dates and values cell ranges and click on submit. We’ll run the model and mark the anomalies cells in yellow while adding a third column for expected values with a green background. # TimeGPT in R Source: https://nixtla.io/docs/reference/timegpt_in_r Using TimeGPT for time series forecasting in the R programming language
Logo for nixtlar
## Introduction **TimeGPT-1**: The first foundation model for time series forecasting and anomaly detection. The `nixtlar` package is the R interface to TimeGPT, allowing you to perform state-of-the-art time series forecasting directly from R. TimeGPT is a production-ready, generative pretrained transformer for time series forecasting, developed by Nixtla. It is capable of accurately predicting various domains such as retail, electricity, finance, and IoT, with just a few lines of code. Additionally, it can detect anomalies in time series data. Version 0.6.2 of nixtlar is now available on CRAN! This version introduces support for TimeGEN-1, TimeGPT optimized for Azure, along with enhanced date support, business-day frequency inference, and various bug fixes. ## How to use To learn how to use `nixtlar`, please refer to the [documentation](https://nixtla.github.io/nixtlar/). To view directly on CRAN, please use this [link](https://cloud.r-project.org/web/packages/nixtlar/index.html). The `nixtlar` package requires an API key. Get yours on the [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/reference/timegpt_in_r). ## Installation ```r theme={null} # Install nixtlar from CRAN install.packages("nixtlar") # Then load it library(nixtlar) # Set your API key nixtla_set_api_key(api_key = "Your API key here") ``` ## Quick Example ```r theme={null} # Load sample data df <- nixtlar::electricity head(df) # Forecast the next 8 steps ahead nixtla_client_fcst <- nixtla_client_forecast(df, h = 8, level = c(80,95)) # Optionally, plot the results nixtla_client_plot(df, nixtla_client_fcst, max_insample_length = 200) ``` ## Anomaly Detection Example ```r theme={null} # Detect anomalies nixtla_client_anomalies <- nixtlar::nixtla_client_detect_anomalies(df) # Plot with anomalies highlighted nixtlar::nixtla_client_plot(df, nixtla_client_anomalies, plot_anomalies = TRUE) ``` ## Features and Capabilities TimeGPT through the `nixtlar` package provides: * **Zero-shot Inference**: Generate forecasts and detect anomalies with no prior training * **Fine-tuning**: Enhance model performance for your specific datasets * **Add Exogenous Variables**: Incorporate additional variables like special dates or events to improve accuracy * **Multiple Series Forecasting**: Simultaneously forecast multiple time series * **Custom Loss Function**: Tailor the fine-tuning process with specific performance metrics * **Cross Validation**: Implement out-of-the-box validation techniques * **Prediction Intervals**: Quantify uncertainty in your predictions * **Irregular Timestamps**: Handle data with non-uniform intervals ## How to Cite If you find TimeGPT useful for your research, please consider citing: ``` Garza, A., Challu, C., & Mergenthaler-Canseco, M. (2024). TimeGPT-1. arXiv preprint arXiv:2310.03589. Available at https://arxiv.org/abs/2310.03589 ``` ## Support If you have questions or need support, please email `support@nixtla.io`. TimeGPT is closed source. However, this SDK is open source and available under the Apache 2.0 License. # TimeGEN-1 Quickstart (Azure) Source: https://nixtla.io/docs/setup/azureai Quickstart guide to deploy and use TimeGEN-1 on Azure with the Nixtla Python SDK for time series forecasting. TimeGEN-1 is TimeGPT optimized for Azure infrastructure. It is a production-ready generative pretrained transformer for time series, capable of accurately predicting domains such as retail, electricity, finance, and IoT with minimal code. Azure-native generative forecasting with TimeGEN-1 for streamlined deployments. • Demand forecasting\\ • Electricity load prediction\\ • Financial time series\\ • IoT data analysis 1. Visit [ml.azure.com](https://ml.azure.com) and sign in (or create a Microsoft account if needed). 2. Click **Models** in the sidebar. 3. Search for **TimeGEN** in the catalog and select **TimeGEN-1**. 4. Click **Deploy** to create an endpoint. ![TimeGEN-1 model catalog deployment option](https://github.com/Nixtla/nixtla/blob/main/nbs/img/azure-deploy.png?raw=true) 5. Click **Endpoint** in the sidebar. 6. Copy the **base URL** and **API Key** shown for your TimeGEN-1 endpoint. ![Endpoint URL and API key](https://github.com/Nixtla/nixtla/blob/main/nbs/img/azure-endpoint.png?raw=true) Install the **nixtla** package using pip: ```shell Install nixtla SDK theme={null} pip install nixtla ``` Import the Nixtla client into your Python environment: ```python Import NixtlaClient theme={null} from nixtla import NixtlaClient ``` Then create a client instance using your TimeGEN-1 endpoint credentials: ```python Instantiate NixtlaClient theme={null} nixtla_client = NixtlaClient( base_url="YOUR_BASE_URL", api_key="YOUR_API_KEY" ) ``` In this example, we'll use the classic **AirPassengers** dataset to demonstrate forecasting. The dataset shows monthly passenger counts in Australia between 1949 and 1960. ```python Load AirPassengers dataset theme={null} import pandas as pd df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv' ) df.head() ``` Use the Nixtla client to quickly visualize your data: ```python Visualize time series theme={null} nixtla_client.plot(df, time_col='timestamp', target_col='value') ``` ![AirPassengers time series visualization](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/22_azure_quickstart_files/figure-markdown_strict/cell-12-output-1.png) • Ensure the target column has no missing or non-numeric values.\\ • Avoid gaps in date stamps (for the specific frequency) from the initial to final timestamp—missing dates are not automatically imputed.\\ • Datestamps must be in a pandas-readable format. ([See Pandas reference](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)) See [Data Requirements](/docs/data_requirements/data_requirements) for details. In most notebook environments, figures display automatically. To save a figure locally, run: ```python Save plot figure theme={null} fig = nixtla_client.plot(df, time_col='timestamp', target_col='value') fig.savefig('plot.png', bbox_inches='tight') ``` Use the `forecast` method from the Nixtla client to forecast the next 12 months. • `df`: Pandas DataFrame with time series data\\ • `h`: Forecast horizon (number of steps ahead)\\ • `freq`: Time series frequency ([pandas frequency aliases](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases))\\ • `time_col`: Name of timestamp column\\ • `target_col`: Name of forecast variable ```python Generate 12-month forecast theme={null} timegen_fcst_df = nixtla_client.forecast( df=df, h=12, freq='MS', time_col='timestamp', target_col='value' ) timegen_fcst_df.head() ``` Forecast endpoint call logs will be displayed for validation and preprocessing steps. ```bash Forecast API call logs theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Inferred freq: MS INFO:nixtla.nixtla_client:Restricting input... INFO:nixtla.nixtla_client:Calling Forecast Endpoint... ``` Example output: | | timestamp | TimeGPT | | - | ---------- | ---------- | | 0 | 1961-01-01 | 437.837921 | | 1 | 1961-02-01 | 426.062714 | | 2 | 1961-03-01 | 463.116547 | | 3 | 1961-04-01 | 478.244507 | | 4 | 1961-05-01 | 505.646484 | Visualize the forecast results: ```python Visualize forecast results theme={null} nixtla_client.plot(df, timegen_fcst_df, time_col='timestamp', target_col='value') ``` ![Forecast visualization AirPassengers](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/getting-started/22_azure_quickstart_files/figure-markdown_strict/cell-14-output-1.png) # Docker Image for TimeGPT Source: https://nixtla.io/docs/setup/docker Learn how to access TimeGPT via a Docker image You can deploy TimeGPT in your own local infrastructure using our provided Docker image. This solution is ideal for enterprise customers who wish to keep their data secure and give access to TimeGPT to everyone in the organization through their own cloud provider or local infrastructure. Benefits of using the Docker image are: * Cloud-agnostic installation * Full control over the server's hardware (CPU only or with GPU), maintenance and uptime * Data is secure as per your own guidelines The Docker image is available for entreprise customers. To request access to TimeGPT and deploy it on your own local infrastructure, [book a call with us](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=7a3723cd-153e-4901-a81c-f6cee9d6a6a3\&utm_source=documentation\&utm_medium=setup-docs\&utm_campaign=docker-timegpt). # Python Wheel for TimeGPT Source: https://nixtla.io/docs/setup/python_wheel Learn how to access TimeGPT via a Python wheel Using TimeGPT through API calls might not be the optimal solution for your organization, as it implies sending data to external servers. One way of respecting data security requirements is to use a Python wheel. We can send you a custom Python wheel for your own needs, allowing you to locally install TimeGPT. That way, you can make forecasts and perform anomaly detection locally, without the need of a server. Benefits of using a Python wheel include: * Local installation so there is no need of a dedicated server * Lower latency as there is no data transfer * Data is secure as it never leaves your local machine The Python wheel is available for enterprise clients. To request access via a Python wheel, [book a call with us](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=7a3723cd-153e-4901-a81c-f6cee9d6a6a3\&utm_source=documentation\&utm_medium=setup-docs\&utm_campaign=python-wheel-timegpt). Once access is granted, we will send a Python wheel as well as all the necessary instructions to install and use TimeGPT locally. # Setting up your API key Source: https://nixtla.io/docs/setup/setting_up_your_api_key Learn how to securely configure your Nixtla SDK API key using direct code or environment variables. This tutorial explains how to set up your API key when using the Nixtla SDK. It covers both quick and secure methods to configure your API key directly in code or using environment variables. If you haven't done so yet, create an API Key in your [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/setup/setting_up_your_api_key). ![Diagram of the API Key configuration process](https://github.com/Nixtla/nixtla/blob/main/nbs/img/dashboard.png?raw=true) ## Overview Your API key grants access to your Nixtla account and should be treated like a password. By securing it, you prevent unauthorized usage and protect your usage credits. Your API key can be generated from your Nixtla Dashboard under the **API Keys** section. Make sure you copy the entire key with no extra spaces. ## How to configure your API key This approach is simple but not secure. Your API key will be stored in your source code, visible to anyone with access to it. **Step 1:** Copy your key from the [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/setup/setting_up_your_api_key). **Step 2:** Paste the key into your Python code, for example: ```python NixtlaClient Initialization with API Key theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient(api_key='your_api_key_here') ``` Storing your API key in an environment variable is the recommended approach for security and ease of sharing code without exposing credentials. This method requires setting an environment variable named `NIXTLA_API_KEY`. The Nixtla SDK automatically detects this environment variable without needing to manually pass it into `NixtlaClient`. Open your terminal and use the `export` command: ```bash Setting Environment Variable Temporarily on Linux/Mac theme={null} export NIXTLA_API_KEY=your_api_key ``` Open a PowerShell session and set the environment variable: ```powershell Setting Environment Variable Temporarily on Windows PowerShell theme={null} $env:NIXTLA_API_KEY = "your_api_key" ``` After setting the variable, instantiate the `NixtlaClient` without specifying the key: ```python NixtlaClient Initialization Using Environment Variable theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient() ``` Create a file named `.env` in the same directory as your Python script with the following content: ```env .env File Content theme={null} NIXTLA_API_KEY=your_api_key ``` Then, in your Python script, load it with the `dotenv` package: ```python Load Environment Variables with dotenv theme={null} from dotenv import load_dotenv load_dotenv() from nixtla import NixtlaClient nixtla_client = NixtlaClient() ``` Be sure not to commit your `.env` file to public repositories. Your API key grants access to your Nixtla account. ## Validate your API key Use the `validate_api_key` method of `NixtlaClient` to confirm that you have correctly configured your API key. This method returns `True` if your API key is valid, or `False` otherwise: ```python Validate API Key Method theme={null} nixtla_client.validate_api_key() ``` You do not need to validate your API key before every request. This method is a convenience function. To fully access **TimeGPT** functionality, ensure you have adequate credits by checking your [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/setup/setting_up_your_api_key). You've now learned how to configure your Nixtla API key through multiple methods, ranging from the simplest copy-and-paste approach to more secure environment variable setups. Remember to keep your API key confidential to prevent unauthorized usage. # Forecasting Bitcoin Prices Source: https://nixtla.io/docs/use_cases/bitcoin_price_prediction Master Bitcoin price forecasting with TimeGPT. Complete Python tutorial covering cryptocurrency prediction, anomaly detection, uncertainty quantification, and risk management strategies. ## Introduction [Time series forecasting](/docs/forecasting/timegpt_quickstart) is essential in finance for trading, risk management, and strategic planning. However, predicting financial asset prices remains challenging due to market volatility. Whether you believe financial forecasting is possible or your role requires it, [TimeGPT](/docs/introduction/about_timegpt) simplifies the process. This tutorial demonstrates how to use TimeGPT for Bitcoin price prediction and uncertainty quantification for risk management. ### Why Forecast Bitcoin Prices Bitcoin (₿), the first decentralized cryptocurrency, records transactions on a blockchain. Bitcoins are mined by solving cryptographic tasks and can be used for payments, trading, or investment. Bitcoin's volatility and popularity make price forecasting valuable for trading strategies and risk management. ### What You'll Learn * How to load and prepare Bitcoin price data * How to generate [short-term forecasts](/docs/forecasting/timegpt_quickstart) with TimeGPT * How to visualize and interpret forecast results * How to [detect anomalies](/docs/anomaly_detection/real-time/introduction) and add [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features) The procedures in this tutorial apply to many [financial asset forecasting](/docs/use_cases/forecasting_energy_demand) scenarios, not just Bitcoin. ## How to Forecast Bitcoin Prices with TimeGPT [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/2_bitcoin_price_prediction.ipynb) ### Step 1: Load Bitcoin Price Data Start by loading the Bitcoin price data: ```python theme={null} import pandas as pd # Load Bitcoin historical price data from 2020-2023 df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/bitcoin_price_usd.csv', sep=',' ) df.head() ``` | | Date | Close | | - | ---------- | ----------- | | 0 | 2020-01-01 | 7200.174316 | | 1 | 2020-01-02 | 6985.470215 | | 2 | 2020-01-03 | 7344.884277 | | 3 | 2020-01-04 | 7410.656738 | | 4 | 2020-01-05 | 7411.317383 | This dataset includes daily Bitcoin closing prices (in USD) from 2020 to 2023. "Closing price" refers to the price at a specific daily time, not a traditional market close. Next, rename the columns to match TimeGPT's expected `ds` (date) and `y` (target) format. ```python theme={null} # Rename columns to TimeGPT's expected format (ds=date, y=target value) df.rename(columns={'Date': 'ds', 'Close': 'y'}, inplace=True) ``` ### Step 2: Get Started with TimeGPT Initialize the `NixtlaClient` with your Nixtla API key. To learn more about how to set up your API key, see [Setting up your API key](/docs/setup/setting_up_your_api_key). ```python theme={null} from nixtla import NixtlaClient # Initialize TimeGPT client with your API key nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 3: Visualize the Data Before attempting any forecasting, it is good practice to visualize the data we want to predict. The `NixtlaClient` class includes a `plot` method for this purpose. The `plot` method has an `engine` argument that allows you to choose between different plotting libraries. Default is `matplotlib`, but you can also use `plotly` for interactive plots. ```python theme={null} # Visualize Bitcoin price history nixtla_client.plot(df) ``` ![Bitcoin historical price data from 2020-2023 showing upward trends and significant volatility patterns](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/2_bitcoin_price_prediction_files/figure-markdown_strict/cell-12-output-1.png) If you did not rename the columns, specify them explicitly: ```python theme={null} nixtla_client.plot( df, time_col='Date Column', target_col='Close Column' ) ``` ### Step 4: Forecast with TimeGPT Now we are ready to generate predictions with TimeGPT. To do this, we will use the `forecast` method from the `NixtlaClient` class. The `forecast` method requires the following arguments: * `df`: The DataFrame containing the time series data * `h`: (int) The forecast horizon. In this case, we will forecast the next 7 days. * `level`: (list) The confidence level for the prediction intervals. Given the inherent volatility of Bitcoin, we will use multiple confidence levels. ```python theme={null} # Generate 7-day forecast with 50%, 80%, and 90% prediction intervals level = [50, 80, 90] fcst = nixtla_client.forecast( df, h=7, # Forecast horizon: 7 days level=level # Confidence intervals for uncertainty quantification ) fcst.head() ``` | | ds | TimeGPT | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-lo-50 | TimeGPT-hi-50 | TimeGPT-hi-80 | TimeGPT-hi-90 | | - | ---------- | ------------ | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | | 0 | 2024-01-01 | 42269.460938 | 39567.209020 | 40429.953636 | 41380.654646 | 43158.267229 | 44108.968239 | 44971.712855 | | 1 | 2024-01-02 | 42469.917969 | 39697.941669 | 40578.197049 | 41466.511361 | 43473.324576 | 44361.638888 | 45241.894268 | | 2 | 2024-01-03 | 42864.078125 | 40538.871243 | 41586.252507 | 42284.316674 | 43443.839576 | 44141.903743 | 45189.285007 | | 3 | 2024-01-04 | 42881.621094 | 40603.117448 | 41216.106493 | 42058.539392 | 43704.702795 | 44547.135694 | 45160.124739 | | 4 | 2024-01-05 | 42773.457031 | 40213.699760 | 40665.384780 | 41489.812431 | 44057.101632 | 44881.529282 | 45333.214302 | We can pass the forecasts we just generated to the `plot` method to visualize the predictions with the historical data. ```python theme={null} # Plot historical data with forecast and confidence intervals nixtla_client.plot(df, fcst, level=level) ``` ![Bitcoin price forecast showing 7-day ahead predictions with 50%, 80%, and 90% confidence intervals using TimeGPT](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/2_bitcoin_price_prediction_files/figure-markdown_strict/cell-14-output-1.png) To get a closer look at the predictions, we can zoom in on the plot or specify the maximum number of in-sample observations to be plotted using the `max_insample_length` argument. Note that setting `max_insample_length=60`, for instance, will display the last 60 historical values along with the complete forecast. ![Detailed zoom showing Bitcoin 7-day price forecast with 50%, 80%, and 90% prediction intervals for uncertainty quantification](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/2_bitcoin_price_prediction_files/figure-markdown_strict/cell-15-output-1.png) ### Step 5: Extend Bitcoin Price Analysis with TimeGPT #### Anomaly Detection Given Bitcoin's volatility, identifying anomalies can be valuable. Use TimeGPT's `detect_anomalies` method to evaluate each observation statistically within its series context. By default, it identifies anomalies using a 99% prediction interval, which you can adjust with the `level` argument. ```python theme={null} # Detect anomalies in Bitcoin price data anomalies_df = nixtla_client.detect_anomalies(df) # Visualize anomalies highlighted on the price chart nixtla_client.plot( df, anomalies_df, plot_anomalies=True ) ``` ![Bitcoin price anomaly detection showing highlighted unusual price movements and volatility spikes identified by TimeGPT](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/2_bitcoin_price_prediction_files/figure-markdown_strict/cell-19-output-1.png) To learn how to incorporate exogenous variables to TimeGPT, see [Real-time Anomaly Detection](/docs/anomaly_detection/real-time/introduction). #### Add Exogenous Variables To improve forecasts, include relevant data as exogenous variables, such as other cryptocurrency prices, stock market indices, or Bitcoin network transaction volumes. To learn how to incorporate exogenous variables to TimeGPT, see [Numeric Features Guide](/docs/forecasting/exogenous-variables/numeric_features). ## Understand the Model Limitations As stated in the introduction, predicting Bitcoin prices is challenging. The predictions here may appear accurate because they use recent data and update frequently, but the real test is forecasting future prices, not historical performance. For those who need or want to try to forecast these assets, `TimeGPT` can be an option that simplifies the forecasting process. With just a couple of lines of code, `TimeGPT` can help you: * Produce point forecasts * Quantify the uncertainty of your predictions * Produce in-sample forecasts * Detect anomalies * Incorporate exogenous variables To learn more about TimeGPT capabilities, see the [TimeGPT Introduction](/docs/introduction/introduction). ## Conclusion TimeGPT simplifies Bitcoin price forecasting by providing: * Accurate short-term predictions with quantified uncertainty * Automated anomaly detection for risk management * Support for exogenous variables to improve forecast accuracy This approach applies to various cryptocurrency and financial asset forecasting scenarios, helping traders and analysts make data-driven decisions despite market volatility. ### Next Steps * Explore [energy demand forecasting](/docs/use_cases/forecasting_energy_demand) with TimeGPT * Learn about [cross-validation](/docs/forecasting/evaluation/cross_validation) for model evaluation * Understand [fine-tuning](/docs/forecasting/fine-tuning/steps) for improved accuracy * Scale forecasts with [distributed computing](/docs/forecasting/forecasting-at-scale/computing_at_scale) ## References and Additional Material * [Joaquín Amat Rodrigo and Javier Escobar Ortiz (2022), "Bitcoin price prediction with Python, when the past does not repeat itself"](https://www.cienciadedatos.net/documentos/py41-forecasting-cryptocurrency-bitcoin-machine-learning-python.html) # Forecasting Energy Demand Source: https://nixtla.io/docs/use_cases/forecasting_energy_demand Energy demand forecasting tutorial using TimeGPT AI. Step-by-step Python guide for electricity consumption prediction with 90% faster predictions and superior accuracy. ## Introduction Energy demand forecasting is critical for grid operations, resource allocation, and infrastructure planning. Despite advances in methods, predicting consumption remains challenging due to weather, economic activity, and consumer behavior. This tutorial demonstrates how TimeGPT simplifies in-zone electricity forecasting while delivering superior accuracy and speed. We will use the [PJM Hourly Energy Consumption dataset](https://www.pjm.com/) covering five regions from October 2023 to September 2024. ### What You'll Learn * How to load and prepare energy consumption data * How to generate 4-day ahead forecasts with TimeGPT * How to evaluate forecast accuracy using MAE and sMAPE * How TimeGPT compares to deep learning models like N-HiTS The procedures in this tutorial apply to many time series forecasting scenarios beyond energy demand. ## How to Use TimeGPT to Forecast Energy Demand [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/3_electricity_demand.ipynb) ### Step 1: Initial Setup Install and import required packages, then create a NixtlaClient instance to interact with TimeGPT. ```python theme={null} import time import requests import pandas as pd from nixtla import NixtlaClient from utilsforecast.losses import mae, smape from utilsforecast.evaluation import evaluate nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY") ) ``` ### Step 2: Load Energy Consumption Data Load the energy consumption dataset and convert datetime strings to timestamps. ```python theme={null} df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/refs/heads/main/datasets/pjm_in_zone.csv') df['ds'] = pd.to_datetime(df['ds']) # Examine the dataset df.groupby('unique_id').head(2) ``` | | unique\_id | ds | y | | - | ---------- | ------------------------- | -------- | | 0 | AP-AP | 2023-10-01 04:00:00+00:00 | 4042.513 | | 1 | AP-AP | 2023-10-01 05:00:00+00:00 | 3850.067 | Plot the data series to visualize seasonal patterns. ```python theme={null} nixtla_client.plot( df, max_insample_length=365 ) ``` ![Seasonal patterns in energy consumption showing daily and weekly cycles in PJM electricity demand data](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/3_electricity_demand_files/figure-markdown_strict/cell-12-output-1.png) ### Step 3: Generate Energy Demand Forecasts with TimeGPT We'll split our dataset into: * A training/input set for model calibration * A testing set (last 4 days) to validate performance ```python theme={null} # Prepare test (last 4 days) and input data test_df = df.groupby('unique_id').tail(96) input_df = df.groupby('unique_id').apply(lambda group: group.iloc[-1104:-96]).reset_index(drop=True) # Make forecasts start = time.time() fcst_df = nixtla_client.forecast( df=input_df, h=96, level=[90], finetune_steps=10, finetune_loss='mae', model='timegpt-1-long-horizon', time_col='ds', target_col='y', id_col='unique_id' ) end = time.time() timegpt_duration = end - start print(f"Time (TimeGPT): {timegpt_duration}") # Visualize forecasts against actual values nixtla_client.plot( test_df, fcst_df, models=['TimeGPT'], level=[90], time_col='ds', target_col='y' ) ``` ![TimeGPT energy demand forecast with 90% confidence intervals compared to actual electricity consumption](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/3_electricity_demand_files/figure-markdown_strict/cell-15-output-1.png) ### Step 4: Evaluate Forecast Accuracy Compute accuracy metrics (MAE and sMAPE) for TimeGPT. ```python theme={null} fcst_df['ds'] = pd.to_datetime(fcst_df['ds']) test_df = pd.merge(test_df, fcst_df, 'left', ['unique_id', 'ds']) evaluation = evaluate(test_df, [mae, smape], ["TimeGPT"], "y", "unique_id") average_metrics = evaluation.groupby('metric')['TimeGPT'].mean() average_metrics ``` ### Step 5: Forecast with N-HiTS For comparison, we train and forecast using the deep-learning model N-HiTS. ```python theme={null} from neuralforecast.core import NeuralForecast from neuralforecast.models import NHITS # Prepare training dataset by excluding the last 4 days train_df = df.groupby('unique_id').apply(lambda group: group.iloc[:-96]).reset_index(drop=True) models = [ NHITS(h=96, input_size=480, scaler_type='robust', batch_size=16, valid_batch_size=8) ] nf = NeuralForecast(models=models, freq='H') start = time.time() nf.fit(df=train_df) nhits_preds = nf.predict() end = time.time() print(f"Time (N-HiTS): {end - start}") ``` ### Step 6: Evaluate N-HiTS Compute accuracy metrics (MAE and sMAPE) for N-HiTS. ```python theme={null} preds_df = pd.merge(test_df, nhits_preds, 'left', ['unique_id', 'ds']) evaluation = evaluate(preds_df, [mae, smape], ["NHITS"], "y", "unique_id") average_metrics = evaluation.groupby('metric')['NHITS'].mean() print(average_metrics) ``` ## Conclusion TimeGPT demonstrates substantial performance improvements over N-HiTS across key metrics: * **Accuracy**: 18.6% lower MAE (882.6 vs 1084.7) * **Error Rate**: 31.1% lower sMAPE * **Speed**: 90% faster predictions (4.3 seconds vs 44 seconds) These results make TimeGPT a powerful tool for forecasting energy consumption and other time-series tasks where both accuracy and speed are critical. Experiment with the parameters to further optimize performance for your specific use case. ## Related Tutorials Ready to explore more forecasting applications? Check out these guides: * [Bitcoin Price Prediction with TimeGPT](/docs/use_cases/bitcoin_price_prediction) - Financial time series forecasting * [Exogenous Variables Guide](/docs/forecasting/exogenous-variables/numeric_features) - Improve forecasts with external data * [Long Horizon Forecasting](/docs/forecasting/model-version/longhorizon_model) - Extended forecast periods Learn more about [TimeGPT capabilities](/docs/introduction/introduction) for time series prediction. # Forecasting Intermittent Demand Source: https://nixtla.io/docs/use_cases/forecasting_intermittent_demand Master intermittent demand forecasting with TimeGPT for inventory optimization. Achieve 14% better accuracy than specialized models using the M5 dataset with exogenous variables and log transforms. ## Introduction Intermittent demand occurs when products or services have irregular purchase patterns with frequent zero-value periods. This is common in retail, spare parts inventory, and specialty products where demand is irregular rather than continuous. Forecasting these patterns accurately is essential for optimizing stock levels, reducing costs, and preventing stockouts. [TimeGPT](/docs/introduction/about_timegpt) excels at intermittent demand forecasting by capturing complex patterns that traditional statistical methods miss. This tutorial demonstrates TimeGPT's capabilities using the M5 dataset of food sales, including [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features) like pricing and promotional events that influence purchasing behavior. ### What You'll Learn * How to prepare and analyze intermittent demand data * How to leverage exogenous variables for better predictions * How to use log transforms to ensure realistic forecasts * How TimeGPT compares to specialized intermittent demand models The methods shown here apply broadly to inventory management and retail forecasting challenges. For getting started with TimeGPT, see our [quickstart guide](/docs/forecasting/timegpt_quickstart). ## How to Use TimeGPT to Forecast Intermittent Demand [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/4_intermittent_demand.ipynb) ### Step 1: Environment Setup Start by importing the required packages for this tutorial and create an instance of `NixtlaClient`. ```python theme={null} import pandas as pd import numpy as np from nixtla import NixtlaClient from utilsforecast.losses import mae from utilsforecast.evaluation import evaluate nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla') ``` ### Step 2: Load and Visualize the Dataset Load the dataset from the M5 dataset and convert the `ds` column to a datetime object: ```python theme={null} df = pd.read_csv("https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/m5_sales_exog_small.csv") df['ds'] = pd.to_datetime(df['ds']) df.head() ``` | unique\_id | ds | y | sell\_price | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting | | ------------- | ---------- | - | ----------- | --------------------- | --------------------- | ---------------------- | --------------------- | | FOODS\_1\_001 | 2011-01-29 | 3 | 2.0 | 0 | 0 | 0 | 0 | | FOODS\_1\_001 | 2011-01-30 | 0 | 2.0 | 0 | 0 | 0 | 0 | | FOODS\_1\_001 | 2011-01-31 | 0 | 2.0 | 0 | 0 | 0 | 0 | | FOODS\_1\_001 | 2011-02-01 | 1 | 2.0 | 0 | 0 | 0 | 0 | | FOODS\_1\_001 | 2011-02-02 | 4 | 2.0 | 0 | 0 | 0 | 0 | Visualize the dataset using the `plot` method: ```python theme={null} nixtla_client.plot( df, max_insample_length=365, ) ``` ![Dataset Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/4_intermittent_demand_files/figure-markdown_strict/cell-11-output-1.png) In the figure above, we can see the intermittent nature of this dataset, with many periods with zero demand. Now, let's use TimeGPT to forecast the demand of each product. ### Step 3: Transform the Data To avoid getting negative predictions coming from the model, we use a log transformation on the data. That way, the model will be forced to predict only positive values. Note that due to the presence of zeros in our dataset, we add one to all points before taking the log. ```python theme={null} df_transformed = df.copy() df_transformed['y'] = np.log(df_transformed['y'] + 1) ``` Now, let's keep the last 28 time steps for the test set and use the rest as input to the model. ```python theme={null} test_df = df_transformed.groupby('unique_id').tail(28) input_df = df_transformed.drop(test_df.index).reset_index(drop=True) ``` ### Step 4: Forecast with TimeGPT Forecast with TimeGPT using the `forecast` method: ```python theme={null} fcst_df = nixtla_client.forecast( df=input_df, h=28, level=[80], finetune_steps=10, # Learn more about fine-tuning: /forecasting/fine-tuning/steps finetune_loss='mae', model='timegpt-1-long-horizon', # For long-horizon forecasting: /forecasting/model-version/longhorizon_model time_col='ds', target_col='y', id_col='unique_id' ) ``` ```bash theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Inferred freq: D INFO:nixtla.nixtla_client:Calling Forecast Endpoint... ``` Great! We now have predictions. However, those predictions are transformed, so we need to inverse the transformation to get back to the original scale. Therefore, we take the exponential and subtract one from each data point. ```python theme={null} cols = [col for col in fcst_df.columns if col not in ['ds', 'unique_id']] fcst_df[cols] = np.exp(fcst_df[cols]) - 1 fcst_df.head() ``` | | unique\_id | ds | TimeGPT | TimeGPT-lo-80 | TimeGPT-hi-80 | | - | ------------- | ---------- | -------- | ------------- | ------------- | | 0 | FOODS\_1\_001 | 2016-05-23 | 0.286841 | -0.267101 | 1.259465 | | 1 | FOODS\_1\_001 | 2016-05-24 | 0.320482 | -0.241236 | 1.298046 | | 2 | FOODS\_1\_001 | 2016-05-25 | 0.287392 | -0.362250 | 1.598791 | | 3 | FOODS\_1\_001 | 2016-05-26 | 0.295326 | -0.145489 | 0.963542 | | 4 | FOODS\_1\_001 | 2016-05-27 | 0.315868 | -0.166516 | 1.077437 | ### Step 5: Evaluate the Forecasts Before measuring the performance metric, let's plot the predictions against the actual values. ```python theme={null} nixtla_client.plot( test_df, fcst_df, models=['TimeGPT'], level=[80], time_col='ds', target_col='y' ) ``` ![Predictions vs Actual Values](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/4_intermittent_demand_files/figure-markdown_strict/cell-16-output-1.png) Finally, we can measure the mean absolute error (MAE) of the model. Learn more about [evaluation metrics](/docs/forecasting/evaluation/evaluation_metrics) in our documentation. ```python theme={null} # Compute MAE test_df = pd.merge(test_df, fcst_df, how='left', on=['unique_id', 'ds']) evaluation = evaluate( test_df, metrics=[mae], models=['TimeGPT'], target_col='y', id_col='unique_id' ) average_metrics = evaluation.groupby('metric')['TimeGPT'].mean() average_metrics ``` ```bash theme={null} metric mae 0.492559 ``` ### Step 6: Compare with Statistical Models The library `statsforecast` by Nixtla provides a suite of statistical models specifically built for intermittent forecasting, such as Croston, IMAPA and TSB. Let's use these models and see how they perform against TimeGPT. ```python theme={null} from statsforecast import StatsForecast from statsforecast.models import CrostonClassic, CrostonOptimized, IMAPA, TSB sf = StatsForecast( models=[CrostonClassic(), CrostonOptimized(), IMAPA(), TSB(0.1, 0.1)], freq='D', n_jobs=-1 ) ``` Then, we can fit the models on our data. ```python theme={null} sf.fit(df=input_df) sf_preds = sf.predict(h=28) ``` Again, we need to inverse the transformation. Remember that the training data was previously transformed using the log function. ```python theme={null} cols = [col for col in sf_preds.columns if col not in ['ds', 'unique_id']] sf_preds[cols] = np.exp(sf_preds[cols]) - 1 sf_preds.head() ``` | | unique\_id | ds | CrostonClassic | CrostonOptimized | IMAPA | TSB | | - | ------------- | ---------- | -------------- | ---------------- | -------- | -------- | | 0 | FOODS\_1\_001 | 2016-05-23 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 1 | FOODS\_1\_001 | 2016-05-24 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 2 | FOODS\_1\_001 | 2016-05-25 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 3 | FOODS\_1\_001 | 2016-05-26 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 4 | FOODS\_1\_001 | 2016-05-27 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | Now, let's combine the predictions from all methods and see which performs best. ```python theme={null} test_df = pd.merge(test_df, sf_preds, 'left', ['unique_id', 'ds']) test_df.head() ``` | | unique\_id | ds | y | sell\_price | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting | TimeGPT | TimeGPT-lo-80 | TimeGPT-hi-80 | CrostonClassic | CrostonOptimized | IMAPA | TSB | | - | ------------- | ---------- | -------- | ----------- | --------------------- | --------------------- | ---------------------- | --------------------- | -------- | ------------- | ------------- | -------------- | ---------------- | -------- | -------- | | 0 | FOODS\_1\_001 | 2016-05-23 | 1.386294 | 2.24 | 0 | 0 | 0 | 0 | 0.286841 | -0.267101 | 1.259465 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 1 | FOODS\_1\_001 | 2016-05-24 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.320482 | -0.241236 | 1.298046 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 2 | FOODS\_1\_001 | 2016-05-25 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.287392 | -0.362250 | 1.598791 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 3 | FOODS\_1\_001 | 2016-05-26 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.295326 | -0.145489 | 0.963542 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | | 4 | FOODS\_1\_001 | 2016-05-27 | 1.945910 | 2.24 | 0 | 0 | 0 | 0 | 0.315868 | -0.166516 | 1.077437 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | ```python theme={null} statistical_models = ["CrostonClassic", "CrostonOptimized", "IMAPA", "TSB"] evaluation = evaluate( test_df, metrics=[mae], models=["TimeGPT"] + statistical_models, target_col="y", id_col='unique_id' ) average_metrics = evaluation.groupby('metric')[[ "TimeGPT"] + statistical_models].mean() average_metrics ``` | metric | TimeGPT | CrostonClassic | CrostonOptimized | IMAPA | TSB | | ------ | -------- | -------------- | ---------------- | -------- | -------- | | mae | 0.492559 | 0.564563 | 0.580922 | 0.571943 | 0.567178 | In the table above, we can see that TimeGPT achieves the lowest MAE, achieving a 12.8% improvement over the best performing statistical model. These results demonstrate TimeGPT's strong performance without additional features. We can further improve accuracy by incorporating exogenous variables, a capability TimeGPT supports but traditional statistical models do not. ### Step 7: Use Exogenous Variables To forecast with [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features), we need to specify their future values over the forecast horizon. Therefore, let's simply take the types of events, as those dates are known in advance. You can also explore using [date features](/docs/forecasting/exogenous-variables/date_features) and [holidays](/docs/forecasting/exogenous-variables/holiday_and_special_dates) as exogenous variables. ```python theme={null} # Include holiday/event data as exogenous features exog_cols = ['event_type_Cultural', 'event_type_National', 'event_type_Religious', 'event_type_Sporting'] futr_exog_df = test_df[['unique_id', 'ds'] + exog_cols] futr_exog_df.head() ``` | | unique\_id | ds | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting | | - | ------------- | ---------- | --------------------- | --------------------- | ---------------------- | --------------------- | | 0 | FOODS\_1\_001 | 2016-05-23 | 0 | 0 | 0 | 0 | | 1 | FOODS\_1\_001 | 2016-05-24 | 0 | 0 | 0 | 0 | | 2 | FOODS\_1\_001 | 2016-05-25 | 0 | 0 | 0 | 0 | | 3 | FOODS\_1\_001 | 2016-05-26 | 0 | 0 | 0 | 0 | | 4 | FOODS\_1\_001 | 2016-05-27 | 0 | 0 | 0 | 0 | Then, we simply call the `forecast` method and pass the `futr_exog_df` in the `X_df` parameter. ```python theme={null} fcst_df = nixtla_client.forecast( df=input_df, X_df=futr_exog_df, h=28, level=[80], # Generate a 80% confidence interval finetune_steps=10, # Specify the number of steps for fine-tuning finetune_loss='mae', # Use the MAE as the loss function for fine-tuning model='timegpt-1-long-horizon', # Use the model for long-horizon forecasting time_col='ds', target_col='y', id_col='unique_id' ) ``` ```bash theme={null} INFO:nixtla.nixtla_client:Validating inputs... INFO:nixtla.nixtla_client:Preprocessing dataframes... INFO:nixtla.nixtla_client:Inferred freq: D INFO:nixtla.nixtla_client:Using the following exogenous variables: event_type_Cultural, event_type_National, event_type_Religious, event_type_Sporting INFO:nixtla.nixtla_client:Calling Forecast Endpoint... ``` Great! Remember that the predictions are transformed, so we have to inverse the transformation again. ```python theme={null} fcst_df.rename(columns={'TimeGPT': 'TimeGPT_ex'}, inplace=True) cols = [col for col in fcst_df.columns if col not in ['ds', 'unique_id']] fcst_df[cols] = np.exp(fcst_df[cols]) - 1 fcst_df.head() ``` | | unique\_id | ds | TimeGPT\_ex | TimeGPT-lo-80 | TimeGPT-hi-80 | | - | ------------- | ---------- | ----------- | ------------- | ------------- | | 0 | FOODS\_1\_001 | 2016-05-23 | 0.281922 | -0.269902 | 1.250828 | | 1 | FOODS\_1\_001 | 2016-05-24 | 0.313774 | -0.245091 | 1.286372 | | 2 | FOODS\_1\_001 | 2016-05-25 | 0.285639 | -0.363119 | 1.595252 | | 3 | FOODS\_1\_001 | 2016-05-26 | 0.295037 | -0.145679 | 0.963104 | | 4 | FOODS\_1\_001 | 2016-05-27 | 0.315484 | -0.166760 | 1.076830 | Finally, let's evaluate the performance of TimeGPT with exogenous features. ```python theme={null} test_df['TimeGPT_ex'] = fcst_df['TimeGPT_ex'].values test_df.head() ``` | | unique\_id | ds | y | sell\_price | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting | TimeGPT | TimeGPT-lo-80 | TimeGPT-hi-80 | CrostonClassic | CrostonOptimized | IMAPA | TSB | TimeGPT\_ex | | - | ------------- | ---------- | -------- | ----------- | --------------------- | --------------------- | ---------------------- | --------------------- | -------- | ------------- | ------------- | -------------- | ---------------- | -------- | -------- | ----------- | | 0 | FOODS\_1\_001 | 2016-05-23 | 1.386294 | 2.24 | 0 | 0 | 0 | 0 | 0.286841 | -0.267101 | 1.259465 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.281922 | | 1 | FOODS\_1\_001 | 2016-05-24 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.320482 | -0.241236 | 1.298046 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.313774 | | 2 | FOODS\_1\_001 | 2016-05-25 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.287392 | -0.362250 | 1.598791 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.285639 | | 3 | FOODS\_1\_001 | 2016-05-26 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.295326 | -0.145489 | 0.963542 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.295037 | | 4 | FOODS\_1\_001 | 2016-05-27 | 1.945910 | 2.24 | 0 | 0 | 0 | 0 | 0.315868 | -0.166516 | 1.077437 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.315484 | ```python theme={null} evaluation = evaluate( test_df, metrics=[mae], models=["TimeGPT"] + statistical_models + ["TimeGPT_ex"], target_col="y", id_col='unique_id' ) average_metrics = ( evaluation.groupby('metric')[["TimeGPT"] + statistical_models + ["TimeGPT_ex"]] ).mean() average_metrics ``` | metric | TimeGPT | CrostonClassic | CrostonOptimized | IMAPA | TSB | TimeGPT\_ex | | ------ | -------- | -------------- | ---------------- | -------- | -------- | ----------- | | mae | 0.492559 | 0.564563 | 0.580922 | 0.571943 | 0.567178 | 0.485352 | From the table above, we can see that using exogenous features improved the performance of TimeGPT. Now, it represents a 14% improvement over the best statistical model. ## Conclusion TimeGPT provides a robust solution for forecasting intermittent demand: * \~14% MAE improvement over specialized models * Supports exogenous features for enhanced accuracy By leveraging TimeGPT and combining both internal series patterns and external factors, organizations can achieve more reliable forecasts even for challenging intermittent demands. ### Next Steps * Explore [other use cases](/docs/use_cases/forecasting_energy_demand) with TimeGPT * Learn about [probabilistic forecasting](/docs/forecasting/probabilistic/introduction) with prediction intervals * Scale your forecasts with [distributed computing](/docs/forecasting/forecasting-at-scale/computing_at_scale) * Fine-tune models with [custom loss functions](/docs/forecasting/fine-tuning/custom_loss) # Forecasting Web Traffic Source: https://nixtla.io/docs/use_cases/forecasting_web_traffic Learn how to predict website traffic patterns using TimeGPT. **Goal:** Forecast the next 7 days of daily visits to the website [cienciadedatos.net](https://cienciadedatos.net) using TimeGPT. This tutorial is adapted from *"Forecasting web traffic with machine learning and Python"* by Joaquín Amat Rodrigo and Javier Escobar Ortiz. You will learn how to: Obtain forecasts nearly 10% more accurate than the original method. Use significantly fewer lines of code and simpler workflows. Generate forecasts in substantially less computation time. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/1_forecasting_web_traffic.ipynb) To start, import the required packages and initialize the Nixtla client with your API key. ```python Nixtla Client Initialization theme={null} import pandas as pd from nixtla import NixtlaClient nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY") ) ``` **Use an Azure AI endpoint**
If you are using an Azure AI endpoint, also set the `base_url` argument: ```python Azure AI Endpoint Setup theme={null} nixtla_client = NixtlaClient( base_url="your_azure_ai_endpoint", api_key="your_api_key" ) ```
We will load the website visit data directly from a CSV file. Then, we format the dataset by adding an identifier column named `daily_visits`. ```python Load and Format Data theme={null} url = ( 'https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/' 'master/data/visitas_por_dia_web_cienciadedatos.csv' ) df = pd.read_csv(url, sep=',', parse_dates=[0], date_format='%d/%m/%y') df['unique_id'] = 'daily_visits' df.head(10) ``` | | date | users | unique\_id | | - | ---------- | ----- | ------------- | | 0 | 2020-07-01 | 2324 | daily\_visits | | 1 | 2020-07-02 | 2201 | daily\_visits | | 2 | 2020-07-03 | 2146 | daily\_visits | | 3 | 2020-07-04 | 1666 | daily\_visits | | 4 | 2020-07-05 | 1433 | daily\_visits | | 5 | 2020-07-06 | 2195 | daily\_visits | | 6 | 2020-07-07 | 2240 | daily\_visits | | 7 | 2020-07-08 | 2295 | daily\_visits | | 8 | 2020-07-09 | 2279 | daily\_visits | | 9 | 2020-07-10 | 2155 | daily\_visits | **Note:** No further preprocessing is required before we start forecasting. We will set up a rolling window cross-validation using TimeGPT. This will help us evaluate the forecast accuracy across multiple historic windows. ```python Cross-validation Setup theme={null} timegpt_cv_df = nixtla_client.cross_validation( df, h=7, n_windows=8, time_col='date', target_col='users', freq='D', level=[80, 90, 99.5] ) timegpt_cv_df.head() ``` ![CV Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/1_forecasting_web_traffic_files/figure-markdown_strict/cell-12-output-1.png) The results align closely with those from the original tutorial on [forecasting web traffic with machine learning](https://cienciadedatos.net/documentos/py37-forecasting-web-traffic-machine-learning.html). Next, we compute the Mean Absolute Error (MAE) to quantify forecast accuracy: ```python Calculate MAE theme={null} from utilsforecast.losses import mae mae_timegpt = mae( df=timegpt_cv_df.drop(columns=['cutoff']), models=['TimeGPT'], target_col='users' ) mae_timegpt ``` **MAE Result:** The MAE obtained is `167.69`, outperforming the original pipeline. Exogenous variables can provide additional context that may improve forecast accuracy. In this example, we add binary indicators for each day of the week. ```python Add Weekday Indicators theme={null} for i in range(7): df[f'week_day_{i + 1}'] = 1 * (df['date'].dt.weekday == i) df.head(10) ``` We repeat the cross-validation with these new features: ```python Cross-validation with Exogenous Variables theme={null} timegpt_cv_df_with_ex = nixtla_client.cross_validation( df, h=7, n_windows=8, time_col='date', target_col='users', freq='D', level=[80, 90, 99.5] ) ``` ![Exogenous CV Plot](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/1_forecasting_web_traffic_files/figure-markdown_strict/cell-17-output-1.png) Adding weekday indicators can capture weekly seasonality in user visits. | **Model** | **Exogenous features** | **MAE Backtest** | | --------- | ---------------------- | ---------------- | | TimeGPT | No | 167.6917 | | TimeGPT | Yes | 167.2286 | We see a slight improvement in MAE by including the weekday indicators. This illustrates how TimeGPT can incorporate additional signals without complex data processing or extensive model tuning. **Key Takeaways** * TimeGPT simplifies forecasting workflows by reducing code and tuning overhead. * Feature engineering (like adding weekday variables) further boosts accuracy. * Cross-validation provides a robust way to evaluate model performance. We have demonstrated significant improvements in forecasting accuracy with minimal effort using TimeGPT. This avoids the majority of the complex steps required when building custom models—such as extensive feature engineering, validation, model comparisons, and hyperparameter tuning. **Good luck and happy forecasting!**
# Logging and Serving with MLFlow Source: https://nixtla.io/docs/use_cases/logging_and_serving_with_mlflow Use MLFlow to log experiment metrics using TimeGPT and serve TimeGPT ## Introduction [MLFlow](https://mlflow.org/) is an open-source platform that allows you, among other things, to track experiments to compare different hyperparameters and results, and to serve ML models on different platforms. In this tutorial, we provide basic scripts that you can use to track experiments made with TimeGPT, or to serve TimeGPT through MLFLow. Each script is customizable to your own needs. The goal is to provide an easy-to-use template that you can extend. ## Experiment Tracking with TimeGPT and MLFlow The following scripts provide functions for logging experiment results when testing different parameter combinations in TimeGPT. Of course, to use these scripts make sure to have MLFLow installed and any other required dependencies. To install MLFlow, run `pip install mlflow`. Here, we use the following packages: ```python theme={null} import os import mlflow import pandas as pd from datetime import datetime from dotenv import load_dotenv # Used when working with the API directly or a Docker deployment from api.serverless import make_client # Used when working with the Python wheel ``` In all scripts, we require that you initialize the `NixtlaClient` and pass it to the functions. You can see how to setup your client with an API key [here](/docs/setup/setting_up_your_api_key). ### Logging experiments with forecasting The following script can be used to log results when using the `forecast` method. ```python theme={null} def log_timegpt_forecast( client: NixtlaClient, df: pd.DataFrame, h: int = 12, freq: str = "MS", level: list = None, model: str = "timegpt-2-mini", experiment_name: str = "basic_forecast", time_col: str = "ds", target_col: str = "y", id_col: str = "unique_id", **kwargs ): """ Perform TimeGPT forecast and log to MLFlow. Parameters: ----------- client : NixtlaClient Initialized Nixtla client df : pd.DataFrame Input dataframe with time series data h : int Forecast horizon freq : str Frequency of the time series level : list Confidence levels for prediction intervals model : str TimeGPT model variant to use experiment_name : str Name for this MLFlow run time_col : str Name of the time column in df target_col : str Name of the target column in df id_col : str Name of the series identifier column in df **kwargs : dict Additional parameters to pass to forecast() """ with mlflow.start_run(run_name=experiment_name): # Log parameters mlflow.log_param("horizon", h) mlflow.log_param("frequency", freq) mlflow.log_param("model", model) mlflow.log_param("n_series", df[id_col].nunique() if id_col in df.columns else 1) mlflow.log_param("n_observations", len(df)) if level: mlflow.log_param("prediction_intervals", level) # Log any additional parameters for key, value in kwargs.items(): mlflow.log_param(key, value) # Log dataset info mlflow.log_param("start_date", df[time_col].min()) mlflow.log_param("end_date", df[time_col].max()) # Forecast forecast_df = client.forecast( df=df, h=h, freq=freq, level=level, model=model, id_col=id_col, time_col=time_col, target_col=target_col, **kwargs ) # Log metrics mlflow.log_metric("forecast_points", len(forecast_df)) # Log forecast as artifact forecast_path = "forecast_output.csv" forecast_df.to_csv(forecast_path, index=False) mlflow.log_artifact(forecast_path) # Log input data as artifact input_path = "input_data.csv" df.to_csv(input_path, index=False) mlflow.log_artifact(input_path) # Add tags mlflow.set_tag("task_type", "forecasting") mlflow.set_tag("timestamp", datetime.now().isoformat()) # Clean up temporary files os.remove(forecast_path) os.remove(input_path) return forecast_df ``` If you want add more explicit parameters for fine-tuning or reproduce the exact type hints of the SDK, refer to the [SDK reference](/docs/reference/sdk_reference). ### Logging experiments with cross-validation The following is a simple script to log metrics when performing experiments with the `cross_validation` method. This script requires `utilsforecast` to compute performance metrics, so make sure `pip install utilsforecast`. The script allows you to log overall performance metrics across multiple windows, and also per-window metrics. ```python theme={null} def log_timegpt_cross_validation( client: NixtlaClient, df: pd.DataFrame, h: int = 12, n_windows: int = 1, step_size: int = None, freq: str = "MS", level: list = None, model: str = "timegpt-2-mini", experiment_name: str = "cross_validation", time_col: str = "ds", target_col: str = "y", id_col: str = "unique_id", **kwargs ): """ Perform TimeGPT cross-validation and log to MLFlow. Parameters: ----------- client : NixtlaClient Initialized Nixtla client df : pd.DataFrame Input dataframe with time series data h : int Forecast horizon n_windows : int Number of cross-validation windows step_size : int Step size between windows (default: h) freq : str Frequency of the time series level : list Confidence levels for prediction intervals model : str TimeGPT model variant to use experiment_name : str Name for this MLFlow run time_col : str Name of the time column in df target_col : str Name of the target column in df id_col : str Name of the series identifier column in df **kwargs : dict Additional parameters to pass to cross_validation() """ with mlflow.start_run(run_name=experiment_name): # Log parameters mlflow.log_param("horizon", h) mlflow.log_param("n_windows", n_windows) mlflow.log_param("step_size", step_size or h) mlflow.log_param("frequency", freq) mlflow.log_param("model", model) mlflow.log_param("n_series", df[id_col].nunique() if id_col in df.columns else 1) if level: mlflow.log_param("prediction_intervals", level) for key, value in kwargs.items(): mlflow.log_param(key, value) # Perform cross-validation cv_df = client.cross_validation( df=df, h=h, n_windows=n_windows, step_size=step_size, freq=freq, level=level, model=model, time_col=time_col, target_col=target_col, id_col=id_col, **kwargs ) # Calculate and log metrics from utilsforecast.losses import mae, mse, rmse # MAE mae_value = mae( df=cv_df.drop(columns=['cutoff']), models=['TimeGPT'], target_col=target_col, id_col=id_col, )['TimeGPT'].values[0] # MSE mse_value = mse( df=cv_df.drop(columns=['cutoff']), models=['TimeGPT'], target_col=target_col, id_col=id_col, )['TimeGPT'].values[0] # RMSE rmse_value = rmse( df=cv_df.drop(columns=['cutoff']), models=['TimeGPT'], target_col=target_col, id_col=id_col, )['TimeGPT'].values[0] mlflow.log_metric("mae", mae_value) mlflow.log_metric("mse", mse_value) mlflow.log_metric("rmse", rmse_value) mlflow.log_metric("total_cv_predictions", len(cv_df)) # Log per-window metrics (MAE as an example) cutoffs = cv_df['cutoff'].unique() for i, cutoff in enumerate(cutoffs): window_df = cv_df[cv_df['cutoff'] == cutoff] window_mae = mae( df=window_df.drop(columns=['cutoff']), models=['TimeGPT'], target_col=target_col, id_col=id_col, )['TimeGPT'].values[0] mlflow.log_metric(f"mae_window_{i+1}", window_mae) # Log artifacts cv_path = "cross_validation_results.csv" cv_df.to_csv(cv_path, index=False) mlflow.log_artifact(cv_path) # Log summary statistics summary = { 'metric': ['MAE', 'MSE', 'RMSE'], 'value': [mae_value, mse_value, rmse_value] } summary_df = pd.DataFrame(summary) summary_path = "metrics_summary.csv" summary_df.to_csv(summary_path, index=False) mlflow.log_artifact(summary_path) mlflow.set_tag("task_type", "cross_validation") mlflow.set_tag("timestamp", datetime.now().isoformat()) # Clean up os.remove(cv_path) os.remove(summary_path) return cv_df ``` If you want add more explicit parameters for fine-tuning or reproduce the exact type hints of the SDK, refer to the [SDK reference](/docs/reference/sdk_reference). ### Logging experiments with anomaly detection This script showcases how you can track experiments done with the `detect_anomalies_online` method. ```python theme={null} def log_timegpt_online_anomaly_detection( client: NixtlaClient, df: pd.DataFrame, h: int, detection_size: int, threshold_method: str = "univariate", freq: str = "D", level: int | float = 99, model: str = "timegpt-2-mini", experiment_name: str = "anomaly_detection", time_col: str = "ds", target_col: str = "y", id_col: str = "unique_id", **kwargs ): """ Perform TimeGPT anomaly detection and log results to MLFlow. Parameters: ----------- client : NixtlaClient Initialized Nixtla client df : pd.DataFrame Input dataframe with time series data freq : str Frequency of the time series level : int Confidence level for anomaly detection threshold model : str TimeGPT model variant to use experiment_name : str Name for this MLFlow run time_col : str Name of the time column in df target_col : str Name of the target column in df id_col : str Name of the series identifier column in df **kwargs : dict Additional parameters to pass to detect_anomalies() """ with mlflow.start_run(run_name=experiment_name): # Log parameters mlflow.log_param("horizon", h) mlflow.log_param("detection_size", detection_size) mlflow.log_param("threshold_method", threshold_method) mlflow.log_param("frequency", freq) mlflow.log_param("detection_level", level) mlflow.log_param("model", model) mlflow.log_param("n_observations", len(df)) for key, value in kwargs.items(): mlflow.log_param(key, value) # Detect anomalies anomalies_df = client.detect_anomalies_online( df=df, h=h, detection_size=detection_size, threshold_method=threshold_method, freq=freq, level=level, model=model, time_col=time_col, target_col=target_col, id_col=id_col, **kwargs ) # Calculate metrics n_anomalies = anomalies_df['anomaly'].sum() mlflow.log_metric("n_anomalies", n_anomalies) # Log results anomaly_path = "anomaly_detection_results.csv" anomalies_df.to_csv(anomaly_path, index=False) mlflow.log_artifact(anomaly_path) # Log only the detected anomalies if n_anomalies > 0: detected_anomalies = anomalies_df[anomalies_df['anomaly'] == True] detected_path = "detected_anomalies_only.csv" detected_anomalies.to_csv(detected_path, index=False) mlflow.log_artifact(detected_path) os.remove(detected_path) mlflow.set_tag("task_type", "anomaly_detection") mlflow.set_tag("timestamp", datetime.now().isoformat()) # Clean up os.remove(anomaly_path) return anomalies_df ``` If you want add more explicit parameters for fine-tuning or reproduce the exact type hints of the SDK, refer to the [SDK reference](/docs/reference/sdk_reference). ### Sample usage With the above functions, you can now run experiments and log it to MLFLow. First, you must instantiate your client using either: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient(api_key='your_api_key_here') ``` Or, if you are using a Python wheel: ```python theme={null} from api.serverless import make_client client = make_client() ``` Then, load your data. Here, we use the simple air passengers dataset. ```python theme={null} df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv' ) df.columns = ['ds', 'y'] df["unique_id"] = 0 df = df[["unique_id", "ds", "y"]] ``` After, you can set your tracking URI and experiment name for MLFlow. Note that here, we use the local filesystem for tracking. ```python theme={null} mlflow.set_tracking_uri("mlruns") experiment = mlflow.set_experiment("timegpt_experiments") ``` Finally, you can run any function defined above. ```python theme={null} forecast_df = log_timegpt_forecast( client=client, df=df, h=h, freq=freq, level=[80, 90], experiment_name="forecast_example", time_col=time_col, target_col=target_col, id_col=id_col, model="timegpt-2" ) cv_df = log_timegpt_cross_validation( client=client, df=df, h=h, n_windows=1, freq=freq, level=[90], experiment_name="cv_example", time_col=time_col, target_col=target_col, id_col=id_col, ) anomaly_df = log_timegpt_online_anomaly_detection( client=client, df=df, h=h, detection_size=h, threshold_method="univariate", freq=freq, level=60, model="timegpt-2-mini", experiment_name="anomaly_detection_example", time_col=time_col, target_col=target_col, id_col=id_col, ) ``` ## Serving TimeGPT with MLFlow You can also use MLFlow for model serving. This can be useful if you need a unified interface and serve the model on many endpoints. Although it is not required to use MLFlow to use TimeGPT, it can be a practice your organization enforces. The following script shows how you can wrap the TimeGPT model in an `mlflow.pyfunc.PythonModel` to save the model and call it. ```python theme={null} class MLFLowTimeGPTModel(mlflow.pyfunc.PythonModel): """ Unified MLflow pyfunc wrapper for TimeGPT Can perform forecasting, cross-validation, and anomaly detection based on the 'operation' parameter in the input. """ def __init__(self, client: NixtlaClient, model: str = "timegpt-2-mini", default_h: int = 96, default_freq: str = "15min", default_level: Optional[list] = None, default_n_windows: int = 1, default_anomaly_level: int = 99, ): """ Initialize the unified TimeGPT model wrapper. Parameters: ----------- model : str TimeGPT model variant (timegpt-2-mini, timegpt-2) default_h : int Default forecast horizon default_freq : str Default frequency default_level : list Default confidence levels for prediction intervals default_n_windows : int Default number of cross-validation windows default_anomaly_level : int Default confidence level for anomaly detection """ self.model = model self.default_h = default_h self.default_freq = default_freq self.default_level = default_level or [80, 90] self.default_n_windows = default_n_windows self.default_anomaly_level = default_anomaly_level self.client = client def load_context(self, context): """ Load the model context. Called once when the model is loaded. Parameters: ----------- context : mlflow.pyfunc.PythonModelContext Context containing artifacts and other model metadata """ # Load configuration from artifacts if present if context.artifacts and "config" in context.artifacts: with open(context.artifacts["config"], 'r') as f: config = json.load(f) self.model = config.get("model", self.model) self.default_h = config.get("h", self.default_h) self.default_freq = config.get("freq", self.default_freq) self.default_level = config.get("level", self.default_level) self.default_n_windows = config.get("n_windows", self.default_n_windows) self.default_anomaly_level = config.get("anomaly_level", self.default_anomaly_level) def predict(self, context, model_input): """Perform operation""" # Parse input if isinstance(model_input, dict): df = model_input.get('data') operation = model_input.get('operation', 'forecast') time_col = model_input.get('time_col', 'ds') target_col = model_input.get('target_col', 'y') id_col = model_input.get('id_col', 'unique_id') model_input.pop('time_col', None) model_input.pop('target_col', None) model_input.pop('id_col', None) else: # If just DataFrame is passed, default to forecast df = model_input operation = 'forecast' id_col = 'unique_id' time_col = 'ds' target_col = 'y' # Validate input if df is None or not isinstance(df, pd.DataFrame): raise ValueError("Input must contain a pandas DataFrame under 'data' key") # Route to appropriate operation if operation == 'forecast': return self._forecast(df, model_input, id_col, time_col, target_col) elif operation == 'cross_validation': return self._cross_validation(df, model_input, id_col, time_col, target_col) elif operation == 'anomaly_detection': return self._anomaly_detection(df, model_input, id_col, time_col, target_col) else: raise ValueError(f"Unknown operation: {operation}. Must be 'forecast', 'cross_validation', or 'anomaly_detection'") def _forecast(self, df, params, id_col, time_col, target_col): """Perform forecasting operation.""" h = params.get('h', self.default_h) if isinstance(params, dict) else self.default_h freq = params.get('freq', self.default_freq) if isinstance(params, dict) else self.default_freq level = params.get('level', self.default_level) if isinstance(params, dict) else self.default_level # Extract additional parameters additional_params = {} if isinstance(params, dict): exclude_keys = {'data', 'operation', 'h', 'freq', 'level', 'time_col', 'target_col'} additional_params = {k: v for k, v in params.items() if k not in exclude_keys} result = self.client.forecast( df=df, h=h, freq=freq, level=level, id_col=id_col, time_col=time_col, target_col=target_col, model=self.model, **additional_params ) return result def _cross_validation(self, df, params, id_col, time_col, target_col): """Perform cross-validation operation.""" h = params.get('h', self.default_h) if isinstance(params, dict) else self.default_h n_windows = params.get('n_windows', self.default_n_windows) if isinstance(params, dict) else self.default_n_windows freq = params.get('freq', self.default_freq) if isinstance(params, dict) else self.default_freq level = params.get('level', self.default_level) if isinstance(params, dict) else self.default_level step_size = params.get('step_size') if isinstance(params, dict) else None # Extract additional parameters additional_params = {} if isinstance(params, dict): exclude_keys = {'data', 'operation', 'h', 'n_windows', 'freq', 'level', 'step_size', 'time_col', 'target_col'} additional_params = {k: v for k, v in params.items() if k not in exclude_keys} result = self.client.cross_validation( df=df, h=h, n_windows=n_windows, step_size=step_size, freq=freq, level=level, id_col=id_col, time_col=time_col, target_col=target_col, model=self.model, **additional_params ) return result def _anomaly_detection(self, df, params, id_col, time_col, target_col): """Perform anomaly detection operation.""" h = params.get('h', self.default_h) if isinstance(params, dict) else self.default_h detection_size = params.get("detection_size", self.default_h) if isinstance(params, dict) else self.default_h threshold_method = params.get("threshold_method", "univariate") if isinstance(params, dict) else "univariate" freq = params.get('freq', self.default_freq) if isinstance(params, dict) else self.default_freq level = params.get('level', self.default_anomaly_level) if isinstance(params, dict) else self.default_anomaly_level # Extract additional parameters additional_params = {} if isinstance(params, dict): exclude_keys = {'data', 'operation', 'h', 'detection_size', 'threshold_method', 'freq', 'level', 'time_col', 'target_col'} additional_params = {k: v for k, v in params.items() if k not in exclude_keys} result = self.client.detect_anomalies_online( df=df, h=h, detection_size=detection_size, threshold_method=threshold_method, freq=freq, level=level, time_col=time_col, id_col=id_col, target_col=target_col, model=self.model, **additional_params ) return result def save_unified_model( client: NixtlaClient, model_path: str, model_variant: str = "timegpt-2-mini", **default_params ): """ Save a unified TimeGPT model that can perform all operations. Parameters: ----------- client : NixtlaClient Initialized Nixtla client model_path : str Path where the model will be saved model_variant : str TimeGPT model variant to use **default_params : dict Default parameters for operations """ python_model = MLFLowTimeGPTModel( client=client, model=model_variant, **default_params ) # Save the model mlflow.pyfunc.save_model( path=model_path, python_model=python_model, ) ``` ### Sample usage First, you must instantiate your client using either: ```python theme={null} from nixtla import NixtlaClient nixtla_client = NixtlaClient(api_key='your_api_key_here') ``` Or, if you are using a Python wheel: ```python theme={null} from api.serverless import make_client client = make_client() ``` Then, load your data. Here, we use the simple air passengers dataset. ```python theme={null} df = pd.read_csv( 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv' ) df.columns = ['ds', 'y'] df["unique_id"] = 0 df = df[["unique_id", "ds", "y"]] ``` Then, save the model with: ```python theme={null} save_unified_model(client=client, model_path="test_model", model_variant="timegpt-2-mini", default_h=h, default_freq=freq) ``` Note that if you want to use another variant of TimeGPT, say `timegpt-2`, then you must save another instance and specify that model variant. Now, you can perform forecasting, cross-validation and anomaly detection with the saved model in MLFlow. ```python theme={null} # Load model model = mlflow.pyfunc.load_model("test_model") # Forecast forecast = model.predict({'data': df, 'operation': 'forecast', 'h': h, 'id_col': id_col, 'time_col': time_col, 'target_col': target_col, 'freq': freq}) # Cross-validation cv = model.predict({'data': df, 'operation': 'cross_validation', 'h': h, 'n_windows': 1, 'id_col': id_col, 'time_col': time_col, 'target_col': target_col, 'freq': freq}) # Anomaly detection anomalies = model.predict({'data': df, 'operation': 'anomaly_detection', 'detection_size': h, 'id_col': id_col, 'time_col': time_col, 'target_col': target_col, 'freq': freq}) ``` # What-If Forecasting: Price Effects in Retail Source: https://nixtla.io/docs/use_cases/what_if_forecasting_price_effects_in_retail Master what-if forecasting with TimeGPT for retail pricing optimization. Learn scenario analysis to predict demand changes from price adjustments using the M5 dataset. Step-by-step Python tutorial. ## Introduction Pricing decisions significantly impact retail demand. [TimeGPT](/docs/introduction/about_timegpt) makes it possible to forecast product demand while incorporating price as a key factor, enabling retailers to evaluate how different pricing scenarios might affect sales. This approach offers valuable insights for strategic pricing decisions. This tutorial demonstrates how to use TimeGPT for scenario analysis by forecasting demand under various pricing conditions. You'll learn to incorporate price data into forecasts and compare different pricing strategies to understand their impact on consumer demand. ### What You'll Learn * How to forecast retail demand using price as an [exogenous variable](/docs/forecasting/exogenous-variables/numeric_features) * How to run what-if scenarios with different pricing strategies * How to compare baseline, increased, and decreased price forecasts * How to interpret price sensitivity in demand forecasts ## How to Forecast Sales with Pricing Scenarios [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/5_what_if_pricing_scenarios_in_retail.ipynb) ### Step 1: Import required packages Import the packages needed for this tutorial and initialize your Nixtla client: ```python theme={null} import pandas as pd import os from nixtla import NixtlaClient ``` Initialize the Nixtla client: ```python theme={null} nixtla_client = NixtlaClient( api_key='my_api_key_provided_by_nixtla' ) ``` ### Step 2: Load the M5 dataset Let's see an example on predicting sales of products of the [M5 dataset](https://nixtlaverse.nixtla.io/datasetsforecast/m5.html). The M5 dataset contains daily product demand (sales) for 10 retail stores in the US. First, we load the data using `datasetsforecast`. This returns: * `Y_df`, containing the sales (`y` column), for each unique product (`unique_id` column) at every timestamp (`ds` column). * `X_df`, containing additional relevant information for each unique product (`unique_id` column) at every timestamp (`ds` column). ```python theme={null} from datasetsforecast.m5 import M5 Y_df, X_df, S_df = M5.load(directory=os.getcwd()) Y_df.head(10) ``` | unique\_id | ds | y | | -------------------- | ---------- | --- | | FOODS\_1\_001\_CA\_1 | 2011-01-29 | 3.0 | | FOODS\_1\_001\_CA\_1 | 2011-01-30 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-01-31 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-01 | 1.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-02 | 4.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-03 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-04 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-05 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-06 | 0.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-07 | 0.0 | For this example, we will only keep the additional relevant information from the column `sell_price`. This column shows the selling price of the product, and we expect demand to fluctuate given a different selling price. ```python theme={null} X_df = X_df[['unique_id', 'ds', 'sell_price']] X_df.head(10) ``` | unique\_id | ds | sell\_price | | -------------------- | ---------- | ----------- | | FOODS\_1\_001\_CA\_1 | 2011-01-29 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-01-30 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-01-31 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-01 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-02 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-03 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-04 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-05 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-06 | 2.0 | | FOODS\_1\_001\_CA\_1 | 2011-02-07 | 2.0 | ### Step 3: Forecast demand using price as an exogenous variable In this example, we forecast for a single product (`FOODS_1_129_`) across all 10 stores. This product exhibits frequent price changes, making it ideal for modeling price effects on demand. Learn more about using [exogenous variables in TimeGPT](/docs/forecasting/exogenous-variables/numeric_features). ```python theme={null} products = [ 'FOODS_1_129_CA_1', 'FOODS_1_129_CA_2', 'FOODS_1_129_CA_3', 'FOODS_1_129_CA_4', 'FOODS_1_129_TX_1', 'FOODS_1_129_TX_2', 'FOODS_1_129_TX_3', 'FOODS_1_129_WI_1', 'FOODS_1_129_WI_2', 'FOODS_1_129_WI_3' ] Y_df_product = Y_df.query('unique_id in @products') X_df_product = X_df.query('unique_id in @products') ``` Merge the sales (`y`) and price (`sell_price`) data into one DataFrame: ```python theme={null} df = Y_df_product.merge(X_df_product) df.head(10) ``` | unique\_id | ds | y | sell\_price | | -------------------- | ---------- | --- | ----------- | | FOODS\_1\_129\_CA\_1 | 2011-02-01 | 1.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-02 | 0.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-03 | 0.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-04 | 0.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-05 | 1.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-06 | 0.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-07 | 0.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-08 | 0.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-09 | 0.0 | 6.22 | | FOODS\_1\_129\_CA\_1 | 2011-02-10 | 3.0 | 6.22 | Let's investigate how the demand, our target `y`, of these products has evolved in the last year of data. ```python theme={null} nixtla_client.plot(df, unique_ids=products, max_insample_length=365) ``` ![Historical retail demand showing intermittent sales patterns across 10 stores for FOODS\_1\_129 product](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/5_what_if_pricing_scenarios_in_retail_files/figure-markdown_strict/cell-15-output-1.png) We see that in the California stores (with a CA\_ suffix), the product has sold intermittently, whereas in the other regions (TX and WY) sales where less intermittent. Note that the plot only shows 8 (out of 10) stores. Next, we look at the `sell_price` of these products across the entire data available. ```python theme={null} nixtla_client.plot(df, unique_ids=products, target_col='sell_price') ``` ![Historical pricing trends showing approximately 20 price changes from 2011 to 2016 for retail products](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/5_what_if_pricing_scenarios_in_retail_files/figure-markdown_strict/cell-16-output-1.png) We find that there have been relatively few price changes (about 20 in total) over the period 2011 to 2016. Let's turn to our forecasting task. We will forecast the last 28 days in the dataset. To use the `sell_price` exogenous variable in TimeGPT, we have to add it as future values. Therefore, we create a future values dataframe, that contains the `unique_id`, the timestamp `ds`, and `sell_price`. ```python theme={null} future_ex_vars_df = df.drop(columns = ['y']) future_ex_vars_df = future_ex_vars_df.query("ds >= '2016-05-23'") future_ex_vars_df.head(10) ``` | unique\_id | ds | sell\_price | | -------------------- | ---------- | ----------- | | FOODS\_1\_129\_CA\_1 | 2016-05-23 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-24 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-25 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-26 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-27 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-28 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-29 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-30 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-05-31 | 5.74 | | FOODS\_1\_129\_CA\_1 | 2016-06-01 | 5.74 | Next, we limit our input dataframe to all but the 28 forecast days: ```python theme={null} df_train = df.query("ds < '2016-05-23'") df_train.tail(10) ``` | unique\_id | ds | y | sell\_price | | -------------------- | ---------- | --- | ----------- | | FOODS\_1\_129\_WI\_3 | 2016-05-13 | 3.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-14 | 1.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-15 | 2.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-16 | 3.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-17 | 1.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-18 | 2.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-19 | 3.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-20 | 1.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-21 | 0.0 | 7.23 | | FOODS\_1\_129\_WI\_3 | 2016-05-22 | 0.0 | 7.23 | Now, we can generate forecasts using TimeGPT (28 days ahead): ```python theme={null} timegpt_fcst_df = nixtla_client.forecast( df=df_train, X_df=future_ex_vars_df, h=28 ) timegpt_fcst_df.head() ``` | unique\_id | ds | TimeGPT | | -------------------- | ---------- | -------- | | FOODS\_1\_129\_CA\_1 | 2016-05-23 | 0.875594 | | FOODS\_1\_129\_CA\_1 | 2016-05-24 | 0.777731 | | FOODS\_1\_129\_CA\_1 | 2016-05-25 | 0.786871 | | FOODS\_1\_129\_CA\_1 | 2016-05-26 | 0.828223 | | FOODS\_1\_129\_CA\_1 | 2016-05-27 | 0.791228 | We plot the forecast, the actuals and the last 28 days before the forecast period: ```python theme={null} nixtla_client.plot( df[['unique_id', 'ds', 'y']], timegpt_fcst_df, max_insample_length=56 ) ``` ![TimeGPT baseline forecast showing actual demand and 28-day ahead predictions for retail products](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/5_what_if_pricing_scenarios_in_retail_files/figure-markdown_strict/cell-20-output-1.png) ### Step 4: What-If Scenario Forecasting with Price Changes What happens when we change the price of the products in our forecast period? Let's see how our forecast changes when we increase and decrease the `sell_price` by 5%. ```python theme={null} price_change = 0.05 future_ex_vars_df_plus = future_ex_vars_df.copy() future_ex_vars_df_plus["sell_price"] *= (1 + price_change) future_ex_vars_df_minus = future_ex_vars_df.copy() future_ex_vars_df_minus["sell_price"] *= (1 - price_change) ``` Let's create a new set of forecasts with TimeGPT. ```python theme={null} timegpt_fcst_df_plus = nixtla_client.forecast(df_train, future_ex_vars_df_plus, h=28) timegpt_fcst_df_minus = nixtla_client.forecast(df_train, future_ex_vars_df_minus, h=28) ``` Rename and combine the scenario forecasts: ```python theme={null} timegpt_fcst_df_plus = timegpt_fcst_df_plus.rename(columns={'TimeGPT':f'TimeGPT-sell_price_plus_{price_change * 100:.0f}%'}) timegpt_fcst_df_minus = timegpt_fcst_df_minus.rename(columns={'TimeGPT':f'TimeGPT-sell_price_minus_{price_change * 100:.0f}%'}) timegpt_fcst_df = pd.concat([timegpt_fcst_df, timegpt_fcst_df_plus[f'TimeGPT-sell_price_plus_{price_change * 100:.0f}%'], timegpt_fcst_df_minus[f'TimeGPT-sell_price_minus_{price_change * 100:.0f}%']], axis=1) timegpt_fcst_df.head(10) ``` | unique\_id | ds | TimeGPT | TimeGPT-sell\_price\_plus\_5% | TimeGPT-sell\_price\_minus\_5% | | -------------------- | ---------- | -------- | ----------------------------- | ------------------------------ | | FOODS\_1\_129\_CA\_1 | 2016-05-23 | 0.875594 | 0.847006 | 1.370029 | | FOODS\_1\_129\_CA\_1 | 2016-05-24 | 0.777731 | 0.749142 | 1.272166 | | FOODS\_1\_129\_CA\_1 | 2016-05-25 | 0.786871 | 0.758283 | 1.281306 | | FOODS\_1\_129\_CA\_1 | 2016-05-26 | 0.828223 | 0.799635 | 1.322658 | | FOODS\_1\_129\_CA\_1 | 2016-05-27 | 0.791228 | 0.762640 | 1.285663 | | FOODS\_1\_129\_CA\_1 | 2016-05-28 | 0.819133 | 0.790545 | 1.313568 | | FOODS\_1\_129\_CA\_1 | 2016-05-29 | 0.839992 | 0.811404 | 1.334427 | | FOODS\_1\_129\_CA\_1 | 2016-05-30 | 0.843070 | 0.814481 | 1.337505 | | FOODS\_1\_129\_CA\_1 | 2016-05-31 | 0.833089 | 0.804500 | 1.327524 | | FOODS\_1\_129\_CA\_1 | 2016-06-01 | 0.855032 | 0.826443 | 1.349467 | As expected, demand increases when we reduce the price and decreases when we increase it. A cheaper product leads to higher sales and vice versa. Finally, let's plot the forecasts for our different pricing scenarios, showing how TimeGPT forecasts a different demand when the price of a set of products is changed. ```python theme={null} nixtla_client.plot( df[['unique_id', 'ds', 'y']], timegpt_fcst_df, max_insample_length=56 ) ``` ![What-if scenario comparison: baseline, +5% price increase, and -5% price decrease demand forecasts](https://raw.githubusercontent.com/Nixtla/nixtla/readme_docs/nbs/_docs/docs/use-cases/5_what_if_pricing_scenarios_in_retail_files/figure-markdown_strict/cell-24-output-1.png) In the graphs we can see that for specific products for certain periods the discount increases expected demand, while during other periods and for other products, price change has a smaller effect on total demand. ## Conclusion What-if forecasting with TimeGPT enables data-driven pricing decisions by: * Modeling demand sensitivity to price changes * Comparing multiple pricing scenarios simultaneously * Incorporating exogenous variables for realistic predictions This scenario analysis approach helps retailers optimize pricing strategies and maximize revenue while understanding demand elasticity. ### Next Steps * Explore [intermittent demand forecasting](/docs/use_cases/forecasting_intermittent_demand) with TimeGPT * Learn about [fine-tuning models](/docs/forecasting/fine-tuning/steps) for better accuracy * Understand [cross-validation](/docs/forecasting/evaluation/cross_validation) for model evaluation * Scale forecasts with [distributed computing](/docs/forecasting/forecasting-at-scale/computing_at_scale) ### Important Considerations * This method assumes that historical demand and price behaviour is predictive of future demand, and omits other factors affecting demand. To include these other factors, use additional exogenous variables that provide the model with more context about the factors influencing demand. * This method is sensitive to unmodelled events that affect the demand, such as sudden market shifts. To include those, use additional exogenous variables indicating such sudden shifts if they have been observed in the past too.