# Key Concepts
Source: https://nixtla.io/docs/about/key-concepts
Understanding the foundations of time series forecasting with TimeGPT
These key concepts cover the foundations of time series data, how forecasts
are generated, and the role of TimeGPT in predicting future values and
detecting anomalies.
Use these concepts as a reference to better understand how TimeGPT simplifies
tasks such as demand forecasting, anomaly detection, and multi-series
forecasting.
A sequence of numerical data points arranged in chronological order.
Predicting future values by analyzing historical data and patterns.
Identifying unusual or unexpected events that deviate from typical behavior.
Managing and forecasting multiple time series data at once.
Nixtla's generative pre-trained model for time series forecasting.
Segments of historical data that inform TimeGPT's forecasting process.
## Time Series
A time series is a sequence of numerical data points arranged in chronological order. In the context of TimeGPT, each data point in the series serves as input to the model. The model learns from patterns in the data and uses this understanding to forecast future values. Time series data appear in various domains, such as stock prices, weather recordings, and sales figures.
## Forecasting
Forecasting is a method used in many fields—such as business and environmental studies—to predict future outcomes based on historical information. It involves analyzing past data to detect patterns, trends, or recurring behaviors and extending these insights into the future.
One significant advancement in forecasting is the application of modern
machine-learning methods, including deep learning. Models like TimeGPT can
handle large datasets and identify complex patterns with enhanced prediction
accuracy.
For example, a retailer might analyze past sales to forecast product demand, while an economist uses historical data to anticipate future economic conditions. TimeGPT makes these advanced capabilities accessible even to users without in-depth machine-learning expertise.

## Anomaly Detection
Analyzing sequential data often requires identifying anomalies or unexpected events that deviate from standard patterns. TimeGPT supports anomaly detection by monitoring data sequences (such as daily temperatures) for unusual fluctuations.
Detecting anomalies is crucial for timely responses. Sudden changes in market
behavior, unusual network activity, or abnormal sensor readings can all
indicate a need for prompt investigation.
For example, in finance, TimeGPT can highlight abrupt market changes; in cybersecurity, it helps uncover suspicious network activity. Anomaly detection enhances forecasting by flagging significant outliers, improving overall data insights.

## Multiple Series
TimeGPT provides robust support for multi-series forecasting, allowing simultaneous analysis of multiple time series. Users can train the model on many related series, improving accuracy and enabling more flexible customization for specific forecasting requirements.

## TimeGPT
TimeGPT by Nixtla is a generative pre-trained model specifically designed for time series forecasting. It reviews historical series values (and optional exogenous variables) to generate predictions. Beyond forecasting, TimeGPT enables tasks like anomaly detection and financial forecasts.
TimeGPT scans time series data similarly to how a person might read text:
sequentially, from left to right. It can interpret historical windows (tokens)
and leverage temporal patterns learned from billions of data points.
With the TimeGPT API, you can access these forecasting capabilities for various potential use cases—from scenario planning to anomaly detection and beyond.

## Get Started with TimeGPT
Now that you understand the key concepts, you're ready to start using TimeGPT for your forecasting needs.
Learn more about TimeGPT and how it can transform your time series analysis.
Get up and running with TimeGPT in minutes with our step-by-step guide.
# Privacy Notice
Source: https://nixtla.io/docs/about/privacy-notice
Details on how Nixtla collects, uses, and protects your personal information.
We at Nixtla Inc. (together with our affiliates, “**Nixtla**”, “**we**”, “**our**” or “**us**”) respect your privacy and are strongly committed to keeping secure any information we obtain from you or about you. This Privacy Policy describes our practices with respect to Personal Information we collect from or about you when you use our website, applications, and services (collectively, “**Services**”). This Privacy Policy does not apply to content that we process on behalf of customers of our business offerings, such as our API. Our use of that data is governed by our customer agreements covering access to and use of those offerings.
# 1. Personal Information we collect
We collect personal information relating to you (“**Personal Information**”) as follows:
Personal Information You Provide: We collect Personal Information if you create an account to use our Services or communicate with us as follows:
**Account Information**: When you create an account with us, we will collect information associated with your account, including your name, contact information, account credentials, payment card information, and transaction history, (collectively, “**Account Information**”).
**User Content**: When you use our Services, we collect Personal Information that is included in the input, file uploads, or feedback that you provide to our Services (“**Content**”).
**Communication Information**: If you communicate with us, we collect your name, contact information, and the contents of any messages you send (“**Communication Information**”).
**Social Media Information**: We have pages on social media sites like Medium, Twitter, YouTube, and LinkedIn. When you interact with our social media pages, we will collect Personal Information that you elect to provide to us, such as your contact details (collectively, “**Social Information**”). In addition, the companies that host our social media pages may provide us with aggregate information and analytics about our social media activity.
**Personal Information We Receive Automatically From Your Use of the Services**: When you visit, use, or interact with the Services, we receive the following information about your visit, use, or interactions (“**Technical Information**”):
**Log Data**: Information that your browser automatically sends when you use our Services. Log data includes your Internet Protocol address, browser type and settings, the date and time of your request, and how you interact with our website.
**Usage Data**: We may automatically collect information about your use of the Services, such as the types of content that you view or engage with, the features you use, and the actions you take, as well as your time zone, country, the dates and times of access, user agent and version, type of computer or mobile device, and your computer connection.
**Device Information**: Includes name of the device, operating system, device identifiers, and browser you are using. Information collected may depend on the type of device you use and its settings.
**Cookies**: We use cookies to operate and administer our Services, and improve your experience.
A “cookie” is a piece of information sent to your browser by a website you visit. You can set your browser to accept all cookies, to reject all cookies, or to notify you whenever a cookie is offered so that you can decide each time whether to accept it. However, refusing a cookie may in some cases preclude you from using, or negatively affect the display or function of, a website or certain areas or features of a website. For more details on cookies, please visit All About Cookies.
**Analytics**: We may use a variety of online analytics products that use cookies to help us analyze how users use our Services and enhance your experience when you use the Services.
# 2. We may use Personal Information for the following purposes:
1. To provide, administer, maintain, and/or analyze the Services;
2. To improve our Services and conduct research;
3. To communicate with you;
4. To develop new programs and services;
5. To prevent fraud, criminal activity, or misuses of our Services, and to protect the security of our IT systems, architecture, and networks;
6. To carry out business transfers; and
7. To comply with legal obligations and legal processes and to protect our rights, privacy, safety, or property, and/or that of our affiliates, you, or other third parties.
**Aggregated or De-Identified Information**. We may aggregate or de-identify Personal Information so that it may no longer be used to identify you and use such information to analyze the effectiveness of our Services, to improve and add features to our Services, to conduct research and for other similar purposes. In addition, from time to time, we may analyze the general behavior and characteristics of users of our Services and share aggregated information like general user statistics with third parties, publish such aggregated information or make such aggregated information generally available. We may collect aggregated information through the Services, through cookies, and through other means described in this Privacy Policy. We will maintain and use de-identified information in anonymous or de-identified form and we will not attempt to reidentify the information, unless required by law.
As noted above, we may use Content you provide us to improve our Services, for example to train the models that power TimeGPT. Fill [this form](https://forms.gle/rvF58qkNCt2uNjSX8) to opt out of our use of your Content to train our models.
# 3. Disclosure of personal information
In certain circumstances we may provide your Personal Information to third parties without further notice to you, unless required by the law:
**Vendors and Service Providers**. To assist us in meeting business operations needs and to perform certain services and functions, we may provide Personal Information to vendors and service providers, including providers of hosting services, cloud services, and other information technology services providers, email communication software, and web analytics services, among others. Pursuant to our instructions, these parties will access, process, or store Personal Information only in the course of performing their duties to us.
**Business Transfers**. If we are involved in strategic transactions, reorganization, bankruptcy, receivership, or transition of service to another provider (collectively, a “**Transaction**”), your Personal Information and other information may be disclosed in the diligence process with counterparties and others assisting with the Transaction and transferred to a successor or affiliate as part of that Transaction along with other assets.
**Legal Requirements**. We may share your Personal Information, including information about your interaction with our Services, with government authorities, industry peers, or other third parties (i) if required to do so by law or in the good faith belief that such action is necessary to comply with a legal obligation, (ii) to protect and defend our rights or property, (iii) if we determine, in our sole discretion, that there is a violation of our terms, policies, or the law; (iv) to detect or prevent fraud or other illegal activity; (v) to protect the safety, security, and integrity of our products, employees, or users, or the public, or (vi) to protect against legal liability.
**Affiliates**. We may disclose Personal Information to our affiliates, meaning an entity that controls, is controlled by, or is under common control with Nixtla. Our affiliates may use the Personal Information we share in a manner consistent with this Privacy Policy.
# 4. Your choices and controls
Depending on where you live, you may have the right to exercise certain controls and choices regarding our collection, use, and sharing of your Personal Information. To opt-out of marketing communications please email us at [support@nixtla.io](mailto:support@nixtla.io) or by following the instructions included in the email or text correspondence.
Please note that, even if you unsubscribe from certain correspondence, we may still need to contact you with important transactional or administrative information, as permitted by law. Additionally, if you choose not to provide certain Personal Information, we may be unable to provide some or all of our Services to you.
# 5. Children
Our Services are not directed to children under the age of 13. Nixtla does not knowingly collect Personal Information from children under the age of 13. If you have reason to believe that a child under the age of 13 has provided Personal Information to Nixtla through the Services, please email us at [support@nixtla.io](mailto:support@nixtla.io)
We will investigate any notification and if appropriate, delete the Personal Information from our systems. If you are 13 or older, but under 18, you must have consent from your parent or guardian to use our Services.
# 6. Links to other websites
The Services may contain links to other websites not operated or controlled by Nixtla, including social media services (“**Third Party Sites**”). The information that you share with Third Party Sites will be governed by the specific privacy policies and terms of service of the Third Party Sites and not by this Privacy Policy. By providing these links we do not imply that we endorse or have reviewed these sites. Please contact the Third Party Sites directly for information on their privacy practices and policies.
# 7. Security and Retention
We implement commercially reasonable technical, administrative, and organizational measures to protect Personal Information both online and offline from loss, misuse, and unauthorized access, disclosure, alteration, or destruction. However, no Internet or email transmission is ever fully secure or error-free. In particular, emails sent to or from us may not be secure. Therefore, you should take special care in deciding what information you send to us via the Services or email. In addition, we are not responsible for circumvention of any privacy settings or security measures contained on the Services, or third-party websites.
We’ll retain your Personal Information for only as long as we need in order to provide our Services to you, or for other legitimate business purposes such as resolving disputes, safety and security reasons, or complying with our legal obligations. How long we retain Personal Information will depend on a number of factors, such as the amount, nature, and sensitivity of the information, the potential risk of harm from unauthorized use or disclosure, our purpose for processing the information, and any legal requirements.
# 8. Changes to the privacy policy
We may update this Privacy Policy from time to time. All changes will be effective immediately upon posting to this page. Material changes will be conspicuously posted on this page or otherwise communicated to you as required by law.
# 9. How to contact us
Please contact us at [support@nixtla.io](mailto:support@nixtla.io) if you have any questions or concerns not already addressed in this Privacy Policy.
# Nixtla
Source: https://nixtla.io/docs/about/sub-categoria
About us
# Nixtla
Nixtla is to numbers what Anthropic or Open AI are to language and images. We are the creators of TimeGPT—a pre-trained model that allows enterprises to upload their data and receive predictions within minutes. This approach saves significant money, development time, and maintenance effort.
TimeGPT was trained on the largest collection of time series data in history—over 100 billion rows across financial, weather, energy, and web data. Nixtla has also built the most comprehensive time series ecosystem, with over 5 million downloads worldwide. Our software is trusted and used in production by leading companies such as Amazon, Walmart, and Lyft.
We are a group of hackers driven by curiosity and a profound desire to make a meaningful impact. With backgrounds ranging from research and development to philosophy, we have united to revolutionize the time series field. We embrace diversity, champion inclusivity, and believe that the future belongs to everyone.
We stand by our roots in Latin America. We are queer, we are different, and we take pride in it. Our shared passion for understanding the world guides us in pushing the boundaries of what’s possible with time series analysis.
## Our Open Source Initiatives
TimeGPT is only one part of our story. Before its creation, Nixtla developed an open-source time series ecosystem that quickly flourished, garnering millions of downloads.

Our thriving open-source community is a testament to the power of collaboration. Join us in building innovative tools for time series analysis.
## Our Origin Story
Nixtla began as a side project. We built tools for an old company we worked for, and then everyone took different paths—some pursued academic careers, others founded companies, and some focused on shipping products.
We eventually reunited to turn what started as a modest open-source library into the most comprehensive time-series ecosystem. By challenging the status quo and giants like Facebook, Amazon, and Google, we proved how a dedicated group of passionate individuals, powered by open-source software, can successfully compete with major players.
As Nixtla’s usage soared, our community grew, fueling our development. Today, Nixtla is the most impactful time series ecosystem worldwide, relied upon by innovators in both industry and academia.
Recognizing this was only the beginning, we set our sights on a new challenge—pioneering foundation models for time series. This breakthrough helps us share the future of data science with everyone.
## Follow Us
Connect with fellow developers, researchers, and enthusiasts in our
[Slack Channel](https://join.slack.com/t/nixtlacommunity/shared_invite/zt-1pmhan9j5-F54XR20edHk0UtYAPcW4KQ).
Stay up-to-date with the latest Nixtla news and community highlights on
[Twitter](https://twitter.com/nixtlainc).
Be part of our open-source evolution by contributing to Nixtla’s core projects on
[GitHub](https://github.com/Nixtla).

Together, we are not just shaping Nixtla— we are defining the future of data science.
# Terms and Conditions
Source: https://nixtla.io/docs/about/terms-and-conditions
Terms and conditions for using Nixtla Services.
Thank you for using Nixtla's TimeGPT and or TimeGEN!
These Terms of Use apply when you use the services of Nixtla, Inc. or our affiliates, including our application programming interface, software, tools, developer services, data, documentation, and websites ("**Services**"). The Terms include other terms and conditions, documentation, guidelines, or policies we may provide in writing. By using our Services, you agree to these Terms. Our [Privacy Notice](/docs/about/privacy-notice) explains how we collect and use personal information.
# 1. Registration and Access
You must be at least 13 years old to use the Services. If you are under 18 you must have your parent or legal guardian's permission to use the Services. If you use the Services on behalf of another person or entity, you must have the authority to accept the Terms on their behalf. You must provide accurate and complete information to register for an account. You may not make your access credentials or account available to others outside your organization, and you are responsible for all activities that occur using your credentials.
# 2. Usage Requirements
**(a) Use of Services**. You may access, and we grant you a non-exclusive right to use, the Services in accordance with these Terms. You will comply with these Terms and all applicable laws when using the Services. We and our affiliates own all rights, title, and interest in and to the Services.
**(b) Feedback**. We appreciate feedback, comments, ideas, proposals and suggestions for improvements. If you provide any of these things, we may use it without restriction or compensation to you.
**(c) Restrictions**. You may not (i) use the Services in a way that infringes, misappropriates or violates any person's rights; (ii) reverse assemble, reverse compile, decompile, translate or otherwise attempt to discover the source code or underlying components of models, algorithms, and systems of the Services (except to the extent such restrictions are contrary to applicable law); (iii) use output from the Services to develop models that compete with Nixtla; (iv) except as permitted through the API, use any automated or programmatic method to extract data or output from the Services, including scraping, web harvesting, or web data extraction; (v) represent that output from the Services was human-generated when it is not or otherwise violate our policies; (vi) buy, sell, or transfer API keys without our prior consent; or (vii), send us any personal information of children under 13 or the applicable age of consent. You will comply with any rate limits and other requirements in our documentation. You may use Services only in geographies currently supported by Nixtla.
**(d) Third Party Services**. Any third party software, services, or other products you use in connection with the Services are subject to their own terms, and we are not responsible for third party products.
# 3. Content
**(a) Your Content**. You may provide input to the Services ("**Input**"), and receive output generated and returned by the Services based on the Input ("**Output**"). Input and Output are collectively ("**Content**"). As between the parties and to the extent permitted by applicable law, you own all Input. Subject to your compliance with these Terms, Nixtla hereby assigns to you all its rights, title, and interest in and to Output. This means you can use Content for any purpose, including commercial purposes such as sale or publication, if you comply with these Terms. Nixtla may use Content to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.
**(b) Use of Content to Improve Services**. In order to improve our Services, we may use Content that you provide to or receive from our API ("**API Content**") to develop or improve our Services. We may use Content from Services other than our API ("**Non-API Content**") to help develop and improve our Services.
Nixtla may use aggregated, de-identified data to enhance and operate the Services and for other business activities, including creating industry benchmarks and best practice guides for users.
If you do not want your Content used to improve Services, you can opt-out by filling out [this form](https://forms.gle/rvF58qkNCt2uNjSX8). In case you opt-out, we will not use the Content you provide after opt-out to train our machine-learning models or otherwise use your Content in any way to improve our Services. Please note that in some cases this may limit the ability of our Services to better address your specific use case.
**(c) Accuracy**. Artificial intelligence and machine learning are rapidly evolving fields of study. We are constantly working to improve our Services to make them more accurate, reliable, safe, and beneficial. Given the probabilistic nature of machine learning, the use of our Services may in some situations result in incorrect Output. You should always evaluate the accuracy of any Output as appropriate for your use case, including by using human review of the Output.
# 4. Fees and Payments
**(a) Fees and Billing**. You will pay all fees charged to your account ("**Fees**") according to the prices and terms on the applicable pricing page, or as otherwise agreed between us in writing. We have the right to correct pricing errors or mistakes even if we have already issued an invoice or received payment. You will provide complete and accurate billing information including a valid and authorized payment method. We will charge your payment method on an agreed-upon periodic basis, but may reasonably change the date on which the charge is posted. You authorize Nixtla and its affiliates, and our third-party payment processor(s), to charge your payment method for the Fees.
If your payment cannot be completed, we will provide you written notice and may suspend access to the Services until payment is received. Fees are payable in U.S. dollars and are due upon invoice issuance. Payments are nonrefundable except as provided in this Agreement.
**(b) Taxes**. Unless otherwise stated, Fees do not include federal, state, local, and foreign taxes, duties, and other similar assessments ("**Taxes**"). You are responsible for all Taxes associated with your purchase, excluding Taxes based on our net income, and we may invoice you for such Taxes. You agree to timely pay such Taxes and provide us with documentation showing the payment, or additional evidence that we may reasonably require. Nixtla uses the name and address in your account registration as the place of supply for tax purposes, so you must keep this information accurate and up-to-date.
**(c) Price Changes**. We may change our prices by posting notice to your account and/or to our website. Price increases will be effective 14 days after they are posted, except for increases made for legal reasons or increases made to Beta Services, which will be effective immediately. Any price changes will apply to the Fees charged to your account immediately after the effective date of the changes.
**(d) Disputes and Late Payments**. If you want to dispute any Fees or Taxes, please contact [support@nixtla.io](mailto:support@nixtla.io) within thirty (30) days of the date of the disputed invoice. Undisputed amounts past due may be subject to a finance charge of 1.5% of the unpaid balance per month. If any amount of your Fees are past due, we may suspend your access to the Services after we provide you written notice of late payment.
**(e) Free Tier**. You may not create more than one account to benefit from credits provided in the free tier of the Services. If we believe you are not using the free tier in good faith, we may charge you standard fees or stop providing access to the Services.
# 5. Confidentiality, Security and Data Protection
**(a) Confidentiality**. You may be given access to Confidential Information of Nixtla, its affiliates and other third parties. You may use Confidential Information only as needed to use the Services as permitted under these Terms.
You may not disclose Confidential Information to any third party, and you will protect Confidential Information in the same manner that you protect your own confidential information of a similar nature, using at least reasonable care. Confidential Information means nonpublic information that Nixtla or its affiliates or third parties designate as confidential or should reasonably be considered confidential under the circumstances, including software, specifications, and other nonpublic business information.
Confidential Information does not include information that: (i) is or becomes generally available to the public through no fault of yours; (ii) you already possess without any confidentiality obligations when you received it under these Terms; (iii) is rightfully disclosed to you by a third party without any confidentiality obligations; or (iv) you independently developed without using Confidential Information. You may disclose Confidential Information when required by law or the valid order of a court or other governmental authority if you give reasonable prior written notice to Nixtla and use reasonable efforts to limit the scope of disclosure, including assisting us with challenging the disclosure requirement, in each case where possible.
**(b) Security**. You must implement reasonable and appropriate measures designed to help secure your access to and use of the Services. If you discover any vulnerabilities or breaches related to your use of the Services, you must promptly contact Nixtla and provide details of the vulnerability or breach.
**(c) Processing of Personal Data**. If you use the Services to process personal data, you must provide legally adequate privacy notices and obtain necessary consents for the processing of such data, and you represent to us that you are processing such data in accordance with applicable law.
# 6. Term and Termination
**(a) Termination; Suspension**. These Terms take effect when you first use the Services and remain in effect until terminated. You may terminate these Terms at any time for any reason by discontinuing the use of the Services and Content.
We may terminate these Terms for any reason by providing you at least 30 days' advance notice. We may terminate these Terms immediately upon notice to you if you materially breach Sections 2 (Usage Requirements), 5 (Confidentiality, Security and Data Protection), 8 (Dispute Resolution) or 9 (General Terms), if there are changes in relationships with third-party technology providers outside of our control, or to comply with law or government requests. We may suspend your access to the Services if you do not comply with these Terms, if your use poses a security risk to us or any third party, or if we suspect that your use is fraudulent or could subject us or any third party to liability.
**(b) Effect on Termination**. Upon termination, you will stop using the Services and you will promptly return or, if instructed by us, destroy any Confidential Information. The sections of these Terms which by their nature should survive termination or expiration should survive, including but not limited to Sections 3 and 5-9.
# 7. Indemnification; Disclaimer of Warranties; Limitations on Liability
**(a) Indemnity**. You will defend, indemnify, and hold harmless us, our affiliates, and our personnel, from and against any claims, losses, and expenses (including attorneys' fees) arising from or relating to your use of the Services, including your Content, products or services you develop or offer in connection with the Services, and your breach of these Terms or violation of applicable law.
**(b) Disclaimer**. THE SERVICES ARE PROVIDED "AS IS." EXCEPT TO THE EXTENT PROHIBITED BY LAW, WE AND OUR AFFILIATES AND LICENSORS MAKE NO WARRANTIES (EXPRESS, IMPLIED, STATUTORY OR OTHERWISE) WITH RESPECT TO THE SERVICES, AND DISCLAIM ALL WARRANTIES INCLUDING BUT NOT LIMITED TO WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, SATISFACTORY QUALITY, NON-INFRINGEMENT, AND QUIET ENJOYMENT, AND ANY WARRANTIES ARISING OUT OF ANY COURSE OF DEALING OR TRADE USAGE. WE DO NOT WARRANT THAT THE SERVICES WILL BE UNINTERRUPTED, ACCURATE OR ERROR FREE, OR THAT ANY CONTENT WILL BE SECURE OR NOT LOST OR ALTERED.
**(c) Limitations of Liability**. NEITHER WE NOR ANY OF OUR AFFILIATES OR LICENSORS WILL BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR EXEMPLARY DAMAGES, INCLUDING DAMAGES FOR LOSS OF PROFITS, GOODWILL, USE, OR DATA OR OTHER LOSSES, EVEN IF WE HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. OUR AGGREGATE LIABILITY UNDER THESE TERMS SHALL NOT EXCEED THE GREATER OF THE AMOUNT YOU PAID FOR THE SERVICE THAT GAVE RISE TO THE CLAIM DURING THE 12 MONTHS BEFORE THE LIABILITY AROSE OR ONE HUNDRED DOLLARS (\$100). THE LIMITATIONS IN THIS SECTION APPLY ONLY TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW.
# 8. Dispute Resolution
YOU AGREE TO THE FOLLOWING MANDATORY ARBITRATION AND CLASS ACTION WAIVER PROVISIONS:
**(a) MANDATORY ARBITRATION**. You and Nixtla agree to resolve any past or present claims relating to these Terms or our Services through final and binding arbitration, except that you have the right to opt out of these arbitration terms, and future changes to these arbitration terms, by emailing [support@nixtla.io](mailto:support@nixtla.io) within 30 days of agreeing to these arbitration terms or the relevant changes.
**(b) Informal Dispute Resolution**. We would like to understand and try to address your concerns prior to formal legal action. Before filing a claim against Nixtla, you agree to try to resolve the dispute informally by sending us notice at [support@nixtla.io](mailto:support@nixtla.io) of your name, a description of the dispute, and the relief you seek. If we are unable to resolve a dispute within 60 days, you may bring a formal proceeding. Any statute of limitations will be tolled during the 60-day resolution process. If you reside in the EU, the European Commission provides for an online dispute resolution platform, which you can access at [https://ec.europa.eu/consumers/odr](https://ec.europa.eu/consumers/odr).
**(c) Arbitration Forum**. Either party may commence binding arbitration through ADR Services, an alternative dispute resolution provider. The parties will pay equal shares of the arbitration fees. If the arbitrator finds that you cannot afford to pay the arbitration fees and cannot obtain a waiver, Nixtla will pay them for you. Nixtla will not seek its attorneys' fees and costs in arbitration unless the arbitrator determines that your claim is frivolous.
**(d) Arbitration Procedures**. The arbitration will be conducted by telephone, based on written submissions, via video conference, or in person in San Francisco, California, or at another mutually agreed location. The arbitration will be conducted by a sole arbitrator by ADR Services under its then-prevailing rules. All issues are for the arbitrator to decide, except a California court has the authority to determine (i) the scope, enforceability, and arbitrability of this Section 8, including the mass filing procedures below, and (ii) whether you have complied with the pre-arbitration requirements in this section. The amount of any settlement offer will not be disclosed to the arbitrator by either party until after the arbitrator determines the final award, if any.
**(e). Exceptions**. This arbitration section does not require arbitration of the following claims: (i) individual claims brought in small claims court; and (ii) injunctive or other equitable relief to stop unauthorized use or abuse of the Services or intellectual property infringement.
**(f) NO CLASS ACTIONS**. Disputes must be brought on an individual basis only, and may not be brought as a plaintiff or class member in any purported class, consolidated, or representative proceeding. Class arbitrations, class actions, private attorney general actions, and consolidation with other arbitrations are not allowed. If for any reason a dispute proceeds in court rather than through arbitration, each party knowingly and irrevocably waives any right to trial by jury in any action, proceeding, or counterclaim. This does not prevent either party from participating in a class-wide settlement of claims.
**(g) Mass Filings**. If, at any time, 30 or more similar demands for arbitration are asserted against Nixtla or related parties by the same or coordinated counsel or entities ("**Mass Filing**"), ADR Services will randomly assign sequential numbers to each of the Mass Filings. Claims numbered 1-10 will be the "Initial Test Cases" and will proceed to arbitration first. The arbitrators will render a final award for the Initial Test Cases within 120 days of the initial pre-hearing conference, unless the claims are resolved in advance or the parties agree to extend the deadline. The parties will then have 90 days (the "**Mediation Period**") to resolve the remaining cases in mediation based on the awards from the Initial Test Cases. If the parties are unable to resolve the outstanding claims during this time, the parties may choose to opt out of the arbitration process and proceed in court by providing written notice to the other party within 60 days after the Mediation Period. Otherwise, the remaining cases will be arbitrated in their assigned order. Any statute of limitations will be tolled from the time the Initial Test Cases are chosen until your case is chosen as described above.
**(h) Severability**. If any part of this Section 8 is found to be illegal or unenforceable, the remainder will remain in effect, except that if a finding of partial illegality or unenforceability would allow Mass Filing or class or representative arbitration, this Section 8 will be unenforceable in its entirety. Nothing in this section will be deemed to waive or otherwise limit the right to seek public injunctive relief or any other non-waivable right, pending a ruling on the substance of such claim from the arbitrator.
# 9. General Terms
**(a) Relationship of the Parties**. These Terms do not create a partnership, joint venture, or agency relationship between you and Nixtla or any of Nixtla's affiliates. Nixtla and you are independent contractors and neither party will have the power to bind the other or to incur obligations on the other's behalf without the other party's prior written consent.
**(b) Use of Brands**. You may not use Nixtla's or any of its affiliates' names, logos, or trademarks, without our prior written consent.
**(c) U.S. Federal Agency Entities**. The Services were developed solely at private expense and are commercial computer software and related documentation within the meaning of the applicable U.S. Federal Acquisition Regulation and agency supplements thereto.
**(d) Copyright Complaints**. If you believe that your intellectual property rights have been infringed, please send notice to the address below or fill out [this form](https://forms.gle/N3xmuZss1Y7rrb889). We may delete or disable content alleged to be infringing and may terminate accounts of repeat infringers.
Written claims concerning copyright infringement must include the following information:
1. A physical or electronic signature of the person authorized to act on behalf of the owner of the copyright interest;
2. A description of the copyrighted work that you claim has been infringed upon;
3. A description of where the material that you claim is infringing is located on the site;
4. Your address, telephone number, and e-mail address;
5. A statement by you that you have a good-faith belief that the disputed use is not authorized by the copyright owner, its agent, or the law; and
6. A statement by you, made under penalty of perjury, that the above information in your notice is accurate and that you are the copyright owner or authorized to act on the copyright owner's behalf.
**(e) Assignment and Delegation**. You may not assign or delegate any rights or obligations under these Terms, including in connection with a change of control. Any purported assignment and delegation shall be null and void. We may assign these Terms in connection with a merger, acquisition, or sale of all or substantially all of our assets, or to any affiliate or as part of a corporate reorganization.
**(f) Modifications**. We may amend these Terms from time to time by posting a revised version on the website, or if an update materially adversely affects your rights or obligations under these Terms we will provide notice to you either by emailing the email associated with your account or providing an in-product notification. Those changes will become effective no sooner than 30 days after we notify you. All other changes will be effective immediately. Your continued use of the Services after any change means you agree to such change.
**(g) Notices**. All notices will be in writing. We may notify you using the registration information you provided or the email address associated with your use of the Services. Service will be deemed given on the date of receipt if delivered by email or on the date sent via courier if delivered by post. Nixtla accepts service of process at this address:
Nixtla, Inc.
166 Geary Str 15th FL #1056
San Francisco, CA 94108
United States.
Attn: Nixtla, Inc. - [support@nixtla.io](mailto:support@nixtla.io)
**(h) Waiver and Severability**. If you do not comply with these Terms, and Nixtla does not take action right away, this does not mean Nixtla is giving up any of our rights. Except as provided in Section 8, if any part of these Terms is determined to be invalid or unenforceable by a court of competent jurisdiction, that term will be enforced to the maximum extent permissible and it will not affect the enforceability of any other terms.
**(i) Export Controls**. The Services may not be used in or for the benefit of, exported, or re-exported (a) into any U.S. embargoed countries (collectively, the "**Embargoed Countries**") or (b) to anyone on the U.S. Treasury Department's list of Specially Designated Nationals, any other restricted party lists (existing now or in the future) identified by the Office of Foreign Asset Control, or the U.S. Department of Commerce Denied Persons List or Entity List, or any other restricted party lists (collectively, "**Restricted Party Lists**").
You represent and warrant that you are not located in any Embargoed Countries and not on any such restricted party lists. You must comply with all applicable laws related to Embargoed Countries or Restricted Party Lists, including any requirements or obligations to know your end users directly.
**(j) Equitable Remedies**. You acknowledge that if you violate or breach these Terms, it may cause irreparable harm to Nixtla and its affiliates, and Nixtla shall have the right to seek injunctive relief against you in addition to any other legal remedies.
**(k) Entire Agreement**. These Terms and any policies incorporated in these Terms contain the entire agreement between you and Nixtla regarding the use of the Services and, other than any Service specific terms of use or any applicable enterprise agreements, supersedes any prior or contemporaneous agreements, communications, or understandings between you and Nixtla on that subject.
**(l) Jurisdiction, Venue and Choice of Law**. These Terms will be governed by the laws of the State of California, excluding California's conflicts of law rules or principles. Except as provided in the "Dispute Resolution" section, all claims arising out of or relating to these Terms will be brought exclusively in the federal or state courts of San Francisco County, California, USA.
# Add Exogenous Variables
Source: https://nixtla.io/docs/anomaly_detection/exogenous_variables
Learn how to improve anomaly detection by incorporating external factors.
## Why Use Exogenous Variables?
Including relevant exogenous variables can greatly improve anomaly detection, especially for time series influenced by external factors such as weather or
market indicators.
Key benefits of using exogenous variables:
* Improve anomaly detection accuracy
* Enhance model interpretability
* Provide additional context for anomaly detection
## How to Use Exogenous Variables
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/historical-anomaly-detection/02_anomaly_exogenous.ipynb)
### Step 1: Set Up Data and Client
Follow the steps in the [historical anomaly detection tutorial](/docs/anomaly_detection/historical_anomaly_detection) to set up the data and client.
### Step 2: Detect Anomalies with Exogenous Features
Use the `detect_anomalies` method to identify anomalies. The method will automatically detect and utilize any exogenous features present in your DataFrame:
```python theme={null}
anomalies_df = nixtla_client.detect_anomalies(
df=df,
time_col='ds',
target_col='y'
)
```
### Step 3: Add Date Features (Optional)
Adding date features is a powerful way to enrich datasets for historical anomaly detection—especially when external exogenous variables are unavailable. By passing date components like `['month', 'year']` and enabling `date_features_to_one_hot=True`, TimeGPT automatically encodes these as one-hot vectors. This allows the model to better detect seasonal patterns, calendar effects, and periodic anomalies.
```python theme={null}
anomalies_df = nixtla_client.detect_anomalies(
df=df,
time_col='ds',
target_col='y',
date_features=['month', 'year'],
date_features_to_one_hot=True
)
```
### Step 4: Visualize Anomalies
Use the `plot` method to visualize the detected anomalies in the time series data.
```python theme={null}
nixtla_client.plot(df, anomalies_df)
```

The plot shows the time series with detected anomalies marked in red. The blue line represents the actual values, while the shaded area indicates the confidence interval. Points that fall outside this interval are flagged as anomalies.
### Step 5: Inspect Model Weights (Optional)
Use the `weights_x` method to view the relative weights of the exogenous features to understand their impact:
```python theme={null}
nixtla_client.weights_x.plot.barh(
x='features',
y='weights'
)
```

The horizontal bar plot shows the relative importance of each exogenous feature in the anomaly detection model. Features with larger weights have a stronger influence on the model's predictions. This visualization helps identify which external factors are most significant in determining anomalies in your time series.
# Quickstart
Source: https://nixtla.io/docs/anomaly_detection/historical_anomaly_detection
Get started with TimeGPT's historical anomaly detection capabilities.
* Understand how TimeGPT detects anomalies in historical time series.
* How to setup and detect anomalies with TimeGPT.
* How to plot and interpret identified anomalies.
* Quickly identify outliers in large time series.
* Improve decision-making by focusing on unusual data points.
* Automate anomaly alerts to save time and resources.
## What Is Historical Anomaly Detection?
Historical anomaly detection is a technique that identifies data points that significantly deviate from expected patterns in a time series. This technique is useful for uncovering potential fraud, security breaches, or other unusual events.
## Overview of TimeGPT's Historical Anomaly Detection
TimeGPT's historical anomaly detection works by:
1. Generating predictions for future values (or reconstructing missing values) within your historical time series.
2. Constructing a confidence interval based on the model's predictions.
3. Flagging any historical data point that falls outside your chosen confidence interval as an anomaly.
## Quickstart Example
You'll learn how historical anomaly detection works—illustrated through an example analyzing daily visits to the Wikipedia page of Peyton Manning.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/historical-anomaly-detection/01_quickstart.ipynb)
### Step 1: Import Packages and Create a NixtlaClient Instance
We'll start by importing required packages and setting up our API key.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Load the Dataset
This dataset tracks the daily visits to the Wikipedia page of Peyton Manning.
```python theme={null}
df = pd.read_csv('https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv')
df.head()
```
| | unique\_id | ds | y |
| - | ---------- | ---------- | -------- |
| 0 | 0 | 2007-12-10 | 9.590761 |
| 1 | 0 | 2007-12-11 | 8.519590 |
| 2 | 0 | 2007-12-12 | 8.183677 |
| 3 | 0 | 2007-12-13 | 8.072467 |
| 4 | 0 | 2007-12-14 | 7.893572 |
### Step 3: Visualize the Data
You can visualize the time series with the following command:
```python theme={null}
nixtla_client.plot(df, max_insample_length=365)
```

### Step 4: Perform Anomaly Detection
By default, TimeGPT uses a 99% confidence interval. Points outside this interval are flagged as anomalies.
```python theme={null}
anomalies_df = nixtla_client.detect_anomalies(df, freq='D')
anomalies_df.head()
```
| | unique\_id | ds | y | TimeGPT | TimeGPT-hi-99 | TimeGPT-lo-99 | anomaly |
| - | ---------- | ---------- | --------- | -------- | ------------- | ------------- | ------- |
| 0 | 0 | 2008-01-10 | 8.281724 | 8.224187 | 9.503586 | 6.944788 | False |
| 1 | 0 | 2008-01-11 | 8.292799 | 8.151533 | 9.430932 | 6.872135 | False |
| 2 | 0 | 2008-01-12 | 8.199189 | 8.127243 | 9.406642 | 6.847845 | False |
| 3 | 0 | 2008-01-13 | 9.996522 | 8.917259 | 10.196658 | 7.637861 | False |
| 4 | 0 | 2008-01-14 | 10.127071 | 9.002326 | 10.281725 | 7.722928 | False |
A `False` anomaly value indicates a normal data point; `True` identifies an outlier.
### Step 5: Review Anomalies
```python theme={null}
nixtla_client.plot(df, anomalies_df)
```

### Step 6: Inspect and Iterate
Inspect the anomalies flagged by the model. These points are potential indicators of significant deviations in your data.If you find that the model is overly sensitive or missing critical outliers, adjust the confidence interval or include additional features (e.g., exogenous data, date features) to improve detection accuracy.
Congratulations! You've successfully performed anomaly detection using TimeGPT. You can now start experimenting with this example or apply it to your own data. For advanced tips on improving detection performance, explore the following sections on using exogenous variables and adjusting confidence intervals.
# Controlling the Anomaly Detection Process
Source: https://nixtla.io/docs/anomaly_detection/real-time/adjusting_detection
Learn how to tune TimeGPT's anomaly detection parameters for optimal accuracy. Step-by-step guide to adjusting detection_size, level, confidence intervals, and fine-tuning strategies with Python code examples.
## Overview
Fine-tuning anomaly detection parameters is essential for reducing false positives and improving detection accuracy in time series data. This guide shows you how to optimize TimeGPT's `detect_anomalies_online` method by adjusting key parameters like detection sensitivity, window sizes, and model fine-tuning options.
For an introduction to real-time anomaly detection, see our [Real-Time Anomaly Detection guide](/docs/anomaly_detection/real-time/introduction). To understand local vs global detection strategies, check out [Local vs Global Anomaly Detection](/docs/anomaly_detection/real-time/univariate_multivariate).
## Why Parameter Tuning Matters
TimeGPT leverages forecast errors to identify anomalies in your time-series data. By optimizing parameters, you can detect subtle deviations, reduce false positives, and customize results for specific use cases.
## Key Parameters for Anomaly Detection
TimeGPT's anomaly detection can be controlled through three primary parameters:
* **detection\_size**: Controls the data window size for threshold calculation, determining how much historical context is used
* **level**: Sets confidence intervals for anomaly thresholds (e.g., 80%, 95%, 99%), controlling detection sensitivity
* **freq**: Aligns detection with data frequency (e.g., 'D' for daily, 'H' for hourly, 'min' for minute-level data)
## Common Use Cases
Adjusting anomaly detection parameters is crucial for:
* **Reducing false positives** in noisy time series data
* **Increasing sensitivity** to detect subtle anomalies
* **Optimizing detection** for different data frequencies (hourly, daily, weekly)
* **Improving accuracy** through model fine-tuning with custom loss functions
## How to Adjust the Anomaly Detection Process
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/02_adjusting_detection_process.ipynb)
### Step 1: Install and Import Dependencies
In your environment, install and import the necessary libraries:
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
import matplotlib.pyplot as plt
```
### Step 2: Initialize the Nixtla Client
Create an instance of NixtlaClient with your API key:
```python theme={null}
nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla')
```
### Step 3: Conduct a baseline detection
Load a portion of the Peyton Manning dataset to illustrate the default anomaly detection process. We use the Peyton Manning Wikipedia page views dataset to demonstrate parameter tuning on real-world data with natural anomalies and trends.
```python theme={null}
df = pd.read_csv(
'https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv',
parse_dates=['ds']
).tail(200)
df.head()
```
| x | unique\_id | ds | y |
| ---- | ---------- | ---------- | -------- |
| 2764 | 0 | 2015-07-05 | 6.499787 |
| 2765 | 0 | 2015-07-06 | 6.859615 |
| 2766 | 0 | 2015-07-07 | 6.881411 |
| 2767 | 0 | 2015-07-08 | 6.997596 |
| 2768 | 0 | 2015-07-09 | 7.152269 |
Set a baseline by using only the default parameters of the method.
```python theme={null}
anomaly_df = nixtla_client.detect_anomalies_online(
df,
freq='D',
h=14,
level=80,
detection_size=150
)
```
```bash Baseline Detection Log Output theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold...
INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...
```

### Step 4: Fine-tuned detection
TimeGPT detects anomalies based on forecast errors. By improving your model's forecasts, you can strengthen anomaly detection performance. The following parameters can be fine-tuned:
* **finetune\_steps**: Number of additional training iterations
* **finetune\_depth**: Depth level for refining the model
* **finetune\_loss**: Loss function used during fine-tuning
```python theme={null}
anomaly_online_ft = nixtla_client.detect_anomalies_online(
df,
freq='D',
h=14,
level=80,
detection_size=150,
finetune_steps=10,
finetune_depth=2,
finetune_loss='mae'
)
```
```bash Fine-tuned Detection Log Output theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold...
INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...
```

From the plot above, we can see that fewer anomalies were detected by the model, since the fine-tuning process helps TimeGPT better forecast the series.
### Step 5: Adjusting Forecast Horizon and Step Size
Similar to cross-validation, the anomaly detection method generates forecasts for historical data by splitting the time series into multiple windows. The way these windows are defined can impact the anomaly detection results. Two key parameters control this process:
* `h`: Specifies how many steps into the future the forecast is made for each window.
* `step_size`: Determines the interval between the starting points of consecutive windows.
Note that when `step_size` is smaller than `h`, then we get overlapping windows. This can make the detection process more robust, as TimeGPT will see the same time step more than once. However, this comes with a computational cost, since the same time step will be predicted more than once.
```python theme={null}
anomaly_df_horizon = nixtla_client.detect_anomalies_online(
df,
time_col='ds',
target_col='y',
freq='D',
h=2,
step_size=1,
level=80,
detection_size=150
)
```

**Choosing `h` and `step_size`** depends on the nature of your data:
* Frequent or short anomalies: Use smaller `h` and `step_size`
* Smooth or longer trends: Choose larger `h` and `step_size`
## Summary
You've learned how to control TimeGPT's anomaly detection process through:
1. **Baseline detection** using default parameters
2. **Fine-tuning** with custom training iterations and loss functions
3. **Window adjustment** using forecast horizon and step size parameters
Experiment with these parameters to optimize detection for your specific use case and data patterns.
## Frequently Asked Questions
**How do I reduce false positives in anomaly detection?**
Increase the `level` parameter (e.g., from 80 to 95 or 99) to make detection stricter, or use fine-tuning parameters like `finetune_steps` to improve forecast accuracy.
**What's the difference between detection\_size and step\_size?**
`detection_size` determines how many data points to analyze, while `step_size` controls the interval between detection windows when using overlapping windows.
**When should I use fine-tuning for anomaly detection?**
Use fine-tuning when you have domain-specific patterns or when baseline detection produces too many false positives. Fine-tuning helps TimeGPT better understand your specific time series characteristics.
**How does overlapping windows improve detection?**
When `step_size` \< `h`, TimeGPT analyzes the same time steps multiple times from different perspectives, making detection more robust but requiring more computation.
# Online (Real-Time) Anomaly Detection
Source: https://nixtla.io/docs/anomaly_detection/real-time/introduction
Learn how to detect anomalies in real-time streaming data using TimeGPT's detect_anomalies_online method. Complete Python tutorial with code examples for monitoring server logs, IoT sensors, and live data streams.
## Overview
Real-time anomaly detection enables you to identify unusual patterns in streaming time series data instantly—essential for monitoring server performance, detecting fraud, identifying system failures, and tracking IoT sensor anomalies. TimeGPT's `detect_anomalies_online` method provides:
* **Flexible Control**: Fine-tune detection sensitivity and confidence levels
* **Local & Global Detection**: Analyze individual series or detect system-wide anomalies across multiple correlated metrics
* **Stream Processing**: Monitor live data feeds with rolling window analysis
## Common Use Cases
* **Server Monitoring**: Detect CPU spikes, memory leaks, and downtime
* **IoT Sensors**: Identify equipment failures and sensor malfunctions
* **Fraud Detection**: Flag suspicious transactions in real-time
* **Application Performance**: Monitor API response times and error rates
## Quick Start
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/01_quickstart.ipynb)
### Step 1: Set up your environment
Initialize your Python environment by importing the required libraries:
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
import matplotlib.pyplot as plt
```
### Step 2: Configure your NixtlaClient
Provide your API key (and optionally a custom base URL).
```python theme={null}
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 3: Load your dataset
We use a minute-level time series dataset that monitors server usage. This dataset is ideal for showcasing streaming data scenarios, where the task is to detect server failures or downtime in real time.
```python theme={null}
df = pd.read_csv(
'https://datasets-nixtla.s3.us-east-1.amazonaws.com/machine-1-1.csv',
parse_dates=['ts']
)
```
We observe that the time series remains stable during the initial period; however, a spike occurs in the last 20 steps, indicating anomalous behavior. Our goal is to capture this abnormal jump as soon as it appears.

### Step 4: Detect anomalies in real time
The `detect_anomalies_online` method detects anomalies in a time series leveraging TimeGPT's forecast power. It uses the forecast error in deciding the anomalous step so you can specify and tune the parameters like that of the `forecast` method. This function will return a dataframe that contains anomaly flags and anomaly score (its absolute value quantifies the abnormality of the value).
To perform real-time anomaly detection, set the following parameters:
* `df`: A pandas DataFrame containing the time series data.
* `time_col`: The column that identifies the datestamp.
* `target_col`: The variable to forecast.
* `h`: Horizon is the number of steps ahead to make a forecast.
* `freq`: The frequency of the time series in Pandas format.
* `level`: Percentile of scores distribution at which the threshold is set, controlling how strictly anomalies are flagged. Default at 99%.
* `detection_size`: The number of steps to analyze for anomaly at the end of time series.
```python theme={null}
anomaly_online = nixtla_client.detect_anomalies_online(
df,
time_col='ts',
target_col='y',
freq='min', # Specify the frequency of the data
h=10, # Specify the forecast horizon
level=99, # Set the confidence level for anomaly detection
detection_size=100 # Number of steps to analyze for anomalies
)
anomaly_online.tail()
```
```bash Log Output theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...
```
View last 5 anomaly detections:
| unique\_id | ts | y | TimeGPT | anomaly | anomaly\_score | TimeGPT-hi-99 | TimeGPT-lo-99 |
| ------------------ | ------------------- | -------- | -------- | ------- | -------------- | ------------- | ------------- |
| machine-1-1\_y\_29 | 2020-02-01 22:11:00 | 0.606017 | 0.544625 | True | 18.463266 | 0.553161 | 0.536090 |
| machine-1-1\_y\_29 | 2020-02-01 22:12:00 | 0.044413 | 0.570869 | True | -158.933850 | 0.579404 | 0.562333 |
| machine-1-1\_y\_29 | 2020-02-01 22:13:00 | 0.038682 | 0.560303 | True | -157.474880 | 0.568839 | 0.551767 |
| machine-1-1\_y\_29 | 2020-02-01 22:14:00 | 0.024355 | 0.521797 | True | -150.178240 | 0.530333 | 0.513261 |
| machine-1-1\_y\_29 | 2020-02-01 22:15:00 | 0.044413 | 0.467860 | True | -127.848560 | 0.476396 | 0.459325 |

From the plot, we observe that the anomalous period is promptly detected.
Here we use a detection size of 100 to illustrate the anomaly detection process. In production, running detections more frequently with smaller detection sizes can help identify anomalies as soon as they occur.
## Frequently Asked Questions
**What's the difference between online and historical anomaly detection?**
Online detection analyzes recent data windows for immediate alerting, while historical detection analyzes complete datasets for pattern discovery.
**Can I adjust detection sensitivity?**
Yes, tune the `level` parameter (confidence threshold) and `detection_size` (analysis window) to control false positive rates.
## Next Steps
Now that you've detected your first anomalies in real-time, explore these guides to optimize your detection:
* [Controlling the Anomaly Detection Process](/docs/anomaly_detection/real-time/adjusting_detection) - Learn how to fine-tune key parameters for more accurate detection
* [Local vs Global Anomaly Detection](/docs/anomaly_detection/real-time/univariate_multivariate) - Choose the right detection strategy for single vs multiple correlated time series
# Local vs Global Anomaly Detection
Source: https://nixtla.io/docs/anomaly_detection/real-time/univariate_multivariate
Compare local vs global anomaly detection methods for time series. Learn when to use univariate detection for independent metrics vs multivariate detection for correlated server data with Python examples.
## Overview
When monitoring multiple time series simultaneously, such as server metrics (CPU, memory, disk I/O), you need to choose between local and global anomaly detection strategies. This guide demonstrates:
* **Local (Univariate) Detection**: Analyzing each time series independently for isolated metric anomalies
* **Global (Multivariate) Detection**: Analyzing all time series collectively to detect system-wide failures
Both methods use TimeGPT's `detect_anomalies_online` with the `threshold_method` parameter. The main difference is whether anomalies are identified individually per series (local) or collectively across multiple correlated series (global).
For an introduction to real-time anomaly detection, see our [Real-Time Anomaly Detection guide](/docs/anomaly_detection/real-time/introduction). To learn about parameter tuning, check out [Controlling the Anomaly Detection Process](/docs/anomaly_detection/real-time/adjusting_detection).
## When to Use Each Method
### Use Local Detection When:
* Monitoring independent, uncorrelated metrics
* Each metric has distinct baseline behavior
* You need low computational overhead
* False positives in individual series are acceptable
### Use Global Detection When:
* Monitoring correlated server or system metrics
* System-wide failures affect multiple metrics simultaneously
* You need to detect coordinated anomalies (e.g., CPU spike + memory spike + network spike)
* Reducing false positives by considering metric relationships
## How to Detect Anomalies Across Multiple Time Series
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/online-anomaly-detection/03_univariate_vs_multivariate_anomaly_detection.ipynb)
### Step 1: Set Up Your Environment
Import dependencies that you will use in the tutorial.
```python theme={null}
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from nixtla import NixtlaClient
```
Create a NixtlaClient instance. Replace 'my\_api\_key\_provided\_by\_nixtla' with your actual API key.
```python theme={null}
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load the Dataset
This tutorial uses the SMD (Server Machine Dataset), a benchmark dataset for anomaly detection across multiple time series. SMD monitors abnormal patterns in server machine data.
We analyze monitoring data from a single server (machine-1-1) containing 38 time series. Each series represents a different server metric: CPU usage, memory usage, disk I/O, network throughput, and other system performance indicators.
```python theme={null}
df = pd.read_csv(
'https://datasets-nixtla.s3.us-east-1.amazonaws.com/SMD_test.csv',
parse_dates=['ts']
)
df.unique_id.nunique()
```
Output:
```bash theme={null}
38
```
### Step 3: Local and Global Anomaly Detection Methods
#### Method Comparison
| Aspect | Local (Univariate) | Global (Multivariate) |
| ------------------------- | ------------------------------- | --------------------------------- |
| **Analysis Scope** | Individual series | All series collectively |
| **Best For** | Independent metrics | Correlated metrics |
| **Computational Cost** | Low | Higher |
| **System-wide Anomalies** | May miss | Detects effectively |
| **Parameter** | `threshold_method='univariate'` | `threshold_method='multivariate'` |
#### Step 3.1: Local Method
Local anomaly detection analyzes each time series in isolation, flagging anomalies based on each series' individual deviation from its expected behavior. This approach is efficient for individual metrics or when correlations between metrics are not relevant. However, it may miss large-scale, system-wide anomalies that are only apparent when multiple series deviate simultaneously.
Example usage:
```python theme={null}
anomaly_online = nixtla_client.detect_anomalies_online(
df[['ts', 'y', 'unique_id']],
time_col='ts',
target_col='y',
freq='h',
h=24,
level=95,
detection_size=475,
threshold_method='univariate' # local anomaly detection
)
```
Log output:
```bash theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold...
INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...
```
Visualize the anomalies:
```python theme={null}
# Utility function to plot anomalies
def plot_anomalies(df, unique_ids, rows, cols):
fig, axes = plt.subplots(rows, cols, figsize=(12, rows * 2))
for i, (ax, uid) in enumerate(zip(axes.flatten(), unique_ids)):
filtered_df = df[df['unique_id'] == uid]
ax.plot(filtered_df['ts'], filtered_df['y'], color='navy', alpha=0.8, label='y')
ax.plot(filtered_df['ts'], filtered_df['TimeGPT'], color='orchid', alpha=0.7, label='TimeGPT')
ax.scatter(
filtered_df.loc[filtered_df['anomaly'] == 1, 'ts'],
filtered_df.loc[filtered_df['anomaly'] == 1, 'y'],
color='orchid', label='Anomalies Detected'
)
ax.set_title(f"Unique_id: {uid}", fontsize=8)
ax.tick_params(axis='x', labelsize=6)
fig.legend(loc='upper center', ncol=3, fontsize=8, labels=['y', 'TimeGPT', 'Anomaly'])
plt.tight_layout(rect=[0, 0, 1, 0.95])
plt.show()
display_ids = ['machine-1-1_y_0', 'machine-1-1_y_1', 'machine-1-1_y_6', 'machine-1-1_y_29']
plot_anomalies(anomaly_online, display_ids, rows=2, cols=2)
```

*This figure highlights anomalies detected in four selected metrics. Each metric is analyzed independently, so anomalies reflect unusual behavior within that series alone.*
#### Step 3.2: Global Method
Global anomaly detection considers all time series collectively, flagging a time step as anomalous if the aggregate deviation across all series at that time exceeds a threshold. This approach captures systemic or correlated anomalies that might be missed when analyzing each series in isolation. However, it comes with slightly higher complexity and computational overhead, and may require careful threshold tuning.
Example usage:
```python theme={null}
anomaly_online_multi = nixtla_client.detect_anomalies_online(
df[['ts', 'y', 'unique_id']],
time_col='ts',
target_col='y',
freq='h',
h=24,
level=95,
detection_size=475,
threshold_method='multivariate' # global anomaly detection
)
```
Log output:
```bash theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
WARNING:nixtla.nixtla_client:Detection size is large. Using the entire series to compute the anomaly threshold...
INFO:nixtla.nixtla_client:Calling Online Anomaly Detector Endpoint...
```
Visualize the anomalies:
```python theme={null}
plot_anomalies(anomaly_online_multi, display_ids, rows=2, cols=2)
```

*In global mode, an anomaly is flagged when the combined deviation across these series reaches a threshold. This can reveal system-wide anomalies.*
In global anomaly detection, anomaly scores from all series at each time step are aggregated. A step is anomalous if the combined score exceeds the threshold. This reveals systemic anomalies that may go unnoticed if each series is considered alone.
## Real-World Use Cases
### Local Detection Examples:
* **Independent application metrics**: Response time, error rates, request counts for different microservices
* **IoT sensor networks**: Temperature sensors at different locations with no correlation
* **Business metrics**: Sales figures across different product categories
### Global Detection Examples:
* **Server monitoring**: CPU, memory, disk I/O, and network metrics from the same server
* **Distributed system health**: Correlated metrics across multiple nodes indicating cluster-wide issues
* **Manufacturing equipment**: Multiple sensor readings from a single machine indicating equipment failure
## Summary
* **Local:** Best for detecting anomalies in a single metric or uncorrelated metrics. Low computational overhead, but may overlook cross-series patterns.
* **Global:** Considers correlations across metrics, capturing system-wide issues. More complex and computationally intensive than local methods.
Both detection approaches use Nixtla's online anomaly detection method. Choose the strategy that best fits your use case and data characteristics.
## Frequently Asked Questions
**What's the difference between univariate and multivariate anomaly detection?**
Univariate (local) detection analyzes each time series independently using the `threshold_method='univariate'` parameter, while multivariate (global) detection analyzes all series together using `threshold_method='multivariate'`, considering correlations between metrics.
**When should I use global detection instead of local?**
Use global detection when your time series are correlated and system-wide failures affect multiple metrics simultaneously, such as monitoring CPU, memory, and network metrics from the same server.
**Does global detection increase computational cost?**
Yes, global detection requires analyzing relationships across all time series, making it more computationally intensive. However, it can reduce overall false positives by considering metric correlations.
**Can I run both local and global detection?**
Yes, you can run both methods and compare results. Local detection may catch metric-specific anomalies while global detection identifies system-wide issues.
# Delete Fine-tuned Model
Source: https://nixtla.io/docs/api-reference/delete-fine-tuned-model
/openapi.json delete /v2/finetuned_models/{finetuned_model_id}
Delete a previously saved finetuned model. It takes the ID of the model that you want to delete as a path parameter.
# Foundational Time Series Model Multi Series
Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series
/openapi.json post /v2/forecast
Based on the provided data, this endpoint predicts the future values of multiple time series at once. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains the predicted values for each series based on the input arguments. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference.
# Foundational Time Series Model Multi Series Anomaly Detector
Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-anomaly-detector
/openapi.json post /v2/anomaly_detection
Based on the provided data, this endpoint detects the anomalies in the historical perdiod of multiple time series at once. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains a flag indicating if the date has an anomaly and also provides the prediction interval used to define if an observation is an anomaly.Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference.
# Foundational Time Series Model Multi Series Cross Validation
Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-cross-validation
/openapi.json post /v2/cross_validation
Perform Cross Validation for multiple series
# Foundational Time Series Model Multi Series Finetuning
Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-finetuning
/openapi.json post /v2/finetune
Fine-tune the large time model to your data and save it for later use. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains the ID of the finetuned model, which you can provide in other endpoints to use that model to make the forecasts. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference.
# Foundational Time Series Model Multi Series Historic (Deprecated)
Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-multi-series-historic-deprecated
/openapi.json post /v2/historic_forecast
**Deprecated:** This endpoint is deprecated and will be removed in a future release. Please use [`/v2/cross_validation`](#tag/default/POST/v2/cross_validation) instead, which offers equivalent in-sample evaluation capabilities through rolling-window cross validation.
Based on the provided data, this endpoint predicts the in-sample period (historical period) values of multiple time series at once. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains the predicted values for the historical period. Usually useful for anomaly detection. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference.
# Foundational Time Series Model Online Multi Series Anomaly Detector
Source: https://nixtla.io/docs/api-reference/foundational-time-series-model-online-multi-series-anomaly-detector
/openapi.json post /v2/online_anomaly_detection
This endpoint performs online anomaly detection based on the provided data. It uses cross-validation for more robust detection of anomalies and it supports detection for univariate and multivariate scenarios. It takes a JSON as an input containing information like the series frequency and historical data. (See below for a full description of the parameters.) The response contains a flag indicating if the date has an anomaly, it provides the prediction interval used to define if an observation is an anomaly, and it reports the associated z-score for each point. Get your token for private beta at https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/api-reference.
# Get single Fine-tuned Model
Source: https://nixtla.io/docs/api-reference/get-single-fine-tuned-model
/openapi.json get /v2/finetuned_models/{finetuned_model_id}
Retrieve metadata for a previously fine-tuned model. The response contains the metadata of a model that you have fine-tuned and is available to make forecasts.
# List Fine-tuned Models
Source: https://nixtla.io/docs/api-reference/list-fine-tuned-models
/openapi.json get /v2/finetuned_models
List all the finetuned models that you have created. The response contains a list with the IDs of the models that you have fine-tuned and are available to make forecasts.
# Validate Api Key
Source: https://nixtla.io/docs/api-reference/validate-api-key
/openapi.json get /validate_api_key
# Audit and Clean Data
Source: https://nixtla.io/docs/data_requirements/audit_clean
Learn how to audit and clean your data with TimeGPT.
The `audit_data` and `clean_data` methods from TimeGPT can help you identify and fix potential issues in your data.
The `audit_data` method checks for common problems such as duplicates, missing dates, categorical columns, negative values, and leading zeros. While not all issues will result in errors, addressing them can improve the quality of the forecasts, depending on your specific use case.
Once identified, `clean_data` can be used to automatically fix these issues.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/24_audit_data.ipynb)
## How to Use the Audit and Clean Methods
### Step 1: Import Packages
To use the `audit_data` and `clean_data` methods, you first need to import and instantiate the `NixtlaClient` class.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # defaults to os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Create Minimal Example
The `audit_data` method performs a series of checks to identify issues in your data. These checks fall into two categories:
| Check Type |
Description |
Checks Performed |
| Fail |
Issues that will cause errors when you run TimeGPT |
Duplicate rows (D001)
Missing dates (D002)
Categorical feature columns (F001)
|
| Case-specific |
Issues that may not cause errors but could negatively affect your results |
Negative values (V001)
Leading zeros (V002)
|
To show how the `audit_data` method works, we will create a sample dataset with missing dates, negative values and leading zeros.
```python theme={null}
df = pd.DataFrame({
'unique_id': ['id1', 'id1', 'id1', 'id2', 'id2', 'id2', 'id2', 'id3', 'id3', 'id3', 'id3'],
'ds': ['2023-01-01', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
'y': [1, 1, 1, 0, 0, 1, 2, -1, 0, 1, -2]
})
df
```
| unique\_id | ds | y |
| ---------- | ---------- | -- |
| id1 | 2023-01-01 | 1 |
| id1 | 2023-01-03 | 1 |
| id1 | 2023-01-04 | 1 |
| id2 | 2023-01-01 | 0 |
| id2 | 2023-01-02 | 0 |
| id2 | 2023-01-03 | 1 |
| id2 | 2023-01-04 | 2 |
| id3 | 2023-01-01 | -1 |
| id3 | 2023-01-02 | 0 |
| id3 | 2023-01-03 | 1 |
| id3 | 2023-01-04 | -2 |
### Step 3: Audit Data
The `audit_data` method requires the following parameters:
* `df` *(required)*: A pandas DataFrame with your input data.
* `freq` *(required)*: The frequency of your time series data (e.g., `D` for daily, `M` for monthly).
* `id_col`: Column name identifying each unique series. Default is `unique_id`.
* `time_col`: Column name containing timestamps. Default is `ds`.
* `target_col`: Column name containing the target variable. Default is `y`.
Additionally, you can use the following optional parameters to specify how missing dates are identified:
* `start`: The initial timestamp for the series.
* `end`: The final timestamp for the series.
Both `start` and `end` can take the following options:
* `per_serie`: Uses the first or last timestamp of each individual series.
* `global`: Uses the earliest or latest timestamp from the entire dataset.
* A specific timestamp or integer (e.g., `2025-01-01`, `2025`, or `datetime(2025, 1, 1)`).
```python theme={null}
all_pass, fail_dfs, case_specific_dfs = nixtla_client.audit_data(
df = df,
freq = 'D',
start = 'per_serie',
end = 'per_serie'
)
```
The audit\_data method returns three values:
* **all\_pass** (bool): True if every check passed, otherwise False.
* **fail\_dfs** (dict): Any failed tests (D001, D002 or F001), each paired with the rows that failed.
* **case\_specific\_dfs** (dict): Any case-specific tests (V001 or V002), each paired with the rows flagged.
In the example above, the `audit_data` method should find missing dates (D002), negative values (V001), and leading zeros (V002).
### Step 4. Clean Data
The `clean_data` method fixes the issues identified by the `audit_data` method. It requires the output of `audit_data`, so it must always be run after it. The `clean_data` method takes the following parameters:
* `df` *(required)*: A pandas DataFrame with your input data.
* `fail_dict` *(required)*: A dictionary with failed checks, as returned by the `audit_data` method.
* `case_specific_dict` *(required)*: A dictionary with case-specific checks, also returned by the `audit_data` method.
* `freq` *(required)*: The frequency of your time series data (e.g., `D` for daily, `M` for monthly). Can be a string, integer, or pandas offset.
* `clean_case_specific`: Whether to clean case-specific issues (e.g., negative values, leading zeros). Default is `False`.
* `id_col`: Column name identifying each unique series. Default is `unique_id`.
* `time_col`: Column name containing timestamps or integer steps. Default is `ds`.
* `target_col`: Column name containing the target variable. Default is `y`.
```python theme={null}
clean_df, all_pass, fail_dfs, case_specific_dfs = nixtla_client.clean_data(
df = df,
fail_dict = fail_dfs,
case_specific_dict = case_specific_dfs,
clean_case_specific = True,
freq = 'D'
)
clean_df
```
| unique\_id | ds | y |
| ---------- | ---------- | --- |
| id1 | 2023-01-01 | 1.0 |
| id1 | 2023-01-03 | 1.0 |
| id1 | 2023-01-04 | 1.0 |
| id1 | 2023-01-02 | NaN |
| id2 | 2023-01-03 | 1.0 |
| id2 | 2023-01-04 | 2.0 |
| id3 | 2023-01-01 | 0.0 |
| id3 | 2023-01-02 | 0.0 |
| id3 | 2023-01-03 | 1.0 |
| id3 | 2023-01-04 | 0.0 |
In this example, `clean_data` added the missing date in `id1`, removed the leading zeros in `id2`, and replaced the negative values in `id3`.
However, replacing negative values with zeros introduced new leading zeros in `id3`, so a second run of `clean_data` is required.
```python theme={null}
clean_df2, all_pass, fail_dfs, case_specific_dfs = nixtla_client.clean_data(
df = clean_df,
fail_dict = fail_dfs,
case_specific_dict = case_specific_dfs,
clean_case_specific = True, # if False, the case-specific tests will be ignored
freq = 'D'
)
clean_df2
```
| unique\_id | ds | y |
| ---------- | ---------- | --- |
| id1 | 2023-01-01 | 1.0 |
| id1 | 2023-01-03 | 1.0 |
| id1 | 2023-01-04 | 1.0 |
| id1 | 2023-01-02 | NaN |
| id2 | 2023-01-03 | 1.0 |
| id2 | 2023-01-04 | 2.0 |
| id3 | 2023-01-03 | 1.0 |
| id3 | 2023-01-04 | 0.0 |
After the second run of `clean_data`, the leading zeros in `id3` have been removed.
The only remaining step is to fill the missing value created when the missing date was added in `id1`, and to sort the DataFrame by `unique_id` and `ds`.
| unique\_id | ds | y |
| ---------- | ---------- | --- |
| id1 | 2023-01-01 | 1.0 |
| id1 | 2023-01-02 | 0.0 |
| id1 | 2023-01-03 | 1.0 |
| id1 | 2023-01-04 | 1.0 |
| id2 | 2023-01-03 | 1.0 |
| id2 | 2023-01-04 | 2.0 |
| id3 | 2023-01-03 | 1.0 |
| id3 | 2023-01-04 | 0.0 |
## Conclusion
The `audit_data` method helps you identify issues that may prevent TimeGPT from running properly.
These include fail tests (duplicate rows, missing dates, and categorical feature columns), which will always result in errors if not addressed.
It also flags case-specific issues (negative values and leading zeros), which may not cause errors but can affect the quality of your forecasts depending on your use case.
The `clean_data` method can automatically fix the issues identified by `audit_data`.
Be cautious when removing negative values or leading zeros, as they may contain important information about your data.
Above all, when auditing and cleaning your data, make decisions based on the needs and context of your specific use case.
# Data Requirements
Source: https://nixtla.io/docs/data_requirements/data_requirements
Overview of the data format and requirements for TimeGPT forecasting.
TimeGPT accepts **pandas** and **polars** dataframes in [long format](https://www.theanalysisfactor.com/wide-and-long-data/#comments). The minimum required columns are:
* **unique\_id**: String or numerical value to label each series.
* **ds**(timestamp): String or datetime in `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS` format.
* **y**(numeric): Numerical target variable to forecast.
If a DataFrame lacks the `ds` column but uses a **DatetimeIndex**, that is also supported.
TimeGPT also supports distributed dataframe libraries such as **dask**, **spark**, and **ray**.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/getting-started/5_data_requirements.ipynb)
You can include additional exogenous features in the same DataFrame. See the [Exogenous Variables tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details.
***
## Example DataFrame
Below is a sample of a valid input DataFrame for TimeGPT (with columns named `timestamp` and `value` instead of `ds` and `y`):
```python Sample Data Loading theme={null}
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df["unique_id"] = "series1"
df.head()
```
**Sample Data Preview**
| **unique\_id** | **timestamp** | **value** |
| -------------- | ------------- | --------- |
| series1 | 1949-01-01 | 112 |
| series1 | 1949-02-01 | 118 |
| series1 | 1949-03-01 | 132 |
| series1 | 1949-04-01 | 129 |
| series1 | 1949-05-01 | 121 |
In this example:
* `unique_id` identifies the series
* `timestamp` corresponds to `ds`.
* `value` corresponds to `y`.
***
## Matching Columns to TimeGPT
You can choose how to align your DataFrame columns with TimeGPT’s expected structure:
Rename `timestamp` to `ds` and `value` to `y`:
```python Rename Columns Example theme={null}
df = df.rename(columns={'timestamp': 'ds', 'value': 'y'})
```
Now your DataFrame has the explicitly required columns:
```bash Show Head of DataFrame theme={null}
print(df.head())
```
Specify column names directly when calling `NixtlaClient`:
```python NixtlaClient Forecast Example theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla')
fcst = nixtla_client.forecast(
df=df,
h=12,
time_col='timestamp',
target_col='value'
)
fcst.head()
```
This way, you don’t need to rename your DataFrame columns, as TimeGPT will know which ones to treat as `ds` and `y`.
***
## Example Forecast
When you run the forecast method:
```python Forecast Example theme={null}
fcst = nixtla_client.forecast(
df=df,
h=12,
time_col='timestamp',
target_col='value'
)
fcst.head()
```
```bash Forecast Logs theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
```
| unique\_id | timestamp | TimeGPT |
| ---------- | ---------- | --------- |
| series1 | 1961-01-01 | 437.83792 |
| series1 | 1961-02-01 | 426.06270 |
| series1 | 1961-03-01 | 463.11655 |
| series1 | 1961-04-01 | 478.24450 |
| series1 | 1961-05-01 | 505.64648 |
TimeGPT attempts to automatically infer your data’s frequency (`freq`). You can override this by specifying the **freq** parameter (e.g., `freq='MS'`).
For more information, see the [TimeGPT Quickstart](/docs/forecasting/timegpt_quickstart).
***
## Important Considerations
**Warning:** Data passed to TimeGPT must not contain missing values or time gaps.
To handle missing data, see [Dealing with Missing Values in TimeGPT](/docs/data_requirements/missing_values).
***
### Minimum Data Requirements (Azure AI)
These are the minimum data sizes required for each frequency when using Azure AI:
| Frequency | Minimum Size |
| -------------------------------- | ------------ |
| Hourly and subhourly (e.g., "H") | 1008 |
| Daily ("D") | 300 |
| Weekly (e.g., "W-MON") | 64 |
| Monthly and others | 48 |
When preparing your data, also consider:
Number of future periods you want to predict.
How many times to test the model's performance.
Periodic offset between validation windows during cross-validation.
This ensures you have enough data for both training and evaluation.
# Missing Values
Source: https://nixtla.io/docs/data_requirements/missing_values
Learn how to handle missing values in time series data for accurate forecasting with TimeGPT.
## Missing Values in Time Series
TimeGPT can handle missing values in your target series, but it needs a continuous series of timestamps.
While you may have multiple series starting and ending on different dates, each one must maintain
a continuous date sequence. Any unobserved values in your target series can be labelled as `NaN`.
Whenever possible, we recommend to fill missing values by interpolation or any other method that makes
sense in your particular context.
This tutorial shows you how to handle missing values for use with TimeGPT. For
reference, this tutorial is based on the skforecast tutorial:
[Forecasting Time Series with Missing Values](https://cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values).
Managing missing values ensures your forecasts with TimeGPT are accurate and reliable.
When dates or values are missing, fill or interpolate them according to the nature of your dataset.
If values cannot be filled, you can label them as `NaN`.
## Tutorial
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/15_missing_values.ipynb)
### Step 1: Load Data
Load the daily bike rental counts dataset using pandas. Note that the original column names are in Spanish; you will rename them to match `ds` and `y`.
```python theme={null}
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/master/data/usuarios_diarios_bicimad.csv')
df = df[['fecha', 'Usos bicis total día']]
df.rename(columns={'fecha': 'ds', 'Usos bicis total día': 'y'}, inplace=True)
df.head()
```
| | ds | y |
| - | ---------- | --- |
| 0 | 2014-06-23 | 99 |
| 1 | 2014-06-24 | 72 |
| 2 | 2014-06-25 | 119 |
| 3 | 2014-06-26 | 135 |
| 4 | 2014-06-27 | 149 |
Next, convert your dates to timestamps and assign a unique identifier (`unique_id`) to handle multiple series if needed:
```python theme={null}
df['ds'] = pd.to_datetime(df['ds'])
df['unique_id'] = 'id1'
df = df[['unique_id', 'ds', 'y']]
```
Reserve the last 93 days for testing:
```python theme={null}
train_df = df[:-93]
test_df = df[-93:]
```
To simulate missing data, remove specific date ranges from the training dataset:
```python theme={null}
mask = ~((train_df['ds'] >= '2020-09-01') & (train_df['ds'] <= '2020-10-10')) & \
~((train_df['ds'] >= '2020-11-08') & (train_df['ds'] <= '2020-12-15'))
train_df_gaps = train_df[mask]
```
### Step 2: Initialize TimeGPT
Initialize a `NixtlaClient` object with your Nixtla API key:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla')
```
### Step 3: Visualize Data
Plot your dataset and examine the gaps introduced above:
```python theme={null}
nixtla_client.plot(train_df_gaps)
```

Note that there are two gaps in the data: from September 1, 2020, to October 10,
2020, and from November 8, 2020, to December 15, 2020. To better visualize these
gaps, you can use the `max_insample_length` argument of the `plot` method or you
can simply zoom in on the plot.
```python theme={null}
nixtla_client.plot(train_df_gaps, max_insample_length=800)
```

Additionally, notice a period from March 16, 2020, to April 21, 2020, where the
data shows zero rentals. These are not missing values, but actual zeros
corresponding to the COVID-19 lockdown in the city.
### Step 4: Fill Missing Dates
You can use `fill_gaps` from `utilsforecast` to insert the missing dates:
Before using TimeGPT, we need to ensure that **all timestamps** from the start date to the end date are present in the data.
Missing values in the series can be present as `NaN`.
To address the first issue, we will use the `fill_gaps` function from `utilsforecast`,
a Python package from Nixtla that provides essential utilities for time series
forecasting, such as functions for data preprocessing, plotting, and evaluation.
The `fill_gaps` function will fill in the missing dates in the data. To do this,
it requires the following arguments:
* `df`: The DataFrame containing the time series data.
* `freq` (str or int): The frequency of the data.
```python theme={null}
from utilsforecast.preprocessing import fill_gaps
print('Number of rows before filling gaps:', len(train_df_gaps))
train_df_complete = fill_gaps(train_df_gaps, freq='D')
print('Number of rows after filling gaps:', len(train_df_complete))
```
```bash theme={null}
Number of rows before filling gaps: 2851
Number of rows after filling gaps: 2929
```
> NOTE: In this tutorial, the data contains only one time series. However, TimeGPT
> supports passing multiple series to the model. In this case, none of the time
> series can have missing values from their individual earliest timestamp until
> their individual latest timestamp. If these individual time series have missing
> values, the user must decide how to fill these gaps for the individual time
> series. The `fill_gaps` function provides a couple of additional arguments to
> assist with this (refer to the documentation for complete details), namely
> `start` and `end`.
Now we need to decide whether to fill the missing values in the target column or not. In
this tutorial, we decide to use interpolation, but it is important to consider the
specific context of your data when selecting a filling strategy. For example,
if you are dealing with daily retail data, a missing value most likely indicates
that there were no sales on that day, and you can fill it with zero. Conversely,
if you are working with hourly temperature data, a missing value probably means
that the sensor was not functioning, and you might prefer to keep the value as `NaN`.
In this case, we will handle the newly inserted missing values by interpolation.
```python theme={null}
train_df_complete['y'] = train_df_complete['y'].interpolate(
method='linear', limit_direction='both'
)
train_df_complete.isna().sum()
```
```bash theme={null}
unique_id 0
ds 0
y 0
dtype: int64
```
### Step 5: Forecast with TimeGPT
Typically, a horizon > 2 times the typical seasonality is considered long. In
this case, the data has a seasonality of 7 days and a horizon of 93 days.
Since the forecast horizon is long compared to the frequency of the data (daily),
we will use `timegpt-1-long-horizon` model.
```python theme={null}
fcst = nixtla_client.forecast(
train_df_complete,
h=len(test_df),
model='timegpt-1-long-horizon'
)
```
Visualize the forecasts against the actual test data:
```python theme={null}
nixtla_client.plot(test_df, fcst)
```

Evaluate performance using `utilsforecast`. We will use Mean Absolute Error (MAE)
as the evaluation metric, but you can choose others like MSE, RMSE, etc.:
```python theme={null}
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae
fcst['ds'] = pd.to_datetime(fcst['ds'])
result = test_df.merge(fcst, on=['ds', 'unique_id'], how='left')
evaluate(result, metrics=[mae])
```
| | unique\_id | metric | TimeGPT |
| - | ---------- | ------ | ----------- |
| 0 | id1 | mae | 1824.693059 |
### Step 6: Conclusion
* Always ensure that your data is free of missing dates before forecasting with TimeGPT.
* Select a gap-filling strategy based on your domain knowledge (linear interpolation, constant filling, etc.).
* You may want to keep missing values as `NaN` if no gap-filling strategy makes sense in your context.
## References
* [Exclude COVID Impact in Time Series Forecasting](https://www.cienciadedatos.net/documentos/py45-weighted-time-series-forecasting.html)
* [Forecasting Time Series with Missing Values](https://cienciadedatos.net/documentos/py46-forecasting-time-series-missing-values.html)
# Multiple Time Series
Source: https://nixtla.io/docs/data_requirements/multiple_series
Learn how to handle missing values in time series data for accurate forecasting with TimeGPT.
You can pass multiple time series within the same dataset to TimeGPT. We can then make forecasts or detect anomalies on all series simultaneously.
To include multiple series, simply include a unique identifier column. By default, we expect this column to be called `unique_id`. The identifier column assigns a value to each series such that we can distinguish between them.
## Load Data with Multiple Series
Here is an example of loading a dataset with multiple series inside.
```python theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df['ds'] = pd.to_datetime(df['ds'])
df = df[["unique_id", "ds", "y"]]
df.groupby('unique_id').head(1)
```
| unique\_id | ds | y |
| ---------- | ---------- | ----- |
| BE | 2016-10-22 | 70.00 |
| DE | 2017-10-22 | 19.10 |
| FR | 2016-10-22 | 54.70 |
| NP | 2018-10-15 | 2.17 |
Above, we can see that we have four unique series in the dataset, as there are four different values in `unique_id`. Note that each series can start at different dates.
To forecast mutliple series, we can simply call:
```python Multiple Series Forecast Example theme={null}
fcst = nixtla_client.forecast(df=df, h=24)
fcst.head()
```
TimeGPT will produce forecasts for all unique IDs in your DataFrame simultaneously.
### Specifying the series identifier column
In the case where unique identifier is not stored in a column called `unique_id`, you can specify the name of the column when making a call to TimeGPT:
```python Specify the name of the column for the series identifier theme={null}
fcst = nixtla_client.forecast(df=df, h=24, id_col="your_column_name")
fcst.head()
```
***
## Exogenous Variables
TimeGPT supports the use of exogenous features. These are variables that are not part of the series you are trying to forecast.
For example, suppose that you are forecasting electricity consumption, which is affected by the temperature outside. In this case, the temperature is an exogenous feature, meaning that you want to use the information from the temperature to forecast the electricity consumption.
In such case, exogenous features can be included as new columns in the dataset. Any additional column to the standard `unique_id`, `ds`, `y` format is considered as an exogenous feature.
Here is an example of loading a dataset with multiple series inside and exogenous features.
```python Multiple Series Data Loading theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df['ds'] = pd.to_datetime(df['ds'])
df.groupby('unique_id').head(1)
```
| unique\_id | ds | y | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 |
| ---------- | ---------- | ----- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| BE | 2016-10-22 | 70.00 | 57253.00 | 49593 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| DE | 2017-10-22 | 19.10 | 16972.75 | 15779 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| FR | 2016-10-22 | 54.70 | 57253.00 | 49593 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| NP | 2018-10-15 | 2.17 | 34078.00 | 1791 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
Above, we can see that we have the columns from `Exogenous1` to `day_6` will be considered as exogenous features when forecasting with TimeGPT.
For more information on forecasting with exogenous features, read the [Exogenous Variables tutorial](/docs/forecasting/exogenous-variables/numeric_features) for further details.
***
# Cross-validation Tutorial
Source: https://nixtla.io/docs/forecasting/evaluation/cross_validation
Master time series cross-validation with TimeGPT. Complete Python tutorial for model validation, rolling-window techniques, and prediction intervals with code examples.
## What is Cross-validation?
Time series cross-validation is essential for validating machine learning models and ensuring accurate forecasts. Unlike traditional k-fold cross-validation, time series validation requires specialized rolling-window techniques that respect temporal order. This comprehensive tutorial shows you how to perform cross-validation in Python using TimeGPT, including prediction intervals, exogenous variables, and model performance evaluation.
One of the primary challenges in time series forecasting is the inherent uncertainty and variability over time, making it crucial to validate the accuracy and reliability of the models employed. Cross-validation, a robust model validation technique, is particularly adapted for this task, as it provides insights into the expected performance of a model on unseen data, ensuring the forecasts are reliable and resilient before being deployed in real-world scenarios.
TimeGPT incorporates the `cross_validation` method, designed to streamline the validation process for [time series forecasting models](/docs/forecasting/timegpt_quickstart). This functionality enables practitioners to rigorously test their forecasting models against historical data, with support for [prediction intervals](/docs/forecasting/probabilistic/prediction_intervals) and [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features). This tutorial will guide you through the nuanced process of conducting cross-validation within the `NixtlaClient` class, ensuring your time series forecasting models are not just well-constructed, but also validated for trustworthiness and precision.
### Why Use Cross-Validation for Time Series?
Cross-validation provides several critical benefits for time series forecasting:
* **Prevent overfitting**: Test model performance across multiple time periods
* **Validate generalization**: Ensure forecasts work on unseen data
* **Quantify uncertainty**: Generate prediction intervals for risk assessment
* **Compare models**: Evaluate different forecasting approaches systematically
* **Optimize hyperparameters**: Fine-tune model parameters with confidence
## How to Perform Cross-validation with TimeGPT
**Quick Summary**: Learn time series cross-validation with TimeGPT in Python. This tutorial covers rolling-window validation, prediction intervals, model performance metrics, and advanced techniques with real-world examples using the Peyton Manning dataset.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/08_cross_validation.ipynb)
### Step 1: Import Packages and Initialize NixtlaClient
First, we install and import the required packages and initialize the Nixtla client.
We start off by initializing an instance of `NixtlaClient`.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
from IPython.display import display
# Initialize TimeGPT client for cross-validation
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load Example Data
Use the Peyton Manning dataset as an example. The dataset can be loaded directly from Nixtla's S3 bucket:
```python theme={null}
pm_df = pd.read_csv(
'https://datasets-nixtla.s3.amazonaws.com/peyton-manning.csv'
)
```
If you are using your own data, ensure your data is properly formatted: you must have a time column (e.g., `ds`), a target column (e.g., `y`), and, if necessary, an identifier column (e.g., `unique_id`) for multiple time series.
### Step 3: Implement Rolling-Window Cross-Validation
The `cross_validation` method within the TimeGPT class is an advanced functionality crafted to perform systematic validation on time series forecasting models. This method necessitates a dataframe comprising time-ordered data and employs a rolling-window scheme to meticulously evaluate the model's performance across different time periods, thereby ensuring the model's reliability and stability over time. The animation below shows how TimeGPT performs cross-validation.

Key parameters include:
* `freq`: Frequency of your data (e.g., `'D'` for daily). If not specified, it will be inferred.
* `id_col`, `time_col`, `target_col`: Columns representing series ID, timestamps, and target values.
* `n_windows`: Number of separate validation windows.
* `step_size`: Step size between each validation window.
* `h`: Forecast horizon (e.g., the number of days ahead to predict).
In execution, `cross_validation` assesses the model's forecasting accuracy in each window, providing a robust view of the model's performance variability over time and potential overfitting. This detailed evaluation ensures the forecasts generated are not only accurate but also consistent across diverse temporal contexts.
**Key Concepts**: Rolling-window cross-validation splits your dataset into multiple training and testing sets over time. Each window moves forward chronologically, training on historical data and validating on future periods. This approach mimics real-world forecasting scenarios where you predict forward in time.
Use `cross_validation` on the Peyton Manning dataset:
```python theme={null}
# Perform cross-validation with 5 windows and 7-day forecast horizon
timegpt_cv_df = nixtla_client.cross_validation(
pm_df,
h=7, # Forecast 7 days ahead
n_windows=5, # Test across 5 different time periods
freq='D' # Daily frequency
)
timegpt_cv_df.head()
```
The logs below indicate successful cross-validation calls and data preprocessing.
```bash theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Querying model metadata...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Cross Validation Endpoint...
```
Cross-validation output includes the forecasted values (`TimeGPT`) aligned with historical values (`y`).
| unique\_id | ds | cutoff | y | TimeGPT |
| ---------- | ---------- | ---------- | -------- | -------- |
| 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 7.939553 |
| 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 7.887512 |
| 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.766617 |
| 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.931502 |
| 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.312632 |
### Step 4: Plot Cross-Validation Results
Visualize forecast performance for each cutoff period. Here's an example plotting the last 100 rows of actual data along with cross-validation forecasts for each cutoff.
```python theme={null}
cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
fig = nixtla_client.plot(
pm_df.tail(100),
timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
)
display(fig)
```





### Step 5: Generate Prediction Intervals for Model Uncertainty
It is also possible to generate prediction intervals during cross-validation. To do so, we simply use the `level` argument.
```python theme={null}
timegpt_cv_df = nixtla_client.cross_validation(
pm_df,
h=7,
n_windows=5,
freq='D',
level=[80, 90],
)
timegpt_cv_df.head()
```
| | unique\_id | ds | cutoff | y | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
| - | ---------- | ---------- | ---------- | -------- | -------- | ------------- | ------------- | ------------- | ------------- |
| 0 | 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 7.939553 | 8.201465 | 8.314956 | 7.677642 | 7.564151 |
| 1 | 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 7.887512 | 8.175414 | 8.207470 | 7.599609 | 7.567553 |
| 2 | 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.766617 | 8.267363 | 8.386674 | 7.265871 | 7.146560 |
| 3 | 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.931502 | 8.205929 | 8.369983 | 7.657075 | 7.493020 |
| 4 | 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.312632 | 9.184893 | 9.625794 | 7.440371 | 6.999469 |
Plot the prediction intervals for the cross-validation results.
```python theme={null}
cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
fig = nixtla_client.plot(
pm_df.tail(100),
timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
level=[80, 90],
models=['TimeGPT']
)
display(fig)
```





### Step 6: Enhance Forecasts with Exogenous Variables
#### Time Features
It is possible to include exogenous variables when performing cross-validation. Here we use the `date_features` parameter to create labels for each month. These features are then used by the model to make predictions during cross-validation.
```python theme={null}
timegpt_cv_df = nixtla_client.cross_validation(
pm_df,
h=7,
n_windows=5,
freq='D',
date_features=['month'],
)
timegpt_cv_df.head()
```
| | unique\_id | ds | cutoff | y | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
| - | ---------- | ---------- | ---------- | -------- | -------- | ------------- | ------------- | ------------- | ------------- |
| 0 | 0 | 2015-12-17 | 2015-12-16 | 7.591862 | 8.426320 | 8.721996 | 8.824101 | 8.130644 | 8.028540 |
| 1 | 0 | 2015-12-18 | 2015-12-16 | 7.528869 | 8.049962 | 8.452083 | 8.658603 | 7.647842 | 7.441321 |
| 2 | 0 | 2015-12-19 | 2015-12-16 | 7.171657 | 7.509098 | 7.984788 | 8.138017 | 7.033409 | 6.880180 |
| 3 | 0 | 2015-12-20 | 2015-12-16 | 7.891331 | 7.739536 | 8.306914 | 8.641355 | 7.172158 | 6.837718 |
| 4 | 0 | 2015-12-21 | 2015-12-16 | 8.360071 | 8.027471 | 8.722828 | 9.152306 | 7.332113 | 6.902636 |
Plot the cross-validation results with the time features.
```python theme={null}
cutoffs = timegpt_cv_df['cutoff'].unique()
for cutoff in cutoffs:
fig = nixtla_client.plot(
pm_df.tail(100),
timegpt_cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'y']),
date_features=['month'],
models=['TimeGPT']
)
display(fig)
```





#### Dynamic Features
Additionally you can pass dynamic exogenous variables to better inform TimeGPT about the data. You just simply have to add the exogenous regressors after the target column.
```python theme={null}
Y_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity.csv')
X_df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/exogenous-vars-electricity.csv')
df = Y_df.merge(X_df)
```
Now let's cross validate `TimeGPT` considering this information
```python theme={null}
timegpt_cv_df_x = nixtla_client.cross_validation(
df.groupby('unique_id').tail(100 * 48),
h=48,
n_windows=2,
level=[80, 90]
)
cutoffs = timegpt_cv_df_x.query('unique_id == "BE"')['cutoff'].unique()
for cutoff in cutoffs:
fig = nixtla_client.plot(
df.query('unique_id == "BE"').tail(24 * 7),
timegpt_cv_df_x.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
models=['TimeGPT'],
level=[80, 90],
)
display(fig)
```


### Step 7: Long-Horizon Forecasting with TimeGPT
Also, you can generate cross validation for different instances of `TimeGPT` using the `model` argument. Here we use the base model and the model for long-horizon forecasting.
```python theme={null}
timegpt_cv_df_x_long_horizon = nixtla_client.cross_validation(
df.groupby('unique_id').tail(100 * 48),
h=48,
n_windows=2,
level=[80, 90],
model='timegpt-1-long-horizon',
)
timegpt_cv_df_x_long_horizon.columns = timegpt_cv_df_x_long_horizon.columns.str.replace('TimeGPT', 'TimeGPT-LongHorizon')
timegpt_cv_df_x_models = timegpt_cv_df_x_long_horizon.merge(timegpt_cv_df_x)
cutoffs = timegpt_cv_df_x_models.query('unique_id == "BE"')['cutoff'].unique()
for cutoff in cutoffs:
fig = nixtla_client.plot(
df.query('unique_id == "BE"').tail(24 * 7),
timegpt_cv_df_x_models.query('cutoff == @cutoff & unique_id == "BE"').drop(columns=['cutoff', 'y']),
models=['TimeGPT', 'TimeGPT-LongHorizon'],
level=[80, 90],
)
display(fig)
```


## Frequently Asked Questions
**What is time series cross-validation?**
Time series cross-validation is a model validation technique that uses rolling windows to evaluate forecasting accuracy while preserving temporal order, ensuring reliable predictions on unseen data.
**How is time series cross-validation different from k-fold cross-validation?**
Unlike k-fold cross-validation which randomly shuffles data, time series cross-validation maintains temporal order using techniques like walk-forward validation and expanding windows to prevent data leakage.
**What are the key parameters for cross-validation in TimeGPT?**
Key parameters include `h` (forecast horizon), `n_windows` (number of validation windows), `step_size` (window increment), and `level` (prediction interval confidence levels).
**How do you evaluate cross-validation results?**
Evaluate results by comparing forecasted values against actual values across multiple time windows, analyzing prediction intervals, and calculating metrics like MAE, RMSE, and MAPE.
## Conclusion
You've mastered time series cross-validation with TimeGPT, including rolling-window validation, prediction intervals, exogenous variables, and long-horizon forecasting. These model validation techniques ensure your forecasts are accurate, reliable, and production-ready.
### Next Steps in Model Validation
* Explore [evaluation metrics](/docs/forecasting/evaluation/evaluation_metrics) to quantify forecast accuracy
* Learn about [fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) for domain-specific data
* Apply cross-validation to [multiple time series](/docs/data_requirements/multiple_series)
Ready to validate your forecasts at scale? [Start your TimeGPT trial](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/evaluation/cross_validation) and implement robust cross-validation today.
# Evaluation Metrics
Source: https://nixtla.io/docs/forecasting/evaluation/evaluation_metrics
Learn to select the right evaluation metrics to measure the performance of TimeGPT.
Selecting the right evaluation metric is crucial, as it guides the selection of the best settings for TimeGPT to ensure the model is making accurate forecasts.
## Overview of Common Evaluation Metrics
The following table summarizes the common evaluation metrics used in forecasting depending on the type of forecasts. It also indicates when to use and when to avoid a particular metric.
| Metric | Types of forecast | Properties | When to avoid |
| ------ | ---------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------- |
| MAE | Point forecast | - robust to outliers
- easy to interpret
- same units as the data
| When averaging over series of different scales |
| MSE | Point forecast | - penalizes large errors
- not the same units as the data
- sensitive to outliers
| There are unrepresentative outliers in the data |
| RMSE | Point forecast | - penalizes large errors
- same units as the data
- sensitive to outliers
| There are unrepresentative outliers in the data |
| MAPE | Point forecast | - expressed as a percentage
- easy to interpret
- favors under-forecasts
| When data has zero values |
| sMAPE | Point forecast | - robust to over- and under-forecasts
- expressed as a percentage
- easy to interpret
| When data has zero values |
| MASE | Point forecast | - like the MAE, but scaled by the naive forecast
- inherently compares to a simple benchmark
- requires technical knowledge to interpret
| There is only one series to evaluate |
| CRPS | Probabilistic forecast | - generalizaed MAE for probabilistic forecasts
- requires technical knowledge to interpret
| When evaluating point forecasts |
In the following sections, we dive deeper into each metric. Note that all of these metrics can be used to evaluate the forecasts of TimeGPT using the *utilsforecast* library. For more information, read our tutorial on [evaluating TimeGPT with utilsforecast](/docs/forecasting/evaluation/evaluation_utilsforecast).
## Mean Absolute Error (MAE)
The mean absolute error simply averages the absolute distance between the forecasts and the actual values.
It is a good evaluation metric that works in the vast majority of forecasting tasks. It is robust to outliers, meaning that it will not magnifiy large errors, and it is expressed as the same units as the data, making it easy to interpret.
Simply be careful when average the MAE over multiple series of different scales, since then a series with smaller values might bring down the MAE, while a series with larger values will bring it up.
## Mean Squared Error (MSE)
The mean squared error squares the forecast errors before averaging them, which heavily penalizes large errors while giving less weight to small ones.
As such, it is not robust to outliers since a single large error can dramatically inflate the MSE value. Additionally, the units are squared (e.g., dollars²), making it difficult to interpret in practical terms.
Avoid MSE when your data contains outliers or when you need an easily interpretable metric. It's best used in optimization contexts where you specifically want to penalize large errors more severely.
## Root Mean Squared Error (RMSE)
The root mean squared error is simply the square root of the MSE, bringing the metric back to the original units of the data while preserving MSE's property of penalizing large errors.
RMSE is more interpretable than MSE since it's expressed in the same units as your data.
You should avoid RMSE when outliers are present or when you want equal treatment of all errors.
## Mean Absolute Percentage Error (MAPE)
The mean absolute percentage error expresses forecast errors as percentages of the actual values, making it scale-independent and easy to interpret.
MAPE is excellent for comparing forecast accuracy across different time series with varying scales. It's intuitive and easily understood in business contexts.
Avoid MAPE when your data contains zero or near-zero values (causes division by zero) or when you have intermittent demand patterns.
Not that it's also asymmetric, penalizing positive errors (over-forecasts) more heavily than negative errors (under-forecasts).
## Symmetric Mean Absolute Percentage Error (sMAPE)
The symmetric mean absolute percentage error attempts to address MAPE's asymmetry by using the average of actual and forecast values in the denominator, making it more balanced between over- and under-forecasts.
sMAPE is more stable than MAPE and less prone to extreme values. It's still scale-independent and relatively easy to interpret, though not as intuitive as MAPE.
Avoid sMAPE when dealing with zero values or when the sum of actual and forecast values approaches zero. While more symmetric than MAPE, it's still not perfectly symmetric and can behave unexpectedly in edge cases.
## Mean Absolute Scaled Error (MASE)
The mean absolute scaled error scales forecast errors relative to the average error of a naive seasonal forecast, providing a scale-independent measure that's robust and interpretable.
MASE is excellent for comparing forecasts across different time series and scales. A MASE value less than 1 indicates your forecast is better than the naive benchmark, while values greater than 1 indicate worse performance.
It's robust to outliers and handles zero values well.
While it is a good metric to compare across multiple series, it might not make sense for you to compare against naive forecasts, and it does require some technical knowledge to interpret correctly.
## Continuous Ranked Probability Score (CRPS)
The continuous ranked probability score measures the distance between the entire forecast distribution and the observed value, making it ideal for evaluating probabilistic forecasts.
CRPS is a proper scoring rule that reduces to MAE when dealing with deterministic forecasts, making it a natural extension for probabilistic forecasting. It's expressed in the same units as the original data and provides a comprehensive evaluation of forecast distributions, rewarding both accuracy and good uncertainty quantification.
CRPS is specifically designed for probabilistic forecasts, so avoid it when you only have point forecasts. It's also more computationally intensive to calculate than simpler metrics and may be less intuitive for stakeholders unfamiliar with probabilistic forecasting concepts.
## Evaluating TimeGPT
To learn how to use any of the metrics outlined above to evaluate the forecasts of TimeGPT, read our tutorial on [evaluating TimeGPT with utilsforecast](/docs/forecasting/evaluation/evaluation_utilsforecast).
# Evaluation Pipeline
Source: https://nixtla.io/docs/forecasting/evaluation/evaluation_utilsforecast
Learn how to evaluate TimeGPT model performance using tools in utilforecast
## Overview
After generating forecasts with TimeGPT, the next step is to evaluate how accurate those forecasts are. The evaluate function from the utilsforecast library provides a fast and flexible way to assess model performance using a wide range of metrics. This pipeline works seamlessly with TimeGPT and other forecasting models.\
With the evaluation pipeline, you can easily select models and define metrics like MAE, MSE, or MAPE to benchmark forecasting performance.
## Step-to-Step Guide
### Step 1. Import Required Packages
Start by importing the necessary libraries and initializing the `NixtlaClient` with your API key.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
from functools import partial
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, mse, rmse, mape, smape, mase, scaled_crps
nixtla_client = NixtlaClient(api_key='your_api_key_here')
```
### Step 2. Load and Prepare the Dataset
For this example, we use the Air Passenger dataset, which records monthly totals of international airline passengers. We'll load the dataset, format the timestamps, and split the data into a training set and a test set. The last 12 months are used for testing.
```python theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df['unique_id'] = 'passengers'
df['timestamp'] = pd.to_datetime(df['timestamp'])
```
```python theme={null}
df_train = df.iloc[:-12]
df_test = df.iloc[-12:]
```
### Step 3. Generate Forecast with TimeGPT
Next, we will:
* Use the training set to generate a 12-step forecast with TimeGPT.
* Merge the forecast with the test set for evaluation.
```python theme={null}
fcst_timegpt = nixtla_client.forecast(df = df_train,
h=12,
time_col = 'timestamp',
target_col = 'value',
level=[80, 95])
fcst_timegpt = fcst_timegpt.merge(df_test, on = ['timestamp','unique_id'])
```
### Step 4. Define Models and Metrics for Evaluation
Next, we define the models to evaluate and the metrics to use. For more information about supported metrics, refer to the [evaluation metrics tutorial](forecasting/evaluation/evaluation_metrics) .
```python theme={null}
models = ['TimeGPT']
metrics = [
mae,
mse,
rmse,
mape,
smape,
partial(mase, seasonality=12),
scaled_crps
]
```
### Step 5. Run the Evaluation
Finally, call the evaluate function with your merged forecast results. Include `train_df` for metrics that need the training data and `level` if using probabilistic metrics.
```python theme={null}
evaluation = evaluate(
fcst_timegpt,
target_col = 'value',
time_col = 'timestamp',
metrics=metrics,
models=model,
train_df=df_train,
level=[80, 95]
)
```
| unique\_id | metric | TimeGPT |
| ---------- | ------------ | -------- |
| passengers | mae | 12.67930 |
| passengers | mse | 213.9358 |
| passengers | rmse | 14.62654 |
| passengers | mape | 0.026964 |
| passengers | smape | 0.013527 |
| passengers | mase | 0.416397 |
| passengers | scaled\_crps | 0.008991 |
# Categorical Variables
Source: https://nixtla.io/docs/forecasting/exogenous-variables/categorical_features
Learn how to incorporate external categorical variables in your TimeGPT forecasts to improve accuracy.
## What Are Categorical Variables?
Categorical variables are external factors that take on a limited range of discrete values, grouping observations by categories. For example, "Sporting" or "Cultural" events in a dataset describing product demand.
By capturing unique external conditions, categorical variables enhance the predictive power of your model and can reduce forecasting error. They are easy to incorporate by merging each time series data point with its corresponding categorical data.
This tutorial demonstrates how to incorporate categorical (discrete) variables into TimeGPT forecasts.
## How to Use Categorical Variables in TimeGPT
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/03_categorical_variables.ipynb)
### Step 1: Import Packages and Initialize the Nixtla Client
Make sure you have the necessary libraries installed: pandas, nixtla, and datasetsforecast.
```python theme={null}
import pandas as pd
import os
from nixtla import NixtlaClient
from datasetsforecast.m5 import M5
from utilsforecast.losses import smape
# Initialize the Nixtla Client
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load M5 Data
We use the **M5 dataset** — a collection of daily product sales demands across 10 US stores — to showcase how categorical variables can improve forecasts.
Start by loading the M5 dataset and converting the date columns to datetime objects.
```python theme={null}
Y_df, X_df, _ = M5.load(directory=os.getcwd())
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
X_df['ds'] = pd.to_datetime(X_df['ds'])
Y_df.head(10)
```
| unique\_id | ds | y |
| -------------------- | ---------- | --- |
| FOODS\_1\_001\_CA\_1 | 2011-01-29 | 3.0 |
| FOODS\_1\_001\_CA\_1 | 2011-01-30 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-01-31 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-01 | 1.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-02 | 4.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-03 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-04 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-05 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-06 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-07 | 0.0 |
Extract the categorical columns from the X\_df dataframe.
```python theme={null}
X_df = X_df[['unique_id', 'ds', 'event_type_1']]
X_df.head(10)
```
| unique\_id | ds | event\_type\_1 |
| -------------------- | ---------- | -------------- |
| FOODS\_1\_001\_CA\_1 | 2011-01-29 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-01-30 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-01-31 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-02-01 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-02-02 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-02-03 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-02-04 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-02-05 | nan |
| FOODS\_1\_001\_CA\_1 | 2011-02-06 | Sporting |
| FOODS\_1\_001\_CA\_1 | 2011-02-07 | nan |
Notice that there is a Sporting event on February 6, 2011, listed under `event_type_1`.
### Step 3: Prepare Data for Forecasting
We'll select a specific product to demonstrate how to incorporate categorical features into TimeGPT forecasts.
#### Select a High-Selling Product and Merge Data
Start by selecting a high-selling product and merging the data.
```python theme={null}
product = 'FOODS_3_090_CA_3'
Y_df_product = Y_df.query('unique_id == @product')
X_df_product = X_df.query('unique_id == @product')
df = Y_df_product.merge(X_df_product)
df.head(10)
```
| unique\_id | ds | y | event\_type\_1 |
| -------------------- | ---------- | ----- | -------------- |
| FOODS\_3\_090\_CA\_3 | 2011-01-29 | 108.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-01-30 | 132.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-01-31 | 102.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-02-01 | 120.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-02-02 | 106.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-02-03 | 123.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-02-04 | 279.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-02-05 | 175.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2011-02-06 | 186.0 | Sporting |
| FOODS\_3\_090\_CA\_3 | 2011-02-07 | 120.0 | nan |
#### Prepare Future External Variables
Select future external variables for Feb 1-7, 2016.
```python theme={null}
future_ex_vars_df = df.drop(columns=['y']).query("ds >= '2016-02-01' & ds <= '2016-02-07'")
```
Separate training data before Feb 1, 2016.
```python theme={null}
df_train = df.query("ds < '2016-02-01'")
df_train.tail(10)
```
| unique\_id | ds | y | event\_type\_1 |
| -------------------- | ---------- | ----- | -------------- |
| FOODS\_3\_090\_CA\_3 | 2016-01-22 | 94.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-23 | 144.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-24 | 146.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-25 | 87.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-26 | 73.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-27 | 62.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-28 | 64.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-29 | 102.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-30 | 113.0 | nan |
| FOODS\_3\_090\_CA\_3 | 2016-01-31 | 98.0 | nan |
### Step 4: Forecast Product Demand
To evaluate the impact of categorical variables, we'll forecast product demand with and without them.
#### Forecast Without Categorical Variables
```python theme={null}
timegpt_fcst_without_cat_vars_df = nixtla_client.forecast(
df=df_train,
h=7,
level=[80, 90]
)
timegpt_fcst_without_cat_vars_df.head()
```
| unique\_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
| -------------------- | ---------- | --------- | ------------- | ------------- | ------------- | ------------- |
| FOODS\_3\_090\_CA\_3 | 2016-02-01 | 73.304090 | 95.887380 | 98.250880 | 50.720802 | 48.357307 |
| FOODS\_3\_090\_CA\_3 | 2016-02-02 | 66.335520 | 75.429660 | 76.663704 | 57.241375 | 56.007330 |
| FOODS\_3\_090\_CA\_3 | 2016-02-03 | 65.881630 | 86.636480 | 87.502810 | 45.126778 | 44.260456 |
| FOODS\_3\_090\_CA\_3 | 2016-02-04 | 72.371864 | 92.362690 | 96.378610 | 52.381035 | 48.365116 |
| FOODS\_3\_090\_CA\_3 | 2016-02-05 | 95.141045 | 111.439224 | 114.115490 | 78.842865 | 76.166595 |
Visualize the forecast without categorical variables.
```python theme={null}
nixtla_client.plot(
df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
timegpt_fcst_without_cat_vars_df,
max_insample_length=28,
)
```
TimeGPT already provides a reasonable forecast, but it seems to somewhat underforecast the peak on the 6th of February 2016 - the day before the Super Bowl.
#### Forecast With Categorical Variables
To forecast with categorical variables, simply provide the list of column names containing categorical features in the `categorical_exog_list` argument.
```python theme={null}
timegpt_fcst_with_cat_vars_df = nixtla_client.forecast(
df=df_train,
X_df=future_ex_vars_df,
h=7,
level=[80, 90],
categorical_exog_list=["event_type_1"]
)
timegpt_fcst_with_cat_vars_df.head()
```
| unique\_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
| -------------------- | ---------- | --------- | ------------- | ------------- | ------------- | ------------- |
| FOODS\_3\_090\_CA\_3 | 2016-02-01 | 73.839455 | 100.905910 | 104.44151 | 46.773006 | 43.237396 |
| FOODS\_3\_090\_CA\_3 | 2016-02-02 | 66.548750 | 75.294970 | 76.62822 | 57.802540 | 56.469284 |
| FOODS\_3\_090\_CA\_3 | 2016-02-03 | 66.694435 | 87.777954 | 88.63922 | 45.610912 | 44.749650 |
| FOODS\_3\_090\_CA\_3 | 2016-02-04 | 74.249530 | 94.813286 | 98.88473 | 53.685770 | 49.614326 |
| FOODS\_3\_090\_CA\_3 | 2016-02-05 | 96.052414 | 112.402090 | 115.22341 | 79.702736 | 76.881420 |
Visualize the forecast with categorical variables.
```python theme={null}
# Visualize the forecast with categorical variables
nixtla_client.plot(
df[['unique_id', 'ds', 'y']].query("ds <= '2016-02-07'"),
timegpt_fcst_with_cat_vars_df,
max_insample_length=28,
)
```
## 5. Evaluate Forecast Accuracy
Finally, we calculate the **Symmetric Mean Absolute Percentage Error (sMAPE)** for the forecasts with and without categorical variables.
```python theme={null}
# Create target dataframe
df_target = df[['unique_id', 'ds', 'y']].query("ds >= '2016-02-01' & ds <= '2016-02-07'")
# Rename forecast columns
timegpt_fcst_without_cat_vars_df = timegpt_fcst_without_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-without-cat-vars'})
timegpt_fcst_with_cat_vars_df = timegpt_fcst_with_cat_vars_df.rename(columns={'TimeGPT': 'TimeGPT-with-cat-vars'})
# Merge forecasts with target dataframe
df_target = df_target.merge(timegpt_fcst_without_cat_vars_df[['unique_id', 'ds', 'TimeGPT-without-cat-vars']])
df_target = df_target.merge(timegpt_fcst_with_cat_vars_df[['unique_id', 'ds', 'TimeGPT-with-cat-vars']])
# Compute errors
smape_errors = smape(df_target, ['TimeGPT-without-cat-vars', 'TimeGPT-with-cat-vars'])
```
| unique\_id | TimeGPT-without-cat-vars | TimeGPT-with-cat-vars |
| -------------------- | ------------------------ | --------------------- |
| FOODS\_3\_090\_CA\_3 | 0.109241 | 0.108666 |
Including categorical variables improves forecast accuracy as it achieves a lower sMAPE.
## Conclusion
Categorical variables are powerful additions to TimeGPT forecasts, helping capture valuable external factors. By simply passing them to the `categorical_exog_list` parameter, you can significantly enhance predictive performance.
Continue exploring more advanced techniques or different datasets to further improve your TimeGPT forecasting models.
# Date/Time Features
Source: https://nixtla.io/docs/forecasting/exogenous-variables/date_features
Learn how to incorporate date/time features into your forecasts to improve performance.
## Why incorporate Date/Time Features in your Forecasts
Many time series display patterns that repeat based on the calendar like demand
increasing on weekends, sales peaking at the end of the month, or traffic
varying by hour of the day. Recognizing and capturing these time-based patterns
can be a powerful way to improve forecasting accuracy.
While you can forecast a time series based solely on its historical values,
adding additional date/time related features, such as the day of the
week, month, quarter, or hour, can often enhance the model's performance. These
features can be especially useful when your dataset lacks exogenous variables,
but they can also complement external regressors when available.
In this tutorial, we'll walk through how to incorporate these date/time features
into TimeGPT to boost the accuracy of your forecasts.
## How to incorporate Date/Time Features in your Forecasts
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/date_features.ipynb)
### Step 1: Import Packages
Import the necessary libraries and initialize the Nixtla client.
```python theme={null}
import numpy as np
import pandas as pd
from nixtla import NixtlaClient
# For forecast evaluation
from utilsforecast.evaluation import evaluate
from utilsforecast.losses import mae, rmse
```
You can instantiate the `NixtlaClient` class providing your authentication API key.
```python theme={null}
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load Data
In this notebook, we use hourly electricity prices as our example dataset, which
consists of 5 time series, each with approximately 1700 data points. For
demonstration purposes, we focus on the German electricity price series. The
time series is split, with the last 240 steps (10 days) set aside as the test set.
For simplicity, we will also demonstrate this tutorial without the use of any
additional exogenous variables, but you could extend this same technique for
datasets that have exogenous variables.
```python theme={null}
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv'
)
df['ds'] = pd.to_datetime(df['ds'])
df_sub = df.query('unique_id == "DE"')[['unique_id','ds','y']]
```
```python theme={null}
df_train = df_sub.query('ds < "2017-12-21"')
df_test = df_sub.query('ds >= "2017-12-21"')
df_train.shape, df_test.shape
```
```bash theme={null}
((1440, 3), (240, 3))
```
```python theme={null}
nixtla_client.plot(df_train, df_test.rename(columns={'y': 'test'}))
```
### Step 3: Forecasting
#### Without Datetime Features
First, we forecast the univariate time series without the use of datetime
features.
```python theme={null}
fcst_timegpt_no_dt = nixtla_client.forecast(
df = df_train,
h=24*10,
model="timegpt-1-long-horizon"
)
```
We will rename the forecast column for this approach, so that we can distinguish
it from forecasts created using other methods later.
```python theme={null}
fcst_timegpt_no_dt.rename(columns={"TimeGPT": "TimeGPT_no_dt"}, inplace=True)
```
#### With Inbuilt Datetime Features
Next, let's forecast the same univariate time series with datetime features.
This can be done by specifying the `date_features` argument. The
data is hourly, so both the hour of the day (`hour`) and the day of
the week (`dayofweek`) may impact the usage.
For example, the usage may peak in the afternoon and drop off at night. It can
also differ between the weekdays and weekends due to working and holiday
patterns. Including these features can help the model make better forecasts.
> NOTE:
>
> 1. In order to show how these features are created, we can add the
> `feature_contribution` agrument. This is just for demonstration purposes in this
> tutorial and not truly needed to forecast with datetime features.
> 2. If you have a weekly frequency dataset, you can use
> `date_features = ["week", "month", "year"]` or a subset of these features.
> 3. If you have a monthly frequency dataset, you can use
> `date_features = ["month", "year"]` or a subset of these features.
```python theme={null}
fcst_timegpt_dt_no_ohe = nixtla_client.forecast(
df = df_train,
h=24*10,
model="timegpt-1-long-horizon",
date_features=['hour', 'dayofweek'],
feature_contributions=True
)
```
```python theme={null}
shap_df = nixtla_client.feature_contributions
shap_df.head()
```
| | unique\_id | ds | TimeGPT | hour | dayofweek | base\_value |
| -: | ---------: | ------------------: | --------: | ---------: | --------: | ----------: |
| 0 | DE | 2017-12-21 00:00:00 | 34.945976 | -12.797431 | 4.236599 | 43.506810 |
| 1 | DE | 2017-12-21 01:00:00 | 33.700954 | -14.274811 | 4.168986 | 43.806778 |
| 2 | DE | 2017-12-21 02:00:00 | 32.120293 | -15.785894 | 4.123096 | 43.783092 |
| 3 | DE | 2017-12-21 03:00:00 | 32.544914 | -15.623017 | 4.542475 | 43.625454 |
| 4 | DE | 2017-12-21 04:00:00 | 33.698105 | -14.559433 | 4.525819 | 43.731720 |
As we can see, two new exogenous features (`hour` and `dayofweek`) got added to
the dataset and the forecast utilized these features.
However, we need to ensure that the model treats each hour (0, 1, 2, ..., 23)
and each day (0, 1, 2, ..., 6) as a categorical variable and not as a numerical
variable. If treated numerically, the model may exaggerate differences (e.g.,
hour 23 might appear 23 times more influential than hour 1), which doesn't
reflect real patterns. Electricity usage at hour 23 is typically similar to
hour 1, and day 6 usage often resembles day 0.
To avoid this distortion, we one-hot encode these variables using the
`date_features_to_one_hot` argument. This creates a separate exogenous feature
for each hour and each day, allowing the model to capture their effects
independently.
```python theme={null}
fcst_timegpt_dt = nixtla_client.forecast(
df = df_train,
h=24*10,
model="timegpt-1-long-horizon",
date_features=['hour', 'dayofweek'],
date_features_to_one_hot=['hour', 'dayofweek'],
feature_contributions=True
)
```
```python theme={null}
shap_df = nixtla_client.feature_contributions
shap_df.head()
```
| | unique\_id | ds | TimeGPT | hour\_0 | hour\_1 | hour\_2 | hour\_3 | hour\_4 | hour\_5 | hour\_6 | ... | hour\_22 | hour\_23 | dayofweek\_0 | dayofweek\_1 | dayofweek\_2 | dayofweek\_3 | dayofweek\_4 | dayofweek\_5 | dayofweek\_6 | base\_value | |
| -: | ---------: | -: | ------------------: | --------: | ---------: | ---------: | ---------: | ---------: | ---------: | -------: | -------: | -------: | -------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | ----------: | --------- |
| 0 | 0 | DE | 2017-12-21 00:00:00 | 35.248108 | -13.396377 | 0.387143 | 0.423001 | 0.392672 | 0.373034 | 0.333778 | 0.147671 | ... | 0.271507 | 0.393282 | 0.472389 | -0.377321 | -0.548429 | -0.101086 | -0.133001 | 1.455560 | 2.975230 | 44.333805 |
| 1 | 1 | DE | 2017-12-21 01:00:00 | 34.400800 | 0.358443 | -14.488875 | 0.389985 | 0.359990 | 0.341219 | 0.320964 | 0.135058 | ... | 0.266497 | 0.391259 | 0.445456 | -0.306117 | -0.436959 | -0.172850 | -0.151865 | 1.533456 | 3.022358 | 44.539093 |
| 2 | 2 | DE | 2017-12-21 02:00:00 | 33.175526 | 0.375983 | 0.372809 | -15.824338 | 0.348533 | 0.351379 | 0.317832 | 0.123833 | ... | 0.273698 | 0.410714 | 0.417348 | -0.279551 | -0.342991 | -0.171547 | -0.142890 | 1.532721 | 3.042772 | 44.515614 |
| 3 | 3 | DE | 2017-12-21 03:00:00 | 33.205390 | 0.368333 | 0.366936 | 0.372584 | -15.880591 | 0.346306 | 0.319877 | 0.136488 | ... | 0.276705 | 0.416273 | 0.508190 | -0.274014 | -0.339005 | -0.176228 | -0.152890 | 1.588364 | 3.095226 | 44.391410 |
| 4 | 4 | DE | 2017-12-21 04:00:00 | 34.689583 | 0.363581 | 0.363459 | 0.393807 | 0.362043 | -14.755774 | 0.314718 | 0.141911 | ... | 0.274819 | 0.402653 | 0.531417 | -0.277548 | -0.360688 | -0.159342 | -0.169762 | 1.692538 | 3.165733 | 44.505848 |
As we can see above, this now creates a separate feature for each hour of the
day and each day of the week.
> NOTE: With one hot encoding, the number of features can increase by a lot.
> This is especially true if you have weekly frequency data and you are using
> `date_feature=["week"]` because this leads to 52 features being created after
> one hot encoding. Please make sure that your dataset has enough datapoints or
> else the model will overfit to the data. You can increase the number of
> datapoints in the dataset by increasing the available history for your time
> series, or increasing the number of unique time series that share a common
> pattern in your dataset.
```python theme={null}
fcst_timegpt_dt.rename(columns={"TimeGPT": "fcst_timegpt_dt"}, inplace=True)
```
#### With Custom Datetime Features
In the example above, we saw how to incorporate the inbuilt datetime features
into the forecast. However, as seen above, in some cases, it may not be feasible
to one hot encode the datetime features since it may lead to a large number of
features for the dataset size. In that case, we can create a custom datetime
feature and use it in the forecast.
In this example, we will create a sine/cosine encoder for the week which is a
popular technique to encode datetime features due to their circular nature
described above (e.g. hour 23 behavior is close to hour 0 behavior, week 52
behavior is very close to week 1 behavior, etc.).
```python theme={null}
class SinCosWeekOfYear:
"""
Adds sine and cosine features for each week of the year. This is useful for
models that can benefit from understanding the periodicity of weeks in a year.
"""
def __call__(self, dates: pd.DatetimeIndex):
df = pd.DataFrame(index=dates)
# Get week of year (1 to 53)
weeks = np.array([date.isocalendar().week for date in dates])
# Calculate sine and cosine features
df["week_sin"] = np.sin((2 * np.pi) * (weeks-1) / 53).round(4)
df["week_cos"] = np.cos((2 * np.pi) * (weeks-1) / 53).round(4)
return df
def __name__(self):
return "SinCosWeekOfYear"
# Example usage
dates = pd.date_range(start='2023-01-01', periods=55, freq='W-MON')
sin_cos_week = SinCosWeekOfYear()
features = sin_cos_week(dates)
features.tail()
```
| | week\_sin | week\_cos |
| ---------: | --------: | --------: |
| 2023-12-18 | -0.3482 | 0.9374 |
| 2023-12-25 | -0.2349 | 0.9720 |
| 2024-01-01 | 0.0000 | 1.0000 |
| 2024-01-08 | 0.1183 | 0.9930 |
| 2024-01-15 | 0.2349 | 0.9720 |
As we can see above, because of the cyclical encoding of the datetime feature,
the encoded values (`week_sin` and `week_cos`) for week 2023-12-25 (week 52)
is very close to 2024-01-01 (week 1). This will ensure that the learned features
for week 52 will be close to those for week 1. This has also helped us get the
feature cardinality down from 53 (in case of one hot encoding) to only 2 features.
In our example, we have the hour feature wich has a relatively high cardinality
after one hot encoding. Let's encode this with sine and cosine features and use
this instead of the one hot encoding.
```python theme={null}
class SinCosHourOfDay:
"""
Adds sine and cosine features for each hour of the day. This is useful for
models that can benefit from understanding the periodicity of hours in a day.
"""
def __call__(self, dates: pd.DatetimeIndex):
df = pd.DataFrame(index=dates)
# Get hour of day (0 to 23)
hours = np.array([date.hour for date in dates])
# Calculate sine and cosine features
df["hour_sin"] = np.sin((2 * np.pi) * (hours) / 24).round(4)
df["hour_cos"] = np.cos((2 * np.pi) * (hours) / 24).round(4)
return df
def __name__(self):
return "SinCosHourOfDay"
# Example usage
dates = pd.date_range(start='2023-01-01 00:00', periods=26, freq='h')
sin_cos_hour = SinCosHourOfDay()
features = sin_cos_hour(dates)
features.tail()
```
| | hour\_sin | hour\_cos |
| ------------------: | --------: | --------: |
| 2023-01-01 21:00:00 | -0.7071 | 0.7071 |
| 2023-01-01 22:00:00 | -0.5000 | 0.8660 |
| 2023-01-01 23:00:00 | -0.2588 | 0.9659 |
| 2023-01-02 00:00:00 | 0.0000 | 1.0000 |
| 2023-01-02 01:00:00 | 0.2588 | 0.9659 |
In order to use this custom datetime feature, we can simply pass an instance of
the class to the `date_features` argument. Since this is alreay encoded, we do
not need to include it in the `date_features_to_one_hot` argument.
```python theme={null}
fcst_timegpt_dt_custom = nixtla_client.forecast(
df = df_train,
h=24*10,
model="timegpt-1-long-horizon",
date_features=[SinCosHourOfDay(), 'dayofweek'],
date_features_to_one_hot=['dayofweek'],
feature_contributions=True
)
```
```python theme={null}
shap_df = nixtla_client.feature_contributions
shap_df.head()
```
| | unique\_id | ds | TimeGPT | hour\_sin | hour\_cos | dayofweek\_0 | dayofweek\_1 | dayofweek\_2 | dayofweek\_3 | dayofweek\_4 | dayofweek\_5 | dayofweek\_6 | base\_value |
| -: | ---------: | ------------------: | --------: | --------: | ---------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | -----------: | ----------: |
| 0 | DE | 2017-12-21 00:00:00 | 35.801600 | -3.609636 | -9.003666 | 0.805974 | -0.424078 | -0.343238 | -0.428668 | -0.055370 | 1.462214 | 3.295479 | 44.102590 |
| 1 | DE | 2017-12-21 01:00:00 | 34.419390 | -3.824628 | -10.493365 | 0.714771 | -0.400898 | -0.282606 | -0.331269 | -0.115753 | 1.539153 | 3.245723 | 44.368263 |
| 2 | DE | 2017-12-21 02:00:00 | 32.892105 | -4.959243 | -10.772224 | 0.712402 | -0.439891 | -0.261654 | -0.207954 | -0.191223 | 1.481960 | 3.206257 | 44.323673 |
| 3 | DE | 2017-12-21 03:00:00 | 32.727295 | -5.161374 | -10.812295 | 0.771099 | -0.417504 | -0.262543 | -0.146066 | -0.258350 | 1.578070 | 3.268950 | 44.167310 |
| 4 | DE | 2017-12-21 04:00:00 | 34.121994 | -3.687167 | -11.353230 | 0.846524 | -0.387008 | -0.278475 | -0.169525 | -0.255498 | 1.788180 | 3.362950 | 44.255240 |
As we can see above, the hour has now gotten encoded using the sine and cosine
features instead of the one hot encoding.
```python theme={null}
fcst_timegpt_dt_custom.rename(columns={"TimeGPT": "fcst_timegpt_dt_custom"}, inplace=True)
```
### Step 4: Compare Results
#### Visual Comparison
Let's compare the results visually first. For this, we will merge all the
forecasts together. This is why we had renamed the forecast columns above so
that we can distinguish the forecasts generated by the different methods.
```python theme={null}
all_fcst = (
fcst_timegpt_no_dt
.merge(fcst_timegpt_dt, on=['unique_id', 'ds'])
.merge(fcst_timegpt_dt_custom, on=['unique_id', 'ds'])
)
all_fcst.head()
```
| | unique\_id | ds | TimeGPT\_no\_dt | fcst\_timegpt\_dt | fcst\_timegpt\_dt\_custom |
| -: | ---------: | ------------------: | --------------: | ----------------: | ------------------------: |
| 0 | DE | 2017-12-21 00:00:00 | 34.340740 | 35.248108 | 35.801600 |
| 1 | DE | 2017-12-21 01:00:00 | 34.376488 | 34.400800 | 34.419390 |
| 2 | DE | 2017-12-21 02:00:00 | 32.215570 | 33.175526 | 32.892105 |
| 3 | DE | 2017-12-21 03:00:00 | 34.485695 | 33.205390 | 32.727295 |
| 4 | DE | 2017-12-21 04:00:00 | 34.359673 | 34.689583 | 34.121994 |
```python theme={null}
nixtla_client.plot(df_sub, all_fcst)
```
Visually looking at the results shows that the forecast with the datetime
features is closer to the actuals as compared to the forecast without the
datetime features.
#### Metric Comparison
Next, let's compare the forecast with the actual data quantitatively. We will
use two common metrics - `MAE` and `RMSE` for this purpose.
```python theme={null}
all_fcst_with_actuals = (
df_test[["unique_id", "ds", "y"]]
.merge(all_fcst, on=['unique_id', 'ds'])
)
all_fcst_with_actuals.head()
```
| | unique\_id | ds | y | TimeGPT\_no\_dt | fcst\_timegpt\_dt | fcst\_timegpt\_dt\_custom |
| -: | ---------: | ------------------: | ----: | --------------: | ----------------: | ------------------------: |
| 0 | DE | 2017-12-21 00:00:00 | 33.09 | 34.340740 | 35.248108 | 35.801600 |
| 1 | DE | 2017-12-21 01:00:00 | 35.26 | 34.376488 | 34.400800 | 34.419390 |
| 2 | DE | 2017-12-21 02:00:00 | 31.88 | 32.215570 | 33.175526 | 32.892105 |
| 3 | DE | 2017-12-21 03:00:00 | 33.04 | 34.485695 | 33.205390 | 32.727295 |
| 4 | DE | 2017-12-21 04:00:00 | 33.60 | 34.359673 | 34.689583 | 34.121994 |
```python theme={null}
metrics = [mae, rmse]
evaluation = evaluate(
all_fcst_with_actuals,
metrics=metrics,
)
evaluation
```
| | unique\_id | metric | TimeGPT\_no\_dt | fcst\_timegpt\_dt | fcst\_timegpt\_dt\_custom |
| -: | ---------: | -----: | --------------: | ----------------: | ------------------------: |
| 0 | DE | mae | 27.527012 | 21.644545 | 21.139603 |
| 1 | DE | rmse | 33.478168 | 28.099654 | 27.616988 |
As we can see, the addition of the datetime features improved the forecasting
metrics compared to the baseline model created without these features.
## Conclusion
As demonstrated in this tutorial
1. Providing datetime features to the model during forecasting can improve the
metrics substantially.
2. However, users must be careful of the cardinality of the features after
datetime features have been added. If the feature cardinality is too large for
the dataset, it may lead to overfitting.
3. In case of high cardinality, users may consider a custom encoding approach
as demonstrated.
# Holidays & Special Dates
Source: https://nixtla.io/docs/forecasting/exogenous-variables/holiday_and_special_dates
Guide to using holiday calendar variables and special dates to improve forecast accuracy in time series.
## What Are Holiday Variables and Special Dates?
Special dates, such as holidays, promotions, or significant events, often cause notable deviations from normal patterns in your time series. By incorporating these special dates into your forecasting model, you can better capture these expected variations and improve prediction accuracy.
## How to Add Holiday Variables and Special Dates
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/02_holidays.ipynb)
### Step 1: Import Packages
Import the required libraries and initialize the Nixtla client.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
```
```python theme={null}
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load Data
We use a Google Trends dataset on "chocolate" with monthly frequency:
```python theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/google_trend_chocolate.csv')
df['month'] = pd.to_datetime(df['month']).dt.to_period('M').dt.to_timestamp('M')
df.head()
```
| | month | chocolate |
| - | ---------- | --------- |
| 0 | 2004-01-31 | 35 |
| 1 | 2004-02-29 | 45 |
| 2 | 2004-03-31 | 28 |
| 3 | 2004-04-30 | 30 |
| 4 | 2004-05-31 | 29 |
### Step 3: Create a Future Dataframe
When adding exogenous variables (like holidays) to time series forecasting, we need a future DataFrame because:
* Historical data already exists: Our training data contains past values of both the target variable and exogenous features
* Future exogenous features are known: Unlike the target variable, we can determine future values of exogenous features (like holidays) in advance
For example, we know that Christmas will occur on December 25th next year, so we can include this information in our future DataFrame to help the model understand seasonal patterns during the forecast period.
Start with creating a future DataFrame with 14 months of dates starting from May 2024.
```python theme={null}
# Create future Dataframe for adding US holidays
start_date = "2024-05"
dates = pd.date_range(start=start_date, periods=14, freq="ME")
dates = dates.to_period("M").to_timestamp("M")
future_df = pd.DataFrame(dates, columns=["month"])
future_df.tail()
```
| | month |
| -- | ------------------- |
| 9 | 2025-02-28 00:00:00 |
| 10 | 2025-03-31 00:00:00 |
| 11 | 2025-04-30 00:00:00 |
| 12 | 2025-05-31 00:00:00 |
| 13 | 2025-06-30 00:00:00 |
### Step 4: Forecast with Holidays and Special Dates
TimeGPT automatically generates standard date-based features (like month, day of week, etc.) during forecasting. For more specialized temporal patterns, you can manually add holiday indicators to both your historical and future datasets.
#### Create a Function to Add Date Features
To make it easier to add date features to a DataFrame, we'll create the `add_date_features_to_DataFrame` function that takes:
* A pandas DataFrame
* A date extractor function, which can be `CountryHolidays` or `SpecialDates`
* A time column name
```python theme={null}
def add_date_features_to_dataframe(df, date_extractor, time_col="month", freq="ME"):
# Create a copy of the DataFrame
df = df.copy()
# Ensure time column is datetime
datetime_types = ["datetime64[ns]", "datetime64[us]", "datetime64[ms]"]
if df[time_col].dtype.name not in datetime_types:
raise ValueError(
f"Column '{time_col}' must be datetime type, got {df[time_col].dtype}"
)
# Create date range
dates_range = pd.date_range(
start=df[time_col].min(), end=df[time_col].max(), freq="D"
)
# Get date feature indicators and resample to specified frequency
features_df = date_extractor(dates_range)
features = features_df.resample(freq).max()
features = features.reset_index(names=time_col)
# Merge with input DataFrame
result_df = df.merge(features)
return result_df
```
#### Add Holiday Features
To add holiday features, we'll use the `CountryHolidays` class to compute US holidays and merge them into the future DataFrame.
```python theme={null}
from nixtla.date_features import CountryHolidays
us_holidays = CountryHolidays(countries=["US"])
future_df_holidays = add_date_features_to_DataFrame(future_df, us_holidays)
print(f"Future DataFrame shape: {future_df_holidays.shape}")
future_df_holidays.head()
```
| | month | US\_New Year's Day | US\_Memorial Day | US\_Juneteenth National Independence Day | US\_Independence Day | US\_Labor Day | US\_Veterans Day | US\_Thanksgiving Day | US\_Christmas Day | US\_Martin Luther King Jr. Day | US\_Washington's Birthday | US\_Columbus Day |
| -: | :------------------ | -----------------: | ---------------: | ---------------------------------------: | -------------------: | ------------: | ---------------: | -------------------: | ----------------: | -----------------------------: | ------------------------: | ---------------: |
| 0 | 2024-05-31 00:00:00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2024-06-30 00:00:00 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2024-07-31 00:00:00 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3 | 2024-08-31 00:00:00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2024-09-30 00:00:00 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
This DataFrame now includes columns for each identified US holiday as binary indicators.
Next, add holiday indicators to the historical DataFrame.
```python theme={null}
df_with_holidays = add_date_features_to_DataFrame(df, us_holidays)
df_with_holidays.tail()
```
| | month | chocolate | US\_New Year's Day | US\_New Year's Day (observed) | US\_Memorial Day | US\_Independence Day | US\_Independence Day (observed) | US\_Labor Day | US\_Veterans Day | US\_Thanksgiving Day | US\_Christmas Day | US\_Christmas Day (observed) | US\_Martin Luther King Jr. Day | US\_Washington's Birthday | US\_Columbus Day | US\_Veterans Day (observed) | US\_Juneteenth National Independence Day | US\_Juneteenth National Independence Day (observed) |
| --: | :------------------ | --------: | -----------------: | ----------------------------: | ---------------: | -------------------: | ------------------------------: | ------------: | ---------------: | -------------------: | ----------------: | ---------------------------: | -----------------------------: | ------------------------: | ---------------: | --------------------------: | ---------------------------------------: | --------------------------------------------------: |
| 239 | 2023-12-31 00:00:00 | 90 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 240 | 2024-01-31 00:00:00 | 64 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 241 | 2024-02-29 00:00:00 | 66 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
| 242 | 2024-03-31 00:00:00 | 59 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 243 | 2024-04-30 00:00:00 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Now, your historical DataFrame also contains holiday flags for each month.
Finally, forecast with the holiday features.
```python theme={null}
fcst_df_holidays = nixtla_client.forecast(
df=df_with_holidays,
h=14,
freq="ME",
time_col="month",
target_col="chocolate",
X_df=future_df_holidays,
model="timegpt-1-long-horizon",
hist_exog_list=[
"US_New Year's Day (observed)",
"US_Independence Day (observed)",
"US_Christmas Day (observed)",
"US_Veterans Day (observed)",
"US_Juneteenth National Independence Day (observed)",
],
feature_contributions=True, # for shapley values
)
fcst_df_holidays.head()
```
Plot the forecast with holiday effects.
```python theme={null}
nixtla_client.plot(
df_with_holidays,
fcst_df_holidays,
time_col='month',
target_col='chocolate',
)
```

We can then plot the weights of each holiday to see which are more important in forecasting the interest in chocolate. We will use the [SHAP library](https://shap.readthedocs.io/en/latest/) to plot the weights.
> For more details on how to use the shap library, see our [tutorial on model interpretability](/docs/forecasting/exogenous-variables/interpretability_with_shap).
```python theme={null}
import shap
import matplotlib.pyplot as plt
def plot_shap_values(ds_column, title):
shap_df = nixtla_client.feature_contributions
shap_columns = shap_df.columns.difference(
["unique_id", ds_column, "TimeGPT", "base_value"]
)
shap_obj = shap.Explanation(
values=shap_df[shap_columns].values,
base_values=shap_df["base_value"].values,
feature_names=shap_columns,
)
shap.plots.bar(shap_obj, max_display=len(shap_columns), show=False)
plt.title(title)
plt.show()
plot_shap_values(ds_column="month", title="SHAP values for holidays")
```
The SHAP values reveal that Christmas, Independence Day, and Labor Day have the strongest influence on chocolate interest forecasting. These holidays show the highest feature importance weights, indicating they significantly impact consumer behavior patterns. This aligns with expectations since these are major US holidays associated with gift-giving, celebrations, and seasonal consumption patterns that drive chocolate sales.
#### Add Special Dates
Beyond country holidays, you can create custom special dates with `SpecialDates`. These can represent unique one-time events or recurring patterns on specific dates of your choice.
Assume we already have a future DataFrame with monthly dates. We'll create Valentine's Day and Halloween as custom special dates and add them to the future DataFrame.
```python theme={null}
from nixtla.date_features import SpecialDates
# Generate special dates programmatically for the full data range (2004-2025)
valentine_dates = [f"{year}-02-14" for year in range(2004, 2026)]
halloween_dates = [f"{year}-10-31" for year in range(2004, 2026)]
# Define custom special dates - chocolate-related seasonal events
special_dates = SpecialDates(
special_dates={
"Valentine_season": valentine_dates,
"Halloween_season": halloween_dates,
}
)
# Apply special dates to future data
future_df_special = add_date_features_to_DataFrame(future_df, special_dates)
future_df_special.head()
```
| | month | Valentine\_season | Halloween\_season |
| -: | :------------------ | ----------------: | ----------------: |
| 0 | 2024-05-31 00:00:00 | 0 | 0 |
| 1 | 2024-06-30 00:00:00 | 0 | 0 |
| 2 | 2024-07-31 00:00:00 | 0 | 0 |
| 3 | 2024-08-31 00:00:00 | 0 | 0 |
| 4 | 2024-09-30 00:00:00 | 0 | 0 |
We will also add custom special dates to the historical DataFrame.
```python theme={null}
# Apply special dates to historical data as well
df_special = add_date_features_to_DataFrame(df, special_dates)
df_special.tail()
```
| | month | chocolate | Valentine\_season | Halloween\_season |
| --: | :------------------ | --------: | ----------------: | ----------------: |
| 239 | 2023-12-31 00:00:00 | 90 | 0 | 0 |
| 240 | 2024-01-31 00:00:00 | 64 | 0 | 0 |
| 241 | 2024-02-29 00:00:00 | 66 | 1 | 0 |
| 242 | 2024-03-31 00:00:00 | 59 | 0 | 0 |
| 243 | 2024-04-30 00:00:00 | 51 | 0 | 0 |
Now, forecast with the special date features.
```python theme={null}
fcst_df_special = nixtla_client.forecast(
df=df_special,
h=14,
freq="M",
time_col="month",
target_col="chocolate",
X_df=future_df_special,
model="timegpt-1-long-horizon",
feature_contributions=True,
)
```
Plot the forecast with special date effects.
```python theme={null}
nixtla_client.plot(
df_special,
fcst_df_special,
time_col='month',
target_col='chocolate',
)
```
Examine the feature importance of the special dates.
```python theme={null}
plot_shap_values(ds_column="month", title="SHAP values for special dates")
```
The SHAP values reveal that Valentine's Day has the strongest positive impact on chocolate sales forecasts. This aligns with consumer behavior patterns, as chocolate is a popular gift choice during Valentine's Day celebrations.
Congratulations! You have successfully integrated holiday and special date features into your time series forecasts. Use these steps as a starting point for further experimentation with advanced date features.
# Model Interpretability
Source: https://nixtla.io/docs/forecasting/exogenous-variables/interpretability_with_shap
Learn how to interpret model predictions using SHAP values to understand the impact of exogenous variables.
## What Are SHAP Values?
SHAP (SHapley Additive exPlanation) values use game theory concepts to explain how each feature influences machine learning forecasts. They're particularly useful when working with exogenous (external) variables, letting you understand contributions both at individual prediction steps and across entire forecast horizons.
SHAP values can be seamlessly combined with visualization methods from the [SHAP](https://shap.readthedocs.io/en/latest/) Python package for powerful plots and insights. Before proceeding, make sure you understand forecasting with exogenous features. For reference, see our [tutorial on exogenous variables](/docs/forecasting/exogenous-variables/numeric_features).
## How to Use SHAP Values for TimeGPT
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/21_shap_values.ipynb)
## Install SHAP
Install the SHAP library.
```bash theme={null}
pip install shap
```
For more details, visit the [official SHAP documentation](https://shap.readthedocs.io/en/latest/).
### Step 1: Import Packages
Import the necessary libraries and initialize the Nixtla client.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # Or use os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Load Data
We'll use exogenous variables (covariates) to enhance electricity market forecasting accuracy. The widely known EPF dataset is available at
[this link](https://zenodo.org/records/4624805). It contains hourly prices and relevant exogenous factors for five different electricity markets.
For this tutorial, we'll focus on the Belgian electricity market (BE). The data includes:
* Hourly prices (y)
* Forecasts for load (Exogenous1) and generation (Exogenous2)
* Day-of-week indicators (one-hot encoded)
If your data relies on factors such as weather, holiday calendars, marketing, or other elements, ensure they're similarly structured.
```python theme={null}
market = "BE"
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv'
)
df.head()
```
| unique\_id | ds | y | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 |
| ---------- | ------------------- | ----- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| BE | 2016-10-22 00:00:00 | 70.00 | 57253.0 | 49593.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 01:00:00 | 37.10 | 51887.0 | 46073.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 02:00:00 | 37.10 | 51896.0 | 44927.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 03:00:00 | 44.75 | 48428.0 | 44483.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 04:00:00 | 37.10 | 46721.0 | 44338.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
### Step 3: Forecast with Exogenous Variables
To make forecasts with exogenous variables, you must have future data for these variables available at the time of prediction.
Before generating forecasts, ensure you have (or can generate) future exogenous values. Below, we load future exogenous features to obtain 24-step-ahead predictions:
```python theme={null}
future_ex_vars_df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv'
)
future_ex_vars_df.head()
```
| unique\_id | ds | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 |
| ---------- | ------------------- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| BE | 2016-12-31 00:00:00 | 70318.0 | 64108.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 01:00:00 | 67898.0 | 62492.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 02:00:00 | 68379.0 | 61571.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 03:00:00 | 64972.0 | 60381.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 04:00:00 | 62900.0 | 60298.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
Next, create forecasts using the Nixtla API:
```python theme={null}
timegpt_fcst_ex_vars_df = nixtla_client.forecast(
df=df,
X_df=future_ex_vars_df,
h=24,
level=[80, 90],
feature_contributions=True
)
timegpt_fcst_ex_vars_df.head()
```
| unique\_id | ds | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-lo-80 | TimeGPT-lo-90 |
| ---------- | ------------------- | --------- | ------------- | ------------- | ------------- | ------------- |
| BE | 2016-12-31 00:00:00 | 51.632830 | 61.598820 | 66.088295 | 41.666843 | 37.177372 |
| BE | 2016-12-31 01:00:00 | 45.750877 | 54.611988 | 60.176445 | 36.889767 | 31.325312 |
| BE | 2016-12-31 02:00:00 | 39.650543 | 46.256210 | 52.842808 | 33.044876 | 26.458277 |
| BE | 2016-12-31 03:00:00 | 34.000072 | 44.015310 | 47.429000 | 23.984835 | 20.571144 |
| BE | 2016-12-31 04:00:00 | 33.785370 | 43.140503 | 48.581240 | 24.430239 | 18.989498 |
### Step 4: Extract SHAP Values
After forecasting, you can retrieve SHAP values to see how each feature contributed to the model's predictions.
```python theme={null}
shap_df = nixtla_client.feature_contributions
shap_df = shap_df.query("unique_id == @market")
shap_df.head()
```
### Step 5: Visualization with SHAP
Visualizing SHAP values helps interpret the impact of exogenous features in detail. Below, we demonstrate three common SHAP plots.
#### Bar Plot
Use a bar plot to see the average impact of each feature across predictions:
```python theme={null}
import shap
import matplotlib.pyplot as plt
shap_columns = shap_df.columns.difference(['unique_id', 'ds', 'TimeGPT', 'base_value'])
shap_obj = shap.Explanation(
values=shap_df[shap_columns].values,
base_values=shap_df['base_value'].values,
feature_names=shap_columns
)
shap.plots.bar(
shap_obj,
max_display=len(shap_columns),
show=False
)
plt.title(f'SHAP values for {market}')
plt.show()
```

#### Waterfall Plot
A waterfall plot shows how each feature contributes to a single prediction step. Here, we select the earliest date for illustration:
```python theme={null}
selected_ds = shap_df['ds'].min()
filtered_df = shap_df[shap_df['ds'] == selected_ds]
shap_obj = shap.Explanation(
values=filtered_df[shap_columns].values.flatten(),
base_values=filtered_df['base_value'].values[0],
feature_names=shap_columns
)
shap.plots.waterfall(shap_obj, show=False)
plt.title(f'Waterfall Plot: {market}, date: {selected_ds}')
plt.show()
```

#### Heatmap
Visualize how feature impacts vary across each forecast step. This often reveals time-dependent effects of certain variables.
```python theme={null}
shap_obj = shap.Explanation(
values=shap_df[shap_columns].values,
feature_names=shap_columns
)
shap.plots.heatmap(shap_obj, show=False)
plt.title(f'SHAP Heatmap (Unique ID: {market})')
plt.show()
```

# Numeric Variables
Source: https://nixtla.io/docs/forecasting/exogenous-variables/numeric_features
Learn how to incorporate external numeric variables to improve your forecasting accuracy.
## What Are Exogenous Variables?
Exogenous variables or external factors are crucial in time series forecasting
as they provide additional information that might influence the prediction.
These variables could include holiday markers, marketing spending, weather data,
or any other external data that correlate with the time series data you are
forecasting.
For example, if you're forecasting ice cream sales, temperature data could serve
as a useful exogenous variable. On hotter days, ice cream sales may increase.
## How to Use Exogenous Variables
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/01_exogenous_variables_reworked.ipynb)
To incorporate exogenous variables in TimeGPT, you'll need to pair each point
in your time series data with the corresponding external data.
### Step 1: Import Packages
Import the required libraries and initialize the Nixtla client.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key="my_api_key_provided_by_nixtla"
)
```
### Step 2: Load Dataset
In this tutorial, we'll predict day-ahead electricity prices. The dataset contains:
* Hourly electricity prices (`y`) from various markets (identified by `unique_id`)
* Exogenous variables (`Exogenous1` to `day_6`)
```python theme={null}
df = pd.read_csv("https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv")
df.head()
```
| unique\_id | ds | y | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 |
| ---------- | ------------------- | ----- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| BE | 2016-10-22 00:00:00 | 70.00 | 57253.0 | 49593.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 01:00:00 | 37.10 | 51887.0 | 46073.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 02:00:00 | 37.10 | 51896.0 | 44927.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 03:00:00 | 44.75 | 48428.0 | 44483.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-10-22 04:00:00 | 37.10 | 46721.0 | 44338.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
### Step 3: Forecast without Exogenous Variables
First, let's create a baseline forecast without using any exogenous variables.
```python theme={null}
timegpt_fcst_no_ex_vars = nixtla_client.forecast(
df=df[["unique_id", "ds", "y"]],
h=24,
level=[80, 90]
)
```
### Step 4: Forecasting with Exogenous Variables
Next, let's create a forecast using the exogenous variables. To make a forecast
using exogenous variables, you need to provide historical and future exogenous
values. Below is an example dataset containing future exogenous variables. Note
that it only contains the future exogenous variable values not the target
variable `y`. We need to forecast this target variable using the exogenous
variables provided.
```python theme={null}
future_ex_vars_df = pd.read_csv("https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-future-ex-vars.csv")
future_ex_vars_df.head()
```
| unique\_id | ds | Exogenous1 | Exogenous2 | day\_0 | day\_1 | day\_2 | day\_3 | day\_4 | day\_5 | day\_6 |
| ---------- | ------------------- | ---------- | ---------- | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
| BE | 2016-12-31 00:00:00 | 70318.0 | 64108.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 01:00:00 | 67898.0 | 62492.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 02:00:00 | 68379.0 | 61571.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 03:00:00 | 64972.0 | 60381.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
| BE | 2016-12-31 04:00:00 | 62900.0 | 60298.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
Ensure you maintain consistent data formatting and columns in both historical
and future exogenous datasets (e.g., dates, unique\_id, variable names).
```python theme={null}
timegpt_fcst_ex_vars = nixtla_client.forecast(
df=df,
X_df=future_ex_vars_df,
h=24,
level=[80, 90]
)
```
### Step 5: Forecast Visualization
Once you have generated your forecasts, you can visualize the results to compare
forecasts between the two methods above.
```python theme={null}
timegpt_fcst_no_ex_vars.rename(columns={"TimeGPT": "TimeGPT_no_ex_vars"}, inplace=True)
timegpt_fcst_ex_vars.rename(columns={"TimeGPT": "TimeGPT_ex_vars"}, inplace=True)
all_forecasts = (
timegpt_fcst_no_ex_vars
.merge(
timegpt_fcst_ex_vars,
how='outer',
on=["unique_id", "ds"]
)
)
```
```python theme={null}
nixtla_client.plot(
df[["unique_id", "ds", "y"]],
all_forecasts,
max_insample_length=1000,
)
```
## Key Takeaways
* Exogenous variables enrich time series forecasting.
* Ensure proper alignment of historical and future exogenous data.
## Next Steps
Congratulations! You have mastered the fundamentals of adding exogenous
variables to your TimeGPT forecasts. Keep refining your approach by
* Exploring feature engineering to create domain-specific exogenous data.
* Experimenting with different modeling approaches for external variables.
* Validating forecast accuracy by comparing with real future data.
# Fine-tuning with a Specific Loss Function
Source: https://nixtla.io/docs/forecasting/fine-tuning/custom_loss
Learn how to fine-tune a model using specific loss functions, configure the Nixtla client, and evaluate performance improvements.
## Fine-tuning with a Specific Loss Function
When you fine-tune, the model trains on your dataset to tailor predictions to
your specific dataset. You can specify the loss function to be used during
fine-tuning using the `finetune_loss` argument. Below are the available loss
functions:
* `"default"`: A proprietary function robust to outliers.
* `"mae"`: Mean Absolute Error
* `"mse"`: Mean Squared Error
* `"rmse"`: Root Mean Squared Error
* `"mape"`: Mean Absolute Percentage Error
* `"smape"`: Symmetric Mean Absolute Percentage Error
## How to Fine-tune with a Specific Loss Function
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/07_loss_function_finetuning.ipynb)
### Step 1: Import Packages and Initialize Client
First, we import the required packages and initialize the Nixtla client.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import mae, mse, rmse, mape, smape
```
```python theme={null}
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load Data
Load your data and prepare it for fine-tuning. Here, we will demonstrate using
an example dataset of air passenger counts.
```python theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.insert(loc=0, column='unique_id', value=1)
df.head()
```
| | unique\_id | timestamp | value |
| - | ---------- | ---------- | ----- |
| 0 | 1 | 1949-01-01 | 112 |
| 1 | 1 | 1949-02-01 | 118 |
| 2 | 1 | 1949-03-01 | 132 |
| 3 | 1 | 1949-04-01 | 129 |
| 4 | 1 | 1949-05-01 | 121 |
### Step 3: Fine-Tune the Model
Let's fine-tune the model on a dataset using the mean absolute error (MAE).
For that, we simply pass the appropriate string representing the loss function
to the `finetune_loss` parameter of the `forecast` method.
```python theme={null}
timegpt_fcst_finetune_mae_df = nixtla_client.forecast(
df=df,
h=12,
finetune_steps=10,
finetune_loss='mae', # Select desired loss function
time_col='timestamp',
target_col='value',
)
```
After training completes, you can visualize the forecast:
```python theme={null}
nixtla_client.plot(
df,
timegpt_fcst_finetune_mae_df,
time_col='timestamp',
target_col='value',
)
```

## Explanation of Loss Functions
Now, depending on your data, you will use a specific error metric to accurately
evaluate your forecasting model's performance.
Below is a non-exhaustive guide on which metric to use depending on your use case.
**Mean absolute error (MAE)**
* Robust to outliers
* Easy to understand
* You care equally about all error sizes
* Same units as your data
**Mean squared error (MSE)**
* You want to penalize large errors more than small ones
* Sensitive to outliers
* Used when large errors must be avoided
* *Not* the same units as your data
**Root mean squared error (RMSE)**
* Brings the MSE back to original units of data
* Penalizes large errors more than small ones
**Mean absolute percentage error (MAPE)**
* Easy to understand for non-technical stakeholders
* Expressed as a percentage
* Heavier penalty on positive errors over negative errors
* To be avoided if your data has values close to 0 or equal to 0
**Symmetric mean absolute percentage error (sMAPE)**
* Fixes bias of MAPE
* Equally sensitive to over and under forecasting
* To be avoided if your data has values close to 0 or equal to 0
With TimeGPT, you can choose your loss function during fine-tuning as to
maximize the model's performance metric for your particular use case.
## Experimentation with Loss Function
Let's run a small experiment to see how each loss function improves their
associated metric when compared to the default setting.
Let's split the dataset into training and testing sets.
```python theme={null}
train = df[:-36]
test = df[-36:]
```
Next, let's compute the forecasts with the various loss functions.
```python theme={null}
losses = ['default', 'mae', 'mse', 'rmse', 'mape', 'smape']
test = test.copy()
for loss in losses:
preds_df = nixtla_client.forecast(
df=train,
h=36,
finetune_steps=10,
finetune_loss=loss,
time_col='timestamp',
target_col='value')
preds = preds_df['TimeGPT'].values
test.loc[:,f'TimeGPT_{loss}'] = preds
```
Great! We have predictions from TimeGPT using all the different loss functions.
We can evaluate the performance using their associated metric and measure the
improvement.
```python theme={null}
loss_fct_dict = {
"mae": mae,
"mse": mse,
"rmse": rmse,
"mape": mape,
"smape": smape
}
pct_improv = []
for loss in losses[1:]:
evaluation = loss_fct_dict[f'{loss}'](test, models=['TimeGPT_default', f'TimeGPT_{loss}'], id_col='unique_id', target_col='value')
pct_diff = (evaluation['TimeGPT_default'] - evaluation[f'TimeGPT_{loss}']) / evaluation['TimeGPT_default'] * 100
pct_improv.append(round(pct_diff, 2))
```
```python theme={null}
data = {
'mae': pct_improv[0].values,
'mse': pct_improv[1].values,
'rmse': pct_improv[2].values,
'mape': pct_improv[3].values,
'smape': pct_improv[4].values
}
metrics_df = pd.DataFrame(data)
metrics_df.index = ['Metric improvement (%)']
metrics_df
```
| | mae | mse | rmse | mape | smape |
| ---------------------- | ---- | ---- | ---- | ----- | ----- |
| Metric improvement (%) | 8.54 | 0.31 | 0.64 | 31.02 | 7.36 |
From the table above, we can see that using a specific loss function during
fine-tuning will improve its associated error metric when compared to the
default loss function.
In this example, using the MAE as the loss function improves the metric by
8.54% when compared to using the default loss function.
That way, depending on your use case and performance metric, you can use the
appropriate loss function to maximize the accuracy of the forecasts.
# Controlling the Level of Fine-Tuning
Source: https://nixtla.io/docs/forecasting/fine-tuning/depth
Learn how to use the finetune_depth parameter to control the extent of fine-tuning in TimeGPT models.
## Controlling the Level of Fine-Tuning
It is possible to control the depth of fine-tuning with the `finetune_depth`
parameter.
`finetune_depth` takes values among `[1, 2, 3, 4, 5]`. By default, it is set to
1, which means that a small set of the model's parameters are being adjusted,
whereas a value of 5 fine-tunes the maximum amount of parameters.
Increasing `finetune_depth` also increases the time to generate predictions.
While it can generate better results, we must be careful to not overfit the
model, in which case the predictions may not be as accurate.
Let's run a small experiment to see how `finetune_depth` impacts the performance.
## How to Control the Level of Fine-Tuning
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/23_finetune_depth_finetuning.ipynb)
### Step 1: Import Packages
First, we import the required packages and initialize the Nixtla client
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import mae, mse
from utilsforecast.evaluation import evaluate
```
```python theme={null}
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load Data
Next, load the dataset
```python theme={null}
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv'
)
df.head()
```
Now, we split the data into a training and test set so that we can measure the
performance of the model as we vary `finetune_depth`.
```python theme={null}
train = df[:-24]
test = df[-24:]
```
### Step 3: Fine-Tuning With finetune\_depth
As mentioned above, `finetune_depth` controls how many parameters from TimeGPT
are fine-tuned on your particular dataset. If the value is set to 1, only a few
parameters are fine-tuned. Setting it to 5 means that all parameters of the
model will be fine-tuned.
Using a large value for `finetune_depth` can lead to better performances for
large datasets with complex patterns. However, it can also lead to overfitting,
in which case the accuracy of the forecasts may degrade, as we will see from the
small experiment below.
```python theme={null}
depths = [1, 2, 3, 4, 5]
test = test.copy()
for depth in depths:
preds_df = nixtla_client.forecast(
df=train,
h=24,
finetune_steps=5,
finetune_depth=depth,
time_col='timestamp',
target_col='value'
)
preds = preds_df['TimeGPT'].values
test.loc[:, f'TimeGPT_depth{depth}'] = preds
```
Evaluate the forecasts using MAE and MSE metrics:
```python theme={null}
test['unique_id'] = 0
evaluation = evaluate(
test,
metrics=[mae, mse],
time_col="timestamp",
target_col="value"
)
evaluation
```
| unique\_id | metric | TimeGPT\_depth1 | TimeGPT\_depth2 | TimeGPT\_depth3 | TimeGPT\_depth4 | TimeGPT\_depth5 |
| ---------- | ------ | --------------- | --------------- | --------------- | --------------- | --------------- |
| 0 | mae | 22.675540 | 17.908963 | 21.318518 | 24.745096 | 28.734302 |
| 0 | mse | 677.254283 | 461.320852 | 676.202126 | 991.835359 | 1119.722602 |
From the result above, we can see that a `finetune_depth` of 2 achieves the best
results since it has the lowest MAE and MSE.
Also notice that with a `finetune_depth` of 4 and 5, the performance degrades,
which is a clear sign of overfitting.
Thus, keep in mind that fine-tuning can be a bit of trial and error. You might
need to adjust the number of `finetune_steps` and the level of `finetune_depth`
based on your specific needs and the complexity of your data. Usually, a higher
`finetune_depth` works better for large datasets. In this specific tutorial,
since we were forecasting a single series with a very short dataset, increasing
the depth led to overfitting.
It's recommended to monitor the model's performance during fine-tuning and
adjust as needed. Be aware that more `finetune_steps` and a larger value of
`finetune_depth` may lead to longer training times and could potentially lead
to overfitting if not managed properly.
# Re-using fine-tuned models
Source: https://nixtla.io/docs/forecasting/fine-tuning/save_reuse_delete_finetuned_models
Learn how to save, fine-tune, list, and delete TimeGPT models to optimize forecasting.
## Re-using Fine-tuned Models
Reusing previously fine-tuned TimeGPT models can help reduce computation time
and costs while maintaining or improving forecast accuracy. This guide walks you
through the steps to save, fine-tune, list, and delete your TimeGPT models effectively.
## How to Re-use Fine-tuned Models
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/061_reusing_finetuned_models.ipynb)
### Step 1: Import Packages
First, we import the required packages and initialize the Nixtla client
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import rmse
from utilsforecast.evaluation import evaluate
```
```python theme={null}
nixtla_client = NixtlaClient(
# defaults to os.environ["NIXTLA_API_KEY"]
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load Data
Load the forecasting dataset and prepare the train/validation split.
```python theme={null}
df = pd.read_parquet('https://datasets-nixtla.s3.amazonaws.com/m4-hourly.parquet')
h = 48
valid = df.groupby('unique_id', observed=True).tail(h)
train = df.drop(valid.index)
train.head()
```
| | unique\_id | ds | y |
| - | ---------- | -- | ----- |
| 0 | H1 | 1 | 605.0 |
| 1 | H1 | 2 | 586.0 |
| 2 | H1 | 3 | 586.0 |
| 3 | H1 | 4 | 559.0 |
| 4 | H1 | 5 | 511.0 |
### Step 3: Zero-shot forecast
We can try forecasting without any finetuning to see how well TimeGPT does.
```python theme={null}
fcst_kwargs = {
'df': train,
'freq': 1,
'model': 'timegpt-1-long-horizon'
}
fcst = nixtla_client.forecast(h=h, **fcst_kwargs)
zero_shot_eval = evaluate(fcst.merge(valid), metrics=[rmse], agg_fn='mean')
zero_shot_eval
```
| metric | TimeGPT |
| ------ | ----------- |
| rmse | 1504.474342 |
### Step 4: Fine-tune the model
We can now fine-tune TimeGPT a little and save our model for later use. We can
define the ID that we want that model to have by providing it through `output_model_id`.
This ID is also returned as the output of the `finetune` method.
```python theme={null}
first_model_id = 'my-first-finetuned-model'
nixtla_client.finetune(output_model_id=first_model_id, **fcst_kwargs)
```
```bash theme={null}
'my-first-finetuned-model'
```
We can now forecast using this fine-tuned model by providing its ID through
the `finetuned_model_id` argument.
```python theme={null}
first_finetune_fcst = nixtla_client.forecast(
h=h,
finetuned_model_id=first_model_id,
**fcst_kwargs
)
first_finetune_eval = evaluate(
first_finetune_fcst.merge(valid),
metrics=[rmse],
agg_fn='mean'
)
zero_shot_eval.merge(
first_finetune_eval,
on=['metric'],
suffixes=('_zero_shot', '_first_finetune')
)
```
| metric | TimeGPT\_zero\_shot | TimeGPT\_first\_finetune |
| ------ | ------------------- | ------------------------ |
| rmse | 1504.474342 | 1472.024619 |
We can see the error was reduced.
### Step 5: Further fine-tune the model
We can now take this model and fine-tune it a bit further by using the
`NixtlaClient.finetune` method but providing our already fine-tuned model as
`finetuned_model_id`, which will take that model and fine-tune it a bit more.
We can also change the fine-tuning settings, like using `finetune_depth=3`, for
example. As before, the new finetuned model ID is returned by the `finetune` method.
```python theme={null}
second_model_id = nixtla_client.finetune(
finetuned_model_id=first_model_id,
finetune_depth=3,
**fcst_kwargs
)
second_model_id
```
```bash theme={null}
'468b13fb-4b26-447a-bd87-87a64b50d913'
```
Since we didn't provide `output_model_id` this time, it got assigned an UUID.
We can now use this model to forecast.
```python theme={null}
second_finetune_fcst = nixtla_client.forecast(
h=h,
finetuned_model_id=second_model_id,
**fcst_kwargs
)
second_finetune_eval = evaluate(
second_finetune_fcst.merge(valid),
metrics=[rmse],
agg_fn='mean'
)
first_finetune_eval.merge(
second_finetune_eval,
on=['metric'],
suffixes=('_first_finetune', '_second_finetune')
)
```
| metric | TimeGPT\_first\_finetune | TimeGPT\_second\_finetune |
| ------ | ------------------------ | ------------------------- |
| rmse | 1472.024619 | 1435.365211 |
We can see the error was reduced a bit more.
### Step 6: List fine-tuned models
We can list our fine-tuned models with the `NixtlaClient.finetuned_models` method.
```python theme={null}
finetuned_models = nixtla_client.finetuned_models()
finetuned_models
```
```bash theme={null}
[FinetunedModel(id='468b13fb-4b26-447a-bd87-87a64b50d913', created_at=datetime.datetime(2024, 12, 30, 17, 57, 31, 241455, tzinfo=TzInfo(UTC)), created_by='user', base_model_id='my-first-finetuned-model', steps=10, depth=3, loss='default', model='timegpt-1-long-horizon', freq='MS'),
FinetunedModel(id='my-first-finetuned-model', created_at=datetime.datetime(2024, 12, 30, 17, 57, 16, 978907, tzinfo=TzInfo(UTC)), created_by='user', base_model_id='None', steps=10, depth=1, loss='default', model='timegpt-1-long-horizon', freq='MS')]
```
While that representation may be useful for programmatic use, in this exploratory
setting it's nicer to see them as a dataframe, which we can get by providing `as_df=True`.
```python theme={null}
nixtla_client.finetuned_models(as_df=True)
```
| id | created\_at | created\_by | base\_model\_id | steps | depth | loss | model | freq |
| ------------------------------------ | -------------------------------- | ----------- | ------------------------ | ----- | ----- | ------- | ---------------------- | ---- |
| 468b13fb-4b26-447a-bd87-87a64b50d913 | 2024-12-30 17:57:31.241455+00:00 | user | my-first-finetuned-model | 10 | 3 | default | timegpt-1-long-horizon | MS |
| my-first-finetuned-model | 2024-12-30 17:57:16.978907+00:00 | user | None | 10 | 1 | default | timegpt-1-long-horizon | MS |
We can see that the `base_model_id` of our second model is our first model,
along with other metadata.
### Step 7: Delete fine-tuned models
In order to keep things organized, and since there's a limit of 50 fine-tuned
models, you can delete models that weren't so promising to make room for more
experiments. For example, we can delete our first finetuned model. Note that
even though it was used as the base for our second model, they're saved
independently so removing it won't affect our second model, except for the
dangling metadata.
```python theme={null}
nixtla_client.delete_finetuned_model(first_model_id)
nixtla_client.finetuned_models(as_df=True)
```
| id | created\_at | created\_by | base\_model\_id | steps | depth | loss | model | freq |
| ------------------------------------ | -------------------------------- | ----------- | ------------------------ | ----- | ----- | ------- | ---------------------- | ---- |
| 468b13fb-4b26-447a-bd87-87a64b50d913 | 2024-12-30 17:57:31.241455+00:00 | user | my-first-finetuned-model | 10 | 3 | default | timegpt-1-long-horizon | MS |
> WARNING: Deleting a fine-tuned model is irreversible. Make sure to back up any
> necessary information before removal.
## Conclusion
Congratulations! You have successfully learned how to save, refine, and manage your fine-tuned TimeGPT models.
This workflow helps optimize your forecasting pipelines by leveraging previously generated insights.
# Fine-tuning Tutorial TimeGPT
Source: https://nixtla.io/docs/forecasting/fine-tuning/steps
Adapt TimeGPT to your specific datasets for more accurate forecasts
Fine-tuning is a powerful process for utilizing TimeGPT more effectively.
Foundation models such as TimeGPT are pre-trained on vast amounts of data,
capturing wide-ranging features and patterns. These models can then be
specialized for specific contexts or domains. With fine-tuning, the model's
parameters are refined to forecast a new task, allowing it to tailor its vast
pre-existing knowledge towards the requirements of the new data. Fine-tuning
thus serves as a crucial bridge, linking TimeGPT's broad capabilities to your
tasks specificities.
Concretely, the process of fine-tuning consists of performing a certain number
of training iterations on your input data minimizing the forecasting error.
The forecasts will then be produced with the updated model. To control the
number of iterations, use the `finetune_steps` argument of the `forecast` method.
## Tutorial
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/06_finetuning.ipynb)
### Step 1: Import Packages and Initialize Client
First, we import the required packages and initialize the Nixtla client.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import mae, mse
from utilsforecast.evaluation import evaluate
```
Next, initialize the NixtlaClient instance, providing your API key (or rely on
environment variables):
```python initialize-client theme={null}
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Load Data
Load the dataset from the provided CSV URL:
```python load-data theme={null}
df = pd.read_csv(
"https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv"
)
df.head()
```
| | timestamp | value |
| - | ---------- | ----- |
| 0 | 1949-01-01 | 112 |
| 1 | 1949-02-01 | 118 |
| 2 | 1949-03-01 | 132 |
| 3 | 1949-04-01 | 129 |
| 4 | 1949-05-01 | 121 |
### Step 3: Fine-tune the Model
Set the number of fine-tuning iterations with the **finetune\_steps** parameter.
Here, `finetune_steps=10` means the model will go through 10 iterations of
training on your time series data.
```python theme={null}
timegpt_fcst_finetune_df = nixtla_client.forecast(
df=df,
h=12,
finetune_steps=10,
time_col='timestamp',
target_col='value',
)
```
Visualize forecasts to confirm performance:
```python theme={null}
nixtla_client.plot(
df,
timegpt_fcst_finetune_df,
time_col='timestamp',
target_col='value',
)
```

## Conclusion
Keep in mind that fine-tuning can be a bit of trial and error. You might need to
adjust the number of `finetune_steps` based on your specific needs and the
complexity of your data. Usually, a larger value of `finetune_steps` works
better for large datasets.
It's recommended to monitor the model's performance during fine-tuning and
adjust as needed. Be aware that more `finetune_steps` may lead to longer
training times and could potentially lead to overfitting if not managed properly.
Remember, fine-tuning is a powerful feature, but it should be used thoughtfully
and carefully.
## Additional Resources
* For a detailed guide on using a specific loss function for fine-tuning, check out the
[Fine-tuning with a specific loss function](/docs/forecasting/fine-tuning/custom_loss)
tutorial.
* Also, read our detailed tutorial on [controlling the level of fine-tuning](/docs/forecasting/fine-tuning/depth)
using `finetune_depth`.
# Distributed Forecasting with Spark, Dask & Ray
Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/computing_at_scale
Scale your time series forecasting with TimeGPT using Spark, Dask, or Ray. Learn distributed computing for millions of time series with Python code examples and best practices.
## Distributed Computing for Large-Scale Forecasting
Handling large datasets is a common challenge in time series forecasting. For example, when working with retail data, you may need to forecast sales for 100,000+ products across hundreds of stores—generating millions of forecasts daily. Similarly, when dealing with electricity consumption data, you may need to predict consumption for millions of smart meters across multiple regions in real-time.
### Why Distributed Computing for Forecasting?
Distributed computing offers significant advantages for time series forecasting:
* **Speed**: Reduce computation time by 10-100x compared to single-machine processing
* **Scalability**: Handle datasets that don't fit in memory on a single machine
* **Cost-efficiency**: Process more forecasts in less time, optimizing resource utilization
* **Reliability**: Fault-tolerant processing ensures forecasts complete even if individual nodes fail
Nixtla's **TimeGPT** enables you to efficiently handle expansive datasets by integrating distributed computing frameworks (**[Spark](https://spark.apache.org/)**, **[Dask](https://www.dask.org/)**, and **[Ray](https://www.ray.io/)** through **Fugue**) that parallelize forecasts across multiple time series and drastically reduce computation times.
## Getting Started
Before getting started, ensure you have your TimeGPT API key. Upon [registration](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/forecasting-at-scale/computing_at_scale), you'll receive an email prompting you to confirm your signup. Once confirmed, access your dashboard and navigate to the **API Keys** section to retrieve your key. For detailed setup instructions, see the [Setting Up Your Authentication Key tutorial](/docs/setup/setting_up_your_api_key).
## How to Use TimeGPT with Distributed Computing Frameworks
Using TimeGPT with distributed computing frameworks is straightforward. The process only slightly differs from non-distributed usage.
### Step 1: Instantiate a NixtlaClient class
```python theme={null}
from nixtla import NixtlaClient
# Replace 'YOUR_API_KEY' with the key obtained from your Nixtla dashboard
client = NixtlaClient(api_key="YOUR_API_KEY")
```
### Step 2: Load your data into a pandas DataFrame
Make sure your data is properly formatted, with each time series uniquely identified (e.g., by store or product).
```python theme={null}
import pandas as pd
data = pd.read_csv("your_time_series_data.csv")
```
### Step 3: Initialize a distributed computing framework
Currently, TimeGPT supports:
* [Spark](/docs/forecasting/forecasting-at-scale/spark)
* [Dask](/docs/forecasting/forecasting-at-scale/dask)
* [Ray](/docs/forecasting/forecasting-at-scale/ray)
Follow the links above for examples on setting up each framework.
### Step 4: Use NixtlaClient methods to forecast at scale
Once your framework is initialized and your data is loaded, you can apply the forecasting methods:
```python theme={null}
# Example function call within the distributed environment
forecast_results = client.forecast(
data=data,
h=14 # horizon (e.g., 14 days)
)
```
### Step 5: Stop the distributed computing framework
When you're finished, you may need to terminate your Spark, Dask, or Ray session. This depends on your environment and setup.
Parallelization in these frameworks operates across multiple time series within your dataset. Ensure each series is uniquely identified so the parallelization can be fully leveraged.
## Real-World Use Cases
Distributed forecasting with TimeGPT is essential for:
* **Retail & E-commerce**: Forecast demand for 100,000+ SKUs across multiple locations simultaneously
* **Energy & Utilities**: Predict consumption patterns for millions of smart meters in real-time
* **Finance**: Generate forecasts for thousands of stocks, currencies, or commodities
* **IoT & Manufacturing**: Process sensor data from thousands of devices for predictive maintenance
* **Telecommunications**: Forecast network traffic across thousands of cell towers
The distributed approach reduces forecast generation time from hours to minutes, enabling faster decision-making at scale.
## Important Considerations
### When to Use a Distributed Computing Framework
Consider a distributed framework if your dataset:
* Contains millions of observations across multiple time series
* Cannot fit into memory on a single machine
* Requires extensive processing time that is impractical on a single machine
### Choosing the Right Framework
When selecting Spark, Dask, or Ray, weigh your existing infrastructure and your team's expertise. Minimal code changes allow TimeGPT to work with each of these frameworks seamlessly. Pick the framework that aligns with your organization's tools and resources for the most efficient large-scale forecasting efforts.
### Framework Comparison
| Framework | Best For | Ideal Dataset Size | Learning Curve |
| --------- | ----------------------------------------------------------- | --------------------- | -------------- |
| **Spark** | Enterprise environments with existing Hadoop infrastructure | 100M+ observations | Medium |
| **Dask** | Python-native workflows, easy scaling from pandas | 10M-100M observations | Low |
| **Ray** | Machine learning pipelines, complex task dependencies | 10M+ observations | Medium |
Each framework integrates seamlessly with TimeGPT through Fugue, requiring minimal code changes to scale from single-machine to distributed forecasting.
### Best Practices
To maximize the benefits of distributed forecasting:
* **Distribute workloads efficiently**: Spread your forecasts across multiple compute nodes to handle huge datasets without exhausting memory or overwhelming single-machine resources.
* **Use proper identifiers**: Ensure your data has distinct identifiers for each series. Correct labeling is crucial for successful multi-series parallel forecasts.
## Frequently Asked Questions
**Q: Which distributed framework should I choose for TimeGPT?**
Choose **Spark** if you have existing Hadoop infrastructure, **Dask** if you're already using Python/pandas and want the easiest transition, or **Ray** if you're building complex ML pipelines.
**Q: How much faster is distributed forecasting compared to single-machine?**
Speed improvements typically range from 10-100x depending on your dataset size, number of time series, and cluster configuration. Datasets with more independent time series see greater parallelization benefits.
**Q: Do I need to change my TimeGPT code to use distributed computing?**
Minimal changes are required. After initializing your chosen framework (Spark/Dask/Ray), TimeGPT automatically detects and uses distributed processing. The API calls remain the same.
**Q: Can I use distributed computing with fine-tuning and cross-validation?**
Yes, TimeGPT supports distributed [fine-tuning](/docs/forecasting/fine-tuning/steps) and [cross-validation](/docs/forecasting/evaluation/cross_validation) across all supported frameworks.
## Related Resources
Explore more TimeGPT capabilities:
* [Spark Integration Guide](/docs/forecasting/forecasting-at-scale/spark) - Detailed Spark setup and examples
* [Dask Integration Guide](/docs/forecasting/forecasting-at-scale/dask) - Dask configuration for TimeGPT
* [Ray Integration Guide](/docs/forecasting/forecasting-at-scale/ray) - Ray distributed forecasting tutorial
* [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale
* [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts
# Time Series Forecasting with Dask
Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/dask
Scale pandas workflows with Dask and TimeGPT for distributed time series forecasting. Learn to process 10M+ time series in Python with minimal code changes.
## Overview
[Dask](https://www.dask.org/) is an open-source parallel computing library for Python that scales pandas workflows seamlessly. This guide explains how to use TimeGPT from Nixtla with Dask for distributed forecasting tasks.
Dask is ideal when you're already using pandas and need to scale beyond single-machine memory limits—typically for datasets with 10-100 million observations across multiple time series. Unlike Spark, Dask requires minimal code changes from your existing pandas workflow.
## Why Use Dask for Time Series Forecasting?
Dask offers unique advantages for scaling time series forecasting:
* **Pandas-like API**: Minimal code changes from your existing pandas workflows
* **Easy scaling**: Convert pandas DataFrames to Dask with a single line of code
* **Python-native**: Pure Python implementation, no JVM required (unlike Spark)
* **Flexible deployment**: Run on your laptop or scale to a cluster
* **Memory efficiency**: Process datasets larger than RAM through intelligent chunking
Choose Dask when you need to scale from 10 million to 100 million observations and want the smoothest transition from pandas.
**What you'll learn:**
* Simplify distributed computing with Fugue
* Run TimeGPT at scale on a Dask cluster
* Seamlessly convert pandas DataFrames to Dask
## Prerequisites
Before proceeding, make sure you have an [API key from Nixtla](/docs/setup/setting_up_your_api_key).
## How to Use TimeGPT with Dask
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/17_computing_at_scale_dask_distributed.ipynb)
### Step 1: Install Fugue and Dask
Fugue provides an easy-to-use interface for distributed computing over frameworks like Dask.
You can install Fugue with:
```bash theme={null}
pip install fugue[dask]
```
If running on a distributed Dask cluster, ensure the `nixtla` library is installed on all worker nodes.
### Step 2: Load Your Data
You can start by loading data into a pandas DataFrame. In this example, we use hourly electricity prices from multiple markets:
```python theme={null}
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
parse_dates=['ds'],
)
df.head()
```
Example pandas DataFrame:
| | unique\_id | ds | y |
| - | ---------- | ------------------- | ----- |
| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
### Step 3: Import Dask
Convert the pandas DataFrame into a Dask DataFrame for parallel processing.
```python theme={null}
import dask.dataframe as dd
dask_df = dd.from_pandas(df, npartitions=2)
dask_df
```
When converting to a Dask DataFrame, you can specify the number of partitions based on your data size or system resources.
### Step 4: Use TimeGPT on Dask
To use TimeGPT with Dask, provide a Dask DataFrame to Nixtla's client methods instead of a pandas DataFrame.
Instantiate the `NixtlaClient` class to interact with Nixtla's API:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`.
```python Forecast with TimeGPT and Dask theme={null}
fcst_df = nixtla_client.forecast(dask_df, h=12)
fcst_df.compute().head()
```
| | unique\_id | ds | TimeGPT |
| - | ---------- | ------------------- | --------- |
| 0 | BE | 2016-12-31 00:00:00 | 45.190453 |
| 1 | BE | 2016-12-31 01:00:00 | 43.244446 |
| 2 | BE | 2016-12-31 02:00:00 | 41.958389 |
| 3 | BE | 2016-12-31 03:00:00 | 39.796486 |
| 4 | BE | 2016-12-31 04:00:00 | 39.204533 |
```python Cross-validation with TimeGPT and Dask theme={null}
cv_df = nixtla_client.cross_validation(
dask_df,
h=12,
n_windows=5,
step_size=2
)
cv_df.compute().head()
```
| | unique\_id | ds | cutoff | TimeGPT |
| - | ---------- | ------------------- | ------------------- | --------- |
| 0 | BE | 2016-12-30 04:00:00 | 2016-12-30 03:00:00 | 39.375439 |
| 1 | BE | 2016-12-30 05:00:00 | 2016-12-30 03:00:00 | 40.039215 |
| 2 | BE | 2016-12-30 06:00:00 | 2016-12-30 03:00:00 | 43.455849 |
| 3 | BE | 2016-12-30 07:00:00 | 2016-12-30 03:00:00 | 47.716408 |
| 4 | BE | 2016-12-30 08:00:00 | 2016-12-30 03:00:00 | 50.316650 |
## Working with Exogenous Variables
TimeGPT with Dask also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Dask DataFrames—the API remains identical.
## Related Resources
Explore more distributed forecasting options:
* [Distributed Computing Overview](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray
* [Spark Integration](/docs/forecasting/forecasting-at-scale/spark) - For datasets with 100M+ observations
* [Ray Integration](/docs/forecasting/forecasting-at-scale/ray) - For ML pipeline integration
* [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale
* [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts
# Time Series Forecasting with Ray
Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/ray
Scale machine learning pipelines with Ray and TimeGPT for distributed time series forecasting. Learn to integrate TimeGPT with Ray for complex ML workflows in Python.
## Overview
[Ray](https://www.ray.io/) is an open-source unified compute framework that helps scale Python workloads for distributed computing. This guide demonstrates how to distribute TimeGPT forecasting jobs on top of Ray.
Ray is ideal for machine learning pipelines with complex task dependencies and datasets with 10+ million observations. Its unified framework excels at orchestrating distributed ML workflows, making it perfect for integrating TimeGPT into broader AI applications.
## Why Use Ray for Time Series Forecasting?
Ray offers unique advantages for ML-focused time series forecasting:
* **ML pipeline integration**: Seamlessly integrate TimeGPT into complex ML workflows with Ray Tune and Ray Serve
* **Task parallelism**: Handle complex task dependencies beyond data parallelism
* **Python-native**: Pure Python with minimal boilerplate code
* **Flexible architecture**: Scale from laptop to cluster with the same code
* **Actor model**: Stateful computations for advanced forecasting scenarios
Choose Ray when you're building ML pipelines, need complex task orchestration, or want to integrate TimeGPT with other ML frameworks like PyTorch or TensorFlow.
**What you'll learn:**
* Install Fugue with Ray support for distributed computing
* Initialize Ray clusters for distributed forecasting
* Run TimeGPT forecasting and cross-validation on Ray
## Prerequisites
Before proceeding, make sure you have an [API key from Nixtla](/docs/setup/setting_up_your_api_key).
When executing on a distributed Ray cluster, ensure the `nixtla` library is installed on all workers.
## How to Use TimeGPT with Ray
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/19_computing_at_scale_ray_distributed.ipynb)
### Step 1: Install Fugue and Ray
Fugue provides an easy-to-use interface for distributed computation across frameworks like Ray.
Install Fugue with Ray support:
```bash theme={null}
pip install fugue[ray]
```
### Step 2: Load Your Data
Load your dataset into a pandas DataFrame. This tutorial uses hourly electricity prices from various markets:
```python theme={null}
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
parse_dates=['ds'],
)
df.head()
```
Example pandas DataFrame:
| | unique\_id | ds | y |
| - | ---------- | ------------------- | ----- |
| 0 | BE | 2016-10-22 00:00:00 | 70.00 |
| 1 | BE | 2016-10-22 01:00:00 | 37.10 |
| 2 | BE | 2016-10-22 02:00:00 | 37.10 |
| 3 | BE | 2016-10-22 03:00:00 | 44.75 |
| 4 | BE | 2016-10-22 04:00:00 | 37.10 |
### Step 3: Initialize Ray
Create a Ray cluster locally by initializing a head node. You can scale this to multiple machines in a real cluster environment.
```python theme={null}
import ray
from ray.cluster_utils import Cluster
ray_cluster = Cluster(
initialize_head=True,
head_node_args={"num_cpus": 2}
)
ray.init(address=ray_cluster.address, ignore_reinit_error=True)
# Convert your DataFrame to Ray format:
ray_df = ray.data.from_pandas(df)
ray_df
```
### Step 4: Use TimeGPT on Ray
To use TimeGPT with Ray, provide a Ray Dataset to Nixtla's client methods instead of a pandas DataFrame. The API remains the same as local usage.
Instantiate the `NixtlaClient` class to interact with Nixtla's API:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`.
```python theme={null}
fcst_df = nixtla_client.forecast(ray_df, h=12)
fcst_df.to_pandas().tail()
```
Public API models supported include `timegpt-1` (default) and `timegpt-1-long-horizon`. For long horizon forecasting, see the [long-horizon model tutorial](/docs/forecasting/model-version/longhorizon_model).
```python theme={null}
cv_df = nixtla_client.cross_validation(
ray_df,
h=12,
freq='H',
n_windows=5,
step_size=2
)
cv_df.to_pandas().tail()
```
### Step 5: Shutdown Ray
Always shut down Ray after you finish your tasks to free up resources:
```python theme={null}
ray.shutdown()
```
## Working with Exogenous Variables
TimeGPT with Ray also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Ray Datasets—the API remains identical.
## Related Resources
Explore more distributed forecasting options:
* [Distributed Computing Overview](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray
* [Spark Integration](/docs/forecasting/forecasting-at-scale/spark) - For datasets with 100M+ observations
* [Dask Integration](/docs/forecasting/forecasting-at-scale/dask) - For datasets with 10M-100M observations
* [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale
* [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts
# Time Series Forecasting with Spark
Source: https://nixtla.io/docs/forecasting/forecasting-at-scale/spark
Scale enterprise time series forecasting with Spark and TimeGPT. Learn to process 100M+ observations across distributed clusters with Python and PySpark.
## Overview
[Spark](https://spark.apache.org/) is an open-source distributed compute framework designed for large-scale data processing. This guide demonstrates how to use TimeGPT with Spark to perform forecasting and cross-validation across distributed clusters.
Spark is ideal for enterprise environments with existing Hadoop infrastructure and datasets exceeding 100 million observations. Its robust distributed architecture handles massive-scale time series forecasting with fault tolerance and high performance.
## Why Use Spark for Time Series Forecasting?
Spark offers unique advantages for enterprise-scale time series forecasting:
* **Enterprise-grade scalability**: Handle datasets with 100M+ observations across distributed clusters
* **Hadoop integration**: Seamlessly integrate with existing HDFS and Hadoop ecosystems
* **Fault tolerance**: Automatic recovery from node failures ensures reliable computation
* **Mature ecosystem**: Leverage Spark's rich ecosystem of tools and libraries
* **Multi-language support**: Work with Python (PySpark), Scala, or Java
Choose Spark when you have enterprise infrastructure, datasets exceeding 100 million observations, or need robust fault tolerance for mission-critical forecasting.
**What you'll learn:**
* Install Fugue with Spark support for distributed computing
* Convert pandas DataFrames to Spark DataFrames
* Run TimeGPT forecasting and cross-validation on Spark clusters
## Prerequisites
Before proceeding, make sure you have an [API key from Nixtla](/docs/setup/setting_up_your_api_key).
If executing on a distributed Spark cluster, ensure the `nixtla` library is installed on all worker nodes for consistent execution.
## How to Use TimeGPT with Spark
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/16_computing_at_scale_spark_distributed.ipynb)
### Step 1: Install Fugue and Spark
Fugue provides a convenient interface to distribute Python code across frameworks like Spark.
Install Fugue with Spark support:
```bash theme={null}
pip install fugue[spark]
```
To work with TimeGPT, make sure you have the `nixtla` library installed as well.
### Step 2: Load Your Data
Load the dataset into a pandas DataFrame. In this example, we use hourly electricity price data from different markets:
```python theme={null}
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short.csv',
parse_dates=['ds'],
)
df.head()
```
### Step 3: Initialize Spark
Create a Spark session and convert your pandas DataFrame to a Spark DataFrame:
```python theme={null}
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark_df = spark.createDataFrame(df)
spark_df.show(5)
```
### Step 4: Use TimeGPT on Spark
To use TimeGPT with Spark, provide a Spark DataFrame to Nixtla's client methods instead of a pandas DataFrame. The main difference from local usage is working with Spark DataFrames instead of pandas DataFrames.
Instantiate the `NixtlaClient` class to interact with Nixtla's API:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
You can use any method from the `NixtlaClient`, such as `forecast` or `cross_validation`.
```python theme={null}
fcst_df = nixtla_client.forecast(spark_df, h=12)
fcst_df.show(5)
```
When using Azure AI endpoints, specify `model="azureai"`:
```python theme={null}
nixtla_client.forecast(
spark_df,
h=12,
model="azureai"
)
```
The public API supports two models: `timegpt-1` (default) and `timegpt-1-long-horizon`. For long horizon forecasting, see the [long-horizon model tutorial](/docs/forecasting/model-version/longhorizon_model).
```python theme={null}
cv_df = nixtla_client.cross_validation(
spark_df,
h=12,
n_windows=5,
step_size=2
)
cv_df.show(5)
```
### Step 5: Stop Spark
After completing your tasks, stop the Spark session to free resources:
```python theme={null}
spark.stop()
```
## Working with Exogenous Variables
TimeGPT with Spark also supports exogenous variables. Refer to the [Exogenous Variables Tutorial](/docs/forecasting/exogenous-variables/numeric_features) for details. Simply substitute pandas DataFrames with Spark DataFrames—the API remains identical.
## Related Resources
Explore more distributed forecasting options:
* [Distributed Computing Overview](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Compare Spark, Dask, and Ray
* [Dask Integration](/docs/forecasting/forecasting-at-scale/dask) - For datasets with 10M-100M observations
* [Ray Integration](/docs/forecasting/forecasting-at-scale/ray) - For ML pipeline integration
* [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy at scale
* [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate distributed forecasts
# Improve Forecast Accuracy with TimeGPT
Source: https://nixtla.io/docs/forecasting/improve_accuracy
Advanced techniques to enhance TimeGPT forecast accuracy for energy and electricity.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/22_how_to_improve_forecast_accuracy.ipynb)
# Improve Forecast Accuracy with TimeGPT
This guide demonstrates how to improve forecast accuracy using TimeGPT. We use hourly electricity price data from Germany as an illustrative example. Before you begin, make sure you have initialized the `NixtlaClient` object with your API key.
## Forecasting Results Overview
Below is a summary of our experiments and the corresponding accuracy improvements. We progressively refine forecasts by adding fine-tuning steps, adjusting loss functions, increasing the number of fine-tuned parameters, incorporating exogenous variables, and switching to a long-horizon model.
| Steps | Description | MAE | MAE Improvement (%) | RMSE | RMSE Improvement (%) |
| ----- | ---------------------------- | ---- | ------------------- | ---- | -------------------- |
| 0 | Zero-Shot TimeGPT | 18.5 | N/A | 20.0 | N/A |
| 1 | Add Fine-Tuning Steps | 11.5 | 38% | 12.6 | 37% |
| 2 | Adjust Fine-Tuning Loss | 9.6 | 48% | 11.0 | 45% |
| 3 | Fine-tune More Parameters | 9.0 | 51% | 11.3 | 44% |
| 4 | Add Exogenous Variables | 4.6 | 75% | 6.4 | 68% |
| 5 | Switch to Long-Horizon Model | 6.4 | 65% | 7.7 | 62% |
***
## Step-by-Step Guide
### Step 1: Install and Import Packages
Make sure all necessary libraries are installed and imported. Then set up the Nixtla client (replace with your actual API key).
```python theme={null}
import numpy as np
import pandas as pd
from utilsforecast.evaluation import evaluate
from utilsforecast.plotting import plot_series
from utilsforecast.losses import mae, rmse
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
# api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load the Dataset
We use hourly electricity price data from Germany (`unique_id == "DE"`). The final two days (`48` data points) form the test set.
```python theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv')
df['ds'] = pd.to_datetime(df['ds'])
df_sub = df.query('unique_id == "DE"')
df_train = df_sub.query('ds < "2017-12-29"')
df_test = df_sub.query('ds >= "2017-12-29"')
df_train.shape, df_test.shape
```
```bash Dataset Shape Output theme={null}
((1632, 12), (48, 12))
```

### Step 3: Benchmark Forecast with TimeGPT
**Info:** We first generate a zero-shot forecast using TimeGPT, which captures overall trends but may struggle with short-term fluctuations.
```python theme={null}
fcst_timegpt = nixtla_client.forecast(
df=df_train[['unique_id', 'ds', 'y']],
h=2*24,
target_col='y',
level=[90, 95]
)
```
```bash Forecast Logs theme={null}
[INFO logs here...]
```
#### Evaluation Metrics
| unique\_id | metric | TimeGPT |
| ---------- | ------ | ------- |
| DE | mae | 18.519 |
| DE | rmse | 20.038 |

### Step 4: Methods to Enhance Forecasting Accuracy
Use these following strategies to refine and improve your forecast:
#### 4.1 Add Fine-tuning Steps
Further fine-tuning typically reduces forecasting errors by adjusting the internal weights of the TimeGPT model, allowing it to better adapt to your specific data.
```python theme={null}
fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
h=24*2,
finetune_steps = 30,
level=[90, 95])
```

Evaluation result:
| unique\_id | metric | TimeGPT |
| ---------- | ------ | ------- |
| DE | mae | 11.458 |
| DE | rmse | 12.643 |
#### 4.2 Fine-tune Using Different Loss Functions
Trying different loss functions (e.g., `MAE`, `MSE`) can yield better results for specific use cases.
```python theme={null}
fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
h=24*2,
finetune_steps = 30,
finetune_loss = 'mae',
level=[90, 95])
```

Evaluation result:
| unique\_id | metric | TimeGPT |
| ---------- | ------ | ------- |
| DE | mae | 9.641 |
| DE | rmse | 10.956 |
#### 4.3 Adjust Number of Fine-tuned Parameters
The finetune\_depth parameter controls how many model layers are fine-tuned. It ranges from 1 (few parameters) to 5 (more parameters).
```python theme={null}
fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
h=24*2,
finetune_steps = 30,
finetune_depth=2,
finetune_loss = 'mae',
level=[90, 95])
```

Evaluation result:
| unique\_id | metric | TimeGPT |
| ---------- | ------ | ------- |
| DE | mae | 9.002 |
| DE | rmse | 11.348 |
#### 4.4 Forecast with Exogenous Variables
Incorporate external data (e.g., weather conditions) to boost predictive performance.
```python theme={null}
#import exogenous variables
future_ex_vars_df = df_test.drop(columns = ['y'])
future_ex_vars_df.head()
#make forecast with historical and future exogenous variables
fcst_df = nixtla_client.forecast(df=df_train,
X_df=future_ex_vars_df,
h=24*2,
level=[90, 95])
```

Evaluation result:
| unique\_id | metric | TimeGPT |
| ---------- | ------ | ------- |
| DE | mae | 4.603 |
| DE | rmse | 6.359 |
#### 4.5 Use a Long-Horizon Model
For longer forecasting periods, models optimized for multi-step predictions tend to perform better. You can enable this by setting the model parameter to `timegpt-1-long-horizon`.
```python theme={null}
fcst_df = nixtla_client.forecast(df=df_train[['unique_id', 'ds', 'y']],
h=24*2,
model = 'timegpt-1-long-horizon',
level=[90, 95])
```

Evaluation result:
| unique\_id | metric | TimeGPT |
| ---------- | ------ | ------- |
| DE | mae | 6.366 |
| DE | rmse | 7.738 |
### Step 5: Conclusion and Next Steps
Key takeaways:
The following strategies offer consistent improvements in forecast accuracy. We recommend systematically experimenting with each approach to find the best combination for your data.
* Increase the number of fine-tuning steps.
* Experiment with different loss functions.
* Incorporate exogenous data.
* Switching to the long-horizon model for extended forecasting periods.
**Success:** Small refinements—like adding exogenous data or adjusting fine-tuning parameters—can significantly improve your forecasting results.
***
## Result Summary
| Steps | Description | MAE | MAE Improvement (%) | RMSE | RMSE Improvement (%) |
| ----- | ---------------------------- | ---- | ------------------- | ---- | -------------------- |
| 0 | Zero-Shot TimeGPT | 18.5 | N/A | 20.0 | N/A |
| 1 | Add Fine-Tuning Steps | 11.5 | 38% | 12.6 | 37% |
| 2 | Adjust Fine-Tuning Loss | 9.6 | 48% | 11.0 | 45% |
| 3 | Fine-tune More Parameters | 9.0 | 51% | 11.3 | 44% |
| 4 | Add Exogenous Variables | 4.6 | 75% | 6.4 | 68% |
| 5 | Switch to Long-Horizon Model | 6.4 | 65% | 7.7 | 62% |
# Long-Horizon Forecasting with TimeGPT
Source: https://nixtla.io/docs/forecasting/model-version/longhorizon_model
Master long-horizon time series forecasting in Python using TimeGPT. Learn to predict 2+ seasonal periods ahead with confidence intervals and uncertainty quantification.
## What is Long-Horizon Forecasting?
Long-horizon forecasting refers to predictions far into the future, typically exceeding two seasonal periods. For example, forecasting electricity demand 3 months ahead for hourly data, or predicting sales 2 years ahead for monthly data. The exact threshold depends on data frequency. The further you forecast, the more uncertainty you face.
The key challenge with long-horizon forecasting is that these predictions extend so far into the future that they may be influenced by unforeseen factors not present in the initial dataset. This means long-horizon forecasts generally involve greater risk and uncertainty compared to short-term predictions.
To address these unique challenges, Nixtla provides the specialized `timegpt-1-long-horizon` model in TimeGPT. You can access this model by simply specifying `model="timegpt-1-long-horizon"` when calling `nixtla_client.forecast`.
## When to Use Long-Horizon Forecasting
Long-horizon forecasting is ideal for:
* **Supply chain planning**: Predict inventory needs 3-6 months ahead
* **Financial forecasting**: Model quarterly or annual revenue projections
* **Energy demand**: Forecast power consumption weeks or months in advance
* **Climate modeling**: Predict seasonal weather patterns
Use the `timegpt-1-long-horizon` model when your forecast horizon exceeds two complete seasonal cycles in your data.
## How to Use the Long-Horizon Model
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/04_longhorizon.ipynb)
### Step 1: Import Packages
Start by installing and importing the required packages, then initialize the Nixtla client:
```python theme={null}
from nixtla import NixtlaClient
from datasetsforecast.long_horizon import LongHorizon
from utilsforecast.losses import mae
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # defaults to os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Load the Data
We'll demonstrate long-horizon forecasting using the ETTh1 dataset, which measures oil temperatures and load variations on an electricity transformer in China. Here, we only forecast oil temperatures (`y`):
```python theme={null}
Y_df, *_ = LongHorizon.load(directory='./', group='ETTh1')
Y_df.head()
```
| | unique\_id | ds | y |
| - | ---------- | ------------------- | -------- |
| 0 | OT | 2016-07-01 00:00:00 | 1.460552 |
| 1 | OT | 2016-07-01 01:00:00 | 1.161527 |
| 2 | OT | 2016-07-01 02:00:00 | 1.161527 |
| 3 | OT | 2016-07-01 03:00:00 | 0.862611 |
| 4 | OT | 2016-07-01 04:00:00 | 0.525227 |
We'll set our horizon to 96 timestamps (4 days) for testing and use the previous 42 days as input to the model:
```python theme={null}
test = Y_df[-96:] # 96 timestamps (4 days × 24 hours/day)
input_seq = Y_df[-1104:-96] # 1008 timestamps (42 days × 24 hours/day)
```
### Step 3: Forecasting with the Long-Horizon Model
TimeGPT's `timegpt-1-long-horizon` model is optimized for predictions far into the future. Specify it like so:
```python theme={null}
fcst_df = nixtla_client.forecast(
df=input_seq,
h=96,
level=[90],
finetune_steps=10,
finetune_loss='mae',
model='timegpt-1-long-horizon',
time_col='ds',
target_col='y'
)
```
Next, plot the forecast along with 90% confidence intervals:
```python theme={null}
nixtla_client.plot(
Y_df[-168:],
fcst_df,
models=['TimeGPT'],
level=[90],
time_col='ds',
target_col='y'
)
```

### Step 4: Evaluation
Finally, assess forecast performance using Mean Absolute Error (MAE):
```python theme={null}
test = test.copy()
test.loc[:, 'TimeGPT'] = fcst_df['TimeGPT'].values
evaluation = mae(
test,
models=['TimeGPT'],
id_col='unique_id',
target_col='y'
)
```
Evaluation result:
| unique\_id | TimeGPT |
| ---------- | -------- |
| OT | 0.145393 |
The model achieves a MAE of approximately 0.146, indicating strong performance for these longer-range forecasts.
## Frequently Asked Questions
**Q: What's the difference between timegpt-1 and timegpt-1-long-horizon?**
The `timegpt-1-long-horizon` model is specifically trained for extended forecast horizons (2+ seasonal periods), providing better accuracy for long-range predictions.
**Q: How far ahead can I forecast with the long-horizon model?**
The optimal horizon depends on your data frequency and patterns. Generally, the model performs well up to 4-6 seasonal cycles ahead.
**Q: Can I use exogenous variables with long-horizon forecasting?**
Yes, TimeGPT supports exogenous variables for improved long-horizon accuracy. See our [exogenous variables guide](/docs/forecasting/exogenous-variables/numeric_features) for details.
## Related Resources
Learn more about TimeGPT capabilities:
* [Fine-tuning TimeGPT](/docs/forecasting/fine-tuning/steps) - Improve accuracy for your specific dataset
* [Prediction Intervals](/docs/forecasting/probabilistic/prediction_intervals) - Quantify forecast uncertainty
* [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Validate model performance
* [Anomaly Detection](/docs/anomaly_detection/historical_anomaly_detection) - Identify unusual patterns in time series
# Uncertainty Quantification with TimeGPT
Source: https://nixtla.io/docs/forecasting/probabilistic/introduction
Learn how to generate quantile forecasts and prediction intervals to capture uncertainty in your forecasts.
In time series forecasting, it is important to consider the full probability distribution of the predictions rather than a single point estimate. This provides a more accurate representation of the uncertainty around the forecasts and allows better decision-making.
**TimeGPT** supports uncertainty quantification through quantile forecasts and prediction intervals.
## Why Consider the Full Probability Distribution?
When you focus on a single point prediction, you lose valuable information about the range of possible outcomes. By quantifying uncertainty, you can:
* Identify best-case and worst-case scenarios
* Improve risk management and contingency planning
* Gain confidence in decisions that rely on forecast accuracy
## What You Will Learn
Learn how to compute quantile forecasts using **TimeGPT**.
Discover how to create prediction intervals with **TimeGPT**.
# Prediction Intervals
Source: https://nixtla.io/docs/forecasting/probabilistic/prediction_intervals
Learn how to create prediction intervals with TimeGPT
## What Are Prediction Intervals?
A prediction interval provides a range where a future observation of a time series is expected to fall, with a specific level of probability.
For example, a 95% prediction interval means that the true future value is expected to lie within this range 95 times out of 100.
Wider intervals reflect greater uncertainty, while narrower intervals indicate higher confidence in the forecast.
With TimeGPT, you can easily generate prediction intervals for any confidence level between 0% and 100%.
These intervals are constructed using **[conformal prediction](https://en.wikipedia.org/wiki/Conformal_prediction)**, a distribution-free framework for uncertainty quantification.
Prediction intervals differ from confidence intervals:
* **Prediction Intervals**: Capture the uncertainty in future observations.
* **Confidence Intervals**: Quantify the uncertainty in the estimated model parameters (e.g., the mean).
As a result, prediction intervals are typically wider, as they account for both model and data variability.
## How to Generate Prediction Intervals
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/forecast/10_prediction_intervals.ipynb)
### Step 1: Import Packages
Import the required packages and initialize the Nixtla client.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # defaults to os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Load Data
In this tutorial, we will use the Air Passengers dataset.
```python theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()
```
| | timestamp | value |
| - | ---------- | ----- |
| 0 | 1949-01-01 | 112 |
| 1 | 1949-02-01 | 118 |
| 2 | 1949-03-01 | 132 |
| 3 | 1949-04-01 | 129 |
| 4 | 1949-05-01 | 121 |
### Step 3: Forecast with Prediction Intervals
To generate prediction intervals with TimeGPT, provide a list of desired confidence levels using the `level` argument.
Note that accepted values are between 0 and 100.
* Higher confidence levels provide more certainty that the true value will be captured, but result in wider, less precise intervals.
* Lower confidence levels provide less certainty that the true value will be captured, but result in narrower, more precise intervals.
```python theme={null}
timegpt_fcst_pred_int_df = nixtla_client.forecast(
df=df,
h=12,
level=[80, 90, 99],
time_col='timestamp',
target_col='value',
)
timegpt_fcst_pred_int_df.head()
```
| timestamp | TimeGPT | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-hi-99 | TimeGPT-lo-80 | TimeGPT-lo-90 | TimeGPT-lo-99 |
| ---------- | ------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
| 1961-01-01 | 437.84 | 443.69 | 451.89 | 459.28 | 431.99 | 423.78 | 416.40 |
| 1961-02-01 | 426.06 | 439.42 | 444.43 | 448.94 | 412.70 | 407.70 | 403.19 |
| 1961-03-01 | 463.12 | 488.83 | 495.92 | 502.31 | 437.41 | 430.31 | 423.93 |
| 1961-04-01 | 478.24 | 507.77 | 509.72 | 511.47 | 448.72 | 446.77 | 445.02 |
| 1961-05-01 | 505.65 | 532.89 | 539.32 | 545.12 | 478.41 | 471.97 | 466.18 |
You can visualize the prediction intervals using the `plot` method. To do so, specify the confidence levels to display using the `level` argument.
```python theme={null}
nixtla_client.plot(
df,
timegpt_fcst_pred_int_df,
time_col='timestamp',
target_col='value',
level=[80, 90, 99]
)
```
### Step 4: Historical Forecast
You can also generate prediction intervals for historical forecasts by setting `add_history=True`.
```python theme={null}
timegpt_fcst_pred_int_historical_df = nixtla_client.forecast(
df=df,
h=12,
level=[80, 90],
time_col='timestamp',
target_col='value',
add_history=True,
)
timegpt_fcst_pred_int_historical_df.head()
```
Plot the prediction intervals for the historical forecasts.
```python theme={null}
nixtla_client.plot(
df,
timegpt_fcst_pred_int_historical_df,
time_col='timestamp',
target_col='value',
level=[80,90,99]
)
```
### Step 5. Cross-Validation
You can use the `cross_validation` method to generate prediction intervals for each time window.
```python theme={null}
cv_df = nixtla_client.cross_validation(
df=df,
h=12,
n_windows=4,
level=[80, 90, 99],
time_col='timestamp',
target_col='value'
)
cv_df.head()
```
After computing the forecasts, you can visualize the results for each cross-validation cutoff to better understand model performance over time.
```python theme={null}
cutoffs = cv_df['cutoff'].unique()
for cutoff in cutoffs:
fig = nixtla_client.plot(
df.tail(100),
cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'value']),
level=[80,90,99],
time_col='timestamp',
target_col='value',
)
display(fig)
```
Congratulations! You have successfully generated prediction intervals using TimeGPT.
You also visualized historical forecasts with intervals and evaluated their coverage across multiple time windows using cross-validation.
# Quantile Forecasts
Source: https://nixtla.io/docs/forecasting/probabilistic/quantiles
Learn how to generate quantile forecasts with TimeGPT
## What Are Quantile Forecasts?
Quantile forecasts correspond to specific percentiles of the forecast distribution and provide a more complete representation of the range of possible outcomes.
* The 0.5 quantile (or 50th percentile) is the median forecast, meaning there is a 50% chance that the actual value will fall below or above this point.
* The 0.1 quantile (or 10th percentile) forecast represents a value that the actual observation is expected to fall below 10% of the time.
* The 0.9 quantile (or 90th percentile) forecast represents a value that the actual observation is expected to fall below 90% of the time.
TimeGPT supports quantile forecasts. In this tutorial, we will show you how to generate them.
## Why Use Quantile Forecasts
* Quantile forecasts can provide information about best and worst-case scenarios, allowing you to make better decisions under uncertainty.
* In many real-world scenarios, being wrong in one direction is more costly than being wrong in the other. Quantile forecasts allow you to focus on the specific percentiles that matter most for your particular use case.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/10_uncertainty_quantification_with_quantile_forecasts.ipynb)
## How to Generate Quantile Forecasts
### Step 1: Import Packages
Import the required packages and initialize a Nixtla client to connect with TimeGPT.
```python theme={null}
import pandas as pd
from nixtla import NixtlaClient
from IPython.display import display
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Load Data
In this tutorial, we will use the Air Passengers dataset.
```python theme={null}
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv'
)
df.head()
```
| | timestamp | value |
| - | ---------- | ----- |
| 0 | 1949-01-01 | 112 |
| 1 | 1949-02-01 | 118 |
| 2 | 1949-03-01 | 132 |
| 3 | 1949-04-01 | 129 |
| 4 | 1949-05-01 | 121 |
### Step 3: Forecast with Quantiles
To specify the desired quantiles, you need to pass a list of quantiles to the `quantiles` parameter. Choose quantiles between 0 and 1 based on your uncertainty analysis needs.
```python theme={null}
quantiles = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
timegpt_quantile_fcst_df = nixtla_client.forecast(
df=df,
h=12,
quantiles=quantiles,
time_col='timestamp',
target_col='value'
)
timegpt_quantile_fcst_df.head()
```
| timestamp | TimeGPT | TimeGPT-q-10 | TimeGPT-q-20 | TimeGPT-q-30 | TimeGPT-q-40 | TimeGPT-q-50 | TimeGPT-q-60 | TimeGPT-q-70 | TimeGPT-q-80 | TimeGPT-q-90 |
| ---------- | ------- | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1961-01-01 | 437.84 | 431.99 | 435.04 | 435.38 | 436.40 | 437.84 | 439.27 | 440.29 | 440.63 | 443.69 |
| 1961-02-01 | 426.06 | 412.70 | 414.83 | 416.04 | 421.72 | 426.06 | 430.41 | 436.08 | 437.29 | 439.42 |
| 1961-03-01 | 463.12 | 437.41 | 444.23 | 446.42 | 450.71 | 463.12 | 475.53 | 479.81 | 482.00 | 488.82 |
| 1961-04-01 | 478.24 | 448.72 | 455.43 | 465.57 | 469.88 | 478.24 | 486.61 | 490.92 | 501.06 | 507.76 |
| 1961-05-01 | 505.65 | 478.41 | 493.16 | 497.99 | 499.14 | 505.65 | 512.15 | 513.30 | 518.14 | 532.89 |
TimeGPT returns multiple columns in the forecast output:
* Each requested quantile gets its own column named in the format `TimeGPT-q-...`
* The `TimeGPT` column shows the mean forecast
* The mean forecast (`TimeGPT`) is identical to the 0.5 quantile (`TimeGPT-q-50`)
### Step 4: Plot the Quantile Forecasts
To plot the quantile forecasts, you can use the `plot` method.
```python theme={null}
nixtla_client.plot(
df,
timegpt_quantile_fcst_df,
time_col='timestamp',
target_col='value'
)
```
The plot displays:
* The actual time series data in blue.
* Multiple forecast intervals represented by different quantiles:
* The 0.5 quantile (50th percentile) represents the median forecast.
* The 0.1 and 0.9 quantiles (10th and 90th percentiles) show the outer bounds of the forecast.
* Additional quantiles (0.2, 0.3, 0.4, 0.6, 0.7, 0.8) are shown in between, creating a gradient of uncertainty.
This type of visualization is particularly useful because it:
* Shows the full distribution of possible outcomes rather than just a single point forecast.
* Helps identify best and worst-case scenarios.
* Allows decision-makers to understand the range of uncertainty in the predictions.
### Step 5: Historical Forecast
You can also use quantile forecasts to forecast historical data by setting the `add_history` parameter to `True`.
```python theme={null}
timegpt_quantile_fcst_df = nixtla_client.forecast(
df=df,
h=12,
quantiles=quantiles,
time_col='timestamp',
target_col='value',
add_history=True, # Add historical data to the forecast
)
nixtla_client.plot(
df,
timegpt_quantile_fcst_df,
time_col='timestamp',
target_col='value'
)
```
The plot now includes quantile forecasts for the historical data. This allows you to evaluate how well the quantile forecasts capture the true variability and identify any systematic bias.
### Step 6: Cross-Validation
To evaluate the performance of the quantile forecasts across multiple time windows, you can use the `cross_validation` method.
```python theme={null}
cv_df = nixtla_client.cross_validation(
df=df,
h=12,
n_windows=4,
quantiles=quantiles,
time_col='timestamp',
target_col='value'
)
```
After computing the forecasts, you can visualize the results for each cross-validation cutoff to better understand model performance over time.
```python theme={null}
cutoffs = cv_df['cutoff'].unique()
for cutoff in cutoffs:
fig = nixtla_client.plot(
df.tail(100),
cv_df.query('cutoff == @cutoff').drop(columns=['cutoff', 'value']),
time_col='timestamp',
target_col='value'
)
display(fig)
```
Each plot shows a different cross-validation window (or cutoff) for the time series. This allows you to evaluate how well the predicted intervals capture the true values across multiple, independent forecast windows.
Congratulations! You have successfully generated quantile forecasts using TimeGPT. You also visualized historical quantile predictions and evaluated their performance through cross-validation.
# Bounded Forecasts
Source: https://nixtla.io/docs/forecasting/special-topics/bounded_forecasts
Learn how to generate forecasts with upper and lower bounds to match your business constraints.
## Why Generate Bounded Forecasts?
In forecasting, we often want to make sure the predictions stay within a certain
range. For example, for predicting the sales of a product, we may require all
forecasts to be positive. Thus, the forecasts may need to be bounded.
This tutorial shows how to generate bounded forecasts with TimeGPT by
transforming data prior to forecasting.
## Tutorial
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/13_bounded_forecasts.ipynb)
### Step 1: Import Packages
First, we install and import the required packages.
```python theme={null}
import pandas as pd
import numpy as np
from nixtla import NixtlaClient
```
Next, initialize your Nixtla client with the API key:
```python theme={null}
nixtla_client = NixtlaClient(
# defaults to os.environ.get("NIXTLA_API_KEY")
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load Data
We use the [annual egg prices](https://github.com/robjhyndman/fpp3package/tree/master/data)
dataset from [Forecasting, Principles and Practices](https://otexts.com/fpp3/).
We expect egg prices to be strictly positive, so we want to bound our forecasts
to be positive.
> NOTE: If you do not have `pyreadr`, you can install it with `pip`:
```shell theme={null}
pip install pyreadr
```
```python theme={null}
import pyreadr
from pathlib import Path
url = 'https://github.com/robjhyndman/fpp3package/raw/master/data/prices.rda'
dst_path = str(Path.cwd().joinpath('prices.rda'))
result = pyreadr.read_r(pyreadr.download_file(url, dst_path), dst_path)
df = result['prices'][['year', 'eggs']]
df = df.dropna().reset_index(drop=True)
df = df.rename(columns={'year': 'ds', 'eggs': 'y'})
df['ds'] = pd.to_datetime(df['ds'], format='%Y')
df['unique_id'] = 'eggs'
df.tail(10)
```
| | **ds** | **y** | **unique\_id** |
| -- | ---------- | ------ | -------------- |
| 84 | 1984-01-01 | 100.58 | eggs |
| 85 | 1985-01-01 | 76.84 | eggs |
| 86 | 1986-01-01 | 81.10 | eggs |
| 87 | 1987-01-01 | 69.60 | eggs |
| 88 | 1988-01-01 | 64.55 | eggs |
| 89 | 1989-01-01 | 80.36 | eggs |
| 90 | 1990-01-01 | 79.79 | eggs |
| 91 | 1991-01-01 | 74.79 | eggs |
| 92 | 1992-01-01 | 64.86 | eggs |
| 93 | 1993-01-01 | 62.27 | eggs |
We can have a look at how the prices have evolved in the 20th century,
demonstrating that the price is trending down.
```python theme={null}
nixtla_client.plot(df)
```

### Step 3: Generate Bounded Forecasts with TimeGPT
First, we transform the target data. In this case, we will log-transform the
data prior to forecasting, such that we can only forecast positive prices.
```python theme={null}
df_transformed = df.copy()
df_transformed['y'] = np.log(df_transformed['y'])
```
We will create forecasts for the next 10 years, and we include an 80, 90 and
99.5 percentile of our forecast distribution.
```python theme={null}
timegpt_fcst_with_transform = nixtla_client.forecast(
df=df_transformed,
h=10,
freq='Y',
level=[80, 90, 99.5]
)
```
After having created the forecasts, we need to inverse the transformation that
we applied earlier. With a log-transformation, this simply means we need to
exponentiate the forecasts:
```python theme={null}
cols_to_transform = [
col for col in timegpt_fcst_with_transform if col not in ['unique_id', 'ds']
]
for col in cols_to_transform:
timegpt_fcst_with_transform[col] = np.exp(timegpt_fcst_with_transform[col])
```
Now, we can plot the forecasts. We include a number of prediction intervals,
indicating the 80, 90 and 99.5 percentile of our forecast distribution.
```python theme={null}
nixtla_client.plot(
df,
timegpt_fcst_with_transform,
level=[80, 90, 99.5],
max_insample_length=20
)
```

The forecast and the prediction intervals look reasonable.
### Step 4: Compare with Unbounded Forecast
Let's compare these forecasts to the situation where we don't apply a
transformation. In this case, it may be possible to forecast a negative price.
```python theme={null}
timegpt_fcst_without_transform = nixtla_client.forecast(
df=df,
h=10,
freq='Y',
level=[80, 90, 99.5]
)
```
Indeed, we now observe prediction intervals that become negative:
```python theme={null}
nixtla_client.plot(
df,
timegpt_fcst_without_transform,
level=[80, 90, 99.5],
max_insample_length=20
)
```

For example, in 1995:
```python theme={null}
timegpt_fcst_without_transform
```
| | unique\_id | ds | TimeGPT | TimeGPT-lo-99.5 | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-hi-80 | TimeGPT-hi-90 | TimeGPT-hi-99.5 |
| -: | ---------: | ---------: | --------: | --------------: | ------------: | ------------: | ------------: | ------------: | --------------: |
| 0 | eggs | 1994-01-01 | 66.859756 | 43.103240 | 46.131448 | 49.319034 | 84.400479 | 87.588065 | 90.616273 |
| 1 | eggs | 1995-01-01 | 64.993477 | -20.924112 | -4.750041 | 12.275298 | 117.711656 | 134.736995 | 150.911066 |
| 2 | eggs | 1996-01-01 | 66.695808 | 6.499170 | 8.291150 | 10.177444 | 123.214173 | 125.100467 | 126.892446 |
| 3 | eggs | 1997-01-01 | 66.103325 | 17.304282 | 24.966939 | 33.032894 | 99.173756 | 107.239711 | 114.902368 |
| 4 | eggs | 1998-01-01 | 67.906517 | 4.995371 | 12.349648 | 20.090992 | 115.722042 | 123.463386 | 130.817663 |
| 5 | eggs | 1999-01-01 | 66.147575 | 29.162207 | 31.804460 | 34.585779 | 97.709372 | 100.490691 | 103.132943 |
| 6 | eggs | 2000-01-01 | 66.062637 | 14.671932 | 19.305822 | 24.183601 | 107.941673 | 112.819453 | 117.453343 |
| 7 | eggs | 2001-01-01 | 68.045769 | 3.915282 | 13.188964 | 22.950736 | 113.140802 | 122.902573 | 132.176256 |
| 8 | eggs | 2002-01-01 | 66.718903 | -42.212631 | -30.583703 | -18.342726 | 151.780531 | 164.021508 | 175.650436 |
| 9 | eggs | 2003-01-01 | 67.344078 | -86.239911 | -44.959745 | -1.506939 | 136.195095 | 179.647901 | 220.928067 |
## Conclusion
Log-transformations are a simple and effective way to enforce non-negative
predictions. This tutorial demonstrated how TimeGPT accommodates bounded
forecasts to enhance forecast realism and reliability.
## References
* [**Hyndman, Rob J., and George Athanasopoulos (2021). Forecasting: Principles and Practice (3rd Ed)**](https://otexts.com/fpp3/)
# Hierarchical Forecasting
Source: https://nixtla.io/docs/forecasting/special-topics/hierarchical_forecasting
Learn how to use TimeGPT for hierarchical forecasting across multiple levels.
## What is Hierarchical Forecasting?
Hierarchical forecasting involves generating forecasts for multiple time series that share a hierarchical structure (e.g., product demand by category, department, or region). The goal is to ensure that forecasts are coherent across each level of the hierarchy.
Hierarchical forecasting can be particularly important when you need to generate forecasts at different granularities (e.g., country, state, region) and ensure they align with each other and aggregate correctly at higher levels.
Using TimeGPT, you can create forecasts for multiple related time series and then apply hierarchical forecasting methods from [HierarchicalForecast](https://nixtlaverse.nixtla.io/hierarchicalforecast/index.html) to reconcile those forecasts across your specified hierarchy.
## Why use Hierarchical Forecasting?
* Ensures consistency: Forecasts at lower levels add up to higher-level forecasts.
* Improves accuracy: Reconciliation methods often yield more robust predictions.
* Facilitates deeper insights: Understand how smaller segments contribute to overall trends.
## Tutorial
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/14_hierarchical_forecasting.ipynb)
### Step 1: Install, Import and Initialize
Start by installing the required packages.
```shell theme={null}
pip install nixtla
pip install hierarchicalforecast
```
Next, initialize the TimeGPT NixtlaClient.
```python theme={null}
import pandas as pd
import numpy as np
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load and Prepare Data
This tutorial uses the Australian Tourism dataset from
[Forecasting: Principles and Practices](https://otexts.com/fpp3/). The dataset
contains different levels of hierarchical data, from the entire country of
Australia down to individual regions.
The dataset provides only the lowest-level series, so higher-level series need
to be aggregated explicitly. Let's load and preprocess the dataset.
```python theme={null}
Y_df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/tourism.csv'
)
Y_df = Y_df.rename({'Trips': 'y', 'Quarter': 'ds'}, axis=1)
Y_df.insert(0, 'Country', 'Australia')
Y_df = Y_df[['Country', 'Region', 'State', 'Purpose', 'ds', 'y']]
Y_df['ds'] = Y_df['ds'].str.replace(r'(\d+) (Q\d)', r'\1-\2', regex=True)
Y_df['ds'] = pd.to_datetime(Y_df['ds'])
Y_df.head(10)
```
| Country | Region | State | Purpose | ds | y |
| --------- | -------- | --------------- | -------- | ---------- | ---------- |
| Australia | Adelaide | South Australia | Business | 1998-01-01 | 135.077690 |
| Australia | Adelaide | South Australia | Business | 1998-04-01 | 109.987316 |
| Australia | Adelaide | South Australia | Business | 1998-07-01 | 166.034687 |
| Australia | Adelaide | South Australia | Business | 1998-10-01 | 127.160464 |
| Australia | Adelaide | South Australia | Business | 1999-01-01 | 137.448533 |
| Australia | Adelaide | South Australia | Business | 1999-04-01 | 199.912586 |
| Australia | Adelaide | South Australia | Business | 1999-07-01 | 169.355090 |
| Australia | Adelaide | South Australia | Business | 1999-10-01 | 134.357937 |
| Australia | Adelaide | South Australia | Business | 2000-01-01 | 154.034398 |
| Australia | Adelaide | South Australia | Business | 2000-04-01 | 168.776364 |
We define the dataset hierarchies explicitly. Each level in the list describes
one view of the hierarchy:
```python theme={null}
spec = [
['Country'],
['Country', 'State'],
['Country', 'Purpose'],
['Country', 'State', 'Region'],
['Country', 'State', 'Purpose'],
['Country', 'State', 'Region', 'Purpose']
]
```
Then, use `aggregate` from `HierarchicalForecast` to generate the aggregated series:
```python theme={null}
from hierarchicalforecast.utils import aggregate
Y_df, S_df, tags = aggregate(Y_df, spec)
Y_df.head(10)
```
| unique\_id | ds | y |
| ---------- | ---------- | ------------ |
| Australia | 1998-01-01 | 23182.197269 |
| Australia | 1998-04-01 | 20323.380067 |
| Australia | 1998-07-01 | 19826.640511 |
| Australia | 1998-10-01 | 20830.129891 |
| Australia | 1999-01-01 | 22087.353380 |
| Australia | 1999-04-01 | 21458.373285 |
| Australia | 1999-07-01 | 19914.192508 |
| Australia | 1999-10-01 | 20027.925640 |
| Australia | 2000-01-01 | 22339.294779 |
| Australia | 2000-04-01 | 19941.063482 |
Next, create the train/test splits. Here, we use the last two years (eight
quarters) of data for testing:
```python theme={null}
Y_test_df = Y_df.groupby('unique_id').tail(8)
Y_train_df = Y_df.drop(Y_test_df.index)
```
### Step 3: Hierarchical Forecasting Using TimeGPT
Now we'll generate base forecasts across all series using TimeGPT and then apply
hierarchical reconciliation to ensure the forecasts align across each level.
#### Generate Base Forecasts with TimeGPT
Obtain forecasts with TimeGPT for all series in your training data.
```python theme={null}
timegpt_fcst = nixtla_client.forecast(
df=Y_train_df,
h=8,
freq='QS',
add_history=True
)
```
Next, separate the generated forecasts into in-sample (historical) and
out-of-sample (forecasted) periods:
```python theme={null}
timegpt_fcst_insample = timegpt_fcst.query("ds < '2016-01-01'")
timegpt_fcst_outsample = timegpt_fcst.query("ds >= '2016-01-01'")
```
#### Visualize TimeGPT Forecasts
Quickly visualize the forecasts for different hierarchy levels. Here, we look at
the entire country, the state of Queensland, the Brisbane region, and holidays
in Brisbane:
```python theme={null}
nixtla_client.plot(
Y_df,
timegpt_fcst_outsample,
max_insample_length=4 * 12,
unique_ids=[
'Australia',
'Australia/Queensland',
'Australia/Queensland/Brisbane',
'Australia/Queensland/Brisbane/Holiday'
]
)
```
#### Apply Hierarchical Reconciliation
We use `MinTrace` methods to reconcile forecasts across all levels of the hierarchy.
The `S` parameter was renamed to `S_df` in `hierarchicalforecast`. Make sure
you are using `S_df` when calling `reconcile`.
```python theme={null}
from hierarchicalforecast.methods import MinTrace
from hierarchicalforecast.core import HierarchicalReconciliation
reconcilers = [
MinTrace(method='ols'),
MinTrace(method='mint_shrink')
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_df_with_insample_fcsts = timegpt_fcst_insample.merge(Y_df.copy())
Y_rec_df = hrec.reconcile(
Y_hat_df=timegpt_fcst_outsample,
Y_df=Y_df_with_insample_fcsts,
S_df=S_df,
tags=tags
)
```
Now, let's plot the reconciled forecasts to ensure they make sense across the
full country → state → region → purpose hierarchy:
```python theme={null}
nixtla_client.plot(
Y_df,
Y_rec_df,
max_insample_length=4 * 12,
unique_ids=[
'Australia',
'Australia/Queensland',
'Australia/Queensland/Brisbane',
'Australia/Queensland/Brisbane/Holiday'
]
)
```
### Step 4: Evaluate Forecast Accuracy
Finally, evaluate your forecast performance using RMSE for different levels of
the hierarchy, from total (country) to bottom-level (region/purpose).
```python theme={null}
from hierarchicalforecast.evaluation import evaluate
from utilsforecast.losses import rmse
eval_tags = {
'Total': tags['Country'],
'Purpose': tags['Country/Purpose'],
'State': tags['Country/State'],
'Regions': tags['Country/State/Region'],
'Bottom': tags['Country/State/Region/Purpose']
}
evaluation = evaluate(
df=Y_rec_df.merge(Y_test_df, on=['unique_id', 'ds']),
tags=eval_tags,
train_df=Y_train_df,
metrics=[rmse]
)
evaluation[evaluation.select_dtypes(np.number).columns] = evaluation.select_dtypes(np.number).map('{:.2f}'.format)
evaluation
```
| | level | metric | TimeGPT | TimeGPT/MinTrace\_method-ols | TimeGPT/MinTrace\_method-mint\_shrink |
| - | ------- | ------ | ------- | ---------------------------- | ------------------------------------- |
| 0 | Total | rmse | 1433.07 | 1436.07 | 1627.43 |
| 1 | Purpose | rmse | 482.09 | 475.64 | 507.50 |
| 2 | State | rmse | 275.85 | 278.39 | 294.28 |
| 3 | Regions | rmse | 49.40 | 47.91 | 47.99 |
| 4 | Bottom | rmse | 19.32 | 19.11 | 18.86 |
| 5 | Overall | rmse | 38.66 | 38.21 | 39.16 |
## Conclusion
We made a small improvement in overall RMSE by reconciling the forecasts with
`MinTrace(ols)`, and made them slightly worse using `MinTrace(mint_shrink)`,
indicating that the base forecasts were relatively strong already.
However, we now have coherent forecasts too - so not only did we make a (small)
accuracy improvement, we also got coherency to the hierarchy as a result of our
reconciliation step.
## References
* [Hyndman, Rob J., and George Athanasopoulos (2021). Forecasting: Principles and Practice](https://otexts.com/fpp3/).
# Irregular Timestamps
Source: https://nixtla.io/docs/forecasting/special-topics/irregular_timestamps
Learn how to work with both regular and irregular timestamps in TimeGPT for accurate forecasting.
## Why Handle Irregular Timestamps?
When working with time series data, it is important to specify its frequency
correctly, as this can significantly impact forecasting results. TimeGPT is
designed to automatically infer the frequency of your timestamps. For commonly
used frequencies, such as hourly, daily, or monthly, TimeGPT reliably infers
the frequency automatically, so no additional input is required.
However, for irregular frequencies, where observations are not recorded at
consistent or regular intervals, such as the days the U.S. stock market is open,
it is necessary to specify the frequency directly.
In this tutorial, we will show you how to handle irregular and custom
frequencies in TimeGPT.
> NOTE: TimeGPT requires that your data does not contain missing values, as this is not
> currently supported. In other words, the irregularity of the data should stem
> from the nature of the recorded phenomenon, not from missing observations.
> If your data contains missing values, please refer to our
> [tutorial on missing dates](/docs/data_requirements/missing_values).
## Tutorial
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/capabilities/forecast/11_irregular_timestamps.ipynb)
### Step 1: Import Packages
First, we import the required packages and initialize the Nixtla client.
```python theme={null}
import pandas as pd
import pandas_market_calendars as mcal
from nixtla import NixtlaClient
```
Initialize NixtlaClient with your API key:
```python theme={null}
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Handling Regular Frequencies
As discussed in the introduction, for time series data with regular frequencies,
where observations are recorded at consistent intervals, TimeGPT can automatically
infer the frequency of your timestamps if the input data is a **pandas DataFrame**.
If you prefer not to rely on TimeGPT's automatic inference, you can set the
`freq` parameter to a valid
[pandas frequency string](https://pandas.pydata.org/docs/user_guide/timeseries.html#offset-aliases),
such as `MS` for month-start frequency or `min` for minutely frequency.
When working with **Polars DataFrames**, you must specify the frequency explicitly
by using a valid [polars offset](https://docs.pola.rs/api/python/stable/reference/expressions/api/polars.Expr.dt.offset_by.html),
such as `1d` for daily frequency or `1h` for hourly frequency.
Below is an example of how to specify the frequency for a Polars DataFrame.
```python theme={null}
import polars as pl
url = 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv'
polars_df = pl.read_csv(url, try_parse_dates=True)
fcst_df = nixtla_client.forecast(
df=polars_df,
h=12,
freq='1mo',
time_col='timestamp',
target_col='value',
level=[80, 95]
)
```
Plot the forecast DataFrame:
```python theme={null}
nixtla_client.plot(
polars_df,
fcst_df,
time_col='timestamp',
target_col='value',
level=[80, 95]
)
```

### Step 3: Handling Irregular Frequencies
In this section, we will discuss cases where observations are not recorded at
consistent intervals.
#### Load data
We will use the daily stock prices of Palantir Technologies (PLTR) from 2020 to 2023.
The dataset includes data up to 2023-09-22, but for this tutorial, we will exclude
any data before 2023-08-28. This allows us to show how a custom frequency can
handle days when the stock market is closed, such as Labor Day in the U.S.
> IMPORTANT NOTE: While we are using TimeGPT to predict stock price in this
> tutorial, please note that this is being done only with the intention of showing
> the capability of forecasting with irregular timestamps. **Stock prices are [`random
> walks`](https://otexts.com/fpppy/nbs/09-arima.html#random-walk-model) and as
> such can not be predicted using traditional time series forecasting methods
> (including TimeGPT)**. Predictions for random walk will be a straight line type
> of forecast where tomorrow's price is predicted to be equal to today's price,
> which is not a useful model.
```python theme={null}
url = 'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/openbb/pltr.csv'
pltr_df = pd.read_csv(url, parse_dates=['date'])
pltr_df = pltr_df.query('date < "2023-08-28"')
pltr_df.head()
```
| | date | Open | High | Low | Close | Adj Close | Volume | Dividends | Stock Splits |
| -: | ---------: | ----: | ----: | ---: | ----: | --------: | --------: | --------: | -----------: |
| 0 | 2020-09-30 | 10.00 | 11.41 | 9.11 | 9.50 | 9.50 | 338584400 | 0.0 | 0.0 |
| 1 | 2020-10-01 | 9.69 | 10.10 | 9.23 | 9.46 | 9.46 | 124297600 | 0.0 | 0.0 |
| 2 | 2020-10-02 | 9.06 | 9.28 | 8.94 | 9.20 | 9.20 | 55018300 | 0.0 | 0.0 |
| 3 | 2020-10-05 | 9.43 | 9.49 | 8.92 | 9.03 | 9.03 | 36316900 | 0.0 | 0.0 |
| 4 | 2020-10-06 | 9.04 | 10.18 | 8.90 | 9.90 | 9.90 | 90864000 | 0.0 | 0.0 |
We will forecast the **adjusted closing price**, which represents the stock's
closing price adjusted for corporate actions such as stock splits, dividends,
and rights offerings. Hence, we will exclude the other columns from the dataset.
```python theme={null}
pltr_df = pltr_df[['date', 'Adj Close']]
nixtla_client.plot(
pltr_df,
time_col="date",
target_col="Adj Close"
)
```

#### Define the Frequency
To define a custom frequency, we will first extract and sort the dates from the
input data, ensuring they are in the correct datetime format. Next, we will use
the [`pandas_market_calendars package`](https://pypi.org/project/pandas-market-calendars/),
specifically the `get_calendar` method, to obtain the New York Stock Exchange
(NYSE) calendar. Using this calendar, we can create a custom frequency that
includes only the days the stock market is open.
```python theme={null}
dates = pd.DatetimeIndex(sorted(pltr_df['date'].unique()))
nyse = mcal.get_calendar('NYSE')
```
Note that the days the stock market is open need to include all the dates in the
input data plus the forecast horizon. In this example, we will forecast 7 days
ahead, so we need to make sure our trading days include the last date in the
input data as well as the next 7 valid trading days.
To avoid dealing with holidays or weekends during the forecast horizon, we will
specify an end date well beyond the forecast horizon. For this example, we will
use January 1, 2024, as a safe cutoff.
```python theme={null}
trading_days = nyse.valid_days(
start_date=dates.min(),
end_date="2024-01-01"
).tz_localize(None)
```
Now, with the list of trading days, we can identify the days the stock market is
closed. These are all weekdays (Monday to Friday) within the range that are not
trading days. Using this information, we can define a custom frequency that skips
the stock market's closed days.
```python theme={null}
all_weekdays = pd.date_range(
start=dates.min(),
end="2024-01-01",
freq='B'
)
closed_days = all_weekdays.difference(trading_days)
custom_bday = pd.offsets.CustomBusinessDay(
holidays=closed_days
)
```
#### Forecast with TimeGPT
With the custom frequency defined, we can now use the forecast method,
specifying the custom\_bday frequency in the freq argument. This will make the
forecast respect the trading schedule of the stock market.
```python theme={null}
fcst_pltr_df = nixtla_client.forecast(
df=pltr_df,
h=7,
freq=custom_bday,
time_col='date',
target_col='Adj Close',
level=[80, 95]
)
```
Finally, plot the forecast results:
```python theme={null}
nixtla_client.plot(
pltr_df,
fcst_pltr_df,
time_col="date",
target_col="Adj Close",
level=[80, 95],
max_insample_length=180
)
```

```python theme={null}
fcst_pltr_df[['date']].head(7)
```
| | date |
| -: | ---------- |
| 0 | 2023-08-28 |
| 1 | 2023-08-29 |
| 2 | 2023-08-30 |
| 3 | 2023-08-31 |
| 4 | 2023-09-01 |
| 5 | 2023-09-05 |
| 6 | 2023-09-06 |
Note that the forecast excludes 2023-09-04, which was a Monday when the stock
market was closed for Labor Day in the United States.
## Conclusion
Below are the key takeaways of this tutorial:
* TimeGPT can reliably infer regular frequencies, but you can override this by
setting the `freq` parameter to the corresponding pandas alias.
* When working with polars data frames, you must always specify the frequency
using the correct polars offset.
* TimeGPT supports irregular frequencies and allows you to define a custom
frequency, generating forecasts exclusively for the specified dates.
# Temporal Hierarchical Forecasting with TimeGPT
Source: https://nixtla.io/docs/forecasting/special-topics/temporal_hierarchical
Learn how to combine forecasts at different time frequencies to improve accuracy.
## What is Temporal Hierarchical Forecasting?
Temporal hierarchical forecasting is a technique that improves prediction accuracy by leveraging the structure of time series data across multiple temporal resolutions such as hourly, daily, weekly, and monthly.
Rather than modeling just one time scale, it generates forecasts at each level of the temporal hierarchy and then reconciles them to ensure consistency (e.g., the sum of hourly forecasts aligns with the daily total).
This approach captures both high-frequency variations and long-term trends, allowing for coherent forecasts across time scales.
It is particularly effective in domains like energy demand, retail sales, and transportation planning, where decisions depend on both granular and aggregated time-based insights.
## Tutorial
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/tutorials/23_temporalhierarchical.ipynb)
In this tutorial, we demonstrate how to use TimeGPT for temporal hierarchical forecasting. We will use a dataset that has an hourly frequency, and we create forecasts with TimeGPT for both the hourly and the 2-hourly frequency level. The latter constitutes the timeseries when it is aggregated across 2-hour windows. Subsequently, we can use temporal reconciliation techniques to improve the forecasting performance of TimeGPT.
### Step 1: Import and Initialize
Let's import the NixtlaClient and Initialize it with an API key.
```python theme={null}
import numpy as np
import pandas as pd
from utilsforecast.evaluation import evaluate
from utilsforecast.plotting import plot_series
from utilsforecast.losses import mae, rmse
from nixtla import NixtlaClient
# Initialize NixtlaClient
nixtla_client = NixtlaClient(
# api_key = 'my_api_key_provided_by_nixtla'
)
```
### Step 2: Load and Prepare Data
First, let's read and process the dataset.
```python theme={null}
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/electricity-short-with-ex-vars.csv'
)
df['ds'] = pd.to_datetime(df['ds'])
df_sub = df.query('unique_id == "DE"')
```
Next, let's create the train-test splits
```python theme={null}
df_train = df_sub.query('ds < "2017-12-29"')
df_test = df_sub.query('ds >= "2017-12-29"')
df_train.shape, df_test.shape
```
```bash theme={null}
((1632, 12), (48, 12))
```
Let's visualize the train and test splits to make sure that they are as expected.
```python theme={null}
plot_series(
df_train[['unique_id', 'ds', 'y']][-200:],
forecasts_df=df_test[['unique_id', 'ds', 'y']].rename(columns={'y': 'test'})
)
```

### Step 3: Temporal Hierarchical Forecasting
#### Temporal Aggregation
We are interested in generating forecasts for the hourly and 2-hourly
windows. We can generate these forecasts using TimeGPT. After generating
these forecasts, we make use of hierarchical forecasting techniques to
improve the accuracy of each forecast.
We first define the temporal aggregation spec. The spec is a dictionary in
which the keys are the name of the aggregation and the value is the amount
of bottom-level timesteps that should be aggregated in that aggregation.
In this example, we choose a temporal aggregation of a 2-hour period and a
1-hour period (the bottom level).
```python theme={null}
spec_temporal = { "2-hour-period": 2, "1-hour-period": 1 }
```
We next compute the temporally aggregated train- and test sets using the
aggregate\_temporal function from hierarchicalforecast. Note that we have
different aggregation matrices S for the train- and test set, as the test
set contains temporal hierarchies that are not included in the train set.
```python theme={null}
from hierarchicalforecast.utils import aggregate_temporal
Y_train, S_train, tags_train = aggregate_temporal(
df=df_train[['unique_id', 'ds', 'y']], spec=spec_temporal
)
Y_test, S_test, tags_test = aggregate_temporal(
df=df_test[['unique_id', 'ds', 'y']], spec=spec_temporal
)
```
`Y_train` contains our training data, for both 1-hour and 2-hour periods.
For example, if we look at the first two timestamps of the training data,
we have a 2-hour period ending at 2017-10-22 01:00, and two 1-hour periods,
the first ending at 2017-10-22 00:00, and the second at 2017-10-22 01:00,
the latter corresponding to when the first 2-hour period ends.
Also, the ground truth value `y` of the first 2-hour period is 38.13, which
is equal to the sum of the first two 1-hour periods (19.10 + 19.03). This
showcases how the higher frequency `1-hour-period` has been aggregated into
the `2-hour-period` frequency.
```python theme={null}
Y_train.query("ds <= '2017-10-22 01:00:00'")
```
| | temporal\_id | unique\_id | ds | y |
| --- | --------------- | ---------- | ------------------- | ----- |
| 0 | 2-hour-period-1 | DE | 2017-10-22 01:00:00 | 38.13 |
| 816 | 1-hour-period-1 | DE | 2017-10-22 00:00:00 | 19.10 |
| 817 | 1-hour-period-2 | DE | 2017-10-22 01:00:00 | 19.03 |
The aggregation matrices `S_train` and `S_test` detail how the lowest temporal
granularity (hour) can be aggregated into the 2-hour periods. For example,
the first 2-hour period, named `2-hour-period-1`, can be constructed by
summing the first two hour-periods, `1-hour-period-1` and `1-hour-period-2`,
which we also verified above in our inspection of Y\_train.
```python theme={null}
S_train.iloc[:5, :5]
```
| | temporal\_id | 1-hour-period-1 | 1-hour-period-2 | 1-hour-period-3 | 1-hour-period-4 |
| - | --------------- | --------------- | --------------- | --------------- | --------------- |
| 0 | 2-hour-period-1 | 1.0 | 1.0 | 0.0 | 0.0 |
| 1 | 2-hour-period-2 | 0.0 | 0.0 | 1.0 | 1.0 |
| 2 | 2-hour-period-3 | 0.0 | 0.0 | 0.0 | 0.0 |
| 3 | 2-hour-period-4 | 0.0 | 0.0 | 0.0 | 0.0 |
| 4 | 2-hour-period-5 | 0.0 | 0.0 | 0.0 | 0.0 |
#### Computing Base Forecasts with TimeGPT
Now, we need to compute base forecasts for each temporal aggregation. The
following cell computes the **base forecasts** for each temporal aggregation
in `Y_train` using TimeGPT.
Note that both frequency and horizon are different for each temporal
aggregation. In this example, the lowest level has a hourly frequency, and a
horizon of `48`. The `2-hourly-period` aggregation thus has a 2-hourly
frequency with a horizon of `24`.
```python theme={null}
Y_hats = []
id_cols = ["unique_id", "temporal_id", "ds", "y"]
for level, temporal_ids_train in tags_train.items():
Y_level_train = Y_train.query("temporal_id in @temporal_ids_train")
temporal_ids_test = tags_test[level]
Y_level_test = Y_test.query("temporal_id in @temporal_ids_test")
freq_level = pd.infer_freq(Y_level_train["ds"].unique())
horizon_level = Y_level_test["ds"].nunique()
Y_hat_level = nixtla_client.forecast(
df=Y_level_train[["ds", "unique_id", "y"]],
h=horizon_level
)
Y_hat_level = Y_hat_level.merge(Y_level_test, on=["ds", "unique_id"], how="left")
Y_hat_cols = id_cols + [col for col in Y_hat_level.columns if col not in id_cols]
Y_hat_level = Y_hat_level[Y_hat_cols]
Y_hats.append(Y_hat_level)
Y_hat = pd.concat(Y_hats, ignore_index=True)
```
Observe that `Y_hat` contains all the forecasts but they are not coherent
with each other. For example, consider the forecasts for the first time
period of both frequencies.
| | unique\_id | temporal\_id | ds | y | TimeGPT |
| -: | ---------: | --------------: | ------------------: | ----: | --------- |
| 0 | DE | 2-hour-period-1 | 2017-12-29 01:00:00 | 10.45 | 16.949455 |
| 24 | DE | 1-hour-period-1 | 2017-12-29 00:00:00 | 9.73 | -0.241482 |
| 25 | DE | 1-hour-period-2 | 2017-12-29 01:00:00 | 0.72 | -3.456478 |
The ground truth value `y` for the first 2-hour period is 10.45, and the sum
of the ground truth values for the first two 1-hour periods is (9.73 + 0.72)
\= 10.45. Hence, these values are coherent with each other.
However, the forecast for the first 2-hour period is 16.95, but the sum of
the forecasts for the first two 1-hour periods is -3.69. Hence, these
forecasts are clearly not coherent with each other.
We will use reconciliation techniques to make these forecasts better
coherent with each other and improve their accuracy.
#### Forecast Reconciliation
We can use the `HierarchicalReconciliation` class to reconcile the forecasts.
In this example we use `MinTrace`. Note that we have to set `temporal=True`
in the `reconcile` function.
The `S` parameter was renamed to `S_df` in `hierarchicalforecast`. Make sure
you are using `S_df` when calling `reconcile`.
```python theme={null}
from hierarchicalforecast.methods import MinTrace
from hierarchicalforecast.core import HierarchicalReconciliation
reconcilers = [MinTrace(method="wls_struct")]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
Y_rec = hrec.reconcile(Y_hat_df=Y_hat, S_df=S_test, tags=tags_test, temporal=True)
```
### Step 4. Evaluation
The `HierarchicalForecast` package includes the `evaluate` function to
evaluate the different hierarchies.
We evaluate the temporally aggregated forecasts across **all temporal aggregations**.
```python theme={null}
import hierarchicalforecast.evaluation as hfe
evaluation = hfe.evaluate(
df=Y_rec.drop(columns='unique_id'),
tags=tags_test,
metrics=[mae],
id_col='temporal_id'
)
numeric_cols = evaluation.select_dtypes('number').columns
evaluation[numeric_cols] = evaluation[numeric_cols].map('{:.3}'.format).astype(float)
evaluation
```
| | level | metric | TimeGPT | TimeGPT/MinTrace\_method-wls\_struct |
| -: | ------------: | -----: | ------: | -----------------------------------: |
| 0 | 2-hour-period | mae | 25.2 | 12.00 |
| 1 | 1-hour-period | mae | 18.5 | 6.16 |
| 2 | Overall | mae | 20.8 | 8.12 |
As we can see, we improved performance of TimeGPT's predictions both for the
2-hour period and for the 1-hour period, as both levels see a significant
reduction in MAE.
Visually, we can also verify the forecast is better after using reconciliation
techniques.
For the 1-hour-period forecasts:
```python theme={null}
plot_series(
Y_train.query(
"temporal_id in @tags_train['1-hour-period']"
)[["y", "ds", "unique_id"]].iloc[-100:],
forecasts_df=Y_rec.query("temporal_id in @tags_test['1-hour-period']").drop(columns=["temporal_id"])
)
```
and for the 2-hour period forecasts:
```python theme={null}
plot_series(
Y_train.query(
"temporal_id in @tags_train['2-hour-period']"
)[["y", "ds", "unique_id"]].iloc[-50:],
forecasts_df=Y_rec.query("temporal_id in @tags_test['2-hour-period']").drop(columns=["temporal_id"])
)
```
Also, we can now verify that the forecasts are better coherent with each other.
For the first 2-hour period, our forecast after reconciliation is 6.63, and
the sum of the forecasts for the first two 1-hour periods is 1.7 + 4.92 =
6.63. Hence, we now have more accurate and coherent forecasts across frequencies.
```python theme={null}
Y_rec.query(
"temporal_id in ['2-hour-period-1', '1-hour-period-1', '1-hour-period-2']"
)
```
| | unique\_id | temporal\_id | ds | y | TimeGPT | TimeGPT/MinTrace\_method-wls\_struct |
| -: | ---------: | --------------: | ------------------: | ----: | --------: | -----------------------------------: |
| 0 | DE | 2-hour-period-1 | 2017-12-29 01:00:00 | 10.45 | 16.949455 | 6.625748 |
| 24 | DE | 1-hour-period-1 | 2017-12-29 00:00:00 | 9.73 | -0.241482 | 4.920372 |
| 25 | DE | 1-hour-period-2 | 2017-12-29 01:00:00 | 0.72 | -3.456478 | 1.705376 |
## Conclusion
In this tutorial we have shown:
* How to create forecasts for multiple frequencies for the same dataset with TimeGPT
* How to improve the accuracy of these forecasts using temporal reconciliation techniques
Note that even though we created forecasts for two different frequencies, there
is no 'need' to use the forecast of the 2-hour-period. One can use this technique
also simply to improve the forecast of the 1-hour-period.
# Quickstart (TimeGPT-2)
Source: https://nixtla.io/docs/forecasting/timegpt_2_family
Learn how to use TimeGPT-2 family of time series forecasting models
## TimeGPT-2 Family of Foundation Models
[TimeGPT-2](https://www.nixtla.io/blog/timegpt-2-announcement) and [TimeGPT-2.1](https://www.nixtla.io/blog/timegpt-2-1-announcement) are the latest versions of our enterprise-grade models, built to reliably solve mission-critical time-series problems. The TimeGPT-2 family of models is optimized for enterprise needs, prioritizing accuracy and stability with a privacy-first approach and full support for self-hosted and on-premises deployments.
## Set Up TimeGPT-2 family of models for Python Time Series Forecasting
### Step 1: Confirm Access and get an API Key
* Confirm with [support@nixtla.io](mailto:support@nixtla.io) that your account has access to these latest models.
* Get your API key from [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/timegpt_2_family). Note that you can also use your existing keys as long as your account has access to these latest models (Step 1 above).
* Sign in using Google, GitHub, or your email.
* Navigate to **API Keys** in the menu and select **Create New API Key**.
* Your new API key will appear on the screen. Copy this key and save it in a safe place for use later.

### Step 2: Install Nixtla
Install the Nixtla library in your preferred Python environment. In order to use the TimeGPT-2 family of models, the client version must be >= 0.7.0.
```bash theme={null}
pip install nixtla>=0.7.0
```
You can verify the client version installed using the following code. It should return a version >= 0.7.0
```python theme={null}
from nixtla import __version__
print(__version__)
```
```bash theme={null}
0.7.2
```
### Step 3: Import the Nixtla TimeGPT client
Import the Nixtla client and instantiate it with your API key and base URL for TimeGPT-2 family:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
base_url = 'https://api-preview.nixtla.io', # Needed for TimeGPT-2 family
api_key='my_api_key_provided_by_nixtla'
)
```
Verify the status and validity of your API key:
```python theme={null}
nixtla_client.validate_api_key()
```
```bash theme={null}
True
```
## Forecasting with TimeGPT-2 family
### Load your time series data
```python theme={null}
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()
```
| | timestamp | value |
| - | ---------- | ----- |
| 0 | 1949-01-01 | 112 |
| 1 | 1949-02-01 | 118 |
| 2 | 1949-03-01 | 132 |
| 3 | 1949-04-01 | 129 |
| 4 | 1949-05-01 | 121 |
### Generate the forecast
Forecast the next 12 months using the SDK's `forecast` method. You can switch the model to any of the TimeGPT-2 family of models - `timegpt-2-pro`, `timegpt-2-lab`, `timegpt-2-mini`, `timegpt-2.1`
```python theme={null}
timegpt_fcst_df = nixtla_client.forecast(
df,
h=12,
time_col="timestamp",
target_col="value",
model="timegpt-2.1",
)
```
### Plot the forecast
```python theme={null}
nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
```

## Summary
Using the TimeGPT-2 family of models is similar to using the TimeGPT-1 family with the following changes. In order to use TimeGPT-2 family of models
* Make sure that your account has access to these models
* Install the latest Nixtla client (>= 0.7.0)
* Make sure you use the right `base_url` while instantiating the client along with your API key.
Happy forecasting!
# Quickstart (TimeGPT-1)
Source: https://nixtla.io/docs/forecasting/timegpt_quickstart
Learn how to use TimeGPT for accurate time series forecasting
## TimeGPT-1 Family - Foundation Models for Time Series Forecasting
TimeGPT is a production-ready generative pretrained transformer for time series forecasting and predictions. It delivers accurate forecasts for retail sales, electricity demand, financial markets, and IoT sensor data with just a few lines of Python code. This quickstart guide will have you making your first forecast in under 5 minutes!
## Set Up TimeGPT for Python Time Series Forecasting
### Step 1: Get an API Key
* Visit [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/forecasting/timegpt_quickstart) to activate your free trial and create an account.
* Sign in using Google, GitHub, or your email.
* Navigate to **API Keys** in the menu and select **Create New API Key**.
* Your new API key will appear on the screen. Copy this key and save it in a safe place for use later.

### Step 2: Install Nixtla
Install the Nixtla library in your preferred Python environment:
```bash theme={null}
pip install nixtla
```
### Step 3: Import the Nixtla TimeGPT client
Import the Nixtla client and instantiate it with your API key:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 4: Verify your API key
Verify the status and validity of your API key:
```python theme={null}
nixtla_client.validate_api_key()
```
```bash theme={null}
True
```
For enhanced security practices, see our guide on
[Setting Up your API Key](/docs/setup/setting_up_your_api_key).
## Make Your First Time Series Forecast
We'll demonstrate TimeGPT's forecasting capabilities using the classic `AirPassengers` dataset, a monthly time series showing international airline passengers from 1949 to 1960.
```python theme={null}
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv')
df.head()
```
| | timestamp | value |
| - | ---------- | ----- |
| 0 | 1949-01-01 | 112 |
| 1 | 1949-02-01 | 118 |
| 2 | 1949-03-01 | 132 |
| 3 | 1949-04-01 | 129 |
| 4 | 1949-05-01 | 121 |
If you are using your own data, here are the data requirements:
* The target variable must not contain missing or non-numeric values.
* The timestamp column must not contain missing values.
* Date stamps must form a continuous sequence without gaps for the selected frequency.
* pandas must be able to parse the timestamp column as datetime objects. ([see Pandas documentation](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)).
For more details, visit [Data Requirements](/docs/data_requirements/data_requirements).
Plot the dataset:
```python theme={null}
nixtla_client.plot(df, time_col='timestamp', target_col='value')
```

The `plot` method automatically displays figures in notebook environments. To save a plot locally:
```python theme={null}
fig = nixtla_client.plot(df, time_col='timestamp', target_col='value')
fig.savefig('plot.png', bbox_inches='tight')
```
## Real-World Forecasting Applications
TimeGPT excels at:
* **Retail forecasting**: Predict product demand and inventory needs
* **Energy forecasting**: Forecast electricity consumption and renewable energy production
* **Financial forecasting**: Project revenue, sales, and market trends
* **IoT predictions**: Anticipate sensor readings and equipment metrics
## Short and Long-Term Forecasting Examples
### Generate a longer-term forecast
Forecast the next 12 months using the SDK's `forecast` method:
```python theme={null}
timegpt_fcst_df = nixtla_client.forecast(
df=df,
h=12,
freq='MS',
time_col='timestamp',
target_col='value'
)
timegpt_fcst_df.head()
```
Plot the forecast:
```python theme={null}
nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
```

You may also generate forecasts for longer horizons with the `timegpt-1-long-horizon` model. For example, 36 months ahead:
```python theme={null}
timegpt_fcst_df = nixtla_client.forecast(
df=df,
h=36,
freq='MS',
time_col='timestamp',
target_col='value',
model='timegpt-1-long-horizon'
)
timegpt_fcst_df.head()
```
Plot the forecast:
```python theme={null}
nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
```

### Generate a shorter-term forecast
Forecast the next 6 months with a single command:
```python theme={null}
timegpt_fcst_df = nixtla_client.forecast(
df=df,
h=6,
freq='MS',
time_col='timestamp',
target_col='value'
)
```
Plot the forecast:
```python theme={null}
nixtla_client.plot(df, timegpt_fcst_df, time_col='timestamp', target_col='value')
```

## Frequently Asked Questions
### How accurate is TimeGPT for forecasting?
TimeGPT achieves state-of-the-art accuracy across multiple domains including retail, finance, and electricity forecasting with zero-shot learning.
### Can I use TimeGPT with my own time series data?
Yes, TimeGPT works with any time series data in pandas DataFrame format with a timestamp and target value column.
### How long does it take to generate forecasts?
TimeGPT typically generates forecasts in seconds, making it suitable for production environments.
## Next Steps
Now that you've made your first forecast, explore these tutorials to unlock TimeGPT's full capabilities:
* [Improve Accuracy](/docs/forecasting/improve_accuracy) - Advanced techniques to enhance forecast accuracy
* [Fine-Tuning](/docs/forecasting/fine-tuning/steps) - Customize TimeGPT for your specific data
* [Exogenous Variables](/docs/forecasting/exogenous-variables/numeric_features) - Include external variables in forecasts
* [Uncertainty Quantification](/docs/forecasting/probabilistic/introduction) - Generate prediction intervals and quantile forecasts
* [Cross-Validation](/docs/forecasting/evaluation/cross_validation) - Assess forecast performance
* [Forecasting at Scale](/docs/forecasting/forecasting-at-scale/computing_at_scale) - Process thousands of time series
* [Anomaly Detection](/docs/anomaly_detection/historical_anomaly_detection) - Identify outliers in your data
# About TimeGPT
Source: https://nixtla.io/docs/introduction/about_timegpt
Learn about TimeGPT - the foundation model for time series.
TimeGPT is a production-ready generative pretrained transformer model specifically designed for time series forecasting. It accurately forecasts domains such as retail, electricity, finance, and IoT with minimal code. Below you'll find a high-level overview of its features, architecture, and practical examples.
You can access TimeGPT through:
* Self-hosted deployment on your infrastructure (recommended): [book a call](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=dc037f5a-d93b-4%5B…%5D90b-a611dd9460af\&utm_source=github\&utm_medium=pricing_page) for more information
* Hosted APIs: start your [free trial](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/about_timegpt)
* Azure Studio (TimeGEN-1)
Perform zero-shot inference out-of-the-box to forecast future values or detect anomalies. Fine-tune the model if you need more targeted performance.
For detailed instructions and advanced configurations, visit our
[Quickstart Guide](/docs/forecasting/timegpt_quickstart) and additional tutorials.
## Features and Capabilities
**[Zero-shot Inference](/docs/forecasting/timegpt_quickstart)**:
Generate forecasts and detect anomalies immediately without prior training. Quickly gain insights from your data.
**[Fine-tuning](/docs/forecasting/fine-tuning/steps)**:
Enhance prediction accuracy by training TimeGPT on your own datasets, tailoring it to your unique scenario.
**[API Access](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/about_timegpt)**:
Integrate forecasts into applications via a robust API. Easily obtain keys at the
[Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/about_timegpt).
Easily deploy TimeGPT in your own infrastructure or with any cloud provider using [Docker](/docs/setup/docker) or our Python [wheel file](/docs/setup/python_wheel).
Also accessible in [Azure Studio](/docs/setup/azureai) or through private deployment.
**[Add Exogenous Variables](/docs/forecasting/exogenous-variables/numeric_features)**:
Incorporate external variables (e.g., events, prices) to improve forecast accuracy.
**[Multiple Series Forecasting](/docs/forecasting/timegpt_quickstart)**:
Predict multiple time series at once, improving workflow efficiency.
**[Specific Loss Function](/docs/forecasting/fine-tuning/custom_loss)**:
Customize training with loss functions that match your performance objectives.
**[Cross-validation](/docs/forecasting/evaluation/cross_validation)**:
Evaluate model reliability and generalization with built-in cross-validation.
**[Prediction Intervals](/docs/forecasting/probabilistic/prediction_intervals)**:
Generate intervals to capture forecast uncertainty.
**[Irregular Timestamps](/docs/forecasting/special-topics/irregular_timestamps)**:
Process data with non-uniform timestamps directly, with no extra preprocessing.
**[Anomaly Detection](/docs/anomaly_detection/real-time/introduction)**:
Identify anomalies automatically, integrating external features for improved precision.
Get started quickly with the
[Quickstart guide](/docs/forecasting/timegpt_quickstart). Explore in-depth tutorials on TimeGPT capabilities and real-world applications.
## Architecture

TimeGPT's architecture builds on the self-attention mechanism introduced in the original ["Attention is All You Need"](https://arxiv.org/abs/1706.03762) paper. Unlike typical large language models (LLMs), TimeGPT is independently trained on extensive time series datasets to minimize forecasting errors.
TimeGPT employs an encoder-decoder structure with residual connections, layer normalization, and a linear output layer to match the decoder outputs to forecast dimensions. The attention-based mechanisms help the model capture diverse historical patterns to create accurate future predictions.
The model processes input sequences from left to right, similar to how humans read sentences, and predicts future values (*"tokens"*) based on historical windows of time series data.
## Explore Examples and Use Cases
Quickly set up your workflow using our
[Quickstart Guide](/docs/forecasting/timegpt_quickstart)
or learn to use the API by
[setting up your API key](/docs/setup/setting_up_your_api_key).
* [Anomaly Detection](/docs/anomaly_detection/real-time/introduction)
* Fine-tuning with
[custom loss functions](/docs/forecasting/fine-tuning/custom_loss)
* Scaling workflows using
[Spark](/docs/forecasting/forecasting-at-scale/spark),
[Dask](/docs/forecasting/forecasting-at-scale/dask), or
[Ray](/docs/forecasting/forecasting-at-scale/ray)
* Integrating
[exogenous variables](/docs/forecasting/exogenous-variables/numeric_features),
validation with
[cross-validation](/docs/forecasting/evaluation/cross_validation),
and estimating uncertainty via
[quantile forecasts](/docs/forecasting/probabilistic/quantiles)
or
[prediction intervals](/docs/forecasting/probabilistic/prediction_intervals).
* [Web Traffic Forecasting](/docs/use_cases/forecasting_web_traffic)
* [Bitcoin Price Prediction](/docs/use_cases/bitcoin_price_prediction)
With TimeGPT, you can rapidly iterate from initial exploration to high-accuracy forecasting. Dive deeper into the comprehensive tutorials for more sophisticated workflows.
# TimeGPT FAQ
Source: https://nixtla.io/docs/introduction/faq
Frequently asked questions about TimeGPT
Get started with TimeGPT in minutes
Set up the Python SDK for TimeGPT
Review subscription plans and pricing
## Commonly asked questions
TimeGPT is the first foundation model for time series forecasting. It produces accurate forecasts for new time series across diverse domains using only historical values as inputs. The model reads time series data sequentially from left to right, similar to how humans read a sentence. It examines windows of past data as "tokens" and predicts what comes next based on identified patterns that extrapolate into the future. Beyond forecasting, TimeGPT supports other time series tasks, including what-if scenarios and anomaly detection.
TimeGPT is specifically designed for time series data, not text.
No, TimeGPT is not based on any large language model. While it follows the principle of training a large transformer model on a vast dataset, its architecture specifically handles time series data and minimizes forecasting errors.
To get started with TimeGPT, register for an account at [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq). After confirming your signup via email, you can access your dashboard with account details.
Create an account at [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq)
Click the confirmation link in your email
Find your API key in the dashboard under "API Keys"
Run `pip install nixtla` to install the Python SDK
For a deeper understanding of TimeGPT, refer to the [research paper](https://arxiv.org/pdf/2310.03589.pdf). While some aspects of the model architecture remain confidential, registration for TimeGPT is open to everyone.
You can use TimeGPT through the Python SDK or the REST API.
```python Python SDK Forecast Example theme={null}
from nixtla import NixtlaClient
# Initialize client with your API key
client = NixtlaClient(api_key="your_api_key")
# Make a forecast
forecast = client.forecast(df, h=7)
```
```bash REST API Forecast Example theme={null}
curl -X POST "https://api.nixtla.io/timegpt" \
-H "accept: application/json" \
-H "x-api-key: your_api_key" \
-H "Content-Type: application/json" \
-d '{"df": [{"ds": "2023-01-01", "y": 100}, ...], "h": 7}'
```
Both methods require an API key, obtained upon registration and available in your dashboard under "API Keys".
An API key is a unique string of characters that authenticates your requests when using the Nixtla SDK, ensuring only authorized users can make requests.
Your API key is personal and should not be shared with anyone or exposed in client-side code.
Upon registration, you receive an API key available in your [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq) under "API Keys". Keep your API key confidential.
To integrate your API key into your development workflow, refer to the [Setting Up Your API Key](/docs/setup/setting_up_your_api_key) tutorial.
```python Python API Key Example theme={null}
from nixtla import NixtlaClient
client = NixtlaClient(api_key="your_api_key")
```
```bash REST API Key Example theme={null}
curl -X POST "https://api.nixtla.io/timegpt" \
-H "accept: application/json" \
-H "x-api-key: your_api_key" \
-H "Content-Type: application/json" \
-d '{"df": [{"ds": "2023-01-01", "y": 100}, ...], "h": 7}'
```
Check your API key status with the [`validate_api_key` method](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-validate-api-key) of the `NixtlaClient` class.
```python Validate API Key Example theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla')
nixtla_client.validate_api_key()
```
```bash Log Output theme={null}
INFO:nixtla.nixtla_client:Happy Forecasting! :), If you have questions or need support, please email support@nixtla.io
True
```
When you validate your API key and it returns `False`:
* If you are targeting an Azure endpoint, getting `False` from the `NixtlaClient.validate_api_key` method is expected. You can skip this step when targeting an Azure endpoint and proceed diretly to forecasting instead.
* If you are not taregting an Azure endpoint, then you should check the following:
* Make sure you are using the latest version of the SDK (Python or R).
* Check that your API key is active in your dashboard by visiting [https://nixtla.io/free-trial?utm\_source=nixtla.io\&utm\_campaign=/docs/introduction/faq](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq).
* Consider any firewalls your organization might have. There may be restricted access. If so, you can whitelist our endpoint [https://api.nixtla.io/](https://api.nixtla.io/).
* To use Nixtla's API, you need to let your system know that our endpoint is ok, so it will let you access it. Whitelisting the endpoint isn't something that Nixtla can do on our side. It's something that needs to be done on the user's system. This is a bit of an [overview on whitelisting](https://www.csoonline.com/article/569493/whitelisting-explained-how-it-works-and-where-it-fits-in-a-security-program.html).
* If you work in an organization, please work with an IT team. They're likely the ones setting the security and you can talk with them to get it addressed. If you run your own systems, then it's something you should be able to update, depending on the system you're using.
At Nixtla, we take privacy and security very seriously. To ensure you understand our data policies, refer to these documents:
Our data privacy policies
Python SDK license
TimeGPT service terms
We offer a self-hosted version of TimeGPT, allowing you complete control over your data - your data never leaves your premises. You can either use [Docker](/docs/setup/docker) or a [Python wheel file](/docs/setup/python_wheel). If interested in these option, contact us at `support@nixtla.io`.
Common errors and warnings
```python Invalid API Key Error theme={null}
ApiError: status_code: 401, body: {'data': None, 'message': 'Invalid API key', 'details': 'Key not found', 'code': 'A12', 'requestID': 'E7F2BBTB2P', 'support': 'If you have questions or need support, please email support@nixtla.io'}
```
This error occurs when your TimeGPT API key is invalid or not set up correctly. Use the `validate_api_key` method to verify it or check that you copied it correctly from the "API Keys" section of your [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq).
```python Too Many Requests Error theme={null}
ApiError: status_code: 429, body: {'data': None, 'message': 'Too many requests', 'details': 'You need to add a payment method to continue using the API, do so from https://nixtla.io/free-trial?utm_source=nixtla.io&utm_campaign=/docs/introduction/faq', 'code': 'A21', 'requestID': 'NCJDK7KSJ6', 'support': 'If you have questions or need support, please email support@nixtla.io'}
```
This error occurs when you have exhausted your free credits and need to add a payment method to continue using TimeGPT. Add a payment method in the "Billing" section of your [dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/faq).
A `WriteTimeout` error indicates the request exceeded allowable processing time. This commonly happens with large datasets. To fix this, increase the `num_partitions` parameter in the [`forecast` method](https://nixtlaverse.nixtla.io/nixtla/nixtla_client.html#nixtlaclient-forecast) of the `NixtlaClient` class, or use a distributed backend.
Get Help with TimeGPT
For more questions or support, reach out through one of our channels:
For technical questions or bugs
For general inquiries or support
Connect with our team and community
When reporting issues, include your API key status, SDK version, and sample code to help us assist you more quickly.
## Features & Capabilities
TimeGPT accepts pandas dataframes in [long format](https://www.theanalysisfactor.com/wide-and-long-data/#comments) with these necessary columns:
Timestamp in format `YYYY-MM-DD` or `YYYY-MM-DD HH:MM:SS`
The target variable to forecast
You can also pass a DataFrame with a DatetimeIndex without the `ds` column.
```python Example Input DataFrame theme={null}
import pandas as pd
# Create sample data
data = {
'ds': ['2023-01-01', '2023-01-02', '2023-01-03'],
'y': [10, 12, 15]
}
df = pd.DataFrame(data)
df['ds'] = pd.to_datetime(df['ds'])
print(df)
```
```
ds y
0 2023-01-01 10
1 2023-01-02 12
2 2023-01-03 15
```
TimeGPT also works with [distributed dataframes](/docs/forecasting/forecasting-at-scale/computing_at_scale) like `dask`, `spark`, and `ray`.
Yes, TimeGPT can forecast multiple time series simultaneously.
For guidance on forecasting multiple time series at once, consult the [Multiple Series](/docs/forecasting/timegpt_quickstart) tutorial.
```python Multiple Series Forecasting theme={null}
# Example of forecasting multiple series
from nixtla import NixtlaClient
# Initialize client
client = NixtlaClient(api_key="your_api_key")
# Group identifier for multiple series
df['unique_id'] = df['store_id'] + '_' + df['item_id']
# Forecast multiple series at once
forecast = client.forecast(df, h=7, level=[80, 90])
```
Yes, TimeGPT can incorporate external variables into forecasts.
For instructions on incorporating exogenous variables to TimeGPT, see the [Exogenous Variables](/docs/forecasting/exogenous-variables/numeric_features) tutorial. For incorporating calendar dates, the [Holidays and Special Dates](https://docs.nixtla.io/docs/tutorials-holidays_and_special_dates) tutorial might help. For categorical variables, refer to the [Categorical Variables](https://docs.nixtla.io/docs/tutorials-categorical_variables) tutorial.
```python Exogenous Variables Forecast theme={null}
# Forecasting with exogenous variables
forecast = client.forecast(
df,
h=7,
X_df=exog_df # DataFrame with exogenous variables
)
```
Yes. To forecast historical data using TimeGPT, use cross-validation. See the full tutorial on [cross-validation](/docs/forecasting/evaluation/cross_validation).
```python Historical Forecast theme={null}
# Get in-sample predictions
historical_forecast = client.cross_validation(
df,
h=12,
n_windows=11 # Set as many windows as you want
)
```
TimeGPT has no maximum forecast horizon, but performance decreases as the horizon increases. When the forecast horizon exceeds the data's seasonal length (for example, more than 12 months for monthly data), you will receive this message:
`WARNING:nixtla.nixtla_client:The specified horizon "h" exceeds the model horizon. This may lead to less accurate forecasts. Please consider using a smaller horizon`
For details, refer to the [Long Horizon in Time Series](/docs/forecasting/model-version/longhorizon_model) tutorial.
For best results, keep your forecast horizon within the seasonal pattern of your data.
Yes, TimeGPT includes anomaly detection capabilities.
To learn how to use TimeGPT for anomaly detection, refer to the [Anomaly Detection](/docs/anomaly_detection/real-time/introduction) tutorial.
```python Anomaly Detection Example theme={null}
# Detect anomalies in time series
anomalies = client.detect_anomalies(df)
```
Yes. To learn how to use TimeGPT for cross-validation, refer to the [Cross-Validation](/docs/forecasting/evaluation/cross_validation) tutorial.
```python Cross-Validation Example theme={null}
# Perform cross-validation
cv_results = client.cross_validation(
df,
h=7,
k=3, # Number of folds
test_size=7 # Size of each test fold
)
```
Yes. For more information, explore the [Prediction Intervals](/docs/forecasting/probabilistic/prediction_intervals) and [Quantile Forecasts](/docs/forecasting/probabilistic/quantiles) tutorials.
```python Prediction Intervals Example theme={null}
# Generate prediction intervals
forecast_with_intervals = client.forecast(
df,
h=7,
level=[80, 90, 95] # Confidence levels
)
```
Yes, TimeGPT works with distributed computing frameworks for large datasets.
For large datasets with hundreds of thousands or millions of time series, we recommend using a distributed backend. TimeGPT works with several [distributed computing frameworks](/docs/forecasting/forecasting-at-scale/computing_at_scale), including [Spark](/docs/forecasting/forecasting-at-scale/spark), [Ray](/docs/forecasting/forecasting-at-scale/ray), and [Dask](/docs/forecasting/forecasting-at-scale/dask).
```python Using Dask Example theme={null}
import dask.dataframe as dd
# Convert to Dask DataFrame
dask_df = dd.from_pandas(df, npartitions=4)
# Forecast using Dask backend
forecast = client.forecast(dask_df, h=7)
```
TimeGPT supports any amount of data for generating point forecasts and can produce results with just one observation per series. When using arguments such as `level`, `finetune_steps`, `X_df` (exogenous variables), or `add_history`, additional data points are necessary depending on data frequency. For more details, refer to the [Data Requirements](/docs/data_requirements/data_requirements) tutorial.
While TimeGPT can work with minimal data, more historical data typically produces better forecasts.
TimeGPT cannot handle missing values or series with irregular timestamps.
For more information, see the [Forecasting Time Series with Irregular Timestamps](/docs/forecasting/special-topics/irregular_timestamps) and [Dealing with Missing Values](/docs/data_requirements/missing_values) tutorials.
The `NixtlaClient` class has a [`plot` method](/docs/reference/sdk_reference#nixtlaclient-plot) for visualizing forecasts. This method works only in interactive environments such as Jupyter notebooks, not in Python scripts.
```python Plotting Forecast Example theme={null}
# Plot forecast
client.plot(
historical_data=df,
forecast_data=forecast,
level=[80, 95] # Optional: show prediction intervals
)
```
Currently, TimeGPT does not support polars.
Yes, TimeGPT produces consistent results for identical inputs.
TimeGPT is engineered for stability, ensuring consistent results for identical input data. Given the same dataset, the model will produce the same forecasts.
While not the primary use case for TimeGPT, it can generate solid results on simple data patterns like straight lines. Zero-shot predictions might not always meet expectations, but fine-tuning allows TimeGPT to quickly grasp trends and produce accurate forecasts. For more details, refer to the [Improve Forecast Accuracy with TimeGPT](/docs/forecasting/improve_accuracy) tutorial.
Fine-tuning improves TimeGPT's performance for your specific data patterns.
TimeGPT was trained on the largest publicly available time series dataset, covering domains including finance, retail, healthcare, and more. This comprehensive training enables TimeGPT to produce accurate forecasts for new time series without additional training (zero-shot learning).
While the zero-shot model provides a solid baseline, TimeGPT performance often improves through fine-tuning. During this process, the TimeGPT model undergoes additional training using your specific dataset, starting from the pre-trained parameters.
```python Fine-tuning Example theme={null}
# Fine-tune with 100 steps
forecast = client.forecast(
df,
h=7,
finetune_steps=100,
finetune_loss="mse" # Mean Squared Error
)
```
For a comprehensive guide on fine-tuning, refer to the [fine-tuning](/docs/forecasting/fine-tuning/steps) and [fine-tuning with a specific loss function](/docs/forecasting/fine-tuning/custom_loss) tutorials.
No, you do not need to fine-tune every series individually. When using the `finetune_steps` parameter, the model fine-tunes across all series in your dataset simultaneously. This cross-learning approach allows the model to learn from multiple series at once, which can improve individual forecasts.
Selecting the right number of fine-tuning steps may require experimentation. As fine-tuning steps increase, the model becomes more specialized to your dataset but takes longer to train and may become more prone to overfitting.
Yes, you can save and reuse fine-tuned models.
You can fine-tune the TimeGPT model, save it, and reuse it later. For detailed instructions, see our guide on [Re-using Fine-tuned Models](/docs/forecasting/fine-tuning/save_reuse_delete_finetuned_models).
```python Save Fine-tuned Model theme={null}
# Fine-tune and save the model
fine_tuned_parameters = client.forecast(
df,
h=7,
finetune_steps=100,
return_model=True # Return the fine-tuned parameters
)
# Save to file
import pickle
with open("fine_tuned_model.pkl", "wb") as f:
pickle.dump(fine_tuned_parameters, f)
```
```python Load Fine-tuned Model theme={null}
# Load the fine-tuned parameters
import pickle
with open("fine_tuned_model.pkl", "rb") as f:
fine_tuned_parameters = pickle.load(f)
# Use the fine-tuned model
forecast = client.forecast(
new_df,
h=7,
model=fine_tuned_parameters
)
```
Need more help? Contact our [support team](mailto:support@nixtla.io).
# Introduction
Source: https://nixtla.io/docs/introduction/introduction
Welcome to TimeGPT - The foundational model for time series forecasting and anomaly detection
## Power your time series analysis with TimeGPT
TimeGPT is the first foundation model for time series, providing state-of-the-art forecasting and anomaly detection capabilities to help you make better decisions with your time series data.
Get started with TimeGPT in minutes with our simple Python interface
Set up your environment to start using TimeGPT right away
## Core Capabilities
Explore the powerful features that TimeGPT offers for your time series needs.
Generate accurate predictions for your time series data
Identify unusual patterns in historical data
Detect anomalies as they happen with online detection
## Learn & Explore
Enhance your skills with our comprehensive tutorials and use cases.
Practical guides to get the most out of TimeGPT
See how TimeGPT solves real business problems
Learn how to use TimeGPT with big data frameworks
Take your models further with fine-tuning and specialized techniques
## Resources
Find additional resources to help you succeed with TimeGPT.
Detailed SDK documentation for developers
Get answers to commonly asked questions
# TimeGPT Subscription Plans
Source: https://nixtla.io/docs/introduction/timegpt_subscription_plans
Overview of TimeGPT's Enterprise subscription plans with deployment options, support, and trial details.
## Overview
TimeGPT provides multiple Enterprise subscription plans that can be tailored
to meet your specific forecasting requirements. This includes customization of
API call limits, user seats, and varying levels of support.
* Scalable API calls to match your organization’s growth
* Flexible user access management
* High-level support options (email, chat, phone, or dedicated support)
We offer three main options to use TimeGPT:
The easiest option, you don't need to worry about any infrastructure just make calls directly to TimeGPT using any of our SDKs or plugins.
Host TimeGPT on Azure, managed by Nixtla.\
• Quick setup with minimal maintenance requirements.\
• Automatic updates and patches.\
• Ideal for teams wanting a fully managed solution.
Host TimeGPT in your own infrastructure.\
• Greater control over data and security.\
• Customizable configurations for specific compliance needs.\
• Ideal for organizations requiring on-premise or private cloud solutions.
## Get in Touch
If you'd like to explore custom plan options—for instance, adjusting the number of API calls, user limits, or support level—reach out to us at [support@nixtla.io](mailto:support@nixtla.io).\
You can also schedule a demo through this [link](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=dc037f5a-d93b-4%5B…%5D90b-a611dd9460af\&utm_source=github\&utm_medium=pricing_page) to see TimeGPT in action and discuss your needs in more detail.
**Free Trial Available!**\\
When you [**create your account**](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/timegpt_subscription_plans), you receive a 30-day free trial with no credit card required. Your access expires after 30 days unless you upgrade to a paid plan. If you need more time to evaluate or want to continue using TimeGPT, please **contact us** for flexible plan options.
Visit [our dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/timegpt_subscription_plans) and sign up for a new
account.
Log in and explore the TimeGPT interface to set up forecasting tasks,
configure APIs, and invite team members.
Before the trial ends, decide if you’d like to upgrade to an Enterprise plan
or contact us for a custom plan.
**Pricing and Billing Information**\\
Additional pricing details and frequently asked questions can be found on our [FAQ page](introduction/faq).
Ready to see TimeGPT in action? Schedule a personalized demo to learn how
TimeGPT can enhance your forecasting capabilities.
Have questions, need custom requirements, or want more info on our plans and
deployments? Contact us any time.
# Why TimeGPT?
Source: https://nixtla.io/docs/introduction/why_timegpt
Understand the benefits of using TimeGPT for time series analysis.
## Why TimeGPT?
TimeGPT is a powerful, general-purpose time series forecasting solution. Throughout this notebook, we compare TimeGPT's performance against three popular forecasting approaches:
* Classical model (ARIMA)
* Machine learning model (LightGBM)
* Deep learning model (N-HiTS)
Below are three core benefits that our users value the most:
TimeGPT consistently outperforms traditional models by accurately capturing complex patterns.
Quickly generates forecasts with minimal training and tuning requirements per series.
Minimal setup and no complex preprocessing make TimeGPT immediately accessible for use.
## TimeGPT Advantage
TimeGPT delivers **superior results with minimal effort** compared to traditional approaches. In head-to-head testing against ARIMA, LightGBM, and N-HiTS models on M5 competition data, TimeGPT consistently achieves better accuracy metrics (**lowest RMSE at 592.6** and **SMAPE at 4.94%**).
Unlike other models which require:
* *Extensive preprocessing*
* *Parameter tuning*
* *Significant computational resources*
TimeGPT provides **powerful forecasting capabilities** with a simple API interface, making advanced time series analysis **accessible to users of all technical backgrounds**.
This notebook uses an aggregated subset from the M5 Forecasting Accuracy competition. The dataset:
* Consists of **7 daily time series**
* Has **1,941 observations** per series
* Reserves the last **28 observations** for evaluation on unseen data
```python Data Loading and Stats Preview theme={null}
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from nixtla import NixtlaClient
from utilsforecast.plotting import plot_series
from utilsforecast.losses import mae, rmse, smape
from utilsforecast.evaluation import evaluate
nixtla_client = NixtlaClient(
# api_key='my_api_key_provided_by_nixtla'
)
df = pd.read_csv(
'https://datasets-nixtla.s3.amazonaws.com/demand_example.csv',
parse_dates=['ds']
)
# Display aggregated statistics per time series
df.groupby('unique_id').agg({
"ds": ["min", "max", "count"],
"y": ["min", "mean", "median", "max"]
})
```
Below is a preview of the aggregated statistics for each of the 7 time series.
| unique\_id | min date | max date | count | min y | mean y | median y | max y |
| ---------- | ---------- | ---------- | ----- | ----- | -------- | -------- | ------ |
| FOODS\_1 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 2674.086 | 2665.0 | 5493.0 |
| FOODS\_2 | 2011-01-29 | 2016-05-22 | 1941 | 0.0 | 4015.984 | 3894.0 | 9069.0 |
| ... | ... | ... | ... | ... | ... | ... | ... |
Next, we split our dataset into training and test sets. Here, we use data up to "2016-04-24" for training and the remaining data for testing.
```python Train-Test Split Example theme={null}
df_train = df.query('ds <= "2016-04-24"')
df_test = df.query('ds > "2016-04-24"')
print(df_train.shape, df_test.shape)
# (13391, 3) (196, 3)
```
TimeGPT is compared against four different modeling approaches. Each approach forecasts the final 28 days of our dataset and we compare results across Root Mean Squared Error (RMSE) and Symmetric Mean Absolute Percentage Error (SMAPE).
TimeGPT offers a streamlined solution for time series forecasting with minimal setup.
```python TimeGPT Forecasting with NixtlaClient theme={null}
fcst_timegpt = nixtla_client.forecast(
df=df_train,
target_col='y',
h=28,
model='timegpt-1-long-horizon',
finetune_steps=10,
level=[90]
)
evaluation_timegpt.groupby(['metric'])['TimeGPT'].mean()
# metric
# rmse 592.607378
# smape 0.049403
# Name: TimeGPT, dtype: float64
```
ARIMA is a common baseline for time series, though it often requires more data preprocessing and does not handle multiple series as efficiently.
```python ARIMA Forecasting Using StatsForecast theme={null}
from statsforecast import StatsForecast
from statsforecast.models import AutoARIMA
sf = StatsForecast(models=[AutoARIMA()], freq='D')
fcst_arima = sf.forecast(h=28, df=df_train)
# Evaluation methods omitted here for brevity
```
LightGBM is a popular gradient-boosted tree approach. However, careful feature engineering is typically required for optimal results.
```python LightGBM Modeling with AutoMLForecast theme={null}
import optuna
from mlforecast.auto import AutoMLForecast, AutoLightGBM
mlf = AutoMLForecast(models=[AutoLightGBM()], freq='D')
mlf.fit(df_train)
fcst_lgbm = mlf.predict(28)
# Evaluation methods omitted here for brevity
```
N-HiTS is a deep learning architecture for time series. While powerful, it often requires GPU resources and more hyperparameter tuning.
```python N-HiTS Deep Learning Forecast theme={null}
from neuralforecast.core import NeuralForecast
from neuralforecast.models import NHITS
nf = NeuralForecast(models=[NHITS()], freq='D')
nf.fit(df=df_train)
fcst_nhits = nf.predict()
# Evaluation methods omitted here for brevity
```
Below is a summary of the performance metrics (RMSE and SMAPE) on the test dataset. TimeGPT consistently delivers superior forecasting accuracy:
| Model | RMSE | SMAPE |
| -------- | ----- | ----- |
| ARIMA | 724.9 | 5.50% |
| LightGBM | 687.8 | 5.14% |
| N-HiTS | 605.0 | 5.34% |
| TimeGPT | 592.6 | 4.94% |


TimeGPT stands out with its accuracy, speed, and ease of use. Get started today by visiting the
[Nixtla dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/introduction/why_timegpt) to generate your
`api_key` and access advanced forecasting with minimal overhead.
# Date Features
Source: https://nixtla.io/docs/reference/date_features
Use holidays flags and special dates to improve your accuracy
Date features are an essential part of time series analysis. This document introduces helpful classes (CountryHolidays and SpecialDates) for generating holiday flags, custom date markers, and adding them to TimeGPT.
## Overview
Easily attach holiday flags for multiple countries based on a list of countries.
Add flags for custom events or significant dates you define.
These classes help you enrich your time series datasets with relevant date-based signals. Use them alongside standard data preprocessing techniques to enhance your model's understanding of seasonality and special events.
source
#### CountryHolidays
> ```text theme={null}
> CountryHolidays (countries:list[str])
> ```
*Given a list of countries, returns a dataframe with holidays for each
country.*
```python theme={null}
import pandas as pd
```
| | US\_New Year's Day | US\_Memorial Day | US\_Independence Day | US\_Labor Day | US\_Veterans Day | US\_Veterans Day (observed) | US\_Thanksgiving | US\_Christmas Day | US\_Martin Luther King Jr. Day | US\_Washington's Birthday | ... | US\_Juneteenth National Independence Day (observed) | US\_Christmas Day (observed) | MX\_Año Nuevo | MX\_Día de la Constitución | MX\_Natalicio de Benito Juárez | MX\_Día del Trabajo | MX\_Día de la Independencia | MX\_Día de la Revolución | MX\_Transmisión del Poder Ejecutivo Federal | MX\_Navidad |
| ---------- | ------------------ | ---------------- | -------------------- | ------------- | ---------------- | --------------------------- | ---------------- | ----------------- | ------------------------------ | ------------------------- | --- | --------------------------------------------------- | ---------------------------- | ------------- | -------------------------- | ------------------------------ | ------------------- | --------------------------- | ------------------------ | ------------------------------------------- | ----------- |
| 2018-09-03 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2018-09-04 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2018-09-05 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2018-09-06 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2018-09-07 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
```python theme={null}
c_holidays = CountryHolidays(countries=['US', 'MX'])
periods = 365 * 5
dates = pd.date_range(end='2023-09-01', periods=periods)
holidays_df = c_holidays(dates)
holidays_df.head()
```
***
source
#### SpecialDates
> ```text theme={null}
> SpecialDates (special_dates:dict[str,list[str]])
> ```
*Given a dictionary of categories and dates, returns a dataframe with
the special dates.*
```python theme={null}
special_dates = SpecialDates(
special_dates={
'Important Dates': ['2021-02-26', '2020-02-26'],
'Very Important Dates': ['2021-01-26', '2020-01-26', '2019-01-26']
}
)
periods = 365 * 5
dates = pd.date_range(end='2023-09-01', periods=periods)
holidays_df = special_dates(dates)
holidays_df.head()
```
| | Important Dates | Very Important Dates |
| ---------- | --------------- | -------------------- |
| 2018-09-03 | 0 | 0 |
| 2018-09-04 | 0 | 0 |
| 2018-09-05 | 0 | 0 |
| 2018-09-06 | 0 | 0 |
| 2018-09-07 | 0 | 0 |
# SDK Reference
Source: https://nixtla.io/docs/reference/sdk_reference
***
source
## NixtlaClient
> ```text theme={null}
> NixtlaClient (api_key:Optional[str]=None, base_url:Optional[str]=None,
> timeout:Optional[int]=60, max_retries:int=6,
> retry_interval:int=10, max_wait_time:int=360)
> ```
*Client to interact with the Nixtla API.*
| | **Type** | **Default** | **Details** |
| --------------- | -------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| api\_key | Optional | None | The authorization api\_key interacts with the Nixtla API.
If not provided, will use the NIXTLA\_API\_KEY environment variable. |
| base\_url | Optional | None | Custom base\_url.
If not provided, will use the NIXTLA\_BASE\_URL environment variable. |
| timeout | Optional | 60 | Request timeout in seconds. Set this to `None` to disable it. |
| max\_retries | int | 6 | The maximum number of attempts to make when calling the API before giving up.
It defines how many times the client will retry the API call if it fails.
Default value is 6, indicating the client will attempt the API call up to 6 times in total |
| retry\_interval | int | 10 | The interval in seconds between consecutive retry attempts.
This is the waiting period before the client tries to call the API again after a failed attempt.
Default value is 10 seconds, meaning the client waits for 10 seconds between retries. |
| max\_wait\_time | int | 360 | The maximum total time in seconds that the client will spend on all retry attempts before giving up.
This sets an upper limit on the cumulative waiting time for all retry attempts.
If this time is exceeded, the client will stop retrying and raise an exception.
Default value is 360 seconds, meaning the client will cease retrying if the total time
spent on retries exceeds 360 seconds.
The client throws a ReadTimeout error after 60 seconds of inactivity. If you want to
catch these errors, use max\_wait\_time >> 60. |
***
source
## NixtlaClient.validate\_api\_key
> ```text theme={null}
> NixtlaClient.validate_api_key (log:bool=True)
> ```
*Check API key status.*
| | **Type** | **Default** | **Details** |
| ----------- | -------- | ----------- | ----------------------------- |
| log | bool | True | Show the endpoint’s response. |
| **Returns** | **bool** | | **Whether API key is valid.** |
***
source
## NixtlaClient.forecast
> ```text theme={null}
> NixtlaClient.forecast (df:~AnyDFType, h:typing.Annotated[int,Gt(gt=0)],
> freq:Union[str,int,pandas._libs.tslibs.offsets.Bas
> eOffset,NoneType]=None, id_col:str='unique_id',
> time_col:str='ds', target_col:str='y',
> X_df:Optional[~AnyDFType]=None,
> level:Optional[list[Union[int,float]]]=None,
> quantiles:Optional[list[float]]=None,
> finetune_steps:typing.Annotated[int,Ge(ge=0)]=0,
> finetune_depth:Literal[1,2,3,4,5]=1, finetune_loss
> :Literal['default','mae','mse','rmse','mape','smap
> e']='default',
> finetuned_model_id:Optional[str]=None,
> clean_ex_first:bool=True,
> hist_exog_list:Optional[list[str]]=None,
> validate_api_key:bool=False,
> add_history:bool=False, date_features:Union[bool,l
> ist[Union[str,Callable]]]=False, date_features_to_
> one_hot:Union[bool,list[str]]=False, model:Literal
> ['azureai','timegpt-1','timegpt-1-long-
> horizon']='timegpt-1', num_partitions:Optional[Ann
> otated[int,Gt(gt=0)]]=None,
> feature_contributions:bool=False)
> ```
*Forecast your time series using TimeGPT.*
| | **Type** | **Default** | **Details** |
| ---------------------------- | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| df | AnyDFType | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. |
| h | Annotated | | Forecast horizon. |
| freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). |
| id\_col | str | unique\_id | Column that identifies each series. |
| time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
| target\_col | str | y | Column that contains the target. |
| X\_df | Optional | None | DataFrame with \[`unique_id`, `ds`] columns and `df`’s future exogenous. |
| level | Optional | None | Confidence levels between 0 and 100 for prediction intervals. |
| quantiles | Optional | None | Quantiles to forecast, list between (0, 1).
`level` and `quantiles` should not be used simultaneously.
The output dataframe will have the quantile columns
formatted as TimeGPT-q-(100 \* q) for each q.
100 \* q represents percentiles but we choose this notation
to avoid having dots in column names. |
| finetune\_steps | Annotated | 0 | Number of steps used to finetune learning TimeGPT in the
new data. |
| finetune\_depth | Literal | 1 | The depth of the finetuning. Uses a scale from 1 to 5, where 1 means little finetuning,
and 5 means that the entire model is finetuned. |
| finetune\_loss | Literal | default | Loss function to use for finetuning. Options are: `default`, `mae`, `mse`, `rmse`, `mape`, and `smape`. |
| finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use. |
| clean\_ex\_first | bool | True | Clean exogenous signal before making forecasts using TimeGPT. |
| hist\_exog\_list | Optional | None | Column names of the historical exogenous features. |
| validate\_api\_key | bool | False | If True, validates api\_key before sending requests. |
| add\_history | bool | False | Return fitted values of the model. |
| date\_features | Union | False | Features computed from the dates.
Can be pandas date attributes or functions that will take the dates as input.
If True automatically adds most used date features for the
frequency of `df`. |
| date\_features\_to\_one\_hot | Union | False | Apply one-hot encoding to these date features.
If `date_features=True`, then all date features are
one-hot encoded by default. |
| model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. |
| num\_partitions | Optional | None | Number of partitions to use.
If None, the number of partitions will be equal
to the available parallel resources in distributed environments. |
| feature\_contributions | bool | False | |
| **Returns** | **AnyDFType** | | **DataFrame with TimeGPT forecasts for point predictions and probabilistic
predictions (if level is not None).** |
***
source
## NixtlaClient.cross\_validation
> ```text theme={null}
> NixtlaClient.cross_validation (df:~AnyDFType,
> h:typing.Annotated[int,Gt(gt=0)], freq:Uni
> on[str,int,pandas._libs.tslibs.offsets.Bas
> eOffset,NoneType]=None,
> id_col:str='unique_id', time_col:str='ds',
> target_col:str='y', level:Optional[list[Un
> ion[int,float]]]=None,
> quantiles:Optional[list[float]]=None,
> validate_api_key:bool=False, n_windows:typ
> ing.Annotated[int,Gt(gt=0)]=1, step_size:O
> ptional[Annotated[int,Gt(gt=0)]]=None, fin
> etune_steps:typing.Annotated[int,Ge(ge=0)]
> =0, finetune_depth:Literal[1,2,3,4,5]=1, f
> inetune_loss:Literal['default','mae','mse'
> ,'rmse','mape','smape']='default',
> finetuned_model_id:Optional[str]=None,
> refit:bool=True, clean_ex_first:bool=True,
> hist_exog_list:Optional[list[str]]=None,
> date_features:Union[bool,list[str]]=False,
> date_features_to_one_hot:Union[bool,list[s
> tr]]=False, model:Literal['azureai','timeg
> pt-1','timegpt-1-long-
> horizon']='timegpt-1', num_partitions:Opti
> onal[Annotated[int,Gt(gt=0)]]=None)
> ```
*Perform cross validation in your time series using TimeGPT.*
| | **Type** | **Default** | **Details** |
| ---------------------------- | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| df | AnyDFType | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. |
| h | Annotated | | Forecast horizon. |
| freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). |
| id\_col | str | unique\_id | Column that identifies each series. |
| time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
| target\_col | str | y | Column that contains the target. |
| level | Optional | None | Confidence level between 0 and 100 for prediction intervals. |
| quantiles | Optional | None | Quantiles to forecast, list between (0, 1).
`level` and `quantiles` should not be used simultaneously.
The output dataframe will have the quantile columns
formatted as TimeGPT-q-(100 \* q) for each q.
100 \* q represents percentiles but we choose this notation
to avoid having dots in column names. |
| validate\_api\_key | bool | False | If True, validates api\_key before sending requests. |
| n\_windows | Annotated | 1 | Number of windows to evaluate. |
| step\_size | Optional | None | Step size between each cross validation window. If None it will be equal to `h`. |
| finetune\_steps | Annotated | 0 | Number of steps used to finetune TimeGPT in the
new data. |
| finetune\_depth | Literal | 1 | The depth of the finetuning. Uses a scale from 1 to 5, where 1 means little finetuning,
and 5 means that the entire model is finetuned. |
| finetune\_loss | Literal | default | Loss function to use for finetuning. Options are: `default`, `mae`, `mse`, `rmse`, `mape`, and `smape`. |
| finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use. |
| refit | bool | True | Fine-tune the model in each window. If `False`, only fine-tunes on the first window.
Only used if `finetune_steps` > 0. |
| clean\_ex\_first | bool | True | Clean exogenous signal before making forecasts using TimeGPT. |
| hist\_exog\_list | Optional | None | Column names of the historical exogenous features. |
| date\_features | Union | False | Features computed from the dates.
Can be pandas date attributes or functions that will take the dates as input.
If True automatically adds most used date features for the
frequency of `df`. |
| date\_features\_to\_one\_hot | Union | False | Apply one-hot encoding to these date features.
If `date_features=True`, then all date features are
one-hot encoded by default. |
| model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. |
| num\_partitions | Optional | None | Number of partitions to use.
If None, the number of partitions will be equal
to the available parallel resources in distributed environments. |
| **Returns** | **AnyDFType** | | **DataFrame with cross validation forecasts.** |
***
source
## NixtlaClient.detect\_anomalies
> ```text theme={null}
> NixtlaClient.detect_anomalies (df:~AnyDFType,
> freq:Union[str,int,pandas._libs.tslibs.off
> sets.BaseOffset,NoneType]=None,
> id_col:str='unique_id', time_col:str='ds',
> target_col:str='y',
> level:Union[int,float]=99,
> finetuned_model_id:Optional[str]=None,
> clean_ex_first:bool=True,
> validate_api_key:bool=False,
> date_features:Union[bool,list[str]]=False,
> date_features_to_one_hot:Union[bool,list[s
> tr]]=False, model:Literal['azureai','timeg
> pt-1','timegpt-1-long-
> horizon']='timegpt-1', num_partitions:Opti
> onal[Annotated[int,Gt(gt=0)]]=None)
> ```
*Detect anomalies in your time series using TimeGPT.*
| | **Type** | **Default** | **Details** |
| ---------------------------- | ------------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| df | AnyDFType | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. |
| freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). |
| id\_col | str | unique\_id | Column that identifies each series. |
| time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
| target\_col | str | y | Column that contains the target. |
| level | Union | 99 | Confidence level between 0 and 100 for detecting the anomalies. |
| finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use. |
| clean\_ex\_first | bool | True | Clean exogenous signal before making forecasts
using TimeGPT. |
| validate\_api\_key | bool | False | If True, validates api\_key before sending requests. |
| date\_features | Union | False | Features computed from the dates.
Can be pandas date attributes or functions that will take the dates as input.
If True automatically adds most used date features for the
frequency of `df`. |
| date\_features\_to\_one\_hot | Union | False | Apply one-hot encoding to these date features.
If `date_features=True`, then all date features are
one-hot encoded by default. |
| model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. |
| num\_partitions | Optional | None | Number of partitions to use.
If None, the number of partitions will be equal
to the available parallel resources in distributed environments. |
| **Returns** | **AnyDFType** | | **DataFrame with anomalies flagged by TimeGPT.** |
***
source
## NixtlaClient.usage
> ```text theme={null}
> NixtlaClient.usage ()
> ```
*Query consumed requests and limits*
***
source
## NixtlaClient.finetune
> ```text theme={null}
> NixtlaClient.finetune
> (df:Union[pandas.core.frame.DataFrame,polars.dataf
> rame.frame.DataFrame], freq:Union[str,int,pandas._
> libs.tslibs.offsets.BaseOffset,NoneType]=None,
> id_col:str='unique_id', time_col:str='ds',
> target_col:str='y',
> finetune_steps:typing.Annotated[int,Ge(ge=0)]=10,
> finetune_depth:Literal[1,2,3,4,5]=1, finetune_loss
> :Literal['default','mae','mse','rmse','mape','smap
> e']='default', output_model_id:Optional[str]=None,
> finetuned_model_id:Optional[str]=None, model:Liter
> al['azureai','timegpt-1','timegpt-1-long-
> horizon']='timegpt-1')
> ```
*Fine-tune TimeGPT to your series.*
| | **Type** | **Default** | **Details** |
| -------------------- | --------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| df | Union | | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. |
| freq | Union | None | Frequency of the timestamps. If `None`, it will be inferred automatically.
See [pandas’ available frequencies](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases). |
| id\_col | str | unique\_id | Column that identifies each series. |
| time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
| target\_col | str | y | Column that contains the target. |
| finetune\_steps | Annotated | 10 | Number of steps used to finetune learning TimeGPT in the new data. |
| finetune\_depth | Literal | 1 | The depth of the finetuning. Uses a scale from 1 to 5, where 1 means little finetuning,
and 5 means that the entire model is finetuned. |
| finetune\_loss | Literal | default | Loss function to use for finetuning. Options are: `default`, `mae`, `mse`, `rmse`, `mape`, and `smape`. |
| output\_model\_id | Optional | None | ID to assign to the fine-tuned model. If `None`, an UUID is used. |
| finetuned\_model\_id | Optional | None | ID of previously fine-tuned model to use as base. |
| model | Literal | timegpt-1 | Model to use as a string. Options are: `timegpt-1`, and `timegpt-1-long-horizon`.
We recommend using `timegpt-1-long-horizon` for forecasting
if you want to predict more than one seasonal
period given the frequency of your data. |
| **Returns** | **str** | | **ID of the fine-tuned model** |
***
source
## NixtlaClient.finetuned\_models
> ```text theme={null}
> NixtlaClient.finetuned_models (as_df:bool=False)
> ```
*List fine-tuned models*
| | **Type** | **Default** | **Details** |
| ----------- | --------- | ----------- | -------------------------------------------------- |
| as\_df | bool | False | Return the fine-tuned models as a pandas dataframe |
| **Returns** | **Union** | | **List of available fine-tuned models.** |
***
source
## NixtlaClient.finetuned\_model
> ```text theme={null}
> NixtlaClient.finetuned_model (finetuned_model_id:str)
> ```
*Get fine-tuned model metadata*
| | **Type** | **Details** |
| -------------------- | ------------------ | ------------------------------------------------ |
| finetuned\_model\_id | str | ID of the fine-tuned model to get metadata from. |
| **Returns** | **FinetunedModel** | **Fine-tuned model metadata.** |
***
source
## NixtlaClient.delete\_finetuned\_model
> ```text theme={null}
> NixtlaClient.delete_finetuned_model (finetuned_model_id:str)
> ```
*Delete a previously fine-tuned model*
| | **Type** | **Details** |
| -------------------- | -------- | ----------------------------------------- |
| finetuned\_model\_id | str | ID of the fine-tuned model to be deleted. |
| **Returns** | **bool** | **Whether delete was successful.** |
***
source
## NixtlaClient.plot
> ```text theme={null}
> NixtlaClient.plot (df:Union[pandas.core.frame.DataFrame,polars.dataframe.
> frame.DataFrame,NoneType]=None, forecasts_df:Union[pan
> das.core.frame.DataFrame,polars.dataframe.frame.DataFr
> ame,NoneType]=None, id_col:str='unique_id',
> time_col:str='ds', target_col:str='y', unique_ids:Unio
> n[list[str],NoneType,numpy.ndarray]=None,
> plot_random:bool=True, max_ids:int=8,
> models:Optional[list[str]]=None,
> level:Optional[list[Union[int,float]]]=None,
> max_insample_length:Optional[int]=None,
> plot_anomalies:bool=False,
> engine:Literal['matplotlib','plotly','plotly-
> resampler']='matplotlib',
> resampler_kwargs:Optional[dict]=None, ax:Union[Forward
> Ref('plt.Axes'),numpy.ndarray,ForwardRef('plotly.graph
> _objects.Figure'),NoneType]=None)
> ```
*Plot forecasts and insample values.*
| | **Type** | **Default** | **Details** |
| --------------------- | -------- | ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| df | Union | None | The DataFrame on which the function will operate. Expected to contain at least the following columns:
- time\_col:
Column name in `df` that contains the time indices of the time series. This is typically a datetime
column with regular intervals, e.g., hourly, daily, monthly data points.
- target\_col:
Column name in `df` that contains the target variable of the time series, i.e., the variable we
wish to predict or analyze.
Additionally, you can pass multiple time series (stacked in the dataframe) considering an additional column:
- id\_col:
Column name in `df` that identifies unique time series. Each unique value in this column
corresponds to a unique time series. |
| forecasts\_df | Union | None | DataFrame with columns \[`unique_id`, `ds`] and models. |
| id\_col | str | unique\_id | Column that identifies each series. |
| time\_col | str | ds | Column that identifies each timestep, its values can be timestamps or integers. |
| target\_col | str | y | Column that contains the target. |
| unique\_ids | Union | None | Time Series to plot.
If None, time series are selected randomly. |
| plot\_random | bool | True | Select time series to plot randomly. |
| max\_ids | int | 8 | Maximum number of ids to plot. |
| models | Optional | None | list of models to plot. |
| level | Optional | None | list of prediction intervals to plot if paseed. |
| max\_insample\_length | Optional | None | Max number of train/insample observations to be plotted. |
| plot\_anomalies | bool | False | Plot anomalies for each prediction interval. |
| engine | Literal | matplotlib | Library used to plot. ‘matplotlib’, ‘plotly’ or ‘plotly-resampler’. |
| resampler\_kwargs | Optional | None | Kwargs to be passed to plotly-resampler constructor.
For further custumization (“show\_dash”) call the method,
store the plotting object and add the extra arguments to
its `show_dash` method. |
| ax | Union | None | Object where plots will be added. |
# TimeGPT Excel Add-in (Beta)
Source: https://nixtla.io/docs/reference/timegpt_excel_add_in_beta_
Use TimeGPT from Microsoft Excel
## Installation
Head to the [TimeGTP excel add-in page in Microsoft
Appsource](https://appsource.microsoft.com/en-us/product/office/WA200006429?tab=Overview)
and click on “Get it now”
## Usage
The TimeGPT Excel Add-in requires an access token. Get your API Key on
the [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/reference/timegpt_excel_add_in_beta_).
## Support
If you have questions or need support, please email `support@nixtla.io`.
## How-to
### Settings
If this is your first time using Excel add-ins, find information on how
to add Excel add-ins with your version of Excel. In the Office Add-ins
Store, you’ll search for “TimeGPT”.
Once you have installed the TimeGPT add-in, the add-in comes up in a
sidebar task pane. \* Read through the Welcome screen. \* Click on the
**‘Get Started’** button. \* The API URL is already set to:
[https://api.nixtla.io](https://api.nixtla.io). \* Copy your API key from [Nixtla
Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/reference/timegpt_excel_add_in_beta_). Paste it into the box that say
**API Key, Bearer**. \* Click the gray arrow next to that box on the
right. \* You’ll get to a screen with options for ‘Forecast’ and
‘Anomaly Detection’.
To access the settings later, click the gear icon in the top left.
### Data Requirements
* Put your dates in one column and your values in another.
* Ensure your date format is recognized as a valid date by excel.
* Ensure your values are recognized as valid number by excel.
* All data inputs must exist in the same worksheet. The add-in does
not support forecasting using multiple worksheets.
* Do not include headers
Example:
| dates | values |
| :------------ | :----- |
| 12/1/16 0:00 | 72 |
| 12/1/16 1:00 | 65.8 |
| 12/1/16 2:00 | 59.99 |
| 12/1/16 3:00 | 50.69 |
| 12/1/16 4:00 | 52.58 |
| 12/1/16 5:00 | 65.05 |
| 12/1/16 6:00 | 80.4 |
| 12/1/16 7:00 | 200 |
| 12/1/16 8:00 | 200.63 |
| 12/1/16 9:00 | 155.47 |
| 12/1/16 10:00 | 150.91 |
#### Forecasting
Once you’ve configured your token and formatted your input data then
you’re all ready to forecast!
With the add-in open, configure the forecasting settings by selecting
the column for each input.
* **Frequency** - The frequency of the data (hourly / daily / weekly /
monthly)
* **Horizon** - The forecasting horizon. This represents the number of
time steps into the future that the forecast should predict.
* **Dates Range** - The column and range of the timeseries timestamps.
Must not include header data, and should be formatted as a range,
e.g. A2:A145.
* **Values Range** - The column and range of the timeseries values for
each point in time. Must not include header data, and should be
formatted as a range, e.g. B2:B145.
When you’re ready, click **Make Prediction** to generate the predicted
values. The add-in will generate a plot and append the forecasted data
to the end of the column of your existing data and highlight them in
green. So, scroll to the end of your data to see the predicted values.
#### Anomaly Detection
The requirements are the same as for the forecasting functionality, so
if you already tried it you are ready to run the anomaly detection one.
Go to the main page in the add-in and select “Anomaly Detection”, then
choose your dates and values cell ranges and click on submit. We’ll run
the model and mark the anomalies cells in yellow while adding a third
column for expected values with a green background.
# TimeGPT in R
Source: https://nixtla.io/docs/reference/timegpt_in_r
Using TimeGPT for time series forecasting in the R programming language
## Introduction
**TimeGPT-1**: The first foundation model for time series forecasting and anomaly detection.
The `nixtlar` package is the R interface to TimeGPT, allowing you to perform state-of-the-art time series forecasting directly from R. TimeGPT is a production-ready, generative pretrained transformer for time series forecasting, developed by Nixtla. It is capable of accurately predicting various domains such as retail, electricity, finance, and IoT, with just a few lines of code. Additionally, it can detect anomalies in time series data.
Version 0.6.2 of nixtlar is now available on CRAN! This version introduces support for TimeGEN-1, TimeGPT optimized for Azure, along with enhanced date support, business-day frequency inference, and various bug fixes.
## How to use
To learn how to use `nixtlar`, please refer to the
[documentation](https://nixtla.github.io/nixtlar/).
To view directly on CRAN, please use this
[link](https://cloud.r-project.org/web/packages/nixtlar/index.html).
The `nixtlar` package requires an API key. Get yours on the [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/reference/timegpt_in_r).
## Installation
```r theme={null}
# Install nixtlar from CRAN
install.packages("nixtlar")
# Then load it
library(nixtlar)
# Set your API key
nixtla_set_api_key(api_key = "Your API key here")
```
## Quick Example
```r theme={null}
# Load sample data
df <- nixtlar::electricity
head(df)
# Forecast the next 8 steps ahead
nixtla_client_fcst <- nixtla_client_forecast(df, h = 8, level = c(80,95))
# Optionally, plot the results
nixtla_client_plot(df, nixtla_client_fcst, max_insample_length = 200)
```
## Anomaly Detection Example
```r theme={null}
# Detect anomalies
nixtla_client_anomalies <- nixtlar::nixtla_client_detect_anomalies(df)
# Plot with anomalies highlighted
nixtlar::nixtla_client_plot(df, nixtla_client_anomalies, plot_anomalies = TRUE)
```
## Features and Capabilities
TimeGPT through the `nixtlar` package provides:
* **Zero-shot Inference**: Generate forecasts and detect anomalies with no prior training
* **Fine-tuning**: Enhance model performance for your specific datasets
* **Add Exogenous Variables**: Incorporate additional variables like special dates or events to improve accuracy
* **Multiple Series Forecasting**: Simultaneously forecast multiple time series
* **Custom Loss Function**: Tailor the fine-tuning process with specific performance metrics
* **Cross Validation**: Implement out-of-the-box validation techniques
* **Prediction Intervals**: Quantify uncertainty in your predictions
* **Irregular Timestamps**: Handle data with non-uniform intervals
## How to Cite
If you find TimeGPT useful for your research, please consider citing:
```
Garza, A., Challu, C., & Mergenthaler-Canseco, M. (2024). TimeGPT-1.
arXiv preprint arXiv:2310.03589. Available at
https://arxiv.org/abs/2310.03589
```
## Support
If you have questions or need support, please email `support@nixtla.io`.
TimeGPT is closed source. However, this SDK is open source and available under the Apache 2.0 License.
# TimeGEN-1 Quickstart (Azure)
Source: https://nixtla.io/docs/setup/azureai
Quickstart guide to deploy and use TimeGEN-1 on Azure with the Nixtla Python SDK for time series forecasting.
TimeGEN-1 is TimeGPT optimized for Azure infrastructure. It is a production-ready generative pretrained transformer for time series, capable of accurately predicting domains such as retail, electricity, finance, and IoT with minimal code.
Azure-native generative forecasting with TimeGEN-1 for streamlined deployments.
• Demand forecasting\\
• Electricity load prediction\\
• Financial time series\\
• IoT data analysis
1. Visit [ml.azure.com](https://ml.azure.com) and sign in (or create a Microsoft account if needed).
2. Click **Models** in the sidebar.
3. Search for **TimeGEN** in the catalog and select **TimeGEN-1**.
4. Click **Deploy** to create an endpoint.

5. Click **Endpoint** in the sidebar.
6. Copy the **base URL** and **API Key** shown for your TimeGEN-1 endpoint.

Install the **nixtla** package using pip:
```shell Install nixtla SDK theme={null}
pip install nixtla
```
Import the Nixtla client into your Python environment:
```python Import NixtlaClient theme={null}
from nixtla import NixtlaClient
```
Then create a client instance using your TimeGEN-1 endpoint credentials:
```python Instantiate NixtlaClient theme={null}
nixtla_client = NixtlaClient(
base_url="YOUR_BASE_URL",
api_key="YOUR_API_KEY"
)
```
In this example, we'll use the classic **AirPassengers** dataset to demonstrate forecasting. The dataset shows monthly passenger counts in Australia between 1949 and 1960.
```python Load AirPassengers dataset theme={null}
import pandas as pd
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv'
)
df.head()
```
Use the Nixtla client to quickly visualize your data:
```python Visualize time series theme={null}
nixtla_client.plot(df, time_col='timestamp', target_col='value')
```

• Ensure the target column has no missing or non-numeric values.\\
• Avoid gaps in date stamps (for the specific frequency) from the initial to final timestamp—missing dates are not automatically imputed.\\
• Datestamps must be in a pandas-readable format. ([See Pandas reference](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html))
See [Data Requirements](/docs/data_requirements/data_requirements) for details.
In most notebook environments, figures display automatically. To save a figure locally, run:
```python Save plot figure theme={null}
fig = nixtla_client.plot(df, time_col='timestamp', target_col='value')
fig.savefig('plot.png', bbox_inches='tight')
```
Use the `forecast` method from the Nixtla client to forecast the next 12 months.
• `df`: Pandas DataFrame with time series data\\
• `h`: Forecast horizon (number of steps ahead)\\
• `freq`: Time series frequency ([pandas frequency aliases](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases))\\
• `time_col`: Name of timestamp column\\
• `target_col`: Name of forecast variable
```python Generate 12-month forecast theme={null}
timegen_fcst_df = nixtla_client.forecast(
df=df,
h=12,
freq='MS',
time_col='timestamp',
target_col='value'
)
timegen_fcst_df.head()
```
Forecast endpoint call logs will be displayed for validation and preprocessing steps.
```bash Forecast API call logs theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: MS
INFO:nixtla.nixtla_client:Restricting input...
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
```
Example output:
| | timestamp | TimeGPT |
| - | ---------- | ---------- |
| 0 | 1961-01-01 | 437.837921 |
| 1 | 1961-02-01 | 426.062714 |
| 2 | 1961-03-01 | 463.116547 |
| 3 | 1961-04-01 | 478.244507 |
| 4 | 1961-05-01 | 505.646484 |
Visualize the forecast results:
```python Visualize forecast results theme={null}
nixtla_client.plot(df, timegen_fcst_df, time_col='timestamp', target_col='value')
```

# Docker Image for TimeGPT
Source: https://nixtla.io/docs/setup/docker
Learn how to access TimeGPT via a Docker image
You can deploy TimeGPT in your own local infrastructure using our provided Docker image.
This solution is ideal for enterprise customers who wish to keep their data secure and give access to TimeGPT to everyone in the organization through their own cloud provider or local infrastructure.
Benefits of using the Docker image are:
* Cloud-agnostic installation
* Full control over the server's hardware (CPU only or with GPU), maintenance and uptime
* Data is secure as per your own guidelines
The Docker image is available for entreprise customers. To request access to TimeGPT and deploy it on your own local infrastructure, [book a call with us](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=7a3723cd-153e-4901-a81c-f6cee9d6a6a3\&utm_source=documentation\&utm_medium=setup-docs\&utm_campaign=docker-timegpt).
# Python Wheel for TimeGPT
Source: https://nixtla.io/docs/setup/python_wheel
Learn how to access TimeGPT via a Python wheel
Using TimeGPT through API calls might not be the optimal solution for your organization, as it implies sending data to external servers.
One way of respecting data security requirements is to use a Python wheel.
We can send you a custom Python wheel for your own needs, allowing you to locally install TimeGPT. That way, you can make forecasts and perform anomaly detection locally, without the need of a server.
Benefits of using a Python wheel include:
* Local installation so there is no need of a dedicated server
* Lower latency as there is no data transfer
* Data is secure as it never leaves your local machine
The Python wheel is available for enterprise clients. To request access via a Python wheel, [book a call with us](https://meetings.hubspot.com/cristian-challu/enterprise-contact-us?uuid=7a3723cd-153e-4901-a81c-f6cee9d6a6a3\&utm_source=documentation\&utm_medium=setup-docs\&utm_campaign=python-wheel-timegpt).
Once access is granted, we will send a Python wheel as well as all the necessary instructions to install and use TimeGPT locally.
# Setting up your API key
Source: https://nixtla.io/docs/setup/setting_up_your_api_key
Learn how to securely configure your Nixtla SDK API key using direct code or environment variables.
This tutorial explains how to set up your API key when using the Nixtla SDK. It covers both quick and secure methods to configure your API key directly in code or using environment variables. If you haven't done so yet, create an API Key in your [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/setup/setting_up_your_api_key).

## Overview
Your API key grants access to your Nixtla account and should be treated like a password. By securing it, you prevent unauthorized usage and protect your usage credits.
Your API key can be generated from your Nixtla Dashboard under the **API Keys** section. Make sure you copy the entire key with no extra spaces.
## How to configure your API key
This approach is simple but not secure. Your API key will be stored in your source code, visible to anyone with access to it.
**Step 1:** Copy your key from the [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/setup/setting_up_your_api_key).
**Step 2:** Paste the key into your Python code, for example:
```python NixtlaClient Initialization with API Key theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(api_key='your_api_key_here')
```
Storing your API key in an environment variable is the recommended approach for security and ease of sharing code without exposing credentials.
This method requires setting an environment variable named `NIXTLA_API_KEY`. The Nixtla SDK automatically detects this environment variable without needing to manually pass it into `NixtlaClient`.
Open your terminal and use the `export` command:
```bash Setting Environment Variable Temporarily on Linux/Mac theme={null}
export NIXTLA_API_KEY=your_api_key
```
Open a PowerShell session and set the environment variable:
```powershell Setting Environment Variable Temporarily on Windows PowerShell theme={null}
$env:NIXTLA_API_KEY = "your_api_key"
```
After setting the variable, instantiate the `NixtlaClient` without specifying the key:
```python NixtlaClient Initialization Using Environment Variable theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient()
```
Create a file named `.env` in the same directory as your Python script with the following content:
```env .env File Content theme={null}
NIXTLA_API_KEY=your_api_key
```
Then, in your Python script, load it with the `dotenv` package:
```python Load Environment Variables with dotenv theme={null}
from dotenv import load_dotenv
load_dotenv()
from nixtla import NixtlaClient
nixtla_client = NixtlaClient()
```
Be sure not to commit your `.env` file to public repositories. Your API key grants access to your Nixtla account.
## Validate your API key
Use the `validate_api_key` method of `NixtlaClient` to confirm that you have correctly configured your API key. This method returns `True` if your API key is valid, or `False` otherwise:
```python Validate API Key Method theme={null}
nixtla_client.validate_api_key()
```
You do not need to validate your API key before every request. This method is a convenience function. To fully access **TimeGPT** functionality, ensure you have adequate credits by checking your [Nixtla Dashboard](https://nixtla.io/free-trial?utm_source=nixtla.io\&utm_campaign=/docs/setup/setting_up_your_api_key).
You've now learned how to configure your Nixtla API key through multiple methods, ranging from the simplest copy-and-paste approach to more secure environment variable setups. Remember to keep your API key confidential to prevent unauthorized usage.
# Forecasting Bitcoin Prices
Source: https://nixtla.io/docs/use_cases/bitcoin_price_prediction
Master Bitcoin price forecasting with TimeGPT. Complete Python tutorial covering cryptocurrency prediction, anomaly detection, uncertainty quantification, and risk management strategies.
## Introduction
[Time series forecasting](/docs/forecasting/timegpt_quickstart) is essential in finance for trading, risk management, and strategic planning. However, predicting financial asset prices remains challenging due to market volatility.
Whether you believe financial forecasting is possible or your role requires it, [TimeGPT](/docs/introduction/about_timegpt) simplifies the process.
This tutorial demonstrates how to use TimeGPT for Bitcoin price prediction and uncertainty quantification for risk management.
### Why Forecast Bitcoin Prices
Bitcoin (₿), the first decentralized cryptocurrency, records transactions on a blockchain. Bitcoins are mined by solving cryptographic tasks and can be used for payments, trading, or investment.
Bitcoin's volatility and popularity make price forecasting valuable for trading strategies and risk management.
### What You'll Learn
* How to load and prepare Bitcoin price data
* How to generate [short-term forecasts](/docs/forecasting/timegpt_quickstart) with TimeGPT
* How to visualize and interpret forecast results
* How to [detect anomalies](/docs/anomaly_detection/real-time/introduction) and add [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features)
The procedures in this tutorial apply to many [financial asset forecasting](/docs/use_cases/forecasting_energy_demand) scenarios, not just Bitcoin.
## How to Forecast Bitcoin Prices with TimeGPT
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/2_bitcoin_price_prediction.ipynb)
### Step 1: Load Bitcoin Price Data
Start by loading the Bitcoin price data:
```python theme={null}
import pandas as pd
# Load Bitcoin historical price data from 2020-2023
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/bitcoin_price_usd.csv',
sep=','
)
df.head()
```
| | Date | Close |
| - | ---------- | ----------- |
| 0 | 2020-01-01 | 7200.174316 |
| 1 | 2020-01-02 | 6985.470215 |
| 2 | 2020-01-03 | 7344.884277 |
| 3 | 2020-01-04 | 7410.656738 |
| 4 | 2020-01-05 | 7411.317383 |
This dataset includes daily Bitcoin closing prices (in USD) from 2020 to 2023. "Closing price" refers to the price at a specific daily time, not a traditional market close.
Next, rename the columns to match TimeGPT's expected `ds` (date) and `y` (target) format.
```python theme={null}
# Rename columns to TimeGPT's expected format (ds=date, y=target value)
df.rename(columns={'Date': 'ds', 'Close': 'y'}, inplace=True)
```
### Step 2: Get Started with TimeGPT
Initialize the `NixtlaClient` with your Nixtla API key. To learn more about how to set up your API key, see [Setting up your API key](/docs/setup/setting_up_your_api_key).
```python theme={null}
from nixtla import NixtlaClient
# Initialize TimeGPT client with your API key
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 3: Visualize the Data
Before attempting any forecasting, it is good practice to visualize the data we want to predict. The `NixtlaClient` class includes a `plot` method for this purpose.
The `plot` method has an `engine` argument that allows you to choose between different plotting libraries. Default is `matplotlib`, but you can also use `plotly` for interactive plots.
```python theme={null}
# Visualize Bitcoin price history
nixtla_client.plot(df)
```

If you did not rename the columns, specify them explicitly:
```python theme={null}
nixtla_client.plot(
df,
time_col='Date Column',
target_col='Close Column'
)
```
### Step 4: Forecast with TimeGPT
Now we are ready to generate predictions with TimeGPT. To do this, we will use the `forecast` method from the `NixtlaClient` class.
The `forecast` method requires the following arguments:
* `df`: The DataFrame containing the time series data
* `h`: (int) The forecast horizon. In this case, we will forecast the next 7 days.
* `level`: (list) The confidence level for the prediction intervals. Given the inherent volatility of Bitcoin, we will use multiple confidence levels.
```python theme={null}
# Generate 7-day forecast with 50%, 80%, and 90% prediction intervals
level = [50, 80, 90]
fcst = nixtla_client.forecast(
df,
h=7, # Forecast horizon: 7 days
level=level # Confidence intervals for uncertainty quantification
)
fcst.head()
```
| | ds | TimeGPT | TimeGPT-lo-90 | TimeGPT-lo-80 | TimeGPT-lo-50 | TimeGPT-hi-50 | TimeGPT-hi-80 | TimeGPT-hi-90 |
| - | ---------- | ------------ | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |
| 0 | 2024-01-01 | 42269.460938 | 39567.209020 | 40429.953636 | 41380.654646 | 43158.267229 | 44108.968239 | 44971.712855 |
| 1 | 2024-01-02 | 42469.917969 | 39697.941669 | 40578.197049 | 41466.511361 | 43473.324576 | 44361.638888 | 45241.894268 |
| 2 | 2024-01-03 | 42864.078125 | 40538.871243 | 41586.252507 | 42284.316674 | 43443.839576 | 44141.903743 | 45189.285007 |
| 3 | 2024-01-04 | 42881.621094 | 40603.117448 | 41216.106493 | 42058.539392 | 43704.702795 | 44547.135694 | 45160.124739 |
| 4 | 2024-01-05 | 42773.457031 | 40213.699760 | 40665.384780 | 41489.812431 | 44057.101632 | 44881.529282 | 45333.214302 |
We can pass the forecasts we just generated to the `plot` method to visualize the predictions with the historical data.
```python theme={null}
# Plot historical data with forecast and confidence intervals
nixtla_client.plot(df, fcst, level=level)
```

To get a closer look at the predictions, we can zoom in on the plot or specify the maximum number of in-sample observations to be plotted using the `max_insample_length` argument. Note that setting `max_insample_length=60`, for instance, will display the last 60 historical values along with the complete forecast.

### Step 5: Extend Bitcoin Price Analysis with TimeGPT
#### Anomaly Detection
Given Bitcoin's volatility, identifying anomalies can be valuable. Use TimeGPT's `detect_anomalies` method to evaluate each observation statistically within its series context. By default, it identifies anomalies using a 99% prediction interval, which you can adjust with the `level` argument.
```python theme={null}
# Detect anomalies in Bitcoin price data
anomalies_df = nixtla_client.detect_anomalies(df)
# Visualize anomalies highlighted on the price chart
nixtla_client.plot(
df,
anomalies_df,
plot_anomalies=True
)
```

To learn how to incorporate exogenous variables to TimeGPT, see [Real-time Anomaly Detection](/docs/anomaly_detection/real-time/introduction).
#### Add Exogenous Variables
To improve forecasts, include relevant data as exogenous variables, such as other cryptocurrency prices, stock market indices, or Bitcoin network transaction volumes.
To learn how to incorporate exogenous variables to TimeGPT, see [Numeric Features Guide](/docs/forecasting/exogenous-variables/numeric_features).
## Understand the Model Limitations
As stated in the introduction, predicting Bitcoin prices is challenging. The predictions here may appear accurate because they use recent data and update frequently, but the real test is forecasting future prices, not historical performance.
For those who need or want to try to forecast these assets, `TimeGPT` can be an option that simplifies the forecasting process. With just a couple of lines of code, `TimeGPT` can help you:
* Produce point forecasts
* Quantify the uncertainty of your predictions
* Produce in-sample forecasts
* Detect anomalies
* Incorporate exogenous variables
To learn more about TimeGPT capabilities, see the [TimeGPT Introduction](/docs/introduction/introduction).
## Conclusion
TimeGPT simplifies Bitcoin price forecasting by providing:
* Accurate short-term predictions with quantified uncertainty
* Automated anomaly detection for risk management
* Support for exogenous variables to improve forecast accuracy
This approach applies to various cryptocurrency and financial asset forecasting scenarios, helping traders and analysts make data-driven decisions despite market volatility.
### Next Steps
* Explore [energy demand forecasting](/docs/use_cases/forecasting_energy_demand) with TimeGPT
* Learn about [cross-validation](/docs/forecasting/evaluation/cross_validation) for model evaluation
* Understand [fine-tuning](/docs/forecasting/fine-tuning/steps) for improved accuracy
* Scale forecasts with [distributed computing](/docs/forecasting/forecasting-at-scale/computing_at_scale)
## References and Additional Material
* [Joaquín Amat Rodrigo and Javier Escobar Ortiz (2022), "Bitcoin price prediction with Python, when the past does not repeat itself"](https://www.cienciadedatos.net/documentos/py41-forecasting-cryptocurrency-bitcoin-machine-learning-python.html)
# Forecasting Energy Demand
Source: https://nixtla.io/docs/use_cases/forecasting_energy_demand
Energy demand forecasting tutorial using TimeGPT AI. Step-by-step Python guide for electricity consumption prediction with 90% faster predictions and superior accuracy.
## Introduction
Energy demand forecasting is critical for grid operations, resource allocation, and infrastructure planning. Despite advances in methods, predicting consumption remains challenging due to weather, economic activity, and consumer behavior.
This tutorial demonstrates how TimeGPT simplifies in-zone electricity forecasting while delivering superior accuracy and speed. We will use the [PJM Hourly Energy Consumption dataset](https://www.pjm.com/) covering five regions from October 2023 to September 2024.
### What You'll Learn
* How to load and prepare energy consumption data
* How to generate 4-day ahead forecasts with TimeGPT
* How to evaluate forecast accuracy using MAE and sMAPE
* How TimeGPT compares to deep learning models like N-HiTS
The procedures in this tutorial apply to many time series forecasting scenarios beyond energy demand.
## How to Use TimeGPT to Forecast Energy Demand
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/3_electricity_demand.ipynb)
### Step 1: Initial Setup
Install and import required packages, then create a NixtlaClient instance to interact with TimeGPT.
```python theme={null}
import time
import requests
import pandas as pd
from nixtla import NixtlaClient
from utilsforecast.losses import mae, smape
from utilsforecast.evaluation import evaluate
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY")
)
```
### Step 2: Load Energy Consumption Data
Load the energy consumption dataset and convert datetime strings to timestamps.
```python theme={null}
df = pd.read_csv('https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/refs/heads/main/datasets/pjm_in_zone.csv')
df['ds'] = pd.to_datetime(df['ds'])
# Examine the dataset
df.groupby('unique_id').head(2)
```
| | unique\_id | ds | y |
| - | ---------- | ------------------------- | -------- |
| 0 | AP-AP | 2023-10-01 04:00:00+00:00 | 4042.513 |
| 1 | AP-AP | 2023-10-01 05:00:00+00:00 | 3850.067 |
Plot the data series to visualize seasonal patterns.
```python theme={null}
nixtla_client.plot(
df,
max_insample_length=365
)
```

### Step 3: Generate Energy Demand Forecasts with TimeGPT
We'll split our dataset into:
* A training/input set for model calibration
* A testing set (last 4 days) to validate performance
```python theme={null}
# Prepare test (last 4 days) and input data
test_df = df.groupby('unique_id').tail(96)
input_df = df.groupby('unique_id').apply(lambda group: group.iloc[-1104:-96]).reset_index(drop=True)
# Make forecasts
start = time.time()
fcst_df = nixtla_client.forecast(
df=input_df,
h=96,
level=[90],
finetune_steps=10,
finetune_loss='mae',
model='timegpt-1-long-horizon',
time_col='ds',
target_col='y',
id_col='unique_id'
)
end = time.time()
timegpt_duration = end - start
print(f"Time (TimeGPT): {timegpt_duration}")
# Visualize forecasts against actual values
nixtla_client.plot(
test_df,
fcst_df,
models=['TimeGPT'],
level=[90],
time_col='ds',
target_col='y'
)
```

### Step 4: Evaluate Forecast Accuracy
Compute accuracy metrics (MAE and sMAPE) for TimeGPT.
```python theme={null}
fcst_df['ds'] = pd.to_datetime(fcst_df['ds'])
test_df = pd.merge(test_df, fcst_df, 'left', ['unique_id', 'ds'])
evaluation = evaluate(test_df, [mae, smape], ["TimeGPT"], "y", "unique_id")
average_metrics = evaluation.groupby('metric')['TimeGPT'].mean()
average_metrics
```
### Step 5: Forecast with N-HiTS
For comparison, we train and forecast using the deep-learning model N-HiTS.
```python theme={null}
from neuralforecast.core import NeuralForecast
from neuralforecast.models import NHITS
# Prepare training dataset by excluding the last 4 days
train_df = df.groupby('unique_id').apply(lambda group: group.iloc[:-96]).reset_index(drop=True)
models = [
NHITS(h=96, input_size=480, scaler_type='robust', batch_size=16, valid_batch_size=8)
]
nf = NeuralForecast(models=models, freq='H')
start = time.time()
nf.fit(df=train_df)
nhits_preds = nf.predict()
end = time.time()
print(f"Time (N-HiTS): {end - start}")
```
### Step 6: Evaluate N-HiTS
Compute accuracy metrics (MAE and sMAPE) for N-HiTS.
```python theme={null}
preds_df = pd.merge(test_df, nhits_preds, 'left', ['unique_id', 'ds'])
evaluation = evaluate(preds_df, [mae, smape], ["NHITS"], "y", "unique_id")
average_metrics = evaluation.groupby('metric')['NHITS'].mean()
print(average_metrics)
```
## Conclusion
TimeGPT demonstrates substantial performance improvements over N-HiTS across key metrics:
* **Accuracy**: 18.6% lower MAE (882.6 vs 1084.7)
* **Error Rate**: 31.1% lower sMAPE
* **Speed**: 90% faster predictions (4.3 seconds vs 44 seconds)
These results make TimeGPT a powerful tool for forecasting energy consumption and other time-series tasks where both accuracy and speed are critical.
Experiment with the parameters to further optimize performance for your specific use case.
## Related Tutorials
Ready to explore more forecasting applications? Check out these guides:
* [Bitcoin Price Prediction with TimeGPT](/docs/use_cases/bitcoin_price_prediction) - Financial time series forecasting
* [Exogenous Variables Guide](/docs/forecasting/exogenous-variables/numeric_features) - Improve forecasts with external data
* [Long Horizon Forecasting](/docs/forecasting/model-version/longhorizon_model) - Extended forecast periods
Learn more about [TimeGPT capabilities](/docs/introduction/introduction) for time series prediction.
# Forecasting Intermittent Demand
Source: https://nixtla.io/docs/use_cases/forecasting_intermittent_demand
Master intermittent demand forecasting with TimeGPT for inventory optimization. Achieve 14% better accuracy than specialized models using the M5 dataset with exogenous variables and log transforms.
## Introduction
Intermittent demand occurs when products or services have irregular purchase patterns with frequent zero-value periods. This is common in retail, spare parts inventory, and specialty products where demand is irregular rather than continuous.
Forecasting these patterns accurately is essential for optimizing stock levels, reducing costs, and preventing stockouts. [TimeGPT](/docs/introduction/about_timegpt) excels at intermittent demand forecasting by capturing complex patterns that traditional statistical methods miss.
This tutorial demonstrates TimeGPT's capabilities using the M5 dataset of food sales, including [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features) like pricing and promotional events that influence purchasing behavior.
### What You'll Learn
* How to prepare and analyze intermittent demand data
* How to leverage exogenous variables for better predictions
* How to use log transforms to ensure realistic forecasts
* How TimeGPT compares to specialized intermittent demand models
The methods shown here apply broadly to inventory management and retail forecasting challenges. For getting started with TimeGPT, see our [quickstart guide](/docs/forecasting/timegpt_quickstart).
## How to Use TimeGPT to Forecast Intermittent Demand
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/4_intermittent_demand.ipynb)
### Step 1: Environment Setup
Start by importing the required packages for this tutorial and create an instance of `NixtlaClient`.
```python theme={null}
import pandas as pd
import numpy as np
from nixtla import NixtlaClient
from utilsforecast.losses import mae
from utilsforecast.evaluation import evaluate
nixtla_client = NixtlaClient(api_key='my_api_key_provided_by_nixtla')
```
### Step 2: Load and Visualize the Dataset
Load the dataset from the M5 dataset and convert the `ds` column to a datetime object:
```python theme={null}
df = pd.read_csv("https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/m5_sales_exog_small.csv")
df['ds'] = pd.to_datetime(df['ds'])
df.head()
```
| unique\_id | ds | y | sell\_price | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting |
| ------------- | ---------- | - | ----------- | --------------------- | --------------------- | ---------------------- | --------------------- |
| FOODS\_1\_001 | 2011-01-29 | 3 | 2.0 | 0 | 0 | 0 | 0 |
| FOODS\_1\_001 | 2011-01-30 | 0 | 2.0 | 0 | 0 | 0 | 0 |
| FOODS\_1\_001 | 2011-01-31 | 0 | 2.0 | 0 | 0 | 0 | 0 |
| FOODS\_1\_001 | 2011-02-01 | 1 | 2.0 | 0 | 0 | 0 | 0 |
| FOODS\_1\_001 | 2011-02-02 | 4 | 2.0 | 0 | 0 | 0 | 0 |
Visualize the dataset using the `plot` method:
```python theme={null}
nixtla_client.plot(
df,
max_insample_length=365,
)
```

In the figure above, we can see the intermittent nature of this dataset, with many periods with zero demand.
Now, let's use TimeGPT to forecast the demand of each product.
### Step 3: Transform the Data
To avoid getting negative predictions coming from the model, we use a log transformation on the data. That way, the model will be forced to predict only positive values.
Note that due to the presence of zeros in our dataset, we add one to all points before taking the log.
```python theme={null}
df_transformed = df.copy()
df_transformed['y'] = np.log(df_transformed['y'] + 1)
```
Now, let's keep the last 28 time steps for the test set and use the rest as input to the model.
```python theme={null}
test_df = df_transformed.groupby('unique_id').tail(28)
input_df = df_transformed.drop(test_df.index).reset_index(drop=True)
```
### Step 4: Forecast with TimeGPT
Forecast with TimeGPT using the `forecast` method:
```python theme={null}
fcst_df = nixtla_client.forecast(
df=input_df,
h=28,
level=[80],
finetune_steps=10, # Learn more about fine-tuning: /forecasting/fine-tuning/steps
finetune_loss='mae',
model='timegpt-1-long-horizon', # For long-horizon forecasting: /forecasting/model-version/longhorizon_model
time_col='ds',
target_col='y',
id_col='unique_id'
)
```
```bash theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: D
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
```
Great! We now have predictions. However, those predictions are transformed, so we need to inverse the transformation to get back to the original scale. Therefore, we take the exponential and subtract one from each data point.
```python theme={null}
cols = [col for col in fcst_df.columns if col not in ['ds', 'unique_id']]
fcst_df[cols] = np.exp(fcst_df[cols]) - 1
fcst_df.head()
```
| | unique\_id | ds | TimeGPT | TimeGPT-lo-80 | TimeGPT-hi-80 |
| - | ------------- | ---------- | -------- | ------------- | ------------- |
| 0 | FOODS\_1\_001 | 2016-05-23 | 0.286841 | -0.267101 | 1.259465 |
| 1 | FOODS\_1\_001 | 2016-05-24 | 0.320482 | -0.241236 | 1.298046 |
| 2 | FOODS\_1\_001 | 2016-05-25 | 0.287392 | -0.362250 | 1.598791 |
| 3 | FOODS\_1\_001 | 2016-05-26 | 0.295326 | -0.145489 | 0.963542 |
| 4 | FOODS\_1\_001 | 2016-05-27 | 0.315868 | -0.166516 | 1.077437 |
### Step 5: Evaluate the Forecasts
Before measuring the performance metric, let's plot the predictions against the actual values.
```python theme={null}
nixtla_client.plot(
test_df,
fcst_df,
models=['TimeGPT'],
level=[80],
time_col='ds',
target_col='y'
)
```

Finally, we can measure the mean absolute error (MAE) of the model. Learn more about [evaluation metrics](/docs/forecasting/evaluation/evaluation_metrics) in our documentation.
```python theme={null}
# Compute MAE
test_df = pd.merge(test_df, fcst_df, how='left', on=['unique_id', 'ds'])
evaluation = evaluate(
test_df,
metrics=[mae],
models=['TimeGPT'],
target_col='y',
id_col='unique_id'
)
average_metrics = evaluation.groupby('metric')['TimeGPT'].mean()
average_metrics
```
```bash theme={null}
metric
mae 0.492559
```
### Step 6: Compare with Statistical Models
The library `statsforecast` by Nixtla provides a suite of statistical models specifically built for intermittent forecasting, such as Croston, IMAPA and TSB. Let's use these models and see how they perform against TimeGPT.
```python theme={null}
from statsforecast import StatsForecast
from statsforecast.models import CrostonClassic, CrostonOptimized, IMAPA, TSB
sf = StatsForecast(
models=[CrostonClassic(), CrostonOptimized(), IMAPA(), TSB(0.1, 0.1)],
freq='D',
n_jobs=-1
)
```
Then, we can fit the models on our data.
```python theme={null}
sf.fit(df=input_df)
sf_preds = sf.predict(h=28)
```
Again, we need to inverse the transformation. Remember that the training data was previously transformed using the log function.
```python theme={null}
cols = [col for col in sf_preds.columns if col not in ['ds', 'unique_id']]
sf_preds[cols] = np.exp(sf_preds[cols]) - 1
sf_preds.head()
```
| | unique\_id | ds | CrostonClassic | CrostonOptimized | IMAPA | TSB |
| - | ------------- | ---------- | -------------- | ---------------- | -------- | -------- |
| 0 | FOODS\_1\_001 | 2016-05-23 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 1 | FOODS\_1\_001 | 2016-05-24 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 2 | FOODS\_1\_001 | 2016-05-25 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 3 | FOODS\_1\_001 | 2016-05-26 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 4 | FOODS\_1\_001 | 2016-05-27 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
Now, let's combine the predictions from all methods and see which performs best.
```python theme={null}
test_df = pd.merge(test_df, sf_preds, 'left', ['unique_id', 'ds'])
test_df.head()
```
| | unique\_id | ds | y | sell\_price | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting | TimeGPT | TimeGPT-lo-80 | TimeGPT-hi-80 | CrostonClassic | CrostonOptimized | IMAPA | TSB |
| - | ------------- | ---------- | -------- | ----------- | --------------------- | --------------------- | ---------------------- | --------------------- | -------- | ------------- | ------------- | -------------- | ---------------- | -------- | -------- |
| 0 | FOODS\_1\_001 | 2016-05-23 | 1.386294 | 2.24 | 0 | 0 | 0 | 0 | 0.286841 | -0.267101 | 1.259465 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 1 | FOODS\_1\_001 | 2016-05-24 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.320482 | -0.241236 | 1.298046 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 2 | FOODS\_1\_001 | 2016-05-25 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.287392 | -0.362250 | 1.598791 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 3 | FOODS\_1\_001 | 2016-05-26 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.295326 | -0.145489 | 0.963542 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
| 4 | FOODS\_1\_001 | 2016-05-27 | 1.945910 | 2.24 | 0 | 0 | 0 | 0 | 0.315868 | -0.166516 | 1.077437 | 0.599093 | 0.599093 | 0.445779 | 0.396258 |
```python theme={null}
statistical_models = ["CrostonClassic", "CrostonOptimized", "IMAPA", "TSB"]
evaluation = evaluate(
test_df,
metrics=[mae],
models=["TimeGPT"] + statistical_models,
target_col="y",
id_col='unique_id'
)
average_metrics = evaluation.groupby('metric')[[
"TimeGPT"] + statistical_models].mean()
average_metrics
```
| metric | TimeGPT | CrostonClassic | CrostonOptimized | IMAPA | TSB |
| ------ | -------- | -------------- | ---------------- | -------- | -------- |
| mae | 0.492559 | 0.564563 | 0.580922 | 0.571943 | 0.567178 |
In the table above, we can see that TimeGPT achieves the lowest MAE, achieving a 12.8% improvement over the best performing statistical model.
These results demonstrate TimeGPT's strong performance without additional features. We can further improve accuracy by incorporating exogenous variables, a capability TimeGPT supports but traditional statistical models do not.
### Step 7: Use Exogenous Variables
To forecast with [exogenous variables](/docs/forecasting/exogenous-variables/numeric_features), we need to specify their future values over the forecast horizon. Therefore, let's simply take the types of events, as those dates are known in advance. You can also explore using [date features](/docs/forecasting/exogenous-variables/date_features) and [holidays](/docs/forecasting/exogenous-variables/holiday_and_special_dates) as exogenous variables.
```python theme={null}
# Include holiday/event data as exogenous features
exog_cols = ['event_type_Cultural', 'event_type_National', 'event_type_Religious', 'event_type_Sporting']
futr_exog_df = test_df[['unique_id', 'ds'] + exog_cols]
futr_exog_df.head()
```
| | unique\_id | ds | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting |
| - | ------------- | ---------- | --------------------- | --------------------- | ---------------------- | --------------------- |
| 0 | FOODS\_1\_001 | 2016-05-23 | 0 | 0 | 0 | 0 |
| 1 | FOODS\_1\_001 | 2016-05-24 | 0 | 0 | 0 | 0 |
| 2 | FOODS\_1\_001 | 2016-05-25 | 0 | 0 | 0 | 0 |
| 3 | FOODS\_1\_001 | 2016-05-26 | 0 | 0 | 0 | 0 |
| 4 | FOODS\_1\_001 | 2016-05-27 | 0 | 0 | 0 | 0 |
Then, we simply call the `forecast` method and pass the `futr_exog_df` in the `X_df` parameter.
```python theme={null}
fcst_df = nixtla_client.forecast(
df=input_df,
X_df=futr_exog_df,
h=28,
level=[80], # Generate a 80% confidence interval
finetune_steps=10, # Specify the number of steps for fine-tuning
finetune_loss='mae', # Use the MAE as the loss function for fine-tuning
model='timegpt-1-long-horizon', # Use the model for long-horizon forecasting
time_col='ds',
target_col='y',
id_col='unique_id'
)
```
```bash theme={null}
INFO:nixtla.nixtla_client:Validating inputs...
INFO:nixtla.nixtla_client:Preprocessing dataframes...
INFO:nixtla.nixtla_client:Inferred freq: D
INFO:nixtla.nixtla_client:Using the following exogenous variables: event_type_Cultural, event_type_National, event_type_Religious, event_type_Sporting
INFO:nixtla.nixtla_client:Calling Forecast Endpoint...
```
Great! Remember that the predictions are transformed, so we have to inverse the transformation again.
```python theme={null}
fcst_df.rename(columns={'TimeGPT': 'TimeGPT_ex'}, inplace=True)
cols = [col for col in fcst_df.columns if col not in ['ds', 'unique_id']]
fcst_df[cols] = np.exp(fcst_df[cols]) - 1
fcst_df.head()
```
| | unique\_id | ds | TimeGPT\_ex | TimeGPT-lo-80 | TimeGPT-hi-80 |
| - | ------------- | ---------- | ----------- | ------------- | ------------- |
| 0 | FOODS\_1\_001 | 2016-05-23 | 0.281922 | -0.269902 | 1.250828 |
| 1 | FOODS\_1\_001 | 2016-05-24 | 0.313774 | -0.245091 | 1.286372 |
| 2 | FOODS\_1\_001 | 2016-05-25 | 0.285639 | -0.363119 | 1.595252 |
| 3 | FOODS\_1\_001 | 2016-05-26 | 0.295037 | -0.145679 | 0.963104 |
| 4 | FOODS\_1\_001 | 2016-05-27 | 0.315484 | -0.166760 | 1.076830 |
Finally, let's evaluate the performance of TimeGPT with exogenous features.
```python theme={null}
test_df['TimeGPT_ex'] = fcst_df['TimeGPT_ex'].values
test_df.head()
```
| | unique\_id | ds | y | sell\_price | event\_type\_Cultural | event\_type\_National | event\_type\_Religious | event\_type\_Sporting | TimeGPT | TimeGPT-lo-80 | TimeGPT-hi-80 | CrostonClassic | CrostonOptimized | IMAPA | TSB | TimeGPT\_ex |
| - | ------------- | ---------- | -------- | ----------- | --------------------- | --------------------- | ---------------------- | --------------------- | -------- | ------------- | ------------- | -------------- | ---------------- | -------- | -------- | ----------- |
| 0 | FOODS\_1\_001 | 2016-05-23 | 1.386294 | 2.24 | 0 | 0 | 0 | 0 | 0.286841 | -0.267101 | 1.259465 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.281922 |
| 1 | FOODS\_1\_001 | 2016-05-24 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.320482 | -0.241236 | 1.298046 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.313774 |
| 2 | FOODS\_1\_001 | 2016-05-25 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.287392 | -0.362250 | 1.598791 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.285639 |
| 3 | FOODS\_1\_001 | 2016-05-26 | 0.000000 | 2.24 | 0 | 0 | 0 | 0 | 0.295326 | -0.145489 | 0.963542 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.295037 |
| 4 | FOODS\_1\_001 | 2016-05-27 | 1.945910 | 2.24 | 0 | 0 | 0 | 0 | 0.315868 | -0.166516 | 1.077437 | 0.599093 | 0.599093 | 0.445779 | 0.396258 | 0.315484 |
```python theme={null}
evaluation = evaluate(
test_df,
metrics=[mae],
models=["TimeGPT"] + statistical_models + ["TimeGPT_ex"],
target_col="y",
id_col='unique_id'
)
average_metrics = (
evaluation.groupby('metric')[["TimeGPT"] + statistical_models + ["TimeGPT_ex"]]
).mean()
average_metrics
```
| metric | TimeGPT | CrostonClassic | CrostonOptimized | IMAPA | TSB | TimeGPT\_ex |
| ------ | -------- | -------------- | ---------------- | -------- | -------- | ----------- |
| mae | 0.492559 | 0.564563 | 0.580922 | 0.571943 | 0.567178 | 0.485352 |
From the table above, we can see that using exogenous features improved the performance of TimeGPT. Now, it represents a 14% improvement over the best statistical model.
## Conclusion
TimeGPT provides a robust solution for forecasting intermittent demand:
* \~14% MAE improvement over specialized models
* Supports exogenous features for enhanced accuracy
By leveraging TimeGPT and combining both internal series patterns and external factors, organizations can achieve more reliable forecasts even for challenging intermittent demands.
### Next Steps
* Explore [other use cases](/docs/use_cases/forecasting_energy_demand) with TimeGPT
* Learn about [probabilistic forecasting](/docs/forecasting/probabilistic/introduction) with prediction intervals
* Scale your forecasts with [distributed computing](/docs/forecasting/forecasting-at-scale/computing_at_scale)
* Fine-tune models with [custom loss functions](/docs/forecasting/fine-tuning/custom_loss)
# Forecasting Web Traffic
Source: https://nixtla.io/docs/use_cases/forecasting_web_traffic
Learn how to predict website traffic patterns using TimeGPT.
**Goal:** Forecast the next 7 days of daily visits to the website [cienciadedatos.net](https://cienciadedatos.net) using TimeGPT.
This tutorial is adapted from *"Forecasting web traffic with machine learning and Python"* by Joaquín Amat Rodrigo and Javier Escobar Ortiz. You will learn how to:
Obtain forecasts nearly 10% more accurate than the original method.
Use significantly fewer lines of code and simpler workflows.
Generate forecasts in substantially less computation time.
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/1_forecasting_web_traffic.ipynb)
To start, import the required packages and initialize the Nixtla client with your API key.
```python Nixtla Client Initialization theme={null}
import pandas as pd
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla' # Defaults to os.environ.get("NIXTLA_API_KEY")
)
```
**Use an Azure AI endpoint**
If you are using an Azure AI endpoint, also set the `base_url` argument:
```python Azure AI Endpoint Setup theme={null}
nixtla_client = NixtlaClient(
base_url="your_azure_ai_endpoint",
api_key="your_api_key"
)
```
We will load the website visit data directly from a CSV file. Then, we format the dataset by adding an identifier column named `daily_visits`.
```python Load and Format Data theme={null}
url = (
'https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/'
'master/data/visitas_por_dia_web_cienciadedatos.csv'
)
df = pd.read_csv(url, sep=',', parse_dates=[0], date_format='%d/%m/%y')
df['unique_id'] = 'daily_visits'
df.head(10)
```
| | date | users | unique\_id |
| - | ---------- | ----- | ------------- |
| 0 | 2020-07-01 | 2324 | daily\_visits |
| 1 | 2020-07-02 | 2201 | daily\_visits |
| 2 | 2020-07-03 | 2146 | daily\_visits |
| 3 | 2020-07-04 | 1666 | daily\_visits |
| 4 | 2020-07-05 | 1433 | daily\_visits |
| 5 | 2020-07-06 | 2195 | daily\_visits |
| 6 | 2020-07-07 | 2240 | daily\_visits |
| 7 | 2020-07-08 | 2295 | daily\_visits |
| 8 | 2020-07-09 | 2279 | daily\_visits |
| 9 | 2020-07-10 | 2155 | daily\_visits |
**Note:** No further preprocessing is required before we start forecasting.
We will set up a rolling window cross-validation using TimeGPT. This will help us evaluate the forecast accuracy across multiple historic windows.
```python Cross-validation Setup theme={null}
timegpt_cv_df = nixtla_client.cross_validation(
df,
h=7,
n_windows=8,
time_col='date',
target_col='users',
freq='D',
level=[80, 90, 99.5]
)
timegpt_cv_df.head()
```

The results align closely with those from the original tutorial on [forecasting web traffic with machine learning](https://cienciadedatos.net/documentos/py37-forecasting-web-traffic-machine-learning.html).
Next, we compute the Mean Absolute Error (MAE) to quantify forecast accuracy:
```python Calculate MAE theme={null}
from utilsforecast.losses import mae
mae_timegpt = mae(
df=timegpt_cv_df.drop(columns=['cutoff']),
models=['TimeGPT'],
target_col='users'
)
mae_timegpt
```
**MAE Result:** The MAE obtained is `167.69`, outperforming the original pipeline.
Exogenous variables can provide additional context that may improve forecast accuracy. In this example, we add binary indicators for each day of the week.
```python Add Weekday Indicators theme={null}
for i in range(7):
df[f'week_day_{i + 1}'] = 1 * (df['date'].dt.weekday == i)
df.head(10)
```
We repeat the cross-validation with these new features:
```python Cross-validation with Exogenous Variables theme={null}
timegpt_cv_df_with_ex = nixtla_client.cross_validation(
df,
h=7,
n_windows=8,
time_col='date',
target_col='users',
freq='D',
level=[80, 90, 99.5]
)
```

Adding weekday indicators can capture weekly seasonality in user visits.
| **Model** | **Exogenous features** | **MAE Backtest** |
| --------- | ---------------------- | ---------------- |
| TimeGPT | No | 167.6917 |
| TimeGPT | Yes | 167.2286 |
We see a slight improvement in MAE by including the weekday indicators. This illustrates how TimeGPT can incorporate additional signals without complex data processing or extensive model tuning.
**Key Takeaways**
* TimeGPT simplifies forecasting workflows by reducing code and tuning overhead.
* Feature engineering (like adding weekday variables) further boosts accuracy.
* Cross-validation provides a robust way to evaluate model performance.
We have demonstrated significant improvements in forecasting accuracy with minimal effort using TimeGPT. This avoids the majority of the complex steps required when building custom models—such as extensive feature engineering, validation, model comparisons, and hyperparameter tuning.
**Good luck and happy forecasting!**
# Logging and Serving with MLFlow
Source: https://nixtla.io/docs/use_cases/logging_and_serving_with_mlflow
Use MLFlow to log experiment metrics using TimeGPT and serve TimeGPT
## Introduction
[MLFlow](https://mlflow.org/) is an open-source platform that allows you, among other things, to track experiments to compare different hyperparameters and results, and to serve ML models on different platforms.
In this tutorial, we provide basic scripts that you can use to track experiments made with TimeGPT, or to serve TimeGPT through MLFLow.
Each script is customizable to your own needs. The goal is to provide an easy-to-use template that you can extend.
## Experiment Tracking with TimeGPT and MLFlow
The following scripts provide functions for logging experiment results when testing different parameter combinations in TimeGPT.
Of course, to use these scripts make sure to have MLFLow installed and any other required dependencies. To install MLFlow, run `pip install mlflow`. Here, we use the following packages:
```python theme={null}
import os
import mlflow
import pandas as pd
from datetime import datetime
from dotenv import load_dotenv # Used when working with the API directly or a Docker deployment
from api.serverless import make_client # Used when working with the Python wheel
```
In all scripts, we require that you initialize the `NixtlaClient` and pass it to the functions. You can see how to setup your client with an API key [here](/docs/setup/setting_up_your_api_key).
### Logging experiments with forecasting
The following script can be used to log results when using the `forecast` method.
```python theme={null}
def log_timegpt_forecast(
client: NixtlaClient,
df: pd.DataFrame,
h: int = 12,
freq: str = "MS",
level: list = None,
model: str = "timegpt-2-mini",
experiment_name: str = "basic_forecast",
time_col: str = "ds",
target_col: str = "y",
id_col: str = "unique_id",
**kwargs
):
"""
Perform TimeGPT forecast and log to MLFlow.
Parameters:
-----------
client : NixtlaClient
Initialized Nixtla client
df : pd.DataFrame
Input dataframe with time series data
h : int
Forecast horizon
freq : str
Frequency of the time series
level : list
Confidence levels for prediction intervals
model : str
TimeGPT model variant to use
experiment_name : str
Name for this MLFlow run
time_col : str
Name of the time column in df
target_col : str
Name of the target column in df
id_col : str
Name of the series identifier column in df
**kwargs : dict
Additional parameters to pass to forecast()
"""
with mlflow.start_run(run_name=experiment_name):
# Log parameters
mlflow.log_param("horizon", h)
mlflow.log_param("frequency", freq)
mlflow.log_param("model", model)
mlflow.log_param("n_series", df[id_col].nunique() if id_col in df.columns else 1)
mlflow.log_param("n_observations", len(df))
if level:
mlflow.log_param("prediction_intervals", level)
# Log any additional parameters
for key, value in kwargs.items():
mlflow.log_param(key, value)
# Log dataset info
mlflow.log_param("start_date", df[time_col].min())
mlflow.log_param("end_date", df[time_col].max())
# Forecast
forecast_df = client.forecast(
df=df,
h=h,
freq=freq,
level=level,
model=model,
id_col=id_col,
time_col=time_col,
target_col=target_col,
**kwargs
)
# Log metrics
mlflow.log_metric("forecast_points", len(forecast_df))
# Log forecast as artifact
forecast_path = "forecast_output.csv"
forecast_df.to_csv(forecast_path, index=False)
mlflow.log_artifact(forecast_path)
# Log input data as artifact
input_path = "input_data.csv"
df.to_csv(input_path, index=False)
mlflow.log_artifact(input_path)
# Add tags
mlflow.set_tag("task_type", "forecasting")
mlflow.set_tag("timestamp", datetime.now().isoformat())
# Clean up temporary files
os.remove(forecast_path)
os.remove(input_path)
return forecast_df
```
If you want add more explicit parameters for fine-tuning or reproduce the exact type hints of the SDK, refer to the [SDK reference](/docs/reference/sdk_reference).
### Logging experiments with cross-validation
The following is a simple script to log metrics when performing experiments with the `cross_validation` method. This script requires `utilsforecast` to compute performance metrics, so make sure `pip install utilsforecast`.
The script allows you to log overall performance metrics across multiple windows, and also per-window metrics.
```python theme={null}
def log_timegpt_cross_validation(
client: NixtlaClient,
df: pd.DataFrame,
h: int = 12,
n_windows: int = 1,
step_size: int = None,
freq: str = "MS",
level: list = None,
model: str = "timegpt-2-mini",
experiment_name: str = "cross_validation",
time_col: str = "ds",
target_col: str = "y",
id_col: str = "unique_id",
**kwargs
):
"""
Perform TimeGPT cross-validation and log to MLFlow.
Parameters:
-----------
client : NixtlaClient
Initialized Nixtla client
df : pd.DataFrame
Input dataframe with time series data
h : int
Forecast horizon
n_windows : int
Number of cross-validation windows
step_size : int
Step size between windows (default: h)
freq : str
Frequency of the time series
level : list
Confidence levels for prediction intervals
model : str
TimeGPT model variant to use
experiment_name : str
Name for this MLFlow run
time_col : str
Name of the time column in df
target_col : str
Name of the target column in df
id_col : str
Name of the series identifier column in df
**kwargs : dict
Additional parameters to pass to cross_validation()
"""
with mlflow.start_run(run_name=experiment_name):
# Log parameters
mlflow.log_param("horizon", h)
mlflow.log_param("n_windows", n_windows)
mlflow.log_param("step_size", step_size or h)
mlflow.log_param("frequency", freq)
mlflow.log_param("model", model)
mlflow.log_param("n_series", df[id_col].nunique() if id_col in df.columns else 1)
if level:
mlflow.log_param("prediction_intervals", level)
for key, value in kwargs.items():
mlflow.log_param(key, value)
# Perform cross-validation
cv_df = client.cross_validation(
df=df,
h=h,
n_windows=n_windows,
step_size=step_size,
freq=freq,
level=level,
model=model,
time_col=time_col,
target_col=target_col,
id_col=id_col,
**kwargs
)
# Calculate and log metrics
from utilsforecast.losses import mae, mse, rmse
# MAE
mae_value = mae(
df=cv_df.drop(columns=['cutoff']),
models=['TimeGPT'],
target_col=target_col,
id_col=id_col,
)['TimeGPT'].values[0]
# MSE
mse_value = mse(
df=cv_df.drop(columns=['cutoff']),
models=['TimeGPT'],
target_col=target_col,
id_col=id_col,
)['TimeGPT'].values[0]
# RMSE
rmse_value = rmse(
df=cv_df.drop(columns=['cutoff']),
models=['TimeGPT'],
target_col=target_col,
id_col=id_col,
)['TimeGPT'].values[0]
mlflow.log_metric("mae", mae_value)
mlflow.log_metric("mse", mse_value)
mlflow.log_metric("rmse", rmse_value)
mlflow.log_metric("total_cv_predictions", len(cv_df))
# Log per-window metrics (MAE as an example)
cutoffs = cv_df['cutoff'].unique()
for i, cutoff in enumerate(cutoffs):
window_df = cv_df[cv_df['cutoff'] == cutoff]
window_mae = mae(
df=window_df.drop(columns=['cutoff']),
models=['TimeGPT'],
target_col=target_col,
id_col=id_col,
)['TimeGPT'].values[0]
mlflow.log_metric(f"mae_window_{i+1}", window_mae)
# Log artifacts
cv_path = "cross_validation_results.csv"
cv_df.to_csv(cv_path, index=False)
mlflow.log_artifact(cv_path)
# Log summary statistics
summary = {
'metric': ['MAE', 'MSE', 'RMSE'],
'value': [mae_value, mse_value, rmse_value]
}
summary_df = pd.DataFrame(summary)
summary_path = "metrics_summary.csv"
summary_df.to_csv(summary_path, index=False)
mlflow.log_artifact(summary_path)
mlflow.set_tag("task_type", "cross_validation")
mlflow.set_tag("timestamp", datetime.now().isoformat())
# Clean up
os.remove(cv_path)
os.remove(summary_path)
return cv_df
```
If you want add more explicit parameters for fine-tuning or reproduce the exact type hints of the SDK, refer to the [SDK reference](/docs/reference/sdk_reference).
### Logging experiments with anomaly detection
This script showcases how you can track experiments done with the `detect_anomalies_online` method.
```python theme={null}
def log_timegpt_online_anomaly_detection(
client: NixtlaClient,
df: pd.DataFrame,
h: int,
detection_size: int,
threshold_method: str = "univariate",
freq: str = "D",
level: int | float = 99,
model: str = "timegpt-2-mini",
experiment_name: str = "anomaly_detection",
time_col: str = "ds",
target_col: str = "y",
id_col: str = "unique_id",
**kwargs
):
"""
Perform TimeGPT anomaly detection and log results to MLFlow.
Parameters:
-----------
client : NixtlaClient
Initialized Nixtla client
df : pd.DataFrame
Input dataframe with time series data
freq : str
Frequency of the time series
level : int
Confidence level for anomaly detection threshold
model : str
TimeGPT model variant to use
experiment_name : str
Name for this MLFlow run
time_col : str
Name of the time column in df
target_col : str
Name of the target column in df
id_col : str
Name of the series identifier column in df
**kwargs : dict
Additional parameters to pass to detect_anomalies()
"""
with mlflow.start_run(run_name=experiment_name):
# Log parameters
mlflow.log_param("horizon", h)
mlflow.log_param("detection_size", detection_size)
mlflow.log_param("threshold_method", threshold_method)
mlflow.log_param("frequency", freq)
mlflow.log_param("detection_level", level)
mlflow.log_param("model", model)
mlflow.log_param("n_observations", len(df))
for key, value in kwargs.items():
mlflow.log_param(key, value)
# Detect anomalies
anomalies_df = client.detect_anomalies_online(
df=df,
h=h,
detection_size=detection_size,
threshold_method=threshold_method,
freq=freq,
level=level,
model=model,
time_col=time_col,
target_col=target_col,
id_col=id_col,
**kwargs
)
# Calculate metrics
n_anomalies = anomalies_df['anomaly'].sum()
mlflow.log_metric("n_anomalies", n_anomalies)
# Log results
anomaly_path = "anomaly_detection_results.csv"
anomalies_df.to_csv(anomaly_path, index=False)
mlflow.log_artifact(anomaly_path)
# Log only the detected anomalies
if n_anomalies > 0:
detected_anomalies = anomalies_df[anomalies_df['anomaly'] == True]
detected_path = "detected_anomalies_only.csv"
detected_anomalies.to_csv(detected_path, index=False)
mlflow.log_artifact(detected_path)
os.remove(detected_path)
mlflow.set_tag("task_type", "anomaly_detection")
mlflow.set_tag("timestamp", datetime.now().isoformat())
# Clean up
os.remove(anomaly_path)
return anomalies_df
```
If you want add more explicit parameters for fine-tuning or reproduce the exact type hints of the SDK, refer to the [SDK reference](/docs/reference/sdk_reference).
### Sample usage
With the above functions, you can now run experiments and log it to MLFLow.
First, you must instantiate your client using either:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(api_key='your_api_key_here')
```
Or, if you are using a Python wheel:
```python theme={null}
from api.serverless import make_client
client = make_client()
```
Then, load your data. Here, we use the simple air passengers dataset.
```python theme={null}
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv'
)
df.columns = ['ds', 'y']
df["unique_id"] = 0
df = df[["unique_id", "ds", "y"]]
```
After, you can set your tracking URI and experiment name for MLFlow. Note that here, we use the local filesystem for tracking.
```python theme={null}
mlflow.set_tracking_uri("mlruns")
experiment = mlflow.set_experiment("timegpt_experiments")
```
Finally, you can run any function defined above.
```python theme={null}
forecast_df = log_timegpt_forecast(
client=client,
df=df,
h=h,
freq=freq,
level=[80, 90],
experiment_name="forecast_example",
time_col=time_col,
target_col=target_col,
id_col=id_col,
model="timegpt-2"
)
cv_df = log_timegpt_cross_validation(
client=client,
df=df,
h=h,
n_windows=1,
freq=freq,
level=[90],
experiment_name="cv_example",
time_col=time_col,
target_col=target_col,
id_col=id_col,
)
anomaly_df = log_timegpt_online_anomaly_detection(
client=client,
df=df,
h=h,
detection_size=h,
threshold_method="univariate",
freq=freq,
level=60,
model="timegpt-2-mini",
experiment_name="anomaly_detection_example",
time_col=time_col,
target_col=target_col,
id_col=id_col,
)
```
## Serving TimeGPT with MLFlow
You can also use MLFlow for model serving. This can be useful if you need a unified interface and serve the model on many endpoints. Although it is not required to use MLFlow to use TimeGPT, it can be a practice your organization enforces.
The following script shows how you can wrap the TimeGPT model in an `mlflow.pyfunc.PythonModel` to save the model and call it.
```python theme={null}
class MLFLowTimeGPTModel(mlflow.pyfunc.PythonModel):
"""
Unified MLflow pyfunc wrapper for TimeGPT
Can perform forecasting, cross-validation, and anomaly detection
based on the 'operation' parameter in the input.
"""
def __init__(self,
client: NixtlaClient,
model: str = "timegpt-2-mini",
default_h: int = 96,
default_freq: str = "15min",
default_level: Optional[list] = None,
default_n_windows: int = 1,
default_anomaly_level: int = 99,
):
"""
Initialize the unified TimeGPT model wrapper.
Parameters:
-----------
model : str
TimeGPT model variant (timegpt-2-mini, timegpt-2)
default_h : int
Default forecast horizon
default_freq : str
Default frequency
default_level : list
Default confidence levels for prediction intervals
default_n_windows : int
Default number of cross-validation windows
default_anomaly_level : int
Default confidence level for anomaly detection
"""
self.model = model
self.default_h = default_h
self.default_freq = default_freq
self.default_level = default_level or [80, 90]
self.default_n_windows = default_n_windows
self.default_anomaly_level = default_anomaly_level
self.client = client
def load_context(self, context):
"""
Load the model context. Called once when the model is loaded.
Parameters:
-----------
context : mlflow.pyfunc.PythonModelContext
Context containing artifacts and other model metadata
"""
# Load configuration from artifacts if present
if context.artifacts and "config" in context.artifacts:
with open(context.artifacts["config"], 'r') as f:
config = json.load(f)
self.model = config.get("model", self.model)
self.default_h = config.get("h", self.default_h)
self.default_freq = config.get("freq", self.default_freq)
self.default_level = config.get("level", self.default_level)
self.default_n_windows = config.get("n_windows", self.default_n_windows)
self.default_anomaly_level = config.get("anomaly_level", self.default_anomaly_level)
def predict(self, context, model_input):
"""Perform operation"""
# Parse input
if isinstance(model_input, dict):
df = model_input.get('data')
operation = model_input.get('operation', 'forecast')
time_col = model_input.get('time_col', 'ds')
target_col = model_input.get('target_col', 'y')
id_col = model_input.get('id_col', 'unique_id')
model_input.pop('time_col', None)
model_input.pop('target_col', None)
model_input.pop('id_col', None)
else:
# If just DataFrame is passed, default to forecast
df = model_input
operation = 'forecast'
id_col = 'unique_id'
time_col = 'ds'
target_col = 'y'
# Validate input
if df is None or not isinstance(df, pd.DataFrame):
raise ValueError("Input must contain a pandas DataFrame under 'data' key")
# Route to appropriate operation
if operation == 'forecast':
return self._forecast(df, model_input, id_col, time_col, target_col)
elif operation == 'cross_validation':
return self._cross_validation(df, model_input, id_col, time_col, target_col)
elif operation == 'anomaly_detection':
return self._anomaly_detection(df, model_input, id_col, time_col, target_col)
else:
raise ValueError(f"Unknown operation: {operation}. Must be 'forecast', 'cross_validation', or 'anomaly_detection'")
def _forecast(self, df, params, id_col, time_col, target_col):
"""Perform forecasting operation."""
h = params.get('h', self.default_h) if isinstance(params, dict) else self.default_h
freq = params.get('freq', self.default_freq) if isinstance(params, dict) else self.default_freq
level = params.get('level', self.default_level) if isinstance(params, dict) else self.default_level
# Extract additional parameters
additional_params = {}
if isinstance(params, dict):
exclude_keys = {'data', 'operation', 'h', 'freq', 'level', 'time_col', 'target_col'}
additional_params = {k: v for k, v in params.items() if k not in exclude_keys}
result = self.client.forecast(
df=df,
h=h,
freq=freq,
level=level,
id_col=id_col,
time_col=time_col,
target_col=target_col,
model=self.model,
**additional_params
)
return result
def _cross_validation(self, df, params, id_col, time_col, target_col):
"""Perform cross-validation operation."""
h = params.get('h', self.default_h) if isinstance(params, dict) else self.default_h
n_windows = params.get('n_windows', self.default_n_windows) if isinstance(params, dict) else self.default_n_windows
freq = params.get('freq', self.default_freq) if isinstance(params, dict) else self.default_freq
level = params.get('level', self.default_level) if isinstance(params, dict) else self.default_level
step_size = params.get('step_size') if isinstance(params, dict) else None
# Extract additional parameters
additional_params = {}
if isinstance(params, dict):
exclude_keys = {'data', 'operation', 'h', 'n_windows', 'freq', 'level', 'step_size', 'time_col', 'target_col'}
additional_params = {k: v for k, v in params.items() if k not in exclude_keys}
result = self.client.cross_validation(
df=df,
h=h,
n_windows=n_windows,
step_size=step_size,
freq=freq,
level=level,
id_col=id_col,
time_col=time_col,
target_col=target_col,
model=self.model,
**additional_params
)
return result
def _anomaly_detection(self, df, params, id_col, time_col, target_col):
"""Perform anomaly detection operation."""
h = params.get('h', self.default_h) if isinstance(params, dict) else self.default_h
detection_size = params.get("detection_size", self.default_h) if isinstance(params, dict) else self.default_h
threshold_method = params.get("threshold_method", "univariate") if isinstance(params, dict) else "univariate"
freq = params.get('freq', self.default_freq) if isinstance(params, dict) else self.default_freq
level = params.get('level', self.default_anomaly_level) if isinstance(params, dict) else self.default_anomaly_level
# Extract additional parameters
additional_params = {}
if isinstance(params, dict):
exclude_keys = {'data', 'operation', 'h', 'detection_size',
'threshold_method', 'freq', 'level', 'time_col', 'target_col'}
additional_params = {k: v for k, v in params.items() if k not in exclude_keys}
result = self.client.detect_anomalies_online(
df=df,
h=h,
detection_size=detection_size,
threshold_method=threshold_method,
freq=freq,
level=level,
time_col=time_col,
id_col=id_col,
target_col=target_col,
model=self.model,
**additional_params
)
return result
def save_unified_model(
client: NixtlaClient,
model_path: str,
model_variant: str = "timegpt-2-mini",
**default_params
):
"""
Save a unified TimeGPT model that can perform all operations.
Parameters:
-----------
client : NixtlaClient
Initialized Nixtla client
model_path : str
Path where the model will be saved
model_variant : str
TimeGPT model variant to use
**default_params : dict
Default parameters for operations
"""
python_model = MLFLowTimeGPTModel(
client=client,
model=model_variant,
**default_params
)
# Save the model
mlflow.pyfunc.save_model(
path=model_path,
python_model=python_model,
)
```
### Sample usage
First, you must instantiate your client using either:
```python theme={null}
from nixtla import NixtlaClient
nixtla_client = NixtlaClient(api_key='your_api_key_here')
```
Or, if you are using a Python wheel:
```python theme={null}
from api.serverless import make_client
client = make_client()
```
Then, load your data. Here, we use the simple air passengers dataset.
```python theme={null}
df = pd.read_csv(
'https://raw.githubusercontent.com/Nixtla/transfer-learning-time-series/main/datasets/air_passengers.csv'
)
df.columns = ['ds', 'y']
df["unique_id"] = 0
df = df[["unique_id", "ds", "y"]]
```
Then, save the model with:
```python theme={null}
save_unified_model(client=client, model_path="test_model", model_variant="timegpt-2-mini", default_h=h, default_freq=freq)
```
Note that if you want to use another variant of TimeGPT, say `timegpt-2`, then you must save another instance and specify that model variant.
Now, you can perform forecasting, cross-validation and anomaly detection with the saved model in MLFlow.
```python theme={null}
# Load model
model = mlflow.pyfunc.load_model("test_model")
# Forecast
forecast = model.predict({'data': df, 'operation': 'forecast', 'h': h, 'id_col': id_col, 'time_col': time_col, 'target_col': target_col, 'freq': freq})
# Cross-validation
cv = model.predict({'data': df, 'operation': 'cross_validation', 'h': h, 'n_windows': 1, 'id_col': id_col, 'time_col': time_col, 'target_col': target_col, 'freq': freq})
# Anomaly detection
anomalies = model.predict({'data': df, 'operation': 'anomaly_detection', 'detection_size': h, 'id_col': id_col, 'time_col': time_col, 'target_col': target_col, 'freq': freq})
```
# What-If Forecasting: Price Effects in Retail
Source: https://nixtla.io/docs/use_cases/what_if_forecasting_price_effects_in_retail
Master what-if forecasting with TimeGPT for retail pricing optimization. Learn scenario analysis to predict demand changes from price adjustments using the M5 dataset. Step-by-step Python tutorial.
## Introduction
Pricing decisions significantly impact retail demand. [TimeGPT](/docs/introduction/about_timegpt) makes it possible to forecast product demand while incorporating price as a key factor, enabling retailers to evaluate how different pricing scenarios might affect sales. This approach offers valuable insights for strategic pricing decisions.
This tutorial demonstrates how to use TimeGPT for scenario analysis by forecasting demand under various pricing conditions. You'll learn to incorporate price data into forecasts and compare different pricing strategies to understand their impact on consumer demand.
### What You'll Learn
* How to forecast retail demand using price as an [exogenous variable](/docs/forecasting/exogenous-variables/numeric_features)
* How to run what-if scenarios with different pricing strategies
* How to compare baseline, increased, and decreased price forecasts
* How to interpret price sensitivity in demand forecasts
## How to Forecast Sales with Pricing Scenarios
[](https://colab.research.google.com/github/Nixtla/nixtla/blob/main/nbs/docs/use-cases/5_what_if_pricing_scenarios_in_retail.ipynb)
### Step 1: Import required packages
Import the packages needed for this tutorial and initialize your Nixtla client:
```python theme={null}
import pandas as pd
import os
from nixtla import NixtlaClient
```
Initialize the Nixtla client:
```python theme={null}
nixtla_client = NixtlaClient(
api_key='my_api_key_provided_by_nixtla'
)
```
### Step 2: Load the M5 dataset
Let's see an example on predicting sales of products of the [M5 dataset](https://nixtlaverse.nixtla.io/datasetsforecast/m5.html). The M5 dataset contains daily product demand (sales) for 10 retail stores in the US.
First, we load the data using `datasetsforecast`. This returns:
* `Y_df`, containing the sales (`y` column), for each unique product (`unique_id` column) at every timestamp (`ds` column).
* `X_df`, containing additional relevant information for each unique product (`unique_id` column) at every timestamp (`ds` column).
```python theme={null}
from datasetsforecast.m5 import M5
Y_df, X_df, S_df = M5.load(directory=os.getcwd())
Y_df.head(10)
```
| unique\_id | ds | y |
| -------------------- | ---------- | --- |
| FOODS\_1\_001\_CA\_1 | 2011-01-29 | 3.0 |
| FOODS\_1\_001\_CA\_1 | 2011-01-30 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-01-31 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-01 | 1.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-02 | 4.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-03 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-04 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-05 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-06 | 0.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-07 | 0.0 |
For this example, we will only keep the additional relevant information from the column `sell_price`. This column shows the selling price of the product, and we expect demand to fluctuate given a different selling price.
```python theme={null}
X_df = X_df[['unique_id', 'ds', 'sell_price']]
X_df.head(10)
```
| unique\_id | ds | sell\_price |
| -------------------- | ---------- | ----------- |
| FOODS\_1\_001\_CA\_1 | 2011-01-29 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-01-30 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-01-31 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-01 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-02 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-03 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-04 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-05 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-06 | 2.0 |
| FOODS\_1\_001\_CA\_1 | 2011-02-07 | 2.0 |
### Step 3: Forecast demand using price as an exogenous variable
In this example, we forecast for a single product (`FOODS_1_129_`) across all 10 stores. This product exhibits frequent price changes, making it ideal for modeling price effects on demand. Learn more about using [exogenous variables in TimeGPT](/docs/forecasting/exogenous-variables/numeric_features).
```python theme={null}
products = [
'FOODS_1_129_CA_1', 'FOODS_1_129_CA_2', 'FOODS_1_129_CA_3', 'FOODS_1_129_CA_4',
'FOODS_1_129_TX_1', 'FOODS_1_129_TX_2', 'FOODS_1_129_TX_3',
'FOODS_1_129_WI_1', 'FOODS_1_129_WI_2', 'FOODS_1_129_WI_3'
]
Y_df_product = Y_df.query('unique_id in @products')
X_df_product = X_df.query('unique_id in @products')
```
Merge the sales (`y`) and price (`sell_price`) data into one DataFrame:
```python theme={null}
df = Y_df_product.merge(X_df_product)
df.head(10)
```
| unique\_id | ds | y | sell\_price |
| -------------------- | ---------- | --- | ----------- |
| FOODS\_1\_129\_CA\_1 | 2011-02-01 | 1.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-02 | 0.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-03 | 0.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-04 | 0.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-05 | 1.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-06 | 0.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-07 | 0.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-08 | 0.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-09 | 0.0 | 6.22 |
| FOODS\_1\_129\_CA\_1 | 2011-02-10 | 3.0 | 6.22 |
Let's investigate how the demand, our target `y`, of these products has evolved in the last year of data.
```python theme={null}
nixtla_client.plot(df, unique_ids=products, max_insample_length=365)
```

We see that in the California stores (with a CA\_ suffix), the product has sold intermittently, whereas in the other regions (TX and WY) sales where less intermittent. Note that the plot only shows 8 (out of 10) stores.
Next, we look at the `sell_price` of these products across the entire data available.
```python theme={null}
nixtla_client.plot(df, unique_ids=products, target_col='sell_price')
```

We find that there have been relatively few price changes (about 20 in total) over the period 2011 to 2016.
Let's turn to our forecasting task. We will forecast the last 28 days in the dataset.
To use the `sell_price` exogenous variable in TimeGPT, we have to add it as future values. Therefore, we create a future values dataframe, that contains the `unique_id`, the timestamp `ds`, and `sell_price`.
```python theme={null}
future_ex_vars_df = df.drop(columns = ['y'])
future_ex_vars_df = future_ex_vars_df.query("ds >= '2016-05-23'")
future_ex_vars_df.head(10)
```
| unique\_id | ds | sell\_price |
| -------------------- | ---------- | ----------- |
| FOODS\_1\_129\_CA\_1 | 2016-05-23 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-24 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-25 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-26 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-27 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-28 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-29 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-30 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-05-31 | 5.74 |
| FOODS\_1\_129\_CA\_1 | 2016-06-01 | 5.74 |
Next, we limit our input dataframe to all but the 28 forecast days:
```python theme={null}
df_train = df.query("ds < '2016-05-23'")
df_train.tail(10)
```
| unique\_id | ds | y | sell\_price |
| -------------------- | ---------- | --- | ----------- |
| FOODS\_1\_129\_WI\_3 | 2016-05-13 | 3.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-14 | 1.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-15 | 2.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-16 | 3.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-17 | 1.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-18 | 2.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-19 | 3.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-20 | 1.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-21 | 0.0 | 7.23 |
| FOODS\_1\_129\_WI\_3 | 2016-05-22 | 0.0 | 7.23 |
Now, we can generate forecasts using TimeGPT (28 days ahead):
```python theme={null}
timegpt_fcst_df = nixtla_client.forecast(
df=df_train,
X_df=future_ex_vars_df,
h=28
)
timegpt_fcst_df.head()
```
| unique\_id | ds | TimeGPT |
| -------------------- | ---------- | -------- |
| FOODS\_1\_129\_CA\_1 | 2016-05-23 | 0.875594 |
| FOODS\_1\_129\_CA\_1 | 2016-05-24 | 0.777731 |
| FOODS\_1\_129\_CA\_1 | 2016-05-25 | 0.786871 |
| FOODS\_1\_129\_CA\_1 | 2016-05-26 | 0.828223 |
| FOODS\_1\_129\_CA\_1 | 2016-05-27 | 0.791228 |
We plot the forecast, the actuals and the last 28 days before the forecast period:
```python theme={null}
nixtla_client.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_df,
max_insample_length=56
)
```

### Step 4: What-If Scenario Forecasting with Price Changes
What happens when we change the price of the products in our forecast period? Let's see how our forecast changes when we increase and decrease the `sell_price` by 5%.
```python theme={null}
price_change = 0.05
future_ex_vars_df_plus = future_ex_vars_df.copy()
future_ex_vars_df_plus["sell_price"] *= (1 + price_change)
future_ex_vars_df_minus = future_ex_vars_df.copy()
future_ex_vars_df_minus["sell_price"] *= (1 - price_change)
```
Let's create a new set of forecasts with TimeGPT.
```python theme={null}
timegpt_fcst_df_plus = nixtla_client.forecast(df_train, future_ex_vars_df_plus, h=28)
timegpt_fcst_df_minus = nixtla_client.forecast(df_train, future_ex_vars_df_minus, h=28)
```
Rename and combine the scenario forecasts:
```python theme={null}
timegpt_fcst_df_plus = timegpt_fcst_df_plus.rename(columns={'TimeGPT':f'TimeGPT-sell_price_plus_{price_change * 100:.0f}%'})
timegpt_fcst_df_minus = timegpt_fcst_df_minus.rename(columns={'TimeGPT':f'TimeGPT-sell_price_minus_{price_change * 100:.0f}%'})
timegpt_fcst_df = pd.concat([timegpt_fcst_df,
timegpt_fcst_df_plus[f'TimeGPT-sell_price_plus_{price_change * 100:.0f}%'],
timegpt_fcst_df_minus[f'TimeGPT-sell_price_minus_{price_change * 100:.0f}%']], axis=1)
timegpt_fcst_df.head(10)
```
| unique\_id | ds | TimeGPT | TimeGPT-sell\_price\_plus\_5% | TimeGPT-sell\_price\_minus\_5% |
| -------------------- | ---------- | -------- | ----------------------------- | ------------------------------ |
| FOODS\_1\_129\_CA\_1 | 2016-05-23 | 0.875594 | 0.847006 | 1.370029 |
| FOODS\_1\_129\_CA\_1 | 2016-05-24 | 0.777731 | 0.749142 | 1.272166 |
| FOODS\_1\_129\_CA\_1 | 2016-05-25 | 0.786871 | 0.758283 | 1.281306 |
| FOODS\_1\_129\_CA\_1 | 2016-05-26 | 0.828223 | 0.799635 | 1.322658 |
| FOODS\_1\_129\_CA\_1 | 2016-05-27 | 0.791228 | 0.762640 | 1.285663 |
| FOODS\_1\_129\_CA\_1 | 2016-05-28 | 0.819133 | 0.790545 | 1.313568 |
| FOODS\_1\_129\_CA\_1 | 2016-05-29 | 0.839992 | 0.811404 | 1.334427 |
| FOODS\_1\_129\_CA\_1 | 2016-05-30 | 0.843070 | 0.814481 | 1.337505 |
| FOODS\_1\_129\_CA\_1 | 2016-05-31 | 0.833089 | 0.804500 | 1.327524 |
| FOODS\_1\_129\_CA\_1 | 2016-06-01 | 0.855032 | 0.826443 | 1.349467 |
As expected, demand increases when we reduce the price and decreases when we increase it. A cheaper product leads to higher sales and vice versa.
Finally, let's plot the forecasts for our different pricing scenarios, showing how TimeGPT forecasts a different demand when the price of a set of products is changed.
```python theme={null}
nixtla_client.plot(
df[['unique_id', 'ds', 'y']],
timegpt_fcst_df,
max_insample_length=56
)
```

In the graphs we can see that for specific products for certain periods the discount increases expected demand, while during other periods and for other products, price change has a smaller effect on total demand.
## Conclusion
What-if forecasting with TimeGPT enables data-driven pricing decisions by:
* Modeling demand sensitivity to price changes
* Comparing multiple pricing scenarios simultaneously
* Incorporating exogenous variables for realistic predictions
This scenario analysis approach helps retailers optimize pricing strategies and maximize revenue while understanding demand elasticity.
### Next Steps
* Explore [intermittent demand forecasting](/docs/use_cases/forecasting_intermittent_demand) with TimeGPT
* Learn about [fine-tuning models](/docs/forecasting/fine-tuning/steps) for better accuracy
* Understand [cross-validation](/docs/forecasting/evaluation/cross_validation) for model evaluation
* Scale forecasts with [distributed computing](/docs/forecasting/forecasting-at-scale/computing_at_scale)
### Important Considerations
* This method assumes that historical demand and price behaviour is predictive of future demand, and omits other factors affecting demand. To include these other factors, use additional exogenous variables that provide the model with more context about the factors influencing demand.
* This method is sensitive to unmodelled events that affect the demand, such as sudden market shifts. To include those, use additional exogenous variables indicating such sudden shifts if they have been observed in the past too.