Google Search Appliance: External Metadata Indexing Guide 10
2. Create a
<record>
element for each primary document. In the
<metadata>
element, insert one or
more
<meta>
elements, as shown in the following example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "">
<gsafeed>
<header>
<datasource>sample</datasource>
<feedtype>full</feedtype>
</header>
<group>
<record url="http://www.corp.enterprise.com/hello01"
mimetype="text/plain" last-modified="Tue, 17 Feb 2009 12:45:26 GMT">
<metadata>
<meta name="author" content="Jones"/>
<meta name="project" content="hello01"/>
<meta name="department" content="engineering"/>
</metadata>
3. Insert the contents of the primary document into the
<content>
element of the record, as shown in
the following example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" "">
<gsafeed>
<header>
<datasource>sample</datasource>
<feedtype>full</feedtype>
</header>
<group>
<record url="http://www.corp.enterprise.com/hello01"
mimetype="text/plain" last-modified="Tue, 17 Feb 2009 12:45:26 GMT">
<metadata>
<meta name="author" content="Jones"/>
<meta name="project" content="hello01"/>
<meta name="department" content="engineering"/>
</metadata>
<content> This is hello02 content. </content>
</record>
</group>
</gsafeed>
Note: The previous example uses a full feed (
<feedtype>full</feedtype>
). With full feeds, fed
documents are removed from the index within six hours. For more information, see “Removing Feed
Content From the Index” in the Feeds Protocol Developer’s Guide. You can use an incremental feed to
avoid fed documents being removed from the index by replacing the
<feedtype>full
element with the
<feedtype>incremental</feedtype>
element.
If the content is text-based content, it can be inserted directly into the feed XML file. If it is non-text
content (
.pdf, .doc
, and other file types), you need to base64-encode the content and set the record’s
encoding
attribute to
encoding="base64binary"
, as described in the Feeds Protocol Developer’s Guide.
You should be aware that when you update a content feed for a primary document, the search
appliance does not automatically update the associated metadata feed in its index unless the
corresponding record has a
<metadata>
section. Similarly, if you update a metadata feed, the search
appliance does not automatically update the associated primary document feed in its index. If you want
to update a content feed and a metadata feed, you should explicitly push both of these feeds to the
index.
To update the metadata of a url using metadata-and-url feed, the search appliance must be able to
crawl the url specified in the
<record>
tag. Therefore this process cannot be used for content that was
indexed through a content feed (for example, by using the File System connector).
Kommentare zu diesen Handbüchern